Go to file
Mike 48f5b7b4ff Update Docker Scripts for Spark 2.4.0
update the docker run commands to work with the new spark/hadoop image and with the new versions of the mimir and api container
2019-02-28 13:33:15 -05:00
analytics-nginx Rebuild repo and make it internal on gitlab 2018-11-26 13:32:49 -05:00
api Kubernetes deployment to aws eks 2019-02-12 13:57:18 -05:00
kubernetes readme 2019-02-13 12:12:53 -05:00
mimir Kubernetes deployment to aws eks 2019-02-12 13:57:18 -05:00
s3-endpoint Rebuild repo and make it internal on gitlab 2018-11-26 13:32:49 -05:00
spark-docker Update Docker Scripts for Spark 2.4.0 2019-02-28 13:33:15 -05:00
ui-nginx Kubernetes deployment to aws eks 2019-02-12 13:57:18 -05:00
vizier-nginx-proxy Update Docker Scripts for Spark 2.4.0 2019-02-28 13:33:15 -05:00
.DS_Store Rebuild repo and make it internal on gitlab 2018-11-26 13:32:49 -05:00
build-images.sh Kubernetes deployment is working 2019-01-30 13:09:31 -05:00
Readme.md Merge remote-tracking branch 'origin/master' 2019-02-15 13:43:41 -05:00
reset_microk8s.sh Kubernetes deployment is working. Likely straight docker is broken. 2019-01-30 23:29:46 -05:00
run-containers.sh Update Docker Scripts for Spark 2.4.0 2019-02-28 13:33:15 -05:00
run_containers_norn.sh Update Docker Scripts for Spark 2.4.0 2019-02-28 13:33:15 -05:00

VizierDB

Deploy a Containerized Vizier Instance

Vizier is a cloud-enabled tool that makes it easy to explore, validate, transform and debug data.

Components

Vizier has a number of components that are not trivial to set up manually:

  • [Web UI] - React user interface!
  • [API Server] - python wsgi api server
  • [MimirDB] - virtual probabalistic database
  • [Proxy] - a reverse proxy that provides an endpoint for vizier services
  • [Apache Spark] - distributed data processing
  • [Hadoop] - distributed data processing
  • [S3] - optional data staging endpoint
  • [Analytics] - optional vizier ui access tracking

Though instalation instructions for each of these components is availabel, it is time-consuming and difficult to install them manually. So is there an easier containerized deploment? Yes! Deployment to a kubernetes cluster is explained below.

Deploy The Vizier Stack to Kubernetes

If you already have a kubernetes cluster set up, good, make sure CoreDNS is enabled (we are using k8s v1.13.2). If not, you can get a single node cluster setup pretty fast using microk8s. See microk8s docs for more details, but basically you can just do the following:

sudo snap install microk8s --classic
microk8s.enable dns dashboard

Once your cluster is ready, get the yaml file for deploying Vizier and make the following adjustments:

  • update the host paths for the persistent volumes if you would like them somewhere other than /mnt/ (YAML Line 15, 28, 41, 310)
  • update the s3-credentials secret with your S3 access key id and secret - base64 encode them first: (YAML Line 330, 331)
    echo "YOUR-S3-ACCESS-KEY-ID" | base64    
    echo "YOUR-S3-ACCESS-KEY-SECRET" | base64
    
  • update the VIZIER_DOMAIN env variable for the vizier-proxy deployment to the domain you will use to access Vizier. You can use a real domain and DNS entries or the hosts file of a client. (YAML Line 622)

Deploy vizier

kubectl create -f vizier-deployment.yaml

You may need to do this to allow containers to access the internet

sudo iptables -P FORWARD ACCEPT

Find the ClusterIP or ExternalIP of the vizier-proxy service

kubectl get service vizier-proxy 

After you have the IP of the vizier-proxy service you need to add the following entries to either DNS for a real domain or the hosts file of the client: so where VIZIER_DOMAIN=vizier.dev

IP Address Host Name Purpose
IP of vizier-proxy service demo.vizier.dev web ui for vizier
IP of vizier-proxy service api.vizier.dev web api for vizier
IP of vizier-proxy service vizier.vizier.dev supervisor ctl for api
IP of vizier-proxy service mimir.vizier.dev supervisor ctl for mimir
IP of vizier-proxy service proxy.vizier.dev supervisor ctl for proxy
IP of vizier-proxy service analytics.vizier.dev endpoint for access analytics and ui
IP of vizier-proxy service spark.vizier.dev web ui for spark master
IP of vizier-proxy service driver.vizier.dev web ui for spark driver
IP of vizier-proxy service hdfs.vizier.dev web ui for hadoop

Now you should be able to access the Vizier UI from a web browser.

https://demo.<VIZIER_DOMAIN>/vizier-db

License

Apache License 2.0