docker/Readme.md
2019-02-13 12:26:52 -05:00

6.1 KiB

VizierDB

Deploy a Containerized Vizier Instance

Vizier is a cloud-enabled tool that makes it easy to explore, validate, transform and debug data.

Components

Vizier has a number of components that are not trivial to set up manually:

  • [Web UI] - React user interface!
  • [API Server] - python wsgi api server
  • [MimirDB] - virtual probabalistic database
  • [Proxy] - a reverse proxy that provides an endpoint for vizier services
  • [Apache Spark] - distributed data processing
  • [Hadoop] - evented I/O for the backend
  • [S3] - optional data staging endpoint
  • [Analytics] - optional vizier ui access tracking

Though instalation instructions for each of these components is availabel, it is time-consuming and difficult to install them manually. So is there an easier containerized deploment? Yes! Deployment to a kubernetes cluster is explained below.

Deploy The Vizier Stack to Kubernetes

If you already have a kubernetes cluster set up, good, make sure CoreDNS is enabled (we are using k8s v1.13.2). If not, you can get a single node cluster setup pretty fast using microk8s. See microk8s docs for more details, but basically you can just do the following:

sudo snap install microk8s --classic
microk8s.enable dns dashboard

Once your cluster is ready, get the yaml file for deploying Vizier and make the following adjustments:

  • update the host paths for the persistant volumes if you would like them somewhere other than /mnt/ (YAML Line 15, 28, 41, 310)
  • update the s3-credentials secret with your S3 access key id and secret - base64 encode them first: (YAML Line 330, 331)
    echo "YOUR-S3-ACCESS-KEY-ID" | base64    
    echo "YOUR-S3-ACCESS-KEY-SECRET" | base64
    
  • update the VIZIER_DOMAIN env variable for the vizier-proxy deployment to the domain you will use to access Vizier. You can use a real domain and DNS entries or the hosts file of a client. (YAML Line 622)

Deploy vizier

kubectl create -f vizier-deployment.yaml

You may need to do this to allow containers to access the internet

sudo iptables -P FORWARD ACCEPT

Find the ClusterIP or ExternalIP of the vizier-proxy service

kubectl get service vizier-proxy 

After you have the IP of the vizier-proxy service you need to add the following entries to either DNS for a real domain or the hosts file of the client: so where VIZIER_DOMAIN=vizier.dev

IP Address Host Name Purpose
IP of vizier-proxy service demo.vizier.dev web ui for vizier
IP of vizier-proxy service api.vizier.dev web api for vizier
IP of vizier-proxy service vizier.vizier.dev supervisor ctl for api
IP of vizier-proxy service mimir.vizier.dev supervisor ctl for mimir
IP of vizier-proxy service proxy.vizier.dev supervisor ctl for proxy
IP of vizier-proxy service analytics.vizier.dev endpoint for access analytics and ui
IP of vizier-proxy service spark.vizier.dev web ui for spark master
IP of vizier-proxy service driver.vizier.dev web ui for spark driver
IP of vizier-proxy service hdfs.vizier.dev web ui for hadoop

Now you should be able to access the Vizier UI from a web browser.

https://demo.<VIZIER_DOMAIN>/vizier-db

License

Apache License 2.0