Go to file
Michael Brachmann 328a0b1447 python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
alpine-openjdk8 move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
analytics-nginx Rebuild repo and make it internal on gitlab 2018-11-26 13:32:49 -05:00
api Update readme for docker deployment and improve deploy scripts 2019-03-06 14:27:43 -05:00
api-async move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
kubernetes proxy acme-client -> certbot 2019-09-18 09:04:03 -04:00
mimir move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
python-executor python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
s3-endpoint Rebuild repo and make it internal on gitlab 2018-11-26 13:32:49 -05:00
spark-docker python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
twilio-video add async api stuff 2019-04-23 13:35:31 -04:00
ui-nginx move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
vizier-auth python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
vizier-nginx-proxy move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
.DS_Store Rebuild repo and make it internal on gitlab 2018-11-26 13:32:49 -05:00
Readme.md Update readme for docker deployment and improve deploy scripts 2019-03-06 14:27:43 -05:00
build-images-async.sh all images to master branch of vizier/mimir repos 2019-09-18 09:07:37 -04:00
build-images-auth.sh python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
build-images.sh fixes for non async. add async build and run scripts. 2019-05-08 14:46:06 -04:00
docker-compose.yml add docker compose file. add bokeh support to api 2019-06-13 11:26:16 -04:00
docs-run-containers.sh add docker compose file. add bokeh support to api 2019-06-13 11:26:16 -04:00
remove-containers.sh python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
remove-images-async.sh use same proxy image for both old and async vizier versions 2019-05-09 11:58:16 -04:00
remove-images.sh use same proxy image for both old and async vizier versions 2019-05-09 11:58:16 -04:00
remove-volumes.sh bug fixes and updates 2019-09-18 11:11:50 -04:00
reset_microk8s.sh Kubernetes deployment is working. Likely straight docker is broken. 2019-01-30 23:29:46 -05:00
run-containers-async.sh move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
run-containers-auth.sh python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
run-containers.sh disable api basic auth by default. expose mimir-api docs. 2019-05-14 14:54:48 -04:00
run_containers_auth_norn.sh python cell execution sandboxing docker image fixes for spark image hadoop version mismatch 2020-04-10 09:45:37 -04:00
run_containers_norn.sh move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
start-containers.sh move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00
stop-containers.sh move to openjdk, vizier-auth images 2020-02-24 14:49:33 -05:00

Readme.md

VizierDB

Deploy a Containerized Vizier Instance

Vizier is a cloud-enabled tool that makes it easy to explore, validate, transform and debug data.

Components

Vizier has a number of components that are not trivial to set up manually:

  • [Web UI] - React user interface!
  • [API Server] - python wsgi api server
  • [MimirDB] - virtual probabalistic database
  • [Proxy] - a reverse proxy that provides an endpoint for vizier services
  • [Apache Spark] - distributed data processing
  • [Hadoop] - distributed data processing
  • [S3] - optional data staging endpoint
  • [Analytics] - optional vizier ui access tracking

Though instalation instructions for each of these components is availabel, it is time-consuming and difficult to install them manually. So is there an easier containerized deploment? Yes! Deployments to a kubernetes cluster and to docker are explained below.

Deploy The Vizier Stack to docker

If you already have a docker installed, good. If not, you can get installed pretty fast. See docker for more details, but on Ubuntu, basically you can just do the following:

sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Once your docker instalation is ready, get the bash script for deploying Vizier and make the following adjustments:

  • update the VIZIER_DOMAIN variable for the vizier-proxy deployment to the domain you will use to access Vizier. You can use a real domain and DNS entries or the hosts file of a client. (run-vizier-containers.sh: Line 26)
  • update the name or host paths for the volumes if you would like them somewhere other than the default (run-vizier-containers.sh: Line 31)
  • update the s3-credentials and bucket name with your S3 access key id, secret, and bucket name: (run-vizier-containers.sh: Line 28, 29, 30)

Deploy vizier

./run-vizier-containers.sh

The IP address of the vizier-proxy service for a local docker deployment will likely be 127.0.0.1

Deploy The Vizier Stack to Kubernetes

If you already have a kubernetes cluster set up, good, make sure CoreDNS is enabled (we are using k8s v1.13.2). If not, you can get a single node cluster setup pretty fast using microk8s. See microk8s docs for more details, but basically you can just do the following:

sudo snap install microk8s --classic
microk8s.enable dns dashboard

Once your cluster is ready, get the yaml file for deploying Vizier and make the following adjustments:

  • update the host paths for the persistent volumes if you would like them somewhere other than /mnt/ (YAML Line 15, 28, 41, 310)
  • update the s3-credentials secret with your S3 access key id and secret - base64 encode them first: (YAML Line 330, 331)
    echo "YOUR-S3-ACCESS-KEY-ID" | base64    
    echo "YOUR-S3-ACCESS-KEY-SECRET" | base64
    
  • update the VIZIER_DOMAIN env variable for the vizier-proxy deployment to the domain you will use to access Vizier. You can use a real domain and DNS entries or the hosts file of a client. (YAML Line 622)

Deploy vizier

kubectl create -f vizier-deployment.yaml

You may need to do this to allow containers to access the internet

sudo iptables -P FORWARD ACCEPT

Find the ClusterIP or ExternalIP of the vizier-proxy service

kubectl get service vizier-proxy 

After Deployment

After you have the IP of the vizier-proxy service you need to add the following entries to either DNS for a real domain or the hosts file of the client: so where VIZIER_DOMAIN=vizier.dev

IP Address Host Name Purpose
IP of vizier-proxy service demo.vizier.dev web ui for vizier
IP of vizier-proxy service api.vizier.dev web api for vizier
IP of vizier-proxy service vizier.vizier.dev supervisor ctl for api
IP of vizier-proxy service mimir.vizier.dev supervisor ctl for mimir
IP of vizier-proxy service proxy.vizier.dev supervisor ctl for proxy
IP of vizier-proxy service analytics.vizier.dev endpoint for access analytics and ui
IP of vizier-proxy service spark.vizier.dev web ui for spark master
IP of vizier-proxy service driver.vizier.dev web ui for spark driver
IP of vizier-proxy service hdfs.vizier.dev web ui for hadoop

Now you should be able to access the Vizier UI from a web browser.

https://demo.<VIZIER_DOMAIN>/vizier-db

License

Apache License 2.0