Quickstart¶
On this page, you will find simplified deployment instructions and Helm values to get you started with the latest release of Pachyderm on the Kubernetes Engine of your choice (AWS (EKS), Google (GKS), and Azure (AKS)).
For each cloud provider, we will give you the option to "quick deploy" Pachyderm with or without Console (Pachyderm UI available with Enterprise).
Important
The deployment steps highlighted in this document are not intended for production. For production settings, please read our infrastructure recommendations. In particular, we recommend:
- the use of a managed PostgreSQL server (RDS, CloudSQL, or PostgreSQL Server) rather than Pachyderm's default bundled PostgreSQL.
- the setup of a TCP Load Balancer in front of your pachd service.
- the setup of an Ingress Controller in front of Console.
Then find your targeted Cloud provider in the Deploy and Manage ection of this documentation.
1. Prerequisites¶
Pachyderm is deployed on a Kubernetes Cluster.
Install the following clients on your machine before you start creating your cluster. Use the latest available version of the components listed below.
- kubectl: the cli to interact with your cluster.
- pachctl: the cli to interact with Pachyderm.
- Install
Helm
for your deployment.
Optional - Quick deployment of Pachyderm Enterprise (with Console)
- The deployment of Console (Pachyderm UI) requires a valid enterprise token. To get your free-trial token, fill in this form, get in touch with us at sales@pachyderm.io, or on our Slack.
- When deploying with Console, we create a default mock user (username:
admin
, password:password
) to authenticate yourself to Console so you don't have to connect an Identity Provider to make things work. The mock user is a Cluster Admin.
For a better understanding of the additional steps and helm values needed when deploying with Console in a production environment, read about the deployment of Pachyderm with Console page.
Select your favorite cloud provider.
2. Create Your Values.yaml¶
AWS¶
-
Additional client installation: Install AWS CLI
-
Create an S3 bucket for your data
-
Create a values.yaml
deployTarget: "AMAZON"
pachd:
storage:
amazon:
bucket: "bucket_name"
# this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
id: "AKIAIOSFODNN7EXAMPLE"
# this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
region: "us-east-2"
externalService:
enabled: true
deployTarget: "AMAZON"
pachd:
storage:
amazon:
bucket: "bucket_name"
# this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
id: "AKIAIOSFODNN7EXAMPLE"
# this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
region: "us-east-2"
# pachyderm enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
Jump to Helm install
Google¶
-
Additional client installation: Install Google Cloud SDK
-
Create a GKE cluster Note: Add
--scopes storage-rw
to yourgcloud container clusters create
command. -
Create a GCS Bucket for your data
-
Create a values.yaml
deployTarget: "GOOGLE"
pachd:
storage:
google:
bucket: "bucket_name"
cred: |
INSERT JSON CONTENT HERE
externalService:
enabled: true
deployTarget: "GOOGLE"
pachd:
storage:
google:
bucket: "bucket_name"
cred: |
INSERT JSON CONTENT HERE
# pachyderm enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
Jump to Helm install
Azure¶
Note
- This section assumes that you have an Azure Subsciption.
-
Additional client installation: Install Azure CLI 2.0.1 or later.
-
Create a Storage Container for your data
-
Create a values.yaml
deployTarget: "MICROSOFT"
pachd:
storage:
microsoft:
# storage container name
container: "blah"
# storage account name
id: "AKIAIOSFODNN7EXAMPLE"
# storage account key
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
externalService:
enabled: true
deployTarget: "MICROSOFT"
pachd:
storage:
microsoft:
# storage container name
container: "blah"
# storage account name
id: "AKIAIOSFODNN7EXAMPLE"
# storage account key
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
# pachyderm enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
Jump to Helm install
3. Helm Install¶
-
You will be deploying the latest GA release of Pachyderm:
helm repo add pach https://helm.pachyderm.com helm repo update helm install pachyderm -f my_pachyderm_values.yaml pach/pachyderm
-
Check your deployment:
kubectl get pods
Once the pods are up, you should see a pod for
pachd
running (alongside etcd, pg-bouncer or postgres, console, depending on your installation). If you are curious about the architecture of Pachyderm, take a look at our high-level diagram(../../).System Response:
NAME READY STATUS RESTARTS AGE etcd-0 1/1 Running 0 18h pachd-5db79fb9dd-b2gdq 1/1 Running 2 18h postgres-0 1/1 Running 0 18h
4. Have 'pachctl' And Your Cluster Communicate¶
- Retrieve the external IP address of pachd service:
kubectl get services | grep pachd-lb | awk '{print $4}'
-
Then update your context for pachctl to point at your cluster:
echo '{"pachd_address": "grpc://<external-IP-address>:30650"}' | pachctl config set context "<choose-a-cluster-context-name>" --overwrite
pachctl config set active-context "<your-cluster-context-name>"
-
To connect to your new Pachyderm instance, run:
pachctl config import-kube local --overwrite
pachctl config set active-context local
-
Then run
pachctl port-forward
(Background this process in a new tab of your terminal). -
Note that you will need to run
pachctl auth login
then authenticate to Pachyderm with the mock User (username:admin
, password:password
) to usepachctl
-
Finally, check that your cluster is up and running
pachctl version
System Response:
COMPONENT VERSION pachctl 2.2.0 pachd 2.2.0
5. Connect to Console¶
To connect to your Console (Pachyderm UI):
- Point your browser to
http://localhost:4000
- Authenticate as the mock User using
admin
&password
You are all set!
6. Try our beginner tutorial.¶
7. NOTEBOOKS USERS: Install Pachyderm JupyterLab Mount Extension¶
Once your cluster is up and running, you can helm install JupyterHub on your Pachyderm cluster and experiment with your data in Pachyderm from your Notebook cells.
Check out our JupyterHub and Pachyderm Mount Extension page for installation instructions.
Use Pachyderm's default image and values.yaml jupyterhub-ext-values.yaml
or follow the instructions to update your own.
Note
Make sure to check our data science notebook examples running on Pachyderm, from a market sentiment NLP implementation using a FinBERT model to pipelines training a regression model on the Boston Housing Dataset.