Reference
PachCTL

Quickstart

Learn how to deploy the latest version of Pachyderm quickly with simplified instructions and pre-set Helm values.

March 24, 2023

On this page, you will find simplified deployment instructions and Helm values to get you started with the latest release of Pachyderm on the Kubernetes Engine of your choice (AWS (EKS), Google (GKS), and Azure (AKS)).

For each cloud provider, we will give you the option to “quick deploy” Pachyderm with or without an enterprise key. A quick deployment allows you to experiment with Pachyderm without having to go through any infrastructure setup. In particular, you do not need to set up any object store or PostgreSQL instance.

💡

The deployment steps highlighted in this document are not intended for production. For production settings, please read our infrastructure recommendations. In particular, we recommend:

  • the use of a managed PostgreSQL server (RDS, CloudSQL, or PostgreSQL Server) rather than Pachyderm’s default bundled PostgreSQL.
  • the setup of a TCP Load Balancer in front of your pachd service.
  • the setup of an Ingress Controller in front of Console.

Then find your targeted Cloud provider in the Deploy and Manage section of this documentation.

⚠ī¸

We are now shipping Pachyderm with an optional embedded proxy allowing your cluster to expose one single port externally. This deployment setup is optional.

If you choose to deploy Pachyderm with a Proxy, check out our new recommended architecture and deployment instructions.

Deploying with a proxy presents a couple of advantages:

  • You only need to set up one TCP Load Balancer (No more Ingress in front of Console).
  • You will need one DNS only.
  • It simplifies the deployment of Console.
  • No more port-forward.

1. Prerequisites #

Pachyderm is deployed on a Kubernetes Cluster.

Install the following clients on your machine before you start creating your cluster. Use the latest available version of the components listed below.

⚠ī¸

Get a Enterprise key

To get a free-trial token, fill in this form, get in touch with us at sales@pachyderm.io, or on our Slack.

Select your favorite cloud provider.

💡

Note that we often use the acronym CE for Community Edition.

2. Create Your Values.yaml #

ℹī¸

Pachyderm comes with a Web UI (Console) per default.

AWS #

  1. Additional client installation: Install AWS CLI

  2. Create an EKS cluster

  3. Create an S3 bucket for your data

  4. Create a values.yaml

Deploy Pachyderm CE (includes Console CE) #

 deployTarget: "AMAZON"
 pachd:
   storage:
     amazon:
       bucket: "bucket_name"      
       # this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
       id: "AKIAIOSFODNN7EXAMPLE"                
       # this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html  (AWS Credentials)          
       secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
       region: "us-east-2"
   externalService:
     enabled: true
 console:
   enabled: true

Deploy Enterprise with Console #

Note that when deploying Enterprise with Console, we create a default mock user (username:admin, password: password) to authenticate yourself to Console so you don’t have to connect an Identity Provider to make things work. The mock user is a Cluster Admin per default.

 deployTarget: "AMAZON"
 pachd:
   storage:
     amazon:
       bucket: "bucket_name"                
       # this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
       id: "AKIAIOSFODNN7EXAMPLE"                
       # this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html  (AWS Credentials)          
       secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
       region: "us-east-2"
   # Enterprise key 
   enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
 console:
   enabled: true

Jump to Helm install

Google #

  1. Additional client installation: Install Google Cloud SDK

  2. Create a GKE cluster Note: Add --scopes storage-rw to your gcloud container clusters create command.

  3. Create a GCS Bucket for your data

  4. Create a values.yaml

Deploy Pachyderm CE (includes Console CE) #

 deployTarget: "GOOGLE"
 pachd:
   storage:
     google:
       bucket: "bucket_name"
       cred: |
                  INSERT JSON CONTENT HERE
   externalService:
     enabled: true
 console:
   enabled: true

Deploy Enterprise with Console #

Note that when deploying Enterprise with Console, we create a default mock user (username:admin, password: password) to authenticate yourself to Console so you don’t have to connect an Identity Provider to make things work. The mock user is a Cluster Admin per default.

 deployTarget: "GOOGLE"
 pachd:
   storage:
     google:
       bucket: "bucket_name"
       cred: |
                  INSERT JSON CONTENT HERE
   # Enterprise key
   enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
 console:
   enabled: true

Jump to Helm install

Azure #

ℹī¸
  1. Additional client installation: Install Azure CLI 2.0.1 or later.

  2. Create an AKS cluster

  3. Create a Storage Container for your data

  4. Create a values.yaml

Deploy Pachyderm CE (includes Console CE) #

 deployTarget: "MICROSOFT"
 pachd:
   storage:
     microsoft:
       # storage container name
       container: "blah"
       # storage account name
       id: "AKIAIOSFODNN7EXAMPLE"
       # storage account key
       secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
   externalService:
     enabled: true
 console:
   enabled: true

Deploy Enterprise with Console #

Note that when deploying Enterprise with Console, we create a default mock user (username:admin, password: password) to authenticate yourself to Console so you don’t have to connect an Identity Provider to make things work. The mock user is a Cluster Admin per default.

 deployTarget: "MICROSOFT"
 pachd:
   storage:
     microsoft:
       # storage container name
       container: "blah"
       # storage account name
       id: "AKIAIOSFODNN7EXAMPLE"
       # storage account key
       secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
   # Enterprise key
   enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
 console:
   enabled: true

Jump to Helm install

3. Helm Install #

4. Have ‘pachctl’ And Your Cluster Communicate #

You have deployed Pachyderm without Console #

You have deployed Pachyderm with Console #

Check that your cluster is up and running #

pachctl version

System Response:

COMPONENT           VERSION
pachctl             2.5.2
pachd               2.5.2

5. Connect to Console #

To connect to your Console (Pachyderm UI):

You are all set!

6. Try our beginner tutorial. #

7. NOTEBOOKS USERS: Install Pachyderm JupyterLab Mount Extension #

Once your cluster is up and running, you can helm install JupyterHub on your Pachyderm cluster and experiment with your data in Pachyderm from your Notebook cells.

Check out our JupyterHub and Pachyderm Mount Extension page for installation instructions.

Use Pachyderm’s default image and values.yaml jupyterhub-ext-values.yaml or follow the instructions to update your own.

ℹī¸

Make sure to check our data science notebook examples running on Pachyderm, from a market sentiment NLP implementation using a FinBERT model to pipelines training a regression model on the Boston Housing Dataset.