Skip to content

Quickstart

On this page, you will find simplified deployment instructions and Helm values to get you started with the latest release of Pachyderm on the Kubernetes Engine of your choice (AWS (EKS), Google (GKS), and Azure (AKS)).

For each cloud provider, we will give you the option to "quick deploy" Pachyderm with or without Console (Pachyderm UI).

Important

The deployment steps highlighted in this document are not intended for production. For production settings, please read our infrastructure recommendations. In particular, we recommend:

  • the use of a managed PostgreSQL server (RDS, CloudSQL, or PostgreSQL Server) rather than Pachyderm's default bundled PostgreSQL.
  • the setup of a TCP Load Balancer in front of your pachd service.
  • the setup of an Ingress Controller in front of Console.

Then find your targeted Cloud provider in the Deploy and Manage ection of this documentation.

1. Prerequisites

Pachyderm in deployed on a Kubernetes Cluster.

Just before you start creating your cluster, install the following clients on your machine. Use the latest available version of the components listed below.

  • kubectl: the cli to interact with your cluster.
  • pachctl: the cli to interact with Pachyderm.
  • Install Helm for your deployment.

Optional - Quick deployment of Pachyderm with Console

  • The deployment of Console (Pachyderm UI) requires a valid enterprise token. To get your free-trial token, fill in this form, or get in touch with us at sales@pachyderm.io or on our Slack.
  • When deploying with Console, we create a default mock user (username:admin, password: password) to authenticate to Console without the hassle of connecting your Identity Provider.

For a better understanding of the additional steps and helm values needed when deploying with Console in a production environment, read about the deployment of Pachyderm with Console page.

Select your favorite cloud provider.

2. Create Your Values.yaml

AWS

  1. Additional client installation: Install AWS CLI

  2. Create an EKS cluster

  3. Create an S3 bucket for your data

  4. Create a values.yaml

deployTarget: "AMAZON"
pachd:
  storage:
    amazon:
      bucket: "bucket_name"      
        # this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
        id: "AKIAIOSFODNN7EXAMPLE"                
        # this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html  (AWS Credentials)          
        secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
        region: "us-east-2"
  externalService:
    enabled: true
deployTarget: "AMAZON"
pachd:
  storage:
    amazon:
      bucket: "bucket_name"                
        # this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
        id: "AKIAIOSFODNN7EXAMPLE"                
        # this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html  (AWS Credentials)          
        secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
        region: "us-east-2"
  # pachyderm enterprise key 
  enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
  enabled: true

Jump to Helm install

Google

  1. Additional client installation: Install Google Cloud SDK

  2. Create a GKE cluster Note: Add --scopes storage-rw to your gcloud container clusters create command.

  3. Create a GCS Bucket for your data

  4. Create a values.yaml

deployTarget: "GOOGLE"
pachd:
  storage:
    google:
      bucket: "bucket_name"
      cred: |
        INSERT JSON CONTENT HERE
  externalService:
    enabled: true
deployTarget: "GOOGLE"
pachd:
  storage:
    google:
      bucket: "bucket_name"
      cred: |
        INSERT JSON CONTENT HERE
  # pachyderm enterprise key
  enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
  enabled: true

Jump to Helm install

Azure

Note

  1. Additional client installation: Install Azure CLI 2.0.1 or later.

  2. Create an AKS cluster

  3. Create a Storage Container for your data

  4. Create a values.yaml

deployTarget: "MICROSOFT"
pachd:
  storage:
    microsoft:
      # storage container name
      container: "blah"
      # storage account name
      id: "AKIAIOSFODNN7EXAMPLE"
      # storage account key
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
  externalService:
    enabled: true
deployTarget: "MICROSOFT"
pachd:
  storage:
    microsoft:
      # storage container name
      container: "blah"
      # storage account name
      id: "AKIAIOSFODNN7EXAMPLE"
      # storage account key
      secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
  # pachyderm enterprise key
  enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
  enabled: true

Jump to Helm install

3. Helm Install

  • You will be deploying the latest GA release of Pachyderm:

    helm repo add pach https://helm.pachyderm.com
    helm repo update
    helm install pachyderm -f my_pachyderm_values.yaml pach/pachyderm 
    
  • Check your deployment:

    kubectl get pods
    

    Once the pods are up, you should see a pod for pachd running (alongside etcd, pg-bouncer or postgres, console, depending on your installation). If you are curious about the architecture of Pachyderm, take a look at our high-level diagram(../../).

    System Response:

    NAME                           READY   STATUS    RESTARTS   AGE
    etcd-0                         1/1     Running   0          18h
    pachd-5db79fb9dd-b2gdq         1/1     Running   2          18h
    postgres-0                     1/1     Running   0          18h
    

4. Have 'pachctl' And Your Cluster Communicate

  • Retrieve the external IP address of pachd service:
    kubectl get services | grep pachd-lb | awk '{print $4}'
    
  • Then update your context for pachctl to point at your cluster:

    echo '{"pachd_address": "grpc://<external-IP-address>:30650"}' | pachctl config set context "<choose-a-cluster-context-name>" --overwrite
    
    pachctl config set active-context "<your-cluster-context-name>"
    
  • To connect to your new Pachyderm instance, run:

    pachctl config import-kube local --overwrite
    
    pachctl config set active-context local
    

  • Then run pachctl port-forward (Background this process in a new tab of your terminal).

  • Note that you will need to run pachctl auth login then authenticate to Pachyderm with the mock User (username, password) to use pachctl

  • Finally, check that your cluster is up and running

    pachctl version
    

    System Response:

    COMPONENT           VERSION
    pachctl             2.0.2
    pachd               2.0.2
    

5. Connect to Console

To connect to your Console (Pachyderm UI):

  • Point your browser to http://localhost:4000
  • Authenticate as the mock User using admin & password

You are all set!

6. Try our beginner tutorial.


Last update: November 15, 2021
Does this page need fixing? Edit me on GitHub