Skip to content

Helm Deployment

The package manager Helm is the authoritative deployment method for Pachyderm.

Reminder

Pachyderm services are exposed on the cluster internal IP (ClusterIP) instead of each node’s IP (Nodeport) except for LOCAL Helm installations (i.e. Services are still accessible through Nodeports on Local installations).

This page gives a high level view of the steps to follow to install Pachyderm using Helm. Find our chart on Artifacthub or in our GitHub repository.

Before your start your installation process.

  • Refer to this generic page for more information on how to install and get started with Helm.
  • Read our infrastructure recommendations. You will find instructions on setting up an ingress controller, a TCP load balancer, or connecting an Identity Provider for access control.
  • If you are planning to install Pachyderm UI, read our Console deployment instructions. Note that, unless your deployment is LOCAL (i.e., on a local machine for development only, for example, on Minikube or Docker Desktop), the deployment of Console requires the set up of a DNS, an Ingress, and the activation of authentication.

Install

Prerequisites

  1. Install Helm.

  2. Install pachctl, the command-line utility for interacting with a Pachyderm cluster.

  3. Choose the deployment guidelines that apply to you:

    • Find the deployment page that applies to your Cloud provider (or custom deployment, or on-premises deployment). It will help list the various installation prerequisites, and Kubernetes deployment instructions that fit your own use case:

      For example, if your Cloud provider is Google Cloud Platform, follow the Prerequisites and Deploy Kubernetes sections of the deployment on Google Cloud Platform page.

    • Additionally, those instructions will help you configure the various elements (object store, postgreSQL instance, secret names...) that relate to your deployment needs. Those parameters values will be specified in your YAML configuration file as follows.

Edit a Values.yaml File

Create a personalized my_pachyderm_values.yaml out of this example repository. Pick the example that fits your target deployment and update the relevant values according to the parameters gathered in the previous step.

See the reference values.yaml for the list of all available helm values at your disposal.

Warning

No default k8s CPU and memory requests and limits are created for pachd. If you don't provide values in the values.yaml file, then those requests and limits are simply not set.

For Production deployments, Pachyderm strongly recommends that you create your values.yaml file with CPU and memory requests and limits for both pachd and etcd set to values appropriate to your specific environment. For reference, 1 CPU and 2 GB memory for each is a sensible default.

Platform Secrets: READ BEFORE ANY INSTALL OR UPGRADE

Pachyderm recommends using "platform secrets" to hold the values needed by a cluster at the time of the deployment (such as Postgresql admin login username and password, OAuth information to set up your IdP, or your enterprise license key). You have the option to:

  1. Create those secrets ahead of time then supply their names in the secretName field of your values.yaml (Recommended option).
  2. For a quick installation, put the secrets' values in the dedicated fields of your values.yaml. In such cases, those will populate Pachyderm's default pachyderm-bootstrap-config secret.

Find the complete list of helm values that can control secret values here:

global.postgresqlExistingSecretName 
console.config.oauthClientSecretSecretName 
pachd.enterpriseLicenseKeySecretName 
pachd.rootTokenSecretName 
pachd.enterpriseSecretSecretName 
pachd.oauthClientSecretSecretName 
pachd.enterpriseRootTokenSecretName 
oidc.upstreamIDPsSecretName 

It is important to note that if no secret name is provided for the fields mentioned above, Pachyderm will retrieve the dedicated plain-text secret values in the helm values and populate a generic, default, auto-generated secret (pachyderm-bootstrap-config) at the time of the installation. If no value is found in either one of those two cases, default values are used in pachyderm-bootstrap-config. Check the list of all secret values fields and pachyderm-bootstrap-config keys in our upgrade section.

This generic secret pachyderm-bootstrap-config is reset at each upgrade, and new default values are created, causing the helm upgrade to fail unless you retrieve your default values (for example: kubectl get secret pachyderm-bootstrap-config -o go-template='{{.data.rootToken | base64decode }}'), create a dedicated secret for each, then manually set each secret name back into their corresponding secret name field above.

Install Pachyderm's Helm Chart

  1. Get your Helm Repo Info

    helm repo add pach https://helm.pachyderm.com
    helm repo update
    

  2. Install Pachyderm

    You are ready to deploy Pachyderm on the environment of your choice.

    helm install pachd -f my_pachyderm_values.yaml pach/pachyderm --version <your_chart_version>
    

    To choose a specific helm chart version

    Each chart version is associated with a given version of Pachyderm. You will find the list of all available chart versions and their associated version of Pachyderm on Artifacthub.

    • You can choose a specific helm chart version by adding a --version flag (for example, --version 0.3.0) to your helm install.
    • No additional flag will install the latest GA release of Pachyderm by default.
    • You can choose the latest pre-release version of the chart by using the flag --devel (pre-releases are versions of the chart that correspond to releases of Pachyderm that don't have the GA status yet).

    For example: When the 2.0 version of Pachyderm was a release candidate, using the flag --devel would let you install the latest RC of 2.0 while no flag would retrieve the newest GA (1.13.4).

  3. Check your deployment

    kubectl get pods
    

    Once the pods are up, you should see a pod for pachd running (alongside etcd, pg-bouncer or postgres, console, depending on your installation).

    System Response:

    NAME                           READY   STATUS    RESTARTS   AGE
    etcd-0                         1/1     Running   0          18h
    pachd-5db79fb9dd-b2gdq         1/1     Running   2          18h
    postgres-0                     1/1     Running   0          18h
    

Have 'pachctl' and your Cluster Communicate

Assuming your pachd is running as shown above, make sure that pachctl can talk to the cluster.

If you are exposing your cluster publicly:

  1. Retrieve the external IP address of your TCP load balancer or your domain name:

    kubectl get services | grep pachd-lb | awk '{print $4}'
    

  2. Update the context of your cluster with their direct url, using the external IP address/domain name above:

    echo '{"pachd_address": "grpc://<external-IP-address-or-domain-name>:30650"}' | pachctl config set 
    
    context "<your-cluster-context-name>" --overwrite
    

  3. Check that your are using the right context:

    pachctl config get active-context
    

    Your cluster context name should show up.

If you're not exposing pachd publicly, you can run:

# Background this process because it blocks.
pachctl port-forward

Verify that pachctl and your cluster are connected:

pachctl version

System Response:

COMPONENT           VERSION
pachctl             2.2.7
pachd               2.2.7

Uninstall Pachyderm's Helm Chart

Helm uninstall a release as easily as you installed it.

helm uninstall pachd 

We recommend making sure that everything is properly removed following a helm uninstall:

  • The uninstall leaves your persistent volumes. To clean them up, run kubectl get pvc and delete the claims data-postgres-0 and etcd-storage-etcd-0.

Attention

Deleting pvs will result in the loss of your data.

  • All other resources should have been removed by Helm. Run kubectl get all | grep "etcd\|\pachd\|postgres\|pg-bouncer" to make sure of it and delete any remaining resources where necessary.

  • If your uninstall failed, there might be config jobs still running. Run kubectl get jobs.batch | grep pachyderm and delete any remaining job.

Upgrade Pachyderm's Helm Chart

When a new version of Pachyderm's chart is released, or when you want to change the configuration of your release, use the helm upgrade command:

helm upgrade pachd -f my_new_pachyderm_values.yaml pach/pachyderm --version <your_chart_version>        

Warning

Make sure that your platform's secret names have been set properly. Refer to this section for the list of secrets we recommend creating prior to installing the product, and what to do if you have not. Failing to provide those values might cause your upgrade to fail.


Last update: July 7, 2022
Does this page need fixing? Edit me on GitHub