This page walks you through the fundamentals of what you need to know about Kubernetes, persistent volumes, and object stores to deploy Pachyderm on-premises.
- Read our infrastructure recommendations. You will find instructions on how to set up an ingress controller, a load balancer, or connect an Identity Provider for access control.
- If you are planning to install Pachyderm UI. Read our Console deployment instructions. Note that, unless your deployment is
LOCAL(i.e., on a local machine for development only, for example, on Minikube or Docker Desktop), the deployment of Console requires, at a minimum, the set up on an Ingress.
- Troubleshooting a deployment? Check out Troubleshooting Deployments.
Deploying Pachyderm successfully on-premises requires a few prerequisites. Pachyderm is built on Kubernetes. Before you can deploy Pachyderm, you will need to perform the following actions:
- Deploy Kubernetes on-premises.
- Deploy two Kubernetes persistent volumes that Pachyderm will use to store its metadata.
- Deploy an on-premises object store using a storage provider like MinIO, EMC's ECS, or SwiftStack to provide S3-compatible access to your data storage.
- Finally, Deploy Pachyderm using Helm by running the
helm installcommand with the appropriate values configured in your values.yaml. We recommend reading these generic deployment steps if you are unfamiliar with Helm.
Before you start, you will need the following clients installed:
Setting Up To Deploy On-Premises¶
The Kubernetes docs have instructions for deploying Kubernetes in a variety of on-premise scenarios. We recommend following one of these guides to get Kubernetes running.
Pachyderm recommends running your cluster on Kubernetes 1.19.0 and above.
Once you deploy Kubernetes, you will also need to configure storage classes to consume persistent volumes for
The database and metadata service (Persistent disks) generally requires a small persistent volume size (i.e. 10GB) but high IOPS (1500), therefore, depending on your storage provider, you may need to oversize the volume significantly to ensure enough IOPS.
Once you have determined the name of the storage classes you are going to use and the sizes, you can add them to your helm values file, specifically:
etcd: storageClass: MyStorageClass size: 10Gi postgresql: persistence: storageClass: MyStorageClass size: 10Gi
Deploying An Object Store¶
An object store is used by Pachyderm's
pachd for storing all your data. The object store you use must be accessible via a low-latency, high-bandwidth connection.
For an on-premises deployment, it is not advisable to use a cloud-based storage mechanism. Do not deploy an on-premises Pachyderm cluster against cloud-based object stores (such as S3, GCS, Azure Blob Storage).
You will, however, access your Object Store using the S3 protocol.
Sizing And Configuring The Object Store¶
Start with a large multiple of your current data set size.
You will need four items to configure the object store. We are prefixing each item with how we will refer to it in the helm values file.
endpoint: The access endpoint. For example, MinIO's endpoints are usually something like
Do not begin it with the protocol; it is an endpoint, not an url. Also, check if your object store (e.g. MinIO) is using SSL/TLS. If not, disable it using
bucket: The bucket name you are dedicating to Pachyderm. Pachyderm will need exclusive access to this bucket.
id: The access key id for the object store.
secret: The secret key for the object store.
pachd: storage: backend: minio minio: bucket: "" endpoint: "" id: "" secret: "" secure: ""
Next Step: Proceed to your Helm installation¶
Once you have Kubernetes deployed, your storage classes setup, and your object store configured, follow those steps to Helm install Pachyderm on your cluster.