Blob/Object Storage
You can enable blog/object storage for Pachyderm by updating the pachd.storage
section in your Helm chart. The necessary configuration options depend on your chosen blob storage provider.
Before You Start #
- Ensure you have a blob storage provider account (e.g., AWS S3, Google Cloud Storage, Azure Blob Storage).
- Ensure you have the necessary credentials and permissions to create and access a bucket or container in your blob storage provider.
- For production use, make sure your cloud provider credentials are properly configured in your environment.
Query Parameters for Storage URLs #
The available query parameters for each storage provider are not exhaustively documented in a single location. They are implemented in the Go CDK source code and can change with new releases. However, you can find the most commonly used parameters in the following locations:
- AWS S3: s3blob.URLOpener and aws.ConfigFromURLParams
- Azure Blob: azureblob.URLOpener
- Google Cloud Storage: gcsblob.URLOpener
Common Query Parameters #
Provider | Parameter | Description |
---|---|---|
AWS S3 | region |
The AWS region (e.g., “us-west-1”) |
AWS S3 | endpoint |
Custom endpoint for S3-compatible storage |
AWS S3 | disableSSL |
Set to “true” to disable SSL |
AWS S3 | s3ForcePathStyle |
Set to “true” to force path-style addressing |
AWS S3 | awssdk |
Set to “v2” to use the AWS SDK v2 |
Azure Blob | domain |
Custom domain for the storage account |
Azure Blob | protocol |
Set to “http” for local development with Azurite |
Azure Blob | cdn |
Can be set to “true” when using a CDN URL pointing to a blob storage account |
Azure Blob | localemu |
Set to “true” to use the Azurite emulator |
Google Cloud Storage | access_id |
HMAC Access ID for non-OAuth authentication |
Google Cloud Storage | private_key_path |
Path to service account JSON key file |
If you need to use a parameter not listed here, consult the Go CDK source code or reach out to Pachyderm support for guidance.
How to Set Up Blob Storage #
- Navigate to your
values.yaml
file or obtain your current Helmvalues.yaml
overrides:helm get values pachyderm > values.yaml
- Add the following
pachd.storage
fields to yourvalues.yaml
file:pachd: storage: gocdkEnabled: true storageURL: # The URL for your blob storage provider
- Update the storageURL to include provider-specific configuration options as needed; for options, see the related goCDK packages. For example:
"s3://my-bucket?region=us-west-1&awssdk=v2" "gs://${BUCKET_NAME}" "azblob://my-container?protocol=http&domain=localhost:10001"
- Save your changes and upgrade your cluster:
helm upgrade pachyderm pachyderm/pachyderm -f values.yaml
Limitations #
Some configuration settings such as verifySSL may not be passable via the storageURL as query parameters. In such cases, you can use the pachd.storage
section to set these options.