Global S3 Gateway¶
Pachyderm comes with an embedded S3 gateway, deployed in the
pachd pod, that allows you to access Pachyderm's repo through the S3 protocol.
The S3 Gateway is designed to work with any S3 Client, among which:
- AWS S3 cli
The operations on the HTTP API exposed by the S3 Gateway largely mirror those documented in S3’s official docs. It is typically used when you wish to retrieve data from or expose data to object storage tooling (such as MinIO, boto3, and aws s3 cli).
pachd service exposes the S3 gateway (
s3gateway-port) on port 30600.
Before using the S3 Gateway
Make sure to install and configure the S3 client of your choice as documented here.
The S3 gateway presents each branch from every Pachyderm repository as an S3 bucket. Buckets are represented via
master.data bucket corresponds to the
master branch of the repo
The following diagram gives a quick overview of the two main aws commands that will let you put data into a repo or retrieve data from it via the S3 gateway. For reference, we have also mentioned the corresponding
pachctl commands and the equivalent call to a real s3 Bucket.
Find the exhaustive list of:
If Authentication Is Enabled¶
In any case, whether those values are empty (no authentication) or set, the Access Key must equal the Secret Key (both set to the same value).
If you do not have direct access to the Kubernetes cluster, you can use port forwarding instead. Run
pachctl port-forward, which will allow you to access the s3 gateway through the
However, the Kubernetes port forwarder incurs substantial overhead and does not recover well from broken connections. Connecting to the cluster directly is faster and more reliable.
Most operations act on the
HEAD of the given branch. However, if your object store library or tool supports versioning, you can get objects in non-
HEAD commits by using the commit ID as the S3 object version ID.