Learn about Pachyderm's embedded S3 gateway, which is compatible with MinIO, AWS S3 CLI, and boto3.
March 24, 2023
Use the embedded S3 Gateway to send or receive data through the S3 protocol using object storage tooling such as Minio, boto3, or AWS s3 CLI. Operations available are similar to those officially documented for S3.
S3 Gateway Syntax #
The S3 gateway presents each branch from every Pachyderm repository as an S3 bucket. Buckets are represented via
[<commit>.]<branch>.<repo>.<project>, with the commit being optional.
master.foo.barbucket corresponds to the
masterbranch of the repo
be97b64f110643389f171eb64697d4e1.master.foo.barbucket corresponds to the commit
masterbranch of the
foorepo within the
If auth is enabled, credentials must be passed with each S3 gateway endpoint as mentioned in the S3 Client configuration steps.
Command Examples #
The following command examples assume that you have upgraded to use the embedded proxy, which will become mandatory in future releases.
Put Data Into Pachyderm Repo #
aws --endpoint-url <pachyderm-address> s3 cp myfile.csv s3://master.foo.bar
pachctl put file data@master:/ -f myfile.csv --project bar
Retrieve Data From Pachyderm Repo #
aws --endpoint-url <pachyderm-address> s3 cp s3://master.foo.bar/myfile.csv
pachctl get file data@master:/myfile.csv --project bar
Port Forwarding #
pachctl port-forward to access the s3 gateway through the
localhost:30600 endpoint, however, the Kubernetes port forwarder incurs substantial overhead and does not recover well from broken connections.