Skip to content

Export Your Data with egress

The egress field in the Pachyderm pipeline specification enables you to push the results of a pipeline to an external datastore such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. After the user code has finished running, but before the job is marked as successful, Pachyderm pushes the data to the specified destination.

You can specify the following egress protocols for the corresponding storage:

Cloud Platform Protocol Description
Google Cloud
Storage
gs:// GCP uses the utility called gsutil to access GCP storage resources
from a CLI. This utility uses the gs:// prefix to access these resources.
Example:
gs://gs-bucket/gs-dir
Amazon S3 s3:// The Amazon S3 storage protocol requires you to specify an s3://
prefix before the address of an Amazon resource. A valid address must
include an endpoint and a bucket, and, optionally, a directory in your
Amazon storage.
Example:
s3://s3-endpoint/s3-bucket/s3-dir
Azure Blob
Storage
wasb:// Microsoft Windows Azure Storage Blob (WASB) is the default Azure
filesystem that outputs your data through HDInsight. To output your
data to Azure Blob Storage, use the wasb:// prefix, the container name,
and your storage account in the path to your directory.
Example:
wasb://default-container@storage-account/az-dir

Example

"egress": {
   "URL": "s3://bucket/dir"
},

Last update: July 16, 2020