Export Your Data From Pachyderm¶
After you build a pipeline, you probably want to see the results that the pipeline has produced. Every commit into an input repository results in a corresponding commit into an output repository.
To access the results of a pipeline, you can use one of the following methods:
-
By running the
pachctl get file
command. This command returns the contents of the specified file.
To get the list of files in a repo, you should first run thepachctl list file
command. See Export Your Data withpachctl
. -
By configuring the pipeline. A pipeline can push or expose output data to external sources. You can configure the following data exporting methods in a Pachyderm pipeline:
-
An
egress
property enables you to export your data to an external datastore, such as Amazon S3, Google Cloud Storage, and others.
See Export data by usingegress
. -
A service. A Pachyderm service exposes the results of the pipeline processing on a specific port in the form of a dashboard or similar endpoint.
See Service. -
Configure your code to connect to an external data source. Because a pipeline is a Docker container that runs your code, you can egress your data to any data source, even to those that the
egress
field does not support, by connecting to that source from within your code. -
By using the S3 gateway. Pachyderm Enterprise users can reuse their existing tools and libraries that work with object store to export their data with the S3 gateway.
See Using the S3 Gateway. -
By mounting your data to a local filesystem with
pachctl mount
. See Mount a Repo to a Local Computer