Learn about the concept of history (version control) in Pachyderm.

March 24, 2023

Pachyderm implements rich version-control and history semantics. This section describes the core concepts and architecture of Pachyderm’s version control and the various ways to use the system to access historical data.

The following abstractions store the history of your data:

Ancestry Syntax #


Resolving ancestry syntax requires traversing chains of commits high numbers passed to ^ and low numbers passed to .. These operations might take a long time. If you plan to repeatedly access an ancestor, you might want to resolve that ancestor with pachctl inspect commit <repo>@<branch or commitID>.

View the Pipeline History #

Pipelines are the main processing primitive in Pachyderm. However, they expose version-control and history semantics similar to filesystem objects. This is largely because, under the hood, they are implemented in terms of filesystem objects. You can access previous versions of a pipeline by using the same ancestry syntax that works for commits and branches. For example, pachctl inspect pipeline foo^ gives you the previous version of the pipeline foo. The pachctl inspect pipeline foo.1 command returns the first ever version of that same pipeline. You can use this syntax in all operations and scripts that accept pipeline names.

To view historical versions of a pipeline use the --history flag with the pachctl list pipeline command:

pachctl list pipeline --history all

System Response:

Pipeline2 1       input2:/* 4 hours ago running / success
Pipeline1 3       input1:/* 4 hours ago running / success
Pipeline1 2       input1:/* 4 hours ago running / success
Pipeline1 1       input1:/* 4 hours ago running / success

View the Job History #

Jobs do not have versioning semantics associated with them. However, they are strongly associated with the pipelines that created them. Therefore, they inherit some of their versioning semantics. You can use the -p <pipeline> flag with the pachctl list job command to list all the jobs that were run for the latest version of the pipeline.

Furthermore, you can get jobs from multiple versions of a pipeline by passing the --history flag. For example, pachctl list job -p edges --history all returns all jobs from all versions of the pipeline edges.