Skip to content


The Pachyderm platform brings together version control for data with the tools to build scalable end-to-end ML/AI pipelines while empowering users to develop their code in any language, framework, or tool of their choice. Pachyderm has been proven to be the ideal foundation for teams looking to use ML and AI to solve real-world problems in a reliable way.

The Pachyderm platform includes the following main components:

  • Pachyderm File System (PFS)
  • Pachyderm pipelines

To start, you need to understand the foundational concepts of Pachyderm's data versioning and pipeline semantics. After you have a good grasp of the basics, you can use advanced concepts and features for more complicated challenges.

This section describes the following Pachyderm concepts:

Versioned Data Concepts     

Learn about the main Pachyderm abstractions that you will operate with when using Pachyderm.

Pipeline Concepts    

Learn the main concepts of the Pachyderm pipeline system.

Advanced Concepts     

More about Pachyderm abstractions: Global IDs, deferred processing, and distributed computing.

Last update: November 1, 2021
Does this page need fixing? Edit me on GitHub