Skip to content

Developer Workflow

In general, the developer workflow for Pachyderm involves adding data to versioned data repositories, creating pipelines to read from those repositories, executing the pipeline's code, and writing the pipeline's output to other data repositories. Both the data and pipeline can be iterated on independently with Pachyderm handling the code execution according to the pipeline specfication. The workflow steps are shown below.

Developer workflow

Data Workflow - Load Your Data into Pachyderm

You need to add your data to Pachyderm so that your pipeline runs your code against it. You can do so by using one of the following methods:

  • By using the pachctl put file command
  • By using a special type of pipeline, such as a spout or cron
  • By using one of the Pachyderm's language clients
  • By using a compatible S3 client

For more information, see Load Your Data Into Pachyderm.

Pipeline Workflow - Processing Data in Pachyderm

The fundamental concepts of Pachyderm are very powerful, but the manual build steps mentioned in the pipeline workflow can become cumbersome during rapid-iteration development cycles. We've created a few helpful developer workflows and tools to automate steps that are error-prone or repetitive:

  • Build Pipelines map code changes into a the pipeline using a default base Docker image without rebuilding it. They are most useful when iterating on the code, with few changes to the Docker image.
  • The build flag or --build is a optional flag that can be passed to the create or update pipeline command. This option is most useful when you need to customize your Docker image or are iterating on the Docker image and code together, since it rebuilds and pushes the image before updating the pipeline.
  • CI/CD Integration provides a way to incorporate Pachyderm functions into the CI process. This is most useful when working with a complex project or for code collaboration.
  • create_python_pipeline is Python-specific way to quickly update pipelines and was the predecessor to Build Pipelines. They are only available for Python via the Python Pachyderm package. This tool can be useful when using the Pachyderm IDE.

Last update: August 18, 2020