Create a Pipeline

To create a pipeline, you need to define a pipeline specification in YAML, JSON, or Jsonnet.

Before You Start

A basic pipeline must have all of the following:

  • pipeline.name: The name of your pipeline.
  • transform.cmd: The command that executes your user code.
  • transform.img: The image that contains your user code.
  • input.pfs.repo: The output repository for the transformed data.
  • input.pfs.glob: The glob pattern used to identify the shape of datums.

How to Create a Pipeline

Info
You can define multiple pipeline specifications in one file by separating the specs with the following separator: ---. This works in both JSON and YAML files.

CLI

Console

Examples

JSON

{
  "pipeline": {
    "name": "edges"
  },
  "description": "A pipeline that performs image edge detection by using the OpenCV library.",
  "transform": {
    "cmd": [ "python3", "/edges.py" ],
    "image": "pachyderm/opencv"
  },
  "input": {
    "pfs": {
      "repo": "images",
      "glob": "/*"
    }
  }
}

YAML

pipeline:
  name: edges
description: A pipeline that performs image edge detection by using the OpenCV library.
transform:
  cmd:
  - python3
  - "/edges.py"
  image: pachyderm/opencv
input:
  pfs:
    repo: images
    glob: "/*"

Considerations

  • When you create a pipeline, Pachyderm automatically creates an eponymous output repository. However, if such a repo already exists, your pipeline will take over the master branch. The files that were stored in the repo before will still be in the HEAD of the branch.