Skip to content

Using python-pachyderm with Pachyderm IDE

If you deployed Pachyderm IDE, the python-pachyderm client is preinstalled in your Pachyderm IDE instance This section describes a few basic operations that you can execute from Pachyderm IDE to interact with Pachyderm.

After you log in, use the python-pachyderm client API to manage Pachyderm directly from your Jupyter notebook.

You need to create a new Notebook and add your code in a new cell. To run your code, click the Run button.

The following code initializes the Python Pachyderm client in Pachyderm IDE:

import python_pachyderm
client = python_pachyderm.Client.new_in_cluster()

Note

This function is different from the function you'd call locally.

For example, you can check the current user by running the following code:

import python_pachyderm
client = python_pachyderm.Client.new_in_cluster()
print(client.who_am_i())

The following screenshot demonstrates how this looks in Pachyderm IDE: JupyterHub whoami

Note

If you have not enabled Pachyderm authentication, this code returns an error.

Create a Pipeline

As discussed in Difference in Pipeline Creation Methods, you can use the standard create_pipeline method or create_python_pipeline function to create Pachyderm pipelines in Pachyderm IDE. Depending on your choice, use one of the following examples to create a pipeline:

  • By using the create_python_pipeline method:

    import python_pachyderm
    client = python_pachyderm.Client.new_in_cluster()
    
    def relpath(path):
        return os.path.join(os.path.dirname(os.path.abspath(__file__)), path)
    
    python_pachyderm.create_python_pipeline(
        client,
        relpath("test"),
        python_pachyderm.Input(pfs=python_pachyderm.PFSInput(glob="/*", repo="input_repo")),
    )
    client.list_pipeline()
    

    This code likely will not work as is. To run a pipeline, you need to specify main.py and requirements.txt files in the root directory of your JupyterHub notebook. For more information, see the OpenCV example.

  • By using the create_pipeline method:

    Note

    The input repository must exist. Therefore, in the code below we first create the repository and then create the pipeline.

    import python_pachyderm
    client = python_pachyderm.Client.new_in_cluster()
    client.create_repo('test')
    
    client.create_pipeline(
        "test",
        transform=python_pachyderm.Transform(cmd=["python3", "/test.py"], image="mytest/testimage"),
        input=python_pachyderm.Input(pfs=python_pachyderm.PFSInput(glob="/", repo="input_repo")),
    )
    client.list_pipeline()
    

    System response:

    pipeline_info {
     pipeline {
       name: "test"
     }
     transform {
       image: "mytest/testimage"
       cmd: "python3"
       cmd: "/test.py"
     }
     created_at {
       seconds: 1578520420
       nanos: 789017291
     }
     version: 1
     output_branch: "master"
     resource_requests {
       memory: "64M"
     }
     input {
       pfs {
         name: "input_repo"
         repo: "input_repo"
         branch: "master"
         glob: "/"
       }
     }
     cache_size: "64M"
     salt: "8e57267114c24419a685e250e0fd491b"
     max_queue_size: 1
     spec_commit {
       repo {
         name: "__spec__"
       }
       id: "c83631246a2142cdb93c7a6e4d16fcd2"
     }
     datum_tries: 3
    }
    

For more information, see the OpenCV example.

Create a Repository

To create a repository, run the following code:

import python_pachyderm
client = python_pachyderm.Client.new_in_cluster()
client.create_repo('<repo-name>')
client.list_repo()

Example:

import python_pachyderm
client = python_pachyderm.Client.new_in_cluster()
client.create_repo('test')
client.list_repo()

System Response:

[repo {
name: "test"
}
created {
  seconds: 1576869000
  nanos: 886123695
}
auth_info {
  access_level: OWNER
}
]

Add Files to a Repository

To add a file to a repository, run the following code:

client.put_file_url("<repo-name>/<branch>", "<filename>", "<path-to-file>")

Example:

client.put_file_url("images/master", "46Q8nDz.jpg", "http://imgur.com/46Q8nDz.jpg")

Delete a Repository

To delete a repository, run the following code:

import python_pachyderm
client = python_pachyderm.Client.new_in_cluster()
client.delete_repo('test')
client.list_repo()

System Response:

[]

Update Your Pipeline

When you need to update your pipeline, you can do so directly in the Pachyderm IDE by modifying the corresponding notebook and running it again. If you use the create_python_pipeline function that uses the code stored in a local directory, you can update the pipeline directly in the Pachyderm IDE by adding the update=True parameter to your code into a new Jupyter notebook cell and running it.

Example:

import os
import python_pachyderm
client = python_pachyderm.Client.new_in_cluster()
python_pachyderm.create_python_pipeline(
    client,
    "./edges",
    python_pachyderm.Input(pfs=python_pachyderm.PFSInput(glob="/*", repo="images")),
    update=True
)

If you are using the standard create_pipeline method, you need to rebuild and push your Docker container to your image registry. Then, you need to update the image tag in your pipeline creation code and run create_pipeline with update=True.

Example

To get started with Pachyderm IDE, try the OpenCV example for JupyterHub.

This example walks you through the same steps as in the Beginner Tutorial but using the python-pachyderm client instead of pachctl or the Pachyderm UI.


Last update: July 16, 2020