Client Initialization (Start Here)
The Pachyderm SDK enables you to interact with Pachyderm’s API, client, and configuration directly in a powerful way.
1. Installation #
Before using the Pachyderm SDK, make sure you have it installed. You can install the SDK using pip:
pip install pachyderm_sdk
2. Import the Client & API #
To use the Client class and APIs, you need to import them from pachyderm_sdk
:
from pachyderm_sdk import Client
from pachyderm_sdk.api import pfs, pps
3. Create & Connect a Client Instance #
To interact with a Pachyderm cluster, you need to create an instance of the Client
class. The Client
class provides multiple ways to create a client instance based on your requirements.
Basic Initialization #
The simplest way to create a client instance is by calling the constructor without any parameters. This creates a client that connects to the local Pachyderm cluster running on localhost:30650 with default authentication settings.
client = Client()
You can customize the client settings by providing the relevant parameters to the Client constructor. Here’s an example:
client = Client(
host='localhost',
port=8080,
auth_token='your-auth-token',
root_certs=None,
transaction_id=None,
tls=False
)
In the above example, the client is configured to connect to the local Pachyderm cluster running on localhost:8080
without TLS encryption.
- The
auth_token
parameter allows you to specify an authentication token for accessing the cluster. - The
root_certs
parameter can be used to provide custom root certificates for secure connections. - The
transaction_id
parameter allows you to specify a transaction ID to run operations on.
AUTH_TOKEN_ENV
environment variable. You can also set the authentication token after creating the client using the auth_token
property:From Config File #
If you have a config.json
configuration file, you can create a client instance using the from_config
method:
def read_config(config_file):
with open(config_file, "r") as f:
return json.load(f)
def setup_config(config_file, repo, pipeline, project, job_id=None, fileset_id=None, datum_id=None):
config = read_config(config_file)
config["data"]["pachyderm"]["host"] = os.getenv("PACHD_PEER_SERVICE_HOST")
config["data"]["pachyderm"]["port"] = os.getenv("PACHD_PEER_SERVICE_PORT")
config["data"]["pachyderm"]["repo"] = repo
config["data"]["pachyderm"]["branch"] = job_id
config["data"]["pachyderm"]["token"] = os.getenv("PACH_TOKEN")
config["data"]["pachyderm"]["project"] = project
config["data"]["pachyderm"]["fileset_id"] = os.getenv(" FILESET_ID")
config["data"]["pachyderm"]["datum_id"] = os.getenv("PACH_DATUM_ID")
config["labels"] = [repo, job_id, pipeline]
return config
# Creates a project, repo, branch, and pipeline
project = pfs.Project(name="sdk-basic-pipeline-6")
repo = pfs.Repo(name="housing_data", project=project)
branch = pfs.Branch.from_uri(f"{repo}@main")
pipeline = pps.Pipeline(name="pipeline-001", project=project)
# Creates a config object
config = setup_config("config.json", repo.name, pipeline.name, project.name, branch.name)
client = Client.from_config(config)
# Checks the version of Pachyderm and the address of pachd
version = client.get_version()
print("Pachyderm Version:", version)
print("Pachd Address:", client.pfs.client.address)
From Within a Cluster #
If you’re running the code within a Pachyderm cluster, you can use the new_in_cluster
method to create a client instance that operates within the cluster. This method reads the cluster configuration from the environment and creates a client based on the available configuration.
client = Client.new_in_cluster(auth_token='your-auth-token', transaction_id='your-transaction-id')
Via PachD Address #
If you have the Pachd address (host:port) of the Pachyderm cluster, you can create a client instance using the from_pachd_address
method:
client = Client.from_pachd_address('pachd-address', auth_token='your-auth-token', root_certs='your-root-certs', transaction_id='your-transaction-id')
Test Connection #
If you’d like to quickly test out working with the Pachyderm SDK on your local machine (e.g., using a locally deployed Docker Desktop instance), try out the following:
from pachyderm_sdk import Client
client = Client(host="localhost", port="80")
version = client.get_version()
print(version)
Example Output
Version(major=2, minor=6, micro=4, git_commit='358bd1229130eb262c22caf82ed87b3cc91ec81c', git_tree_modified='false', build_date='2023-06-22T14:49:32Z', go_version='go1.20.5', platform='arm64')
If you see this, you are ready to start working with the SDK.