Skip to content

Local Installation

Info

A local installation helps you learn some of the Pachyderm basics and experiment. It is not designed to be a production environment.

This guide walks you through the steps to install Pachyderm on macOS®, Linux®, or Microsoft® Windows®.

To install Pachyderm on Windows, take a look at Deploy Pachyderm on Windows first.

We offer two ways to deploy Pachyderm on a local Kubernetes cluster.

  • The first uses Pachyderm's client pachctl and the command pachctl deploy local.
  • The second uses the deployment tool Helm.

Info

  • Helm support in Pachyderm is a beta release. See our supported releases documentation for details.
  • pachctl deploy local is designed for a single-node cluster. This cluster uses local storage on disk and does not create a PersistentVolume (PV). If you want to deploy a production multi-node cluster, follow the instructions for your cloud provider or on-prem installation as described in Deploy Pachyderm. New Kubernetes nodes cannot be added to this single-node cluster.
  • Pachyderm supports the Docker runtime only. If you want to deploy Pachyderm on a system that uses another container runtime, ask for advice in our Slack channel.

Prerequisites

Before you deploy Pachyderm, make sure that you have installed:

Using Minikube

On your local machine, you can run Pachyderm in a minikube virtual machine. Minikube is a tool that creates a single-node Kubernetes cluster. This limited installation is sufficient to try basic Pachyderm functionality and complete the Beginner Tutorial.

To configure Minikube, follow these steps:

  1. Install minikube and VirtualBox in your operating system as described in the Kubernetes documentation.
  2. Install kubectl.
  3. Start minikube:

    minikube start
    

Note

Any time you want to stop and restart Pachyderm, run minikube delete and minikube start. Minikube is not meant to be a production environment and does not handle being restarted well without a full wipe.

Using Kubernetes on Docker Desktop

If you are using Minikube, skip this section.

You can use Kubernetes on Docker Desktop instead of Minikube on macOS or Linux by following these steps:

  1. In the Docker Desktop Preferences, enable Kubernetes: Docker Desktop Enable K8s

  2. From the command prompt, confirm that Kubernetes is running:

    kubectl get all
    
    NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
    service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   56d
    

  3. To reset your Kubernetes cluster that runs on Docker Desktop, click the Reset Kubernetes cluster button. See image above.

Install pachctl

pachctl is a command-line tool that you can use to interact with a Pachyderm cluster in your terminal.

You need to have pachctl installed on your machine to deploy Pachyderm using the pachctl deploy local command:

  1. Run the corresponding steps for your operating system:

    • For macOS, run:
    brew tap pachyderm/tap && brew install pachyderm/tap/pachctl@1.13
    
    • For a Debian-based Linux 64-bit or Windows 10 or later running on WSL:
    curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v1.13.0/pachctl_1.13.0_amd64.deb && sudo dpkg -i /tmp/pachctl.deb
    
    • For all other Linux flavors:
    curl -o /tmp/pachctl.tar.gz -L https://github.com/pachyderm/pachyderm/releases/download/v1.13.0/pachctl_1.13.0_linux_amd64.tar.gz && tar -xvf /tmp/pachctl.tar.gz -C /tmp && sudo cp /tmp/pachctl_1.13.0_linux_amd64/pachctl /usr/local/bin
    
  2. Verify that installation was successful by running pachctl version --client-only:

    pachctl version --client-only
    

    System Response:

    COMPONENT           VERSION
    pachctl             1.13.0
    

    If you run pachctl version without the flag --client-only, the command times out. This is expected behavior because Pachyderm has not been deployed yet (pachd is not yet running).

Architecture

A look at Pachyderm high-level architecture diagram will help you build a mental image of Pachyderm various architectural components.

Install Helm

If you choose to install Pachyderm using Helm, follow this installation guide.

Deploy Pachyderm

When done with the Prerequisites, deploy Pachyderm on your local cluster by following these steps:

Using pachctl

Tip

If you are new to Pachyderm, try Pachyderm Shell. This add-on tool suggests pachctl commands as you type. It will help you learn Pachyderm's main commands faster.

  • For macOS or Linux, run:

    pachctl deploy local
    

    This command generates a Pachyderm manifest and deploys Pachyderm on Kubernetes.

    Try the following dry run to visualize your manifest:

    pachctl deploy local --dry-run > pachyderm.json
    

  • For Windows:

    1. Start Windows Subsystem for Linux.
    2. In WSL, run:

      pachctl deploy local --dry-run > pachyderm.json
      
    3. Copy the pachyderm.json file into your working directory.

    4. From the same directory, run:

      kubectl create -f ./pachyderm.json
      

Using Helm

  • Get the Repo Info:

    $ helm repo add pachyderm https://pachyderm.github.io/helmchart
    
    $ helm repo update
    

  • Edit a values file my_pachyderm_values.yaml with pachd.storage.backend set to LOCAL:

    Find a baseline file for local deployments in this example repository and set the backend attribute to LOCAL.

    See also the reference values.yaml for an exhaustive list of all parameters. More details on Helm installation.

  • Install the Pachyderm helm chart (helm v3):

    $ helm install pachd -f my_pachyderm_values.yaml pachyderm/pachyderm
    

Check your install

Check the status of the Pachyderm pods by periodically running kubectl get pods. When Pachyderm is ready for use, all Pachyderm pods must be in the Running status.

Because Pachyderm needs to pull the Pachyderm Docker image from DockerHub, it might take a few minutes for the Pachyderm pods status to change to Running.

kubectl get pods

System Response:

NAME                     READY     STATUS    RESTARTS   AGE
dash-6c9dc97d9c-vb972    2/2       Running   0          6m
etcd-7dbb489f44-9v5jj    1/1       Running   0          6m
pachd-6c878bbc4c-f2h2c   1/1       Running   0          6m

If you see a few restarts on the pachd nodes, that means that Kubernetes tried to bring up those pods before etcd was ready. Therefore, Kubernetes restarted those pods. You can safely ignore that message.

  1. Run pachctl version to verify that pachd has been deployed.

    $ pachctl version
    

    System Response:

    COMPONENT           VERSION
    pachctl             1.13.0
    pachd               1.13.0
    
  2. Open a new terminal window.

  3. Use port forwarding to access the Pachyderm dashboard (Pachyderm UI).

    pachctl port-forward
    

    This command runs continuosly and does not exit unless you interrupt it.

  4. Minikube users: you can alternatively set up Pachyderm to directly connect to the Minikube instance:

  5. Get your Minikube IP address:

    minikube ip
    
  6. Configure Pachyderm to connect directly to the Minikube instance:

    pachctl config update context `pachctl config get active-context` --pachd-address=<minikube ip>:30080
    

Next Steps

  • Complete the Beginner Tutorial to learn the basics of Pachyderm, such as adding data and building analysis pipelines.

  • Explore the Pachyderm Dashboard. By default, Pachyderm deploys the Pachyderm Enterprise dashboard. You can use a FREE trial token to experiment with the dashboard. Point your browser to port 30080 on your minikube IP. Alternatively, if you cannot connect directly, enable port forwarding by running pachctl port-forward, and then point your browser to localhost:30080.


Last update: April 6, 2021
Does this page need fixing? Edit me on GitHub