Release Notes Highlights for Pachyderm

2.10.0

January 1, 0001

  • Feature: For enterprise customers using MLDM (Pachyderm) & MLDE (Determined) in a combined cluster environment, the MLDE Notebooks now include support for the Jupyter Pachyderm extension by default.
  • Feature: You can now set up and maintain metadata on your Pachyderm artifacts. This includes clusters, projects, repos, branches, and commits.
  • Enhancement: The blog storage configuration attribute GOCDK_ENABLED is now set to true by default in your Helm chart values; in 2.11.0 the option to disable it will be removed.
  • Enhancement: The Console UI has undergone several improvements, including:
    • Improved file browsing experience
    • Improved DAG visualizations
      • Interactive DAG edge highlighting
      • Distinguishing colors and/or patterns based on pipeline types
      • Enhancing the ability to understand more about connections by their edges (like joins)
      • Visual indications of parallelism for pipelines when spec calls for it and when running
    • Pipeline and repo table paging

2.11.0

January 1, 0001

  • Feature: Users can now manage metadata (as key:value pairs) in Console for projects and repositories from the User Metadata tab of the details side panel.
  • Enhancement: Projects, pipelines, branches, and commits now include the following dervied metadata by default: created_at, created_by, updated_at, and updated_by.
  • Feature: Pre-built Jsonnet templates are now available in Console when creating a pipeline:
    • Snowflake Integration: Creates a cron pipeline that can execute a query against a Snowflake database and return the results in a single output file.
    • Hugging Face Downloader: Creates a cron pipeline to download datasets or models from huggingface on demand.
  • Enhancement: Several enhancements have been made to improve the integration between Pachyderm and .
  • Feature: The Pachyderm SDK now has an extras package cdr that you can install (pip install pachyderm_sdk[cdr]) to make use of Common Data Refs (CDRs) in your user code. CDRs improve performance and speed by downloading version-controlled data directly from Pachyderm’s underlying Object Storage bucket and caching that data locally on your machine, allowing datasets to be assembled entirely locally and incrementally updated.
  • Security: The Pachyderm repository is now available at Iron Bank, a hardened container image repository owned and maintained by the U.S. Department of Defense (DoD) that supports the end-to-end lifecycle for modern software development. If you plan to download and install from Iron Bank, please reach out to ai-support@hpe.com or your Customer Success Engineer for assistance.
  • Notice: The gocdk_enabled attribute has been removed from the Helm Chart Values as it is now the default object storage driver.

2.12.2

January 1, 0001

  • Feature: Users can now snapshot and restore pachyderm. There is a new Snapshot API, which allows you to Create, List, Delete and Inspect Snapshots.
  • Feature: Users can now specify that changes to the files in a particular input should not result with datum reprocessing using Reference Inputs.
  • Feature: Users can now implement deferred processing with a mechanism called Conditional Propagation. This mechanism is intended to become more robust over time and ultimately replace branch triggers.

2.6.0

January 1, 0001

  • Feature: Datum Batching is now available. Datum Batching is a performance optimization process that enables processing multiple datums sequentially.
  • Feature: The JupyterLab Pipeline Extension (PPS Extension) is now available, allowing users to push notebook code directly into a pipeline to create and run it. This feature is in Alpha, so we encourage you to share your feedback with us as you use it.
  • Enhancement: New RBAC roles have been added to Projects: ProjectViewerRole, ProjectWriterRole, ProjectOwnerRole, and ProjectCreatorRole. You can read about the roles here.
  • Enhancement: The Console UI has undergone some substantial improvements, including a revamped file browser and more detailed information about pipeline and job performance.
  • Enhancement: The Documentation site has undergone a substantial information architecture overhaul, making it easier to find the information you need. Content is now stored in top-level folders that follow the natural progression of learning about and using Pachyderm.

2.7.0

January 1, 0001

  • Feature: The new Pachyderm SDK is now available. Check out the reference documentation, install guide, and example starter project.
  • Feature: Console now has a runtime visualization for jobs in your pipeline.
  • Feature: The documentation site now has a chatbot to help you find what you’re looking for. This feature is in beta, so please let us know if you have any feedback through our Slack community.
  • Feature: Pachyderm’s helm chart now has a section for preflight checks, allowing you to easily validate whether the upgrade/migrations will be successful. This section can be found at pachd.preflightchecks. Simply set enabled: true and set the image.tag to the new version you want to upgrade to. If created the pod named pachyderm-preflight-check shows a status of Completed, you are ready to perform the upgrade. See the Upgrade steps for more information.
  • Enhancement: Console’s scalability has been improved to handle more concurrent users (50+) and power users who have many pipelines.
  • Enhancement: Console’s DAG visualization has been upgraded to include more information about the state of your pipelines.
  • Enhancement: The Jupyterlab Pipeline Specification Extension now supports GPUs.
  • Refactor: The functionality of the Branch Cron Trigger has been refactored to work more intuitively. Previously, cron triggers functioned more like rate limiters; now, they enable you to set up a scheduled reoccurring event on a repo branch that evaluates and fires the trigger. When a Cron Trigger fires, but no new data has been added, there are no new downstream commits or jobs. See our Cron glossary entry for more information on crons in Pachyderm
  • Deprecation: The original Python SDK (python-pachyderm) will be deprecated in 9 months (May 2024). We recommend that you start trying out the new Pachyderm SDK (pachyderm-sdk) and begin planning your transition.

2.8.0

January 1, 0001

  • Feature: You can now create and manage pipelines in Console! To showcase this, we’ve added Console steps to all of our tutorials.
  • Feature: You can now set global defaults for your cluster that are passed down to all pipeline specs. These defaults provide a consistent experience for your data scientists and help manage your cluster. You can manage defaults via the PachCTL CLI or within Console.
  • Beta: You can now try out a beta version of our Unified Deployment experience with Determined.
  • Update: Branch triggers now require the trigger branch to exist before adding a --trigger setting to the target branch.
  • Enhancement: All pipeline specification references have been standardized to use camelCase format; use this format going forward when creating pipeline specifications.