- Feature: For enterprise customers using MLDM (Pachyderm) & MLDE (Determined) in a combined cluster environment, the MLDE Notebooks now include support for the Jupyter Pachyderm extension by default.
- Feature: You can now set up and maintain metadata on your Pachyderm artifacts. This includes clusters, projects, repos, branches, and commits.
- Enhancement: The blog storage configuration attribute
GOCDK_ENABLED
is now set totrue
by default in your Helm chart values; in 2.11.0 the option to disable it will be removed. - Enhancement: The Console UI has undergone several improvements, including:
- Improved file browsing experience
- Improved DAG visualizations
- Interactive DAG edge highlighting
- Distinguishing colors and/or patterns based on pipeline types
- Enhancing the ability to understand more about connections by their edges (like joins)
- Visual indications of parallelism for pipelines when spec calls for it and when running
- Pipeline and repo table paging
Release Notes Highlights for Pachyderm
2.10.0
January 1, 0001
2.11.0
January 1, 0001
- Feature: Users can now manage metadata (as key:value pairs) in Console for projects and repositories from the User Metadata tab of the details side panel.
- Enhancement: Projects, pipelines, branches, and commits now include the following dervied metadata by default:
created_at
,created_by
,updated_at
, andupdated_by
. - Feature: Pre-built Jsonnet templates are now available in Console when creating a pipeline:
- Snowflake Integration: Creates a cron pipeline that can execute a query against a Snowflake database and return the results in a single output file.
- Hugging Face Downloader: Creates a cron pipeline to download datasets or models from huggingface on demand.
- Enhancement: Several enhancements have been made to improve the integration between Pachyderm and .
- Feature: The Pachyderm SDK now has an extras package
cdr
that you can install (pip install pachyderm_sdk[cdr]
) to make use of Common Data Refs (CDRs) in your user code. CDRs improve performance and speed by downloading version-controlled data directly from Pachyderm’s underlying Object Storage bucket and caching that data locally on your machine, allowing datasets to be assembled entirely locally and incrementally updated. - Security: The Pachyderm repository is now available at Iron Bank, a hardened container image repository owned and maintained by the U.S. Department of Defense (DoD) that supports the end-to-end lifecycle for modern software development. If you plan to download and install from Iron Bank, please reach out to
ai-support@hpe.com
or your Customer Success Engineer for assistance. - Notice: The
gocdk_enabled
attribute has been removed from the Helm Chart Values as it is now the default object storage driver.
2.12.2
January 1, 0001
- Feature: Users can now snapshot and restore pachyderm. There is a new Snapshot API, which allows you to Create, List, Delete and Inspect Snapshots.
- Feature: Users can now specify that changes to the files in a particular input should not result with datum reprocessing using Reference Inputs.
- Feature: Users can now implement deferred processing with a mechanism called Conditional Propagation. This mechanism is intended to become more robust over time and ultimately replace branch triggers.
2.6.0
January 1, 0001
- Feature: Datum Batching is now available. Datum Batching is a performance optimization process that enables processing multiple datums sequentially.
- Feature: The JupyterLab Pipeline Extension (PPS Extension) is now available, allowing users to push notebook code directly into a pipeline to create and run it. This feature is in Alpha, so we encourage you to share your feedback with us as you use it.
- Enhancement: New RBAC roles have been added to Projects:
ProjectViewerRole
,ProjectWriterRole
,ProjectOwnerRole
, andProjectCreatorRole
. You can read about the roles here. - Enhancement: The Console UI has undergone some substantial improvements, including a revamped file browser and more detailed information about pipeline and job performance.
- Enhancement: The Documentation site has undergone a substantial information architecture overhaul, making it easier to find the information you need. Content is now stored in top-level folders that follow the natural progression of learning about and using Pachyderm.
2.7.0
January 1, 0001
- Feature: The new Pachyderm SDK is now available. Check out the reference documentation, install guide, and example starter project.
- Feature: Console now has a runtime visualization for jobs in your pipeline.
- Feature: The documentation site now has a chatbot to help you find what you’re looking for. This feature is in beta, so please let us know if you have any feedback through our Slack community.
- Feature: Pachyderm’s helm chart now has a section for preflight checks, allowing you to easily validate whether the upgrade/migrations will be successful. This section can be found at
pachd.preflightchecks
. Simply setenabled: true
and set theimage.tag
to the new version you want to upgrade to. If created the pod namedpachyderm-preflight-check
shows a status ofCompleted
, you are ready to perform the upgrade. See the Upgrade steps for more information. - Enhancement: Console’s scalability has been improved to handle more concurrent users (50+) and power users who have many pipelines.
- Enhancement: Console’s DAG visualization has been upgraded to include more information about the state of your pipelines.
- Enhancement: The Jupyterlab Pipeline Specification Extension now supports GPUs.
- Refactor: The functionality of the Branch Cron Trigger has been refactored to work more intuitively. Previously, cron triggers functioned more like rate limiters; now, they enable you to set up a scheduled reoccurring event on a repo branch that evaluates and fires the trigger. When a Cron Trigger fires, but no new data has been added, there are no new downstream commits or jobs. See our Cron glossary entry for more information on crons in Pachyderm
- Deprecation: The original Python SDK (
python-pachyderm
) will be deprecated in 9 months (May 2024). We recommend that you start trying out the new Pachyderm SDK (pachyderm-sdk
) and begin planning your transition.
2.8.0
January 1, 0001
- Feature: You can now create and manage pipelines in Console! To showcase this, we’ve added Console steps to all of our tutorials.
- Feature: You can now set global defaults for your cluster that are passed down to all pipeline specs. These defaults provide a consistent experience for your data scientists and help manage your cluster. You can manage defaults via the PachCTL CLI or within Console.
- Beta: You can now try out a beta version of our Unified Deployment experience with Determined.
- Update: Branch triggers now require the trigger branch to exist before adding a
--trigger
setting to the target branch. - Enhancement: All pipeline specification references have been standardized to use
camelCase
format; use this format going forward when creating pipeline specifications.