Key Features

Key Features and Benefits #

The following are the key features of Pachyderm that make it a powerful data processing platform.

Data-driven Pipelines #

Automatically trigger pipelines based on changes in the data.
Orchestrate batch or real-time data pipelines.
Only process dependent changes in the data.
Reproducibility and data lineage across all pipelines.

Version Control #

Track every change to your data automatically.
Works with any file type.
Supports collaboration through a git-like structure of commits.

Autoscaling and Deduplication #

Autoscale jobs based on resource demand.
Automatically parallelize large data sets.
Automatically deduplicate data across repositories.

Flexibility and Infrastructure Agnosticism #

Use existing cloud or on-premises infrastructure.
Process any data type, size, or scale in batch or real-time pipelines.
Container-native architecture allows for developer autonomy.
Integrates with existing tools and services, including CI/CD, logging, authentication, and data APIs.