Pachyderm stores information about each datum that a pipeline processes, including timing information, size information,
/pfs snapshots. You can view these statistics by running the
pachctl inspect datum command (or its language client equivalents).
In particular, Pachyderm provides the following information for each datum processed by your pipelines:
- The amount of data that was uploaded and downloaded
- The time spend uploading and downloading data
- The total time spend processing
- Success/failure information, including any error encountered for failed datums
- The directory structure of input data that was seen by the job.
pachctl list datum <pipeline>@<job ID> to retrieve the list of datums processed by a given job, and pick the datum ID you want to inspect. That information can be useful when troubleshooting a failed job.