Reference
PachCTL

Manage Commits & Delete Data

Learn how to delete and squash commits.

March 24, 2023

If bad data was committed into a Pachyderm input repository, you might need to delete a commit or surgically delete files from your history. Depending on whether or not the bad data is in the HEAD commit of the branch, you can perform one of the following actions:

Additionally, although this is a separate use-case, you have the option to squash non-HEAD commits to rewrite your commit history.

Delete the HEAD of a Branch #

To fix a broken HEAD, run the following command:

pachctl delete commit <commit-ID>

When you delete a HEAD commit, Pachyderm performs the following actions:

āš ļø

This command will only succeed if the HEAD commit has no children on any branch. pachctl delete commit will error when attempting to delete a HEAD commit with children.

ā„¹ļø

Are you wondering how a HEAD commit can have children?

A commit can be the head of a branch and still have children. For instance, given a master branch in a repository named repo, if you branch master by running pachctl create branch repo@staging --head repo@master, the master’s HEAD will have an alias child on staging.

Squash non-HEAD Commits #

If your commit has children, you have the option to use the squash commit command. Squashing is a way to rewrite your commit history; this helps clean up and simplify your commit history before sharing your work with team members. Squashing a commit in Pachyderm means that you are combining all the file changes in the commits of a global commit into their children and then removing the global commit. This behavior is inspired by the squash option in git rebase. No data stored in PFS is removed since they remain in the child commits.

pachctl squash commit <commit-ID>
āš ļø
  • Squashing a global commit on the head of a branch (no children) will fail. Use pachctl delete commit instead.
  • Squash commit only applies to user repositories. For example, you cannot squash a commit that updated a pipeline (Commit that lives in a spec repository).
  • Similarly to pachctl delete commit, pachctl squash commit stops (but does not delete) associated jobs.

Example #

In the simple example below, we create three successive commits on the master branch of a repo repo:

We then run pachctl squash commit ID1, then pachctl squash commit ID2, and look at our branch and remaining commit(s).

Squash example

At any moment, pachctl list file repo@master invariably returns the same files Aā€™, B, Cā€™. pachctl list commit however, differs in each case, since, by squashing commits, we have deleted them from the branch.

Delete Files from History #

šŸ“–

It is important to note that this use case is limited to simple cases where the “bad” changes were made relatively recently, as any pipeline update since then will make it impossible.

In rare cases, you might need to delete a particular file from a given commit and further choose to delete its complete history. In such a case, you will need to:

Example #

In the simple example below, we want to delete file C in commit 2. How would we do that?

For now, pachctl list file repo@master returns the files Aā€™, B, Cā€™, E, F.

Delete data example