A Pachyderm repository is a location where you store your data inside Pachyderm. A Pachyderm repository is a top-level data object that contains files and folders. Similar to Git, a Pachyderm repository tracks all changes to the data and creates a history of data modifications that you can access and review. You can store any type of file is in a Pachyderm repo, including binary and plain text files.
Unlike a Git repository that stores history in a
.git file in your copy
of a Git repo, Pachyderm stores the history of your commits in a centralized
location. Because of that, you do not run into
merge conflicts as you often do with Git commits when you try to merge
.git history with the master copy of the repo. With large datasets
resolving a merge conflict might not be possible.
A Pachyderm repository is the first entity that you configure when you
want to add data to Pachyderm. You can create a repository by running
pachctl create repo command or by using the Pachyderm UI. After
creating the repository, you can add your data by using the
pachctl put file command.
The following types of repositories exist in Pachyderm:
- Input repositories Users or external applications outside of Pachyderm can add data to the input repositories for further processing.
- Output repositories
Pachyderm automatically creates output repositories
pipelines write results of computations into these repositories.
Any data that is written to the
pfs/outdirectory within your pipeline user container is written to that pipeline output repository.
You can view the list of repositories in your Pachyderm cluster
by running the
pachctl list repo command.
$ pachctl list repo NAME CREATED SIZE (MASTER) raw_data 6 hours ago 0B
pachctl inspect repo command provides a more detailed overview
of a specified repository.
$ pachctl inspect repo raw_data Name: raw_data Description: A raw data repository Created: 6 hours ago Size of HEAD on master: 5.121MiB
If you need to delete a repository, you can run the
pachctl delete command. This command deletes all
data and the information about the specified
repository, such as commit history. The delete
operation is irreversible and results in a
complete cleanup of your Pachyderm repository.
If you run the delete command with the
--all flag, Pachyderm
deletes all repositories in this