Reference
PachCTL

S3 Gateway Supported Operations

Learn which S3 Gateway operations are supported.

The Pachyderm S3 gateway supports the following operations:

📖

When using the AWS S3 CLI, simply append --profile <name-your-profile> at the end of your command to reference a given profile. If none, the session token will be retrieved from the default profile. More info in the Configure your S3 client page.

For example, in the Create Bucket section below, the command would become:

aws --endpoint-url http://localhost:30600/ s3 mb s3://master.test --profile <name-your-profile>

Create Bucket #

Call the create an S3 bucket command on your S3 client to create a branch in a Pachyderm repository. For example, let’s create the master branch of the repo test.

  1. In MinIO,
    • this would look like:
      mc mb local/master.test
      System Response:
      Bucket created successfully `local/master.test`.
    • verify that the S3 bucket has been successfully created:
      mc ls local
      System Response:
      [2021-04-26 22:46:08]   0B master.test/
  2. If you are using AWS S3 CLI,
    • this would look like:
      aws --endpoint-url http://localhost:30600/ s3 mb s3://master.test
      System Response:
      make_bucket: master.test
    • verify that the S3 bucket has been successfully created:
      aws --endpoint-url http://localhost:30600/ s3 ls
      System Response:
      2021-04-26 22:46:08 master.test
â„šī¸
Alternatively, You can also use the `pachctl list repo` command to view the list of repositories. You should see the newly created repository in this list.

Delete Bucket #

Call the delete an empty S3 bucket command on your S3 client to delete a Pachyderm repository.

âš ī¸

The repo must be completely empty.

  1. In MinIO,
    mc rb local/master.test
    System Response:
    Removed `local/master.test` successfully.
  2. If you are using AWS S3 CLI,
    aws --endpoint-url http://localhost:30600/ s3 rb s3://master.test
    System Response:
    remove_bucket: master.test

List Buckets #

You can check the list of filesystem objects in your Pachyderm repository by running an S3 client ls command.

  1. In MinIO,

    mc ls local

    System Response:

    [2021-04-26 15:09:50 PDT]      0B master.train/
    [2021-04-26 14:58:50 PDT]      0B master.pre_process/
    [2021-04-26 14:58:09 PDT]      0B master.split/
    [2021-04-26 14:58:09 PDT]      0B stats.split/
  2. If you are using AWS S3 CLI,

    aws --endpoint-url http://localhost:30600 s3 ls

    System Response:

    2021-04-26 15:09:50 master.train
    2021-04-26 14:58:50 master.pre_process
    2021-04-26 14:58:09 master.split
    2021-04-26 14:58:09 stats.split

List Objects #

For example, list the contents of the repository raw_data.

  1. In MinIO,

    mc ls local/master.raw_data

    System Response:

    [2021-04-26 12:11:37 PDT]  2.6MiB github_issues_medium.csv
  2. If you are using AWS S3 CLI,

    aws --endpoint-url http://localhost:30600/ s3 ls s3://master.raw_data

    System Response:

    2021-04-26  11:22:23    2685061 github_issues_medium.csv

Write Object #

For example, add the test.csv file to the master branch in the raw_data repository. raw_data being an input repository.

â„šī¸

Not all the repositories that you see in the results of the ls command are repositories that can be written to. Some of them might be read-only. Note that you should have writting access to the input repo in order to be able to add files to it.

  1. In MinIO,

    • this would look like:

      mc cp test.csv local/master.raw_data/test.csv

      System Response:

      test.csv:                  62 B / 62 B  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  100.00% 206 B/s 0s
    • Check that the file was added:

      mc ls local/master.raw_data

      System Response:

      [2021-04-26 12:11:37 PDT]  2.6MiB github_issues_medium.csv
      [2021-04-26 12:11:37 PDT]     62B test.csv
  2. If you are using AWS S3 CLI,

    • this would look like:
      aws --endpoint-url http://localhost:30600/ s3 cp test.csv s3://master.raw_data
      System Response:
      upload: ./test.csv to s3://master.raw_data/test.csv
    • Check that the file was added:
      aws --endpoint-url http://localhost:30600/ s3 ls s3://master.raw_data/
      System Response:
      2021-04-26 12:11:37  2685061 github_issues_medium.csv
      2021-04-26 12:11:37       62 test.csv

Get Object #

For example, download the file github_issues_medium.csv from the master branch of the repo raw_data.

  1. In MinIO,

    mc cp local/master.raw_data/github_issues_medium.csv .

    System Response:

    github_issues_medium.csv:  2.56 MiB / 2.56 MiB  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 100.00% 1.26 MiB/s 2s
  2. If you are using AWS S3 CLI,

    aws --endpoint-url http://localhost:30600/ s3 cp s3://master.raw_data/test.csv .

    System Response:

    download: s3://master.raw_data/test.csv to ./test.csv

Remove Object #

For example, delete the file test.csv in the HEAD of the master branch of the raw_data repo.

  1. In MinIO,

    mc rm local/master.raw_data/test.csv

    System Response:

    Removing `local/master.raw_data/test.csv`.
  2. If you are using AWS S3 CLI,

    aws --endpoint-url http://localhost:30600/ s3 rm s3://master.raw_data/test.csv

    System Response:

    delete: s3://master.raw_data/test.csv
📖