Skip to content

Latest commit

 

History

History
59 lines (40 loc) · 2.58 KB

File metadata and controls

59 lines (40 loc) · 2.58 KB

Configure a DVC Remote Storage

Problem

The xFusionCorp Industries ML team uses SeaweedFS as the shared S3-compatible object store for DVC-tracked data. A .dvc/config already declares a remote called s3 for the fraud-detection project, but dvc push currently fails. Correct the configuration and push the tracked data into the SeaweedFS bucket.

  1. A project exists at /root/code/fraud-detection/ with DVC initialised and data/raw/transactions.csv already tracked.

  2. SeaweedFS is already running on the controlplane:

    • S3 endpoint: http://localhost:8333
    • Filer UI: open the SeaweedFS Filer button at the top of the lab (forwarded port 8888) – buckets are visible under /buckets/.
    • Credentials: weedadmin / weedadmin123 (already set in .dvc/config)
    • Bucket name: dvc-storage (already created and visible in the Filer UI under /buckets/dvc-storage)
  3. Review the existing .dvc/config and correct everything that prevents dvc push from succeeding. The remote called s3 must:

    • point at the dvc-storage bucket using s3://;
    • use the correct SeaweedFS S3 endpoint URL;
    • be marked as the default remote.
  4. Push the tracked data. After the push, the dvc-storage bucket in the SeaweedFS Filer UI must contain at least one object under the files/md5/... prefix.

Solution

  1. Inspect .dvc/config and correct remote s3 so it points to SeaweedFS bucket:

    ['remote "s3"']
        url = s3://dvc-storage
        endpointurl = http://localhost:8333
        access_key_id = weedadmin
        secret_access_key = weedadmin123
    

    Here we have corrected bucket name and the endpoint url. Make sure you have set the correct one.

  2. Set s3 as default remote:

    dvc remote default s3
  3. Push tracked data:

    dvc push
  4. Verify SeaweedFS Filer UI shows object under files/md5/... in dvc-storage.

Good to Know

  • DVC remote config lives in .dvc/config; a typo there can break dvc push even when credentials are correct.
  • For S3-compatible backends, url names bucket/path and endpointurl names service endpoint; both must match storage provider.
  • dvc remote default s3 matters when repo has multiple remotes or push target is not explicit.
  • SeaweedFS exposes S3 API and Filer UI separately; DVC talks to S3 endpoint, not Filer UI.
  • In this lab, bucket visibility in Filer UI under /buckets/ is a quick check that endpoint and credentials are correct.
  • DVC object data should land under files/md5/...; if nothing appears there, remote config or default remote is still wrong.