Skip to content

Latest commit

 

History

History
49 lines (33 loc) · 2.29 KB

File metadata and controls

49 lines (33 loc) · 2.29 KB

Pull DVC-Tracked Data from Remote

Problem

A new xFusionCorp Industries team member has cloned the fraud-detection repository onto a fresh machine. The DVC remote is already configured to point at the team's SeaweedFS bucket, but dvc pull is failing. Diagnose the cause, correct the configuration, and pull the dataset.

  1. A cloned project exists at /root/code/fraud-detection/ with DVC initialised, the data/raw/transactions.csv.dvc pointer file present, but the dataset itself missing from disk and from the local DVC cache.

  2. SeaweedFS is already running on the controlplane and the dataset has already been pushed to the dvc-storage bucket—open the SeaweedFS Filer button at the top of the lab and navigate to /buckets/dvc-storage/ to confirm that the object is there.

    • S3 endpoint: http://localhost:8333
    • Credentials: weedadmin / weedadmin123
  3. Review .dvc/config and correct everything that prevents dvc pull from authenticating against SeaweedFS.

  4. After the fix, the s3 remote must use:

    • The access key (access_key_id) weedadmin
    • The secret key (secret_access_key) weedadmin123.
  5. Pull the dataset. After the pull, data/raw/transactions.csv must be present on disk and its content must match the hash recorded in the .dvc pointer.

Solution

  1. Update .dvc/config so DVC uses SeaweedFS creds and endpoint:

    [core]
        remote = s3
    
    ['remote "s3"']
        url = s3://dvc-storage
        endpointurl = http://localhost:8333
        access_key_id = weedadmin
        secret_access_key = weedadmin123
    
  2. Run dvc pull from repo root:

    cd fraud-detection
    dvc pull

Good to Know?

  • DVC reads [core] remote first, then the matching [remote "..."] block. Wrong section name or typo breaks auth even if bucket name is right.
  • SeaweedFS S3 mode needs endpointurl. Without it, DVC may try AWS defaults and fail against local object storage.
  • If pull still fails, check dvc remote list -v or dvc pull -v to see which config DVC loaded. Local .dvc/config.local or env vars can override repo config.
  • After pull, verify file integrity by checking the hash in the .dvc pointer matches the pulled file's hash. If they differ, something went wrong with the pull or the remote data is corrupted.