A teammate has added the transactions dataset to the xFusionCorp Industries fraud-detection repository, but it was committed directly to Git instead of being tracked with DVC. Bring the repository in line with the team standard—every dataset under data/ must be tracked by DVC, not by Git.
-
A project exists at
/root/code/fraud-detection/with DVC already initialised. The datasetdata/raw/transactions.csvis currently tracked by Git, and the team standard requires DVC to own it instead. -
Stop Git from tracking the dataset without deleting it from disk.
-
Track the same dataset with DVC so a
.dvcpointer file is produced anddata/raw/.gitignoreexcludes the dataset itself. -
Stage the new
.dvcpointer and the new.gitignore, then record a Git commit with the messageTrack transactions dataset with DVC.
Once tracking is moved to DVC, the DVC TRACKED section in the EXPLORER panel will list the dataset, confirming the extension recognises it as a DVC-managed file.
-
First remove git tracking of
/data/raw/transactions.csvfile:cd fraud-detection git rm --cached data/raw/transactions.csv -
Add datasets to dvc tracking:
dvc add data/raw/transactions.csv
-
Check if
.gitignoreis created or not under/data/rawdirectory, otherwise create it manuallyecho "transactions.csv" > /data/raw/.gitignore
-
Stage the changes and commit:
git add . git commit -m "Track transactions dataset with DVC"
- Keep large or mutable datasets out of Git history; let DVC manage files under
data/so commits stay small and reviewable. - Use
git rm --cachedbeforedvc addwhen a file is already tracked by Git, so the file stays on disk but Git stops owning it. - Commit the generated
.dvcpointer file and the accompanying.gitignoretogether to keep the repository state consistent. - Verify that the dataset appears in the DVC TRACKED section after the change; that is the quickest confirmation that the file is managed correctly.