|
| 1 | +# Create a Standard ML Project Structure |
| 2 | + |
| 3 | +## Problem |
| 4 | + |
| 5 | +A colleague has started a new ML project at `/root/code/fraud-detection/`, but the layout does not match the xFusionCorp Industries standard. Bring the project in line with the team's conventions. |
| 6 | + |
| 7 | +1. Inspect the existing project at `/root/code/fraud-detection/`. |
| 8 | + |
| 9 | +2. The final layout must match the tree below exactly: |
| 10 | + |
| 11 | + ```files |
| 12 | + fraud-detection/ |
| 13 | + ├── data/ |
| 14 | + │ ├── raw/ |
| 15 | + │ └── processed/ |
| 16 | + ├── models/ |
| 17 | + ├── notebooks/ |
| 18 | + ├── src/ |
| 19 | + │ ├── data/ |
| 20 | + │ ├── features/ |
| 21 | + │ ├── models/ |
| 22 | + │ └── utils/ |
| 23 | + ├── tests/ |
| 24 | + ├── configs/ |
| 25 | + ├── requirements.txt |
| 26 | + └── README.md |
| 27 | + ``` |
| 28 | +
|
| 29 | +3. Every subdirectory under `src/` must contain an `__init__.py` file so that Python recognises it as a package. |
| 30 | +
|
| 31 | +4. `requirements.txt` must list the following dependencies, one per line: `scikit-learn`, `pandas`, `numpy`, and `mlflow`. The canonical PyPI name for the `scikit-learn` package is `scikit-learn`. |
| 32 | +
|
| 33 | +5. `README.md` must begin with the heading `# fraud-detection`. |
| 34 | +
|
| 35 | +6. Review the existing project and correct everything that does not match the requirements above. |
| 36 | +
|
| 37 | +## Solution |
| 38 | +
|
| 39 | +1. Updated `Readme.md` according to task 5: |
| 40 | +
|
| 41 | + ```markdown |
| 42 | + # fraud-detection |
| 43 | + |
| 44 | + ``` |
| 45 | +
|
| 46 | +2. According to required files structures |
| 47 | +
|
| 48 | + - two sub directory `raw` and `processed` is missing under data directory. |
| 49 | + - `tests` and `configs` directory is also missing |
| 50 | + - let's create them using the following commands: |
| 51 | +
|
| 52 | + ```bash |
| 53 | + mkdir -p fraud-detection/data/{raw,processed} |
| 54 | + mkdir -p fraud-detection/{tests,configs} |
| 55 | + ``` |
| 56 | +
|
| 57 | +3. In my case, I found two directories name was wrong (util and feature). Lets rename those directories: |
| 58 | +
|
| 59 | + ```bash |
| 60 | + mv fraud-detection/src/feature fraud-detection/src/features |
| 61 | + mv fraud-detection/src/util fraud-detection/src/utils |
| 62 | + ``` |
| 63 | +
|
| 64 | +4. For task 3, just inspect and make sure each sub directory has `__init__.py` under `src/` directory. If anyone is missing, then you can create with these commands accordingly. |
| 65 | +
|
| 66 | + ```bash |
| 67 | + touch fraud-detection/src/data/__init__.py |
| 68 | + touch fraud-detection/src/features/__init__.py |
| 69 | + touch fraud-detection/src/models/__init__.py |
| 70 | + touch fraud-detection/src/utils/__init__.py |
| 71 | + ``` |
| 72 | +
|
| 73 | +5. Updated `requirements.txt` file based on the packages that are required to be listed. |
| 74 | +
|
| 75 | + ```bash |
| 76 | + echo -e "scikit-learn\npandas\nnumpy\nmlflow" > fraud-detection/requirements.txt |
| 77 | + ``` |
0 commit comments