Skip to content

Latest commit

 

History

History
77 lines (55 loc) · 2.48 KB

File metadata and controls

77 lines (55 loc) · 2.48 KB

Create a Standard ML Project Structure

Problem

A colleague has started a new ML project at /root/code/fraud-detection/, but the layout does not match the xFusionCorp Industries standard. Bring the project in line with the team's conventions.

  1. Inspect the existing project at /root/code/fraud-detection/.

  2. The final layout must match the tree below exactly:

    fraud-detection/
    ├── data/
    │   ├── raw/
    │   └── processed/
    ├── models/
    ├── notebooks/
    ├── src/
    │   ├── data/
    │   ├── features/
    │   ├── models/
    │   └── utils/
    ├── tests/
    ├── configs/
    ├── requirements.txt
    └── README.md
    
  3. Every subdirectory under src/ must contain an __init__.py file so that Python recognises it as a package.

  4. requirements.txt must list the following dependencies, one per line: scikit-learn, pandas, numpy, and mlflow. The canonical PyPI name for the scikit-learn package is scikit-learn.

  5. README.md must begin with the heading # fraud-detection.

  6. Review the existing project and correct everything that does not match the requirements above.

Solution

  1. Updated Readme.md according to task 5:

    # fraud-detection
    
  2. According to required files structures

    • two sub directory raw and processed is missing under data directory.
    • tests and configs directory is also missing
    • let's create them using the following commands:
    mkdir -p fraud-detection/data/{raw,processed}
    mkdir -p fraud-detection/{tests,configs}
  3. In my case, I found two directories name was wrong (util and feature). Lets rename those directories:

    mv fraud-detection/src/feature fraud-detection/src/features
    mv fraud-detection/src/util fraud-detection/src/utils
  4. For task 3, just inspect and make sure each sub directory has __init__.py under src/ directory. If anyone is missing, then you can create with these commands accordingly.

    touch fraud-detection/src/data/__init__.py
    touch fraud-detection/src/features/__init__.py
    touch fraud-detection/src/models/__init__.py
    touch fraud-detection/src/utils/__init__.py
  5. Updated requirements.txt file based on the packages that are required to be listed.

    echo -e "scikit-learn\npandas\nnumpy\nmlflow" > fraud-detection/requirements.txt