Skip to content

Commit 47fbd68

Browse files
authored
Separate entrypoints and standalone scripts (#285)
* Initialize MLflow environment variable for filesystem tracking * Make extract clips a uv-runnable standalone script * Fix create zarr dataset guide * Create zarr module * Group visualization scripts * Rename * Update references * Add other scripts and README * Remove old utils. Remove entrypoint for extract clips
1 parent 937d28b commit 47fbd68

14 files changed

Lines changed: 44 additions & 44 deletions

MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ recursive-include bash_scripts *.sh
88
recursive-include notebooks *.py
99
recursive-include notebooks *.ipynb
1010
recursive-include scripts *.py
11+
recursive-include scripts *.md
1112
recursive-include crabs *.yaml
1213
recursive-include guides *.png
1314

bash_scripts/run_extract_loop_clips_array.sh

Lines changed: 18 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -61,33 +61,26 @@ if [[ $SLURM_ARRAY_TASK_COUNT -ne $NUM_CSV_ROWS ]]; then
6161
fi
6262

6363
# -----------------------------
64-
# Create virtual environment
64+
# Set up uv
6565
# -----------------------------
66-
# TODO: replace with uv
67-
module load miniconda
66+
# extract_loop_clips.py is a standalone (PEP 723) script: uv fetches it
67+
# from the repository, resolves its inline dependencies into an ephemeral
68+
# environment, and runs it. No package install or virtual environment is
69+
# needed.
70+
module load uv
71+
72+
# set uv cache dir to /ceph/scratch/sminano
73+
# (should be faster than the home directory cache and gets purged regularly)
74+
export UV_CACHE_DIR=/ceph/scratch/sminano/uv-cache
75+
# copy (instead of symlink) files across filesystems (ceph cache vs tmpfs)
76+
export UV_LINK_MODE=copy
77+
export UV_HTTP_TIMEOUT=120 # seconds
78+
79+
# Remote URL of the standalone script for the selected branch
80+
SCRIPT_URL="https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/$GIT_BRANCH/scripts/extract_loop_clips.py"
6881

69-
ENV_NAME=crabs-extract-$SLURM_ARRAY_JOB_ID-$SLURM_ARRAY_TASK_ID
70-
ENV_PREFIX=$TMPDIR/$ENV_NAME
71-
72-
conda create \
73-
--prefix $ENV_PREFIX \
74-
-y \
75-
python=3.12
76-
77-
# activate environment
78-
source activate $ENV_PREFIX
79-
80-
# install crabs package in virtual env
81-
python -m pip install git+https://github.com/SainsburyWellcomeCentre/crabs-exploration.git@$GIT_BRANCH
82-
83-
# log pip and python locations
84-
echo $ENV_PREFIX
85-
which python
86-
which pip
87-
88-
# print the version of crabs package (last number is the commit hash)
8982
echo "Git branch: $GIT_BRANCH"
90-
conda list crabs
83+
echo "Script: $SCRIPT_URL"
9184
echo "-----"
9285

9386
# ---------------------------------------
@@ -103,7 +96,7 @@ fi
10396
# -------------------------
10497
# Run extraction script
10598
# -------------------------
106-
extract-loops \
99+
uv run "$SCRIPT_URL" \
107100
--csv_filepath $CSV_PATH \
108101
--input_dir $INPUT_DIR \
109102
--output_dir $OUTPUT_DIR \
@@ -114,12 +107,6 @@ extract-loops \
114107
echo "Completed extraction of clip with task ID = $SLURM_ARRAY_TASK_ID"
115108
echo "--------------------------------------------------------"
116109

117-
# -----------------------------
118-
# Cleanup
119-
# ----------------------------
120-
conda deactivate
121-
conda remove --prefix $ENV_PREFIX --all -y
122-
123110
# ------------------
124111
# Copy logs to LOG_DIR
125112
# -------------------

guides/CreateZarrDatasetForTracks.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,9 @@
2525
2626
3. **Download the create-zarr-dataset bash script from the 🦀 repository**
2727
28-
To do so, run the following command, which will download a bash script called `run_zarr_dataset_creation.sh` to the current working directory.
28+
To do so, run the following command, which will download a bash script called `run_zarr_dataset.sh` to the current working directory.
2929
```
30-
curl https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/main/bash_scripts/run_zarr_dataset_creation.sh > run_zarr_dataset_creation.sh
30+
curl https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/main/bash_scripts/run_zarr_dataset.sh > run_zarr_dataset.sh
3131
```
3232
3333
This bash script launches a SLURM array job to create a zarr dataset from a set of input VIA track files. Each job in the array processes files from a single video. With the command above, the version of the bash script downloaded is the one at the tip of the `main` branch in the [🦀 repository](https://github.com/SainsburyWellcomeCentre/crabs-exploration).
@@ -38,11 +38,11 @@
3838
>
3939
> - For example, to download the version of the file at the tip of a branch called `<BRANCH-NAME>`, edit the path above to replace `main` with `<BRANCH-NAME>`:
4040
> ```
41-
> https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/<BRANCH-NAME>/bash_scripts/run_zarr_dataset_creation.sh
41+
> https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/<BRANCH-NAME>/bash_scripts/run_zarr_dataset.sh
4242
> ```
4343
> - To download the version of the file of a specific commit, replace `main` with `blob/<COMMIT-HASH>`:
4444
> ```
45-
> https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/blob/<COMMIT-HASH>/bash_scripts/run_zarr_dataset_creation.sh
45+
> https://raw.githubusercontent.com/SainsburyWellcomeCentre/crabs-exploration/blob/<COMMIT-HASH>/bash_scripts/run_zarr_dataset.sh
4646
> ```
4747
4848
4. **Edit the bash script if required**
@@ -63,7 +63,7 @@
6363
To launch a job, use the `sbatch` command with the path to the bash script:
6464
6565
```
66-
sbatch path/to/run_zarr_dataset_creation.sh
66+
sbatch path/to/run_zarr_dataset.sh
6767
```
6868
6969
6. **Check the status of the job**
@@ -97,7 +97,7 @@ Sometimes some of the jobs in the array job fail due to non reproducible issues
9797
Run the edited bash script, to create a zarr dataset for the previously failed jobs:
9898
9999
```bash
100-
sbatch path/to/edited/run_zarr_dataset_creation.sh
100+
sbatch path/to/edited/run_zarr_dataset.sh
101101
```
102102
103103
If the array job runs successfully, a new zarr store (that we will call `store_2` here) will be generated.

guides/ExtractLoopClipsCluster.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
- `CSV_PATH`: path to the input csv file.
5555
- `INPUT_DIR`: path to the input directory containing the input videos.
5656
- `OUTPUT_DIR`: path to the output directory for the extracted loop clips.
57-
- `GIT_BRANCH`: version of the 🦀 package to use. Usually we will use the version at the tip of the `main` branch.
57+
- `GIT_BRANCH`: version of the standalone extraction script to fetch and run. Usually we will use the version at the tip of the `main` branch.
5858
- `VERIFY_FRAMES`: whether to verify frame count of the extracted clips matches the value in the csv file.
5959
6060

pyproject.toml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,7 @@ train-detector = "crabs.detector.train_model:app_wrapper"
7575
evaluate-detector = "crabs.detector.evaluate_model:app_wrapper"
7676
detect-and-track-video = "crabs.tracker.track_video:app_wrapper"
7777
# support utils
78-
extract-loops = "crabs.utils.extract_loop_clips:app_wrapper"
79-
create-zarr-dataset = "crabs.utils.create_zarr_dataset:app_wrapper"
78+
create-zarr-dataset = "crabs.zarr.create_dataset:app_wrapper"
8079

8180
[build-system]
8281
requires = ["setuptools>=77", "wheel", "setuptools_scm[toml]>=8"]

scripts/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Standalone scripts to support one-off tasks.
2+
3+
They should be runnable with `uv`.
Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
1-
"""Extract loop clips from input videos using ffmpeg."""
1+
"""Extract loop clips from input videos using ffmpeg.
2+
3+
Standalone script: run with ``uv run scripts/extract_loop_clips.py ...``
4+
(uv installs the inline dependencies below into an ephemeral env). The
5+
``ffmpeg``/``ffprobe`` binaries must be available on PATH.
6+
"""
7+
8+
# /// script
9+
# requires-python = ">=3.11"
10+
# dependencies = [
11+
# "pandas",
12+
# ]
13+
# ///
214

315
import argparse
416
import subprocess

0 commit comments

Comments
 (0)