Datasets documentation

This folder contains scripts and notebooks that have been used for creation of benchmarks in dataset folder. The format and the process of contribution a new benchmark is specified there.

To make the process of benchmarks creation reproducible, please, try to stick to the following principles:

Fix the random seeds so your script produce the same dataset when calling repeatedly
Clean up temporary files, especially if they were created inside the package structure (so they will not be accidentally pushed to GitHub)
It is ok to import packages that are not contained in requirements.txt but avoid adding unnecessary dependencies
For Jupyter Notebook, it might be a good idea to rerun everything at the end using Kernel -> Restart & Run All
Make sure that the created benchmark can be read, i.e. run the following code

    from genomic_benchmarks.loc2seq import download_dataset
    download_dataset("YOUR_BENCHMARK_NAME")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets documentation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Datasets documentation