Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 994 Bytes

File metadata and controls

18 lines (12 loc) · 994 Bytes

Datasets documentation

This folder contains scripts and notebooks that have been used for creation of benchmarks in dataset folder. The format and the process of contribution a new benchmark is specified there.

To make the process of benchmarks creation reproducible, please, try to stick to the following principles:

  • Fix the random seeds so your script produce the same dataset when calling repeatedly
  • Clean up temporary files, especially if they were created inside the package structure (so they will not be accidentally pushed to GitHub)
  • It is ok to import packages that are not contained in requirements.txt but avoid adding unnecessary dependencies
  • For Jupyter Notebook, it might be a good idea to rerun everything at the end using Kernel -> Restart & Run All
  • Make sure that the created benchmark can be read, i.e. run the following code
    from genomic_benchmarks.loc2seq import download_dataset
    download_dataset("YOUR_BENCHMARK_NAME")