Site-specific splits can leave some classes out of some splits

When using the [Conser-vision dataset](https://www.drivendata.org/competitions/87/competition-image-classification-wildlife-conservation/), I ran Zamba's splitting on sites and "zebra" didn't fall into the test split despite there being seven sites with zebras.

I think site-specific splitting needs to be label-aware to better balance actual labels across splits. I think there's some optimal solution here that round-robins but assigns based on the least-represented label within the split (relative to target proportions), but I don't remember the theory behind that. Relevant Zamba code is [here](https://github.com/drivendataorg/zamba/blob/1cb17199afa5ca0e0c9c7ee9e6bc2f114217548f/zamba/data/metadata.py#L55)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Site-specific splits can leave some classes out of some splits #397

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Site-specific splits can leave some classes out of some splits #397

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions