Skip to content

Site-specific splits can leave some classes out of some splits #397

Description

@jdcc

When using the Conser-vision dataset, I ran Zamba's splitting on sites and "zebra" didn't fall into the test split despite there being seven sites with zebras.

I think site-specific splitting needs to be label-aware to better balance actual labels across splits. I think there's some optimal solution here that round-robins but assigns based on the least-represented label within the split (relative to target proportions), but I don't remember the theory behind that. Relevant Zamba code is here

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions