Hi there,
thanks for this repo.
The idea of a model training another one looks promising.
Juts one thought: looking at the image shown in the README, the change of seed is registered as a valid step to lower the loss (it's the last step on the graph) while it is just "luck" in a way isn't it, unless I'm missing smthg?
I understand that in the end the change of seed led to a stronger checkpoint as per the eval, but it still feels a bit random to me.
So I'm wondering if there's a way to better separate actual gains (ones that always lower the val) and lucky gains (eg a mere change of seed).
Hi there,
thanks for this repo.
The idea of a model training another one looks promising.
Juts one thought: looking at the image shown in the README, the change of seed is registered as a valid step to lower the loss (it's the last step on the graph) while it is just "luck" in a way isn't it, unless I'm missing smthg?
I understand that in the end the change of seed led to a stronger checkpoint as per the eval, but it still feels a bit random to me.
So I'm wondering if there's a way to better separate actual gains (ones that always lower the val) and lucky gains (eg a mere change of seed).