Replies: 1 comment
-
|
I am not checked the code yet but this is my understanding from what i read, without code |Fantastic tool --> autoresearch which is 'auto architecture researcher'| |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone, and thanks for this amazing project!
I'm a beginner in this field and currently trying to learn as much as I can. I am planning to apply the
autoresearchmethod to a Computer Vision (CV) task—specifically, Image-to-Text Retrieval with distillation for a simple research project. My goal is to tune configurations like loss weights, augmentations, and soft-labels, while keeping the model architecture strictly fixed.In the README, under platform support, there is a great piece of advice regarding limited compute:
As I was setting up my script to use the strict 5-minute time budget on my full dataset, I realized a potential fundamental issue with CV training dynamics:
If I stop the training exactly at 5 minutes, my model would only process a very small fraction of the data (< 1 epoch). Because the model would never see the same image twice (no multi-epoch training), I'm worried that settings relying on repeated data exposure—like Data Augmentation, Hard Negative Mining, and EMA—might not converge properly. Could this cause the agent to incorrectly evaluate these settings as "bad" and discard them?
Since my model architecture is fixed (meaning the "speed penalty" of the time budget isn't a factor for me), I've been wondering if an alternative approach inspired by the TinyStories advice might work better for this specific case.
What if I use a Reduced Dataset + Full Epochs approach?
Instead of a strict 5-minute timer on the huge dataset, what if I pre-sample a highly representative small subset and instruct the agent to run a fixed, small number of epochs on it? I would tune the subset size and epoch count so that the total training time still fits within a fast 5-10 minute window perfectly aligning with the fast-iteration philosophy.
My thought process is that this approach might:
Since I am currently setting up my GPU environment, I haven't run the full
autoresearchloop yet but plan to do so very soon. Before I dive in, I would deeply appreciate your insights:Does my reasoning make sense? Has anyone else tried adapting
autoresearchfor Vision tasks this way? I’d love to hear your opinions on substituting the "Strict Time Budget" with a "Strict Mini-Dataset + Epoch Budget" for hyperparameter tuning.Thanks again for your time and the fantastic tool!
(P.S. English is not my first language, so I used translation tools to help write this. I hope my meaning is clear)
Beta Was this translation helpful? Give feedback.
All reactions