Adapting autoresearch for Computer Vision: Time Budget vs. Reduced Dataset + Epochs? #394

HyunKN · 2026-03-23T12:03:35Z

HyunKN
Mar 23, 2026

Hi everyone, and thanks for this amazing project!

I'm a beginner in this field and currently trying to learn as much as I can. I am planning to apply the autoresearch method to a Computer Vision (CV) task—specifically, Image-to-Text Retrieval with distillation for a simple research project. My goal is to tune configurations like loss weights, augmentations, and soft-labels, while keeping the model architecture strictly fixed.

In the README, under platform support, there is a great piece of advice regarding limited compute:

"To get half-decent results I'd use a dataset with a lot less entropy, e.g. this TinyStories dataset. ... Because the data is a lot narrower in scope, you will see reasonable results with a lot smaller models."

As I was setting up my script to use the strict 5-minute time budget on my full dataset, I realized a potential fundamental issue with CV training dynamics:
If I stop the training exactly at 5 minutes, my model would only process a very small fraction of the data (< 1 epoch). Because the model would never see the same image twice (no multi-epoch training), I'm worried that settings relying on repeated data exposure—like Data Augmentation, Hard Negative Mining, and EMA—might not converge properly. Could this cause the agent to incorrectly evaluate these settings as "bad" and discard them?

Since my model architecture is fixed (meaning the "speed penalty" of the time budget isn't a factor for me), I've been wondering if an alternative approach inspired by the TinyStories advice might work better for this specific case.

What if I use a Reduced Dataset + Full Epochs approach?
Instead of a strict 5-minute timer on the huge dataset, what if I pre-sample a highly representative small subset and instruct the agent to run a fixed, small number of epochs on it? I would tune the subset size and epoch count so that the total training time still fits within a fast 5-10 minute window perfectly aligning with the fast-iteration philosophy.

My thought process is that this approach might:

Allow PyTorch LR Schedulers (like CosineAnnealing) to complete their naturally intended cycles.
Give multi-epoch dynamics (Augmentation variants, EMA) enough time to activate and be evaluated properly.

Since I am currently setting up my GPU environment, I haven't run the full autoresearch loop yet but plan to do so very soon. Before I dive in, I would deeply appreciate your insights:
Does my reasoning make sense? Has anyone else tried adapting autoresearch for Vision tasks this way? I’d love to hear your opinions on substituting the "Strict Time Budget" with a "Strict Mini-Dataset + Epoch Budget" for hyperparameter tuning.

Thanks again for your time and the fantastic tool!
(P.S. English is not my first language, so I used translation tools to help write this. I hope my meaning is clear)

jelspace · 2026-03-23T15:23:45Z

jelspace
Mar 23, 2026

I am not checked the code yet but this is my understanding from what i read, without code

|Fantastic tool --> autoresearch which is 'auto architecture researcher'|
|Function probably is to research new and better training architecture|
|So it will work somehow with only part of the dataset|
|After you find best architecture you will train entire dataset using developed best architecture|

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapting autoresearch for Computer Vision: Time Budget vs. Reduced Dataset + Epochs? #394

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Adapting autoresearch for Computer Vision: Time Budget vs. Reduced Dataset + Epochs? #394

Uh oh!

HyunKN Mar 23, 2026

Replies: 1 comment

Uh oh!

Uh oh!

jelspace Mar 23, 2026

HyunKN
Mar 23, 2026

jelspace
Mar 23, 2026