Replies: 7 comments 1 reply
-
|
When starting up initially, the Agent created a branch (named mar10). After one round of training, it changed ADAM_BETAS to (0.9, 0.95). |
Beta Was this translation helpful? Give feedback.
-
|
Because I set different data addresses, the Agent forgot the initially set data path after running for a long time, resulting in subsequent incorrect addresses. If you need to customize the training path, both program.md and prepare.py need to be modified, but I only modified prepare.py before. |
Beta Was this translation helpful? Give feedback.
-
|
Agent performed Fix cache path and change WINDOW_PATTERN to LLLL. |
Beta Was this translation helpful? Give feedback.
-
|
The Agent reduced WEIGHT_DECAY to 0.1. |
Beta Was this translation helpful? Give feedback.
-
|
Currently, using a sandbox and whitelisting git and uv can keep the Agent running continuously. However, currently, all are negative optimizations, and it may take a longer time to find the optimization direction (during this period, I did not carry out any training-related interventions). commit val_bpb memory_gb status description
c12eef7 1.138318 44.0 keep baseline
ceed389 1.143254 44.0 discard change WINDOW_PATTERN to LLLL
50c331c 1.145991 44.0 discard reduce WEIGHT_DECAY to 0.1
23c1832 1.144820 44.0 discard change ADAM_BETAS to (0.9, 0.98)
9267a60 1.166732 54.1 discard increase ASPECT_RATIO to 80
|
Beta Was this translation helpful? Give feedback.
-
|
Update 0312 commit val_bpb memory_gb status description
c12eef7 1.138318 44.0 keep baseline
...
9267a60 1.166732 54.1 discard increase ASPECT_RATIO to 80
d952656 1.118683 34.0 keep decrease ASPECT_RATIO to 48
9ec0584 1.116784 34.0 keep increase MATRIX_LR to 0.045 and EMBEDDING_LR to 0.7
824126c 1.117583 34.0 discard increase WARMDOWN_RATIO to 0.6
ba4a71a 1.116779 34.0 keep increase UNEMBEDDING_LR to 0.005 |
Beta Was this translation helpful? Give feedback.
-
|
A relatively significant improvement has been achieved this time, from 1.116445 -> 1.031226 commit val_bpb memory_gb status description
c12eef7 1.138318 44.0 keep baseline
...
824126c 1.117583 34.0 discard increase WARMDOWN_RATIO to 0.6
ba4a71a 1.116779 34.0 keep increase UNEMBEDDING_LR to 0.005
a8735cc 1.116445 34.0 keep increase UNEMBEDDING_LR to 0.006
6fc918a 1.117362 34.0 discard increase SCALAR_LR to 0.6
021b2cb 1.116960 34.0 discard increase WARMUP_RATIO to 0.05
f585655 1.094303 45.0 keep DEPTH=12, ASPECT_RATIO=32
0db0786 1.111870 58.0 discard DEPTH=16, ASPECT_RATIO=24
5ad95cd 1.095593 45.0 discard increase MATRIX_LR to 0.05
85f304b 1.097688 45.0 discard change WINDOW_PATTERN to SLL
d506eb1 1.095182 45.0 discard decrease WEIGHT_DECAY to 0.15
7efb159 1.031226 45.0 keep Convergence test: increase TIME_BUDGET to 600
86411d0 1.038559 58.0 discard Convergence test (DEPTH=16): increase TIME_BUDGET to 600 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I used TRAE CN as the LLM backend, granted all permissions to the Agent, conducted autonomous training tests on the H20 device, and continuously updated the results.
Beta Was this translation helpful? Give feedback.
All reactions