What I've learned from recording over 100 episodes
October 14, 2021
How to Optimize Your Branding
October 14, 2021

[D] Lower learning rate vs gradient accumulation.

I am trying to optimize a model similar to VQGAN. I am away from my RTX 3090 and stuck using an RTX 3070 right now. Normally I have used a batch size of 16 but with 8GB of memory I can only fit batches of size 4. This seems too low to me so I implemented some simple gradient accumulation to get an effective batch size of 16. What I want to know is would I be better off just using batches of size 4 with 1/4 the learning rate? Also could we take this to the extreme and use batches of size 1 with 1/16th the learning rate? Anyone experiment with this?

submitted by /u/chasep255
[link] [comments]


Comments are closed.