I am trying to optimize a model similar to VQGAN. I am away from my RTX 3090 and stuck using an RTX 3070 right now. Normally I have used a batch size of 16 but with 8GB of memory I can only fit batches of size 4. This seems too low to me so I implemented some simple gradient accumulation to get an effective batch size of 16. What I want to know is would I be better off just using batches of size 4 with 1/4 the learning rate? Also could we take this to the extreme and use batches of size 1 with 1/16th the learning rate? Anyone experiment with this?