For almost All ML projects which had DL, I used AdamW and it just worked. So fucking well. So a few questions to fellow Redditors who might be training models frequently :
Do you use a different optimizer? Why?
Do you tune the Beta values?
Have you consciously ever chosen not to use Adam? Why?
I have seen some recent fancy optimizers like PCGrad but never found the need to use it. When did you use them if you had to?