What is the ralationship between 1.world models 2. machine theory of mind.3.curiosity learning ? In my view, they both learn to predict the variations of the […]
In the original transformer paper, it introduces scaled dot product[1]. In the recent simclr paper[2], it uses scaled cosine similarity, where it first computes cosine similarity […]