Notes on machine learning, recommender systems, and the things I've had to think through carefully.
What does it actually mean for a model to "remember" something? A first-principles look at the memory problem in sequential modeling, before any architecture enters the picture.
Why did we move from RNNs to LSTMs? What was attention actually solving that LSTMs couldn't? A step-by-step look at the design decisions behind each architecture, and the failure each one was built to fix.