Distractions and Mistakes The emotional roller coaster of research continues. Previously I was working on a paper where the results were mediocre, but I thought there was something useful there. I wanted to finish some kind of paper, even if it would never make it to a top conference.
NLP I want to do self-supervised learning, and I think NLP is better suited to this, even though I’ve worked on vision previously. There is something special about NLP where simple self-supervised objectives like next word or masked work prediction work well, compared to computer vision where more complex methods are needed.
What to work on? I spent the fall doing an internship at a hedge fund (Shell Street Labs) in Hong Kong, which was a great break from this open ended research.
My Deep Learning PhD Journey (so far) 2016-2017 Pre-PhD After messing around with some MMA fight prediction in my undergrad I became interested in machine learning, and decided the best route was to take a masters in statistics.
I spent the last few months interning at Shell Street Labs in Hong Kong (September-December 2019), and this opened my eyes to the mess that is media, propaganda, and group identity.
Limitations of Variational Autoencoders Deep learning models are powerful, but don’t always generalize well to out of distribution samples. For example, people have no problem understanding slightly rotated digits:
Rotated MNIST But Variational Autoencoders (VAE) do not generalize to this change in distribution this well.
Attention In the past the standard design of NLP models, was based on recurrent neural networks (RNN) to ensure the model can encode long range dependencies necessary in language modelling. This assumption was called into question with the development of the Transformer, a model where instead of RNNs, self-attention layers are used.
2D pose estimation has improved immensely over the past few years, partly because of wealth of data stemming from the ease of annotating any RGB video. 3D pose annotation is much more difficult because accurate 3D pose annotation requires using motion capture in indoor artificial settings.
Basic Set Up To do any deep learning, you’ll either need to have set up a VM at a cloud provider (AWS, Azure or paperspace, google, etc.), unless you happen to have a gpu lying around.
I completed the 10 day silent Vipassana Meditation course at the Dhamma Torana center outside Toronto, Canada. It was my first meditation course, and it was an interesting mix of enlightenment and a big test of self control.