First whole week at Tuenti completed. Some progress, and a bunch of questions. Here’s the short crackdown:
- Acquired an understanding of the problem with recommendations, why it is useful, where it is used, who the big players are, and how it has evolved over the years.
- Learned about matrix factorization techniques for predicting recommendations. Although I’m not sure I fully understand how it is done, I get the gist. Probably wouldn’t attempt to implement one of the more complicated algorithms on my own yet.
- Tweaked and refactored some code which does simple matrix factorization on very small datasets, and in a sequential manner. Check the refactored code on Github.
- Read numerous papers about algorithms (many of which I don’t understand, but able to classify).
- Read the few papers I could find on recommendation systems architecture. Google News personalisation being the most prominent one.
- Studied the Mahout architecture; a machine learning framework which is able to utilize Hadoop for larger datasets. It works offline only (although Taste, the recommendation framework, is pluggable for online recommendations too) and does not support model-based algorithms yet, afaik.
- Discovered that there isn’t a whole lot of research, papers, or blogposts on the actual implementations of recommendation systems. Good for me, I guess.
- Began studying Tuenti’s internal architecture based on technical specifications and requirement documents. Much of it I will never cover here. Next week a colleague is giving me and Toni a walkthrough of the essentials. Will be very interesting as a lot of things here really are at large scale. Things we only studied in papers before.
- Rewrote my thesis position paper based on this week’s findings. As soon as the Tuenti architecture background is complete I should be able to finalise it and send it for a final review. My UPC supervisor isn’t responding to e-mails at the moment (and hasn’t for a week) which is an issue though.
- Got some rough ideas on a possible system design if I could do it entirely as I want. That’s obviously neither a good idea, nor feasible. But sketching is always fun.
It’s been a lot this week. Although I still haven’t been able to exactly specify the topic and the work, I’m getting more confident that we’ll find something. The best part is that basically whatever I look at is really interesting. Perhaps with an exception for some of the complicated math algorithms.
Now pizza.