Online recommendations

Since January 12, 2012 I’ve been slowly working on my master thesis called Online recommendations at web-scale using matrix factorisation. Today I successfully defended it and can happily say I’m satisified with the results.

Over the course of the semester this blog has served as a place to vent ideas and clarify problems for myself. Perhaps most of all it has been an experiment where I could document my progress. I wanted to, in retrospect, be able to see how my perception of the problem changed over time. As I learned more and more about the problem, how did my understanding change? What decisions led to progress and when did they not? Essentially it has been a tool for personal reflection on my learning process. After I have let the last few weeks sink in a bit, I will try to do a summary on my personal blog.

Anyway, for those of you who are interested, you can download a full copy of the thesis and read all about its juicy details. If you have any questions about the work, don’t hesitate to shoot me an e-mail at marcus@ljungblad.nu.

Abstract

In social networks, e-commerce systems, and other web-services the sheer size of available content is overwhelming. Highlighting relevant content is the focus of recommender systems. Most previous research in the area has provided several algorithms for personalising the user experience, but few have addressed the issues of scalability. In this study we show how matrix factorisation, one of the more accurate recommendation techniques, can be used to serve recommendations online for millions of items and millions of users. An approach based on dividing all available items in clusters and restricting the computation to a selected few is outlined. Consequently, we developed a prototype using requirements from a production environment to demonstrate its feasability. Experimental results show that 600 recommendation requests per second can be served with a latency below 30 ms. We conclude that matrix factorisation can be used online in large-scale settings but specific care has to be taken when clustering the items.

And though it may not make much sense without me talking, here are the slides from this morning’s defense.

The presentation

This will also mark the last post on this blog. From now on you can only find me on http://ljungblad.nu.

So long and thanks for all the fish!

Posted on 22 Jun 2012 in notes

Older posts

Average precision - 21 Jun 2012 in notes
Thesis delivered - 19 Jun 2012 in notes
Writing writing - 11 Jun 2012 in notes
Iterative writing - 11 Jun 2012 in notes
The value of instrumentation - 23 May 2012 in notes
Progress! - 23 May 2012 in notes
When all else fail - 15 May 2012 in results
Unix tools - 15 May 2012 in notes
Weaving the fabric - 12 May 2012 in notes
Configuration management - 08 May 2012 in rant
Balancing the cluster - 04 May 2012 in notes
Towards distributed evaluation - 03 May 2012 in notes
Illustrating matrix factorisation - 02 May 2012 in notes
Planning evaluation - 30 Apr 2012 in notes
More on Evaluation - 26 Apr 2012 in notes
How to evaluate a recommendation system? - 24 Apr 2012 in notes
Follow your guts - 23 Apr 2012 in notes
Towards real-world testing - 19 Apr 2012 in notes
Performance evaluation with JMeter - 19 Apr 2012 in notes
First user interface - 17 Apr 2012 in notes
Working with Scala - 16 Apr 2012 in notes
New popular items - 12 Apr 2012 in notes
REST confusion (again) - 05 Apr 2012 in notes
Supervisor meeting - 03 Apr 2012 in notes
Paper review - Fast Top-k retrieval for Model Based Recommendation - 02 Apr 2012 in review
Rewriting the core - 29 Mar 2012 in code
Re-run with bigger dataset - 27 Mar 2012 in notes
Writing every day - 22 Mar 2012 in links
Iteration 2 - Routing - 20 Mar 2012 in notes
Code coverage in Scala - 20 Mar 2012 in notes
Routing to the most relevant itemset(s) - 15 Mar 2012 in notes
Work process - 08 Mar 2012 in ideas
Worth migrating from Akka 1.3 to 2.0? - 08 Mar 2012 in notes
Status update - 05 Mar 2012 in notes
Mind your language - 05 Mar 2012 in random
Curse of Dimensionality - 02 Mar 2012 in notes
Meeting my supervisor - 01 Mar 2012 in notes
Finding a needle in a haystack - 27 Feb 2012 in notes
Handling failures - 20 Feb 2012 in notes
Drawing sequence diagrams - 14 Feb 2012 in tips
Math libraries (cont) - 09 Feb 2012 in notes
Evaluating math libraries - 09 Feb 2012 in notes
Recommendations from a philosophical view - 07 Feb 2012 in thoughts
On-line computation cost - 06 Feb 2012 in notes
Load test prototype - 03 Feb 2012 in notes
Reducing dimensions of the problem - 02 Feb 2012 in notes
Usage analysis - 31 Jan 2012 in notes
A set of requirements for a recommendation framework - 30 Jan 2012 in ideas
Architecting Recommendation Systems for Web-Scale Data - 27 Jan 2012 in ideas
A day of tutorials and code - 25 Jan 2012 in notes
Production recommender systems - 24 Jan 2012 in links
Mahout vs GraphLab - 23 Jan 2012 in notes
Head-banging - 20 Jan 2012 in notes
Survey paper on CF recommendation algorithms - 19 Jan 2012 in notes
Summary of "Google News Personalization Scalable Online Collaborative Filtering" - 19 Jan 2012 in summaries
Motivating my thesis topic - 19 Jan 2012 in notes
Summary of "A case for distributed recommender system architecture" - 18 Jan 2012 in summaries
More Matrix Factorization - 17 Jan 2012 in maths
Time Computing vs Accuracy - 16 Jan 2012 in ideas
Singular Value Decomposition - 16 Jan 2012 in maths
Summary of Toward the Next Generation of Recommender Systems - 13 Jan 2012 in summaries
First day at Tuenti - 12 Jan 2012 in random
Entry 4 - 07 Jan 2012 in random
Entry 3 - 06 Jan 2012 in references
Entry 2 - 25 Dec 2011 in references
Entry 1 - 15 Dec 2011 in random

Listing all posts