Amy Guy

Raw Blog

Wednesday, July 10, 2013

[Notes] Tommaso Di Noia at #SSSW2013

Tools and Techniques

Recommender systems

Input: Set of users + set of items + rating matrix.
Problem - given user, predict rating for an item.

In real world, recommendation matrix data is sparse.

Can use hybrid approaches.

Collaborative RS:

  • Like Amazon.
  • Based on other users with similar profiles.
  • Experimentally better than content-based, but you don't always have many users.

Knowledge-based RS:

  • No/little user history.
  • Based on domain knowledge.

User-based collaborative recommendation:

  • Pearson's correlation coefficient - baseline.
  • Imagine millions of users - computing similarities takes a lot of time.
  • So ..

Item-based collaborative recommendation:

  • Focus on items not users.
  • Compute similarity between each pair of items.
  • Don't have to compute similarity between items that don't have overlapping ratings.
  • Cosine similarity / adjusted cosine similarity (taking into account average rating related to a user to eliminate some bias).

Content-based RS:

  • Based on description of item 
  • and profile of user interests.


  • Items are described in terms of attributes/features.
  • Finite set of values associated with features.
  • Item representation is a vector.
  • Don't necessarily have complete descriptions of items - just have a 0 in your vector.


  • Similarity between items: 
    • Jaccard similarity.
    • Cosine similarity and TF-IDF (term frequency - inverse document frequency).
    • Batch compute similarities offline, then use similarities to compute ratings on the fly based on user profile.


  • Predict rate only for N nearest neighbours of items in user profile, that are not in the user profile.
  • An item is worth rating if more than x of N number of neighbours are within user profile.


Using LOD

To mitigate lack of information/descriptions about concepts/entities.

Recommender systems are usually vertical, but LD lets you easily build a multi-domain recommender system.

To avoid noisy data, you have to filter it before feeding your RS.

Freebase.


Tiapolo

  • Automating typing of DBPedia entities.


Vector space model for LOD

  • MATHS.