Lately I’m reading through AI algorithms and I came across a goody which a lot of developers want to have because everybody uses it all the time and it’s really useful and that is a recommendation system.
Best known through sites like amazon or youtube: You look at A and the website tells you: oh by the way people who looked at A – also looked at B.
So the system relies on data based on the actual behavior of its users. The data comes from the behavior of the users and needs to be aggregated and then evaluated within an endless process.
The algorithms used in this area come from linear algebra and a bunch of stochastics. Typically a set of n-features is represented within an n-dimensional space and to find related entities you compute a clustering of the features or their shortest distance in n-dimensional space.
But let’s don’t go that road yet, because everybody can build a recommendation engine with some very simple ideas.
Let’s don’t care about scalability while doing that because that is a another story because the amount of data and processing time can get really big.
We don’t care about that here. Let’s only focus on some very simple ideas.
We want to build a recommendation system for a shop. We have Users and we have Items.
Each time a user X looks at an item A we want to recommend n items that other users also looked at when they looked at item A because we – of course – want that the users buy more.
1.Step: Track what all users view.
2.Step: Track for each item which users viewed them.
3.Step: Aggregate the users common items with occurrence count.
- Look up all n users which looked at the item x.
- Look up all items which the users of item x viewed and merge the items into a new data structure while count each items occurrence.
- Order the occurrences by descending order.
4.Step: Take the top n results as recommendation.
The good thing about this algorithm is that you can pre-compute the recommendations for item x and you can continuously keep updating them. They don’t have to be perfectly up to date, they only get better if you keep updating them.
But there are a lot of down sides.
- Most annoying thing is that you need to remove deleted items from all users x. You can lazily remove them by adding a hash table with delete items and while making a union looking up each item if that item has been marked as deleted and then remove it from the user’s x list.
- The computation is simple but needs space and time.
Next time we talk about the space and time complexity and improvements. It depends on its implementation.