Matching markets 2.0
In a previous post I talked about one reason why a social networking site like Facebook is valuable: the information that it collects about people, their preferences, and who their friends are in principle allows more efficient targeting of ads to viewers. In economics this is known as a ‘matching’ problem. You have two groups (viewers and advertisers) and you can create some value if you bring these two groups together. However, not all matches are created equal. A match will be more valuable if the two parties that are matched are better ’suited’ to each other, somehow.
Matching problems also come up for content distribution sites like Youtube and various music sites, as well as online retailers like Amazon. Here you want to match viewers or customers with content or products that are most suited to their tastes. Doing this more effectively will lead to more sales for Amazon, or will make Youtube users happier and make the site more popular so that Google can sell more ads on it. Whatever the business model, the basic point is that better matches means more profits.
Economics gives us a way of thinking about how to measure the value of matches. We can broadly distinguish people’s preferences and therefore the value of matches in two dimensions. The ‘vertical’ dimension measures quality. Everyone prefers a higher quality match to a lower quality one, everything else equal. The ‘horizontal’ dimension measures idiosyncratic differences in people’s preferences that can’t be distinguished as better or worse in a meaningful sense. For example, everyone probably agrees that a more fuel efficient car is of higher quality, everything else equal. Fuel efficiency is a measure of vertical quality. On the other hand, some people prefer red cars and some prefer blue, while there is no real sense in which one colour of car is better than the other. Preferences for colour lie in the horizontal dimension. To give another example, there are many videos of cats doing stupid things on Youtube. Some people like silly cat videos, and some people hate them. There are also cat videos that are lame, and some that are better quality. Youtube wants to (a) show funny cat videos to people who like watching funny cats, and (b) sort the crappy cat videos from the good quality ones. So within this vertical/horizontal framework, a better match is one that (a) more closely matches a person’s horizontal preferences, and (b) is of higher quality.
The crucial question, as I referred to in my earlier post about Facebook, is exactly how to do this matching in the most efficient way possible. The tried-and-tested approach is to use an editor and categorisation. An editor views all videos, places them in the appropriate category, and ranks videos according to his impression of quality. This centralised approach will probably work fine if the volume of content to be evaluated is not too huge. However, it’s infeasible for a site like Youtube with millions of videos. The editor model also doesn’t work so well when it comes to matching ads with viewers, unless the editor knows a lot about the preferences of his viewers (as in the case of a specialised magazine, for example).
The social structure of sites like Facebook and Youtube gives them an opportunity to improve on the editor model. As I see it, the idea is to set up some kind of mechanism to collect information about users, and then use some kind of algorithm to automatically work out the best matches, either between viewers and content, or website users and ads. In Facebook’s case the primary mechanism for collecting information is the ‘social network graph‘ that its users create. Sites like Youtube various music distribution sites are also trying to introduce a social aspect to their service. A ‘graph’ in this sense is a bunch of ‘nodes’ (people) and connections between nodes. If A is friends with B then there’s a link between the A-node and the B-node. Looking at the graph reveals who’s friends with who, and maybe by using some metrics on this graph it will be possible to feed the resulting information into a matching algorithm.
At this point, it’s hard to say much further without sitting down and trying to design some matching algorithms that use social network data or other user data to generate efficient matches. I do have a couple of thoughts about things that might affect the algorithm design though. One is that the ’star rating’ type of feedback system used by Youtube, for example, to rate videos confuses the horizontal and vertical dimensions that I discussed above. A funny cat video might get a high rating because a lot of people like watching cat videos, or because it is high quality. It’s hard to distinguish people’s horizontal and vertical preferences using a single rating scale. On the other hand, people might be confused by a two-dimensional rating system. I’m not sure if this problem can be overcome, but a matching algorithm based on user-generated rating data might have to take this into account.
A second issue is that you want to expose people to new content or new ads over time. The algorithm should have some random probability of showing new things to people, even if the new things are not highly rated or do not clearly match a person’s preferences. The result of these ‘experiments’ could be fed back into the algorithm to make dynamic improvements. The setting of this probability itself would be an important parameter in the algorithm. You don’t want to introduce too much noise because it reduces the quality of your matches, but you need a bit of noise at any given time to make improvements over time.
A third issue is that the algorithm needs to be as immune as possible to gaming by its users. Gaming isn’t really a problem for an algorithm that matches ads to users, I think. It’s not as if people are going to structure their Facebook friends network to manipulate the ads that they see on the website. But it is important for content-based sites like Youtube. You don’t want people to be able to manipulate your nicely-designed algorithm to make their content more popular to the detriment of the quality of matches that you’re generating. Social news site Digg has constantly struggled with this problem.
So I think I’ve raised more questions than answers, but it seems to me that there’s a lot of promise in using social network graphs and other data to improve matching algorithms. It will be interesting to see the actual algorithms that evolve.
2 Comments
Your post got me thinking about blogging. Unless a blog author doesn’t give a hoot about her audience, she will try to make the content of her blog match the preferences of her readers.
As you say, there are two dimensions of the quality of a blog post. You can measure the popularity of a blog or even a particular blog post by looking at its number of visitors or its Technorati ranking, for example. But how do you know whether an increase in viewers is the result of better quality overall (vertical dimension) or a better match to the audience’s interests (horizontal dimension)?
Should a blogger write “experimental” posts, exploring the preferences of the established audience, or trying to broaden it? Is that why sometimes you’ll find blog posts that don’t match the general content of a blog? Or is it just that the author doesn’t care?
Francisco: Good points. As you said, it’s hard to know whether a blog becomes more popular because its quality improves or it becomes more closely matched to people’s tastes. Also, a blog could be popular because it covers quite general topics (so many people are interested in it to some extent), or because it matches the specific tastes of a particular group. Again it would be hard to know which is which.
I do think experimental posts is a good idea. I certainly do keep track of which posts on my blog attract more views, so that I can get some idea of what the tastes of my readers are.