Online economics
Category Archives: Web 2.0

One good thing about Microsoft

Check out this Google horror story, about a guy who was using Google services for his email, blog, calendar, etc. He fell victim to a phishing attack, and Google deleted his account — all his data gone. At first Google said they weren’t willing to restore it; eventually they did thanks to publicity and connections.

This is exactly the kind of thing that makes me cautious about using online services for anything important. There are currently fourteen web-based spreadsheet services available, and I’m not using any of them. It’s not because they’re no good, but because I worry about whether I’ll be able to access my data a few years down the track. One advantage of having a long-lived de facto monopoly like Microsoft is that, 13 years later, I can still easily open a file I created in Excel 95.

I think I’m not the only person who cares about being able to access his data in 10 or 20 years’ time. Hopefully the online applications providers will realise this and support open document standards, as well as letting me download my data and keep my own copy.

by aaron. Permalink. Comments (2). Comments RSS.

Social networking traffic growth model

Examining the empirical evidence on social network traffic, I have formulated a new theory: Social network traffic grows exponentially for about two years, and then follows a random walk.

Myspace:

myspace.png

Facebook:

facebook.png

by aaron. Permalink. Comments (2). Comments RSS.

The value of content filters

My main frustration with content-based sites like YouTube is that it’s hard to filter the content that interests me from all the rest. This led me to start thinking about two things: How valuable filtering is to users, and how to make better filters. I’ll try to explain some ideas I had about the first question.

I’ve been playing with the following simple model of preferences and filtering. Suppose that a user’s preferences for content on a site are represented by a number between 0 and 1. The number, call it x, represents the user’s ideal content, if only she could locate it on the site. However, the site has a range of different content. Without any filter, a user watches or listens to a random piece of content represented also by a number between 0 and 1, call it y. Receiving content that doesn’t exactly match the user’s preference makes her unhappy. She receives disutility (unhappiness) measured by the absolute distance between x and y.

Now we can calculate, for a user with a given x, their expected disutility if they receive a random piece of content, in the absence of any filter. For example, consider the user with preference x = 0.5. She has a 50% chance of receiving y < 0.5 and getting disutility 0.5 - y, and a 50% chance of receiving y > 0.5 and getting disutility y - 0.5. Evaluating over all possible values of y between 0 and 1, her expected disutility is 1/8.

For a user with an arbitrary x between 0 and 1, the disutility from a random piece of content turns out to be given by 0.5(x3 + (1 - x)3). Here’s a graph of this for users with different preferences:

filtervalue1.png

From this we can observe one thing about the value of filters:

Those with the most to gain from filtering content are those with the most extreme preferences.

In other words, users with x close to 0 or 1 suffer a lot from receiving a randomly chosen piece of content. This is because the content they receive can be quite far from their actual preference. Those in the middle, on the other hand, are not likely to suffer as much. Because their preferences are “middle of the road”, a random piece of content is less likely to be very far from their ideal.

Now suppose that we use a filter to split the content up into equal-sized chunks. Then suppose the users can choose the chunk that most closely matches their preference, but still receive a random piece of content within that chunk. Obviously if we could split the content into infinitely many chunks, each user would be able to receive her ideal content, and the gross value of this filtering to a user would equal the red line in the graph above.

However, it is likely to be costly (in terms of time and effort) for users to compare segments that are separated out by the filter. For example if a music site separated its content into 1,000 different genres, the burden on users to compare these choices would be high. Therefore:

Too fine filtering can also be bad, if it imposes too much evaluation cost on users.

We can combine the previous two conclusions to reach another one. The value of filtering differs across users and is highest with those for more extreme preferences. To the extent that it’s possible, we should customise our filtering to suit these preferences:

If possible, present finer segmentation to users with extreme preferences, and coarser segmentation to others.

Users with extreme preferences have more to gain from better segmentation, and so are willing to spend more effort evaluating a finer segmentation to find something that’s closer to their preferences.

Now let’s look at exactly what happens in the model I described above when we segment the content. Suppose we split the content into two groups, those located between 0 and 0.5, and those between 0.5 and 1. Then a user with x between 0 and 0.5 will choose content from the first group, and a user with x between 0.5 and 1 will choose content from the second group. Considering all random pieces of content from within each group, we can calculate the expected disutility of users with this filter applied.

Here’s a graph showing the expected disutility with the content divided into two groups (blue line) versus the disutility with no filtering (red line). The green line shows the difference between the two, ie the value to users of the filter:

filtervalue2.png

From this we see a slightly surprising thing: Those with “middle of the road” preferences are actually made worse off by this filter. The reason is that with this filter, those in the middle with no filter become the extremes with the content filtered into two groups, and thus when they receive a random piece of content from their chosen group, it’s more likely to be further away from their preference than a random piece of content when no filter was applied.

However, this effect goes away if we apply finer segmentation. Here’s the results with the content filtered into three groups:

filtervalue3.png

And here’s 10 groups:

filtervalue4.png

So we can say:

Coarse filtering can actually make users with non-extreme preferences worse off.

Thus, taking all the analysis together, it’s probably better to lean towards finer filtering, but not so fine that users’ costs of comparing the segments are too high. And customise the segmentation to people’s preferences, if that’s possible.

by aaron. Permalink. Comments (0). Comments RSS.

Burst my bubble

Check out this music video, it’s kinda funny:

by aaron. Permalink. Comments (0). Comments RSS.

Matching markets 2.0

In a previous post I talked about one reason why a social networking site like Facebook is valuable: the information that it collects about people, their preferences, and who their friends are in principle allows more efficient targeting of ads to viewers. In economics this is known as a ‘matching’ problem. You have two groups (viewers and advertisers) and you can create some value if you bring these two groups together. However, not all matches are created equal. A match will be more valuable if the two parties that are matched are better ’suited’ to each other, somehow.

Matching problems also come up for content distribution sites like Youtube and various music sites, as well as online retailers like Amazon. Here you want to match viewers or customers with content or products that are most suited to their tastes. Doing this more effectively will lead to more sales for Amazon, or will make Youtube users happier and make the site more popular so that Google can sell more ads on it. Whatever the business model, the basic point is that better matches means more profits.

Economics gives us a way of thinking about how to measure the value of matches. We can broadly distinguish people’s preferences and therefore the value of matches in two dimensions. The ‘vertical’ dimension measures quality. Everyone prefers a higher quality match to a lower quality one, everything else equal. The ‘horizontal’ dimension measures idiosyncratic differences in people’s preferences that can’t be distinguished as better or worse in a meaningful sense. For example, everyone probably agrees that a more fuel efficient car is of higher quality, everything else equal. Fuel efficiency is a measure of vertical quality. On the other hand, some people prefer red cars and some prefer blue, while there is no real sense in which one colour of car is better than the other. Preferences for colour lie in the horizontal dimension. To give another example, there are many videos of cats doing stupid things on Youtube. Some people like silly cat videos, and some people hate them. There are also cat videos that are lame, and some that are better quality. Youtube wants to (a) show funny cat videos to people who like watching funny cats, and (b) sort the crappy cat videos from the good quality ones. So within this vertical/horizontal framework, a better match is one that (a) more closely matches a person’s horizontal preferences, and (b) is of higher quality.

The crucial question, as I referred to in my earlier post about Facebook, is exactly how to do this matching in the most efficient way possible. The tried-and-tested approach is to use an editor and categorisation. An editor views all videos, places them in the appropriate category, and ranks videos according to his impression of quality. This centralised approach will probably work fine if the volume of content to be evaluated is not too huge. However, it’s infeasible for a site like Youtube with millions of videos. The editor model also doesn’t work so well when it comes to matching ads with viewers, unless the editor knows a lot about the preferences of his viewers (as in the case of a specialised magazine, for example).

The social structure of sites like Facebook and Youtube gives them an opportunity to improve on the editor model. As I see it, the idea is to set up some kind of mechanism to collect information about users, and then use some kind of algorithm to automatically work out the best matches, either between viewers and content, or website users and ads. In Facebook’s case the primary mechanism for collecting information is the ‘social network graph‘ that its users create. Sites like Youtube various music distribution sites are also trying to introduce a social aspect to their service. A ‘graph’ in this sense is a bunch of ‘nodes’ (people) and connections between nodes. If A is friends with B then there’s a link between the A-node and the B-node. Looking at the graph reveals who’s friends with who, and maybe by using some metrics on this graph it will be possible to feed the resulting information into a matching algorithm.

At this point, it’s hard to say much further without sitting down and trying to design some matching algorithms that use social network data or other user data to generate efficient matches. I do have a couple of thoughts about things that might affect the algorithm design though. One is that the ’star rating’ type of feedback system used by Youtube, for example, to rate videos confuses the horizontal and vertical dimensions that I discussed above. A funny cat video might get a high rating because a lot of people like watching cat videos, or because it is high quality. It’s hard to distinguish people’s horizontal and vertical preferences using a single rating scale. On the other hand, people might be confused by a two-dimensional rating system. I’m not sure if this problem can be overcome, but a matching algorithm based on user-generated rating data might have to take this into account.

A second issue is that you want to expose people to new content or new ads over time. The algorithm should have some random probability of showing new things to people, even if the new things are not highly rated or do not clearly match a person’s preferences. The result of these ‘experiments’ could be fed back into the algorithm to make dynamic improvements. The setting of this probability itself would be an important parameter in the algorithm. You don’t want to introduce too much noise because it reduces the quality of your matches, but you need a bit of noise at any given time to make improvements over time.

A third issue is that the algorithm needs to be as immune as possible to gaming by its users. Gaming isn’t really a problem for an algorithm that matches ads to users, I think. It’s not as if people are going to structure their Facebook friends network to manipulate the ads that they see on the website. But it is important for content-based sites like Youtube. You don’t want people to be able to manipulate your nicely-designed algorithm to make their content more popular to the detriment of the quality of matches that you’re generating. Social news site Digg has constantly struggled with this problem.

So I think I’ve raised more questions than answers, but it seems to me that there’s a lot of promise in using social network graphs and other data to improve matching algorithms. It will be interesting to see the actual algorithms that evolve.

by aaron. Permalink. Comments (2). Comments RSS.
© Copyright 26econ.com 2008