:

Why do my friends have more friends than I do?

Maple

My daughter the psychiatrist recently shared a link with me that mentioned a factoid about Facebook: "84 per cent of people think their friends have more friends than they do".  Actually they don't just think this: for 84 percent of Facebook users, the median friend count of their friends is higher than their own friend count, according to a recent study

The basic reason for this is quite easy to understand: if you pick a random person A, people who have a large number of friends are more likely to be A's friend than are people who have few friends.  In fact, if friending occurred at random, their probability of being A's friend would be essentially proportional to the number of friends they have.  So the friend counts of A's friends will tend to be higher than A's own friend count.

Will it really be as high as 84%?  Time to get out Maple and try some simulations, I guess.  I won't be able to do anything on the scale of the actual Facebook graph (721 million active users), but let's try some smaller random graphs.  The RandomGraph command in the GraphTheory[RandomGraphs] package makes it easy to generate them.  I'll also want the Statistics package to do some analysis of the data.

`> with(GraphTheory): with(RandomGraphs): with(Statistics):`

The vertices of my graph will correspond to the people in a social network; an edge joins two people if they are friends.

I'll write a little function that will check whether the median degree of the neighbours of a vertex is greater than the degree of the vertex itself.

`> fewfriends:= v -> evalb(Median([seq(Degree(G,w),w=Neighbors(G,v))]) > Degree(G,v));`

My random graph might have some vertices with no neighbours: I'd better leave those out, as the median of an empty set would be undefined.  Presumably the Facebook study also left out anybody who had no Facebook friends.

`> hasfriends:= v -> evalb(Degree(G,v) > 0);`

I arbitrarily decided to look at random graphs with 200 vertices, where each pair of vertices corresponds to an edge with probability 0.05.  For each graph I generate, I'll select those vertices that have neighbours according to hasfriends, and take as my data point the fraction of those for which fewfriends returns true.  I'll do this 200 times, to generate 200 data points.

`> data:= Array(1..200):for i from 1 to 200 do  G:= RandomGraph(200, 0.05);  V:= select(hasfriends,Vertices(G));  data[i]:= nops(select(fewfriends,V))/nops(V);end do:`

Here are the minimum, mean and maximum values I found:

`> evalf([min(data), Mean(data), max(data)]);`

And a histogram:

Clearly we have a real effect: more than half of the people in our "social network" have lower friend count than the median of their friends.  But it's nowhere near 84%.  It seems Facebook is structured differently than the kind of random graph I was looking at: in the latter, most vertices will have degrees close to the average, with a binomial (and thus close-to-normal) distribution.  Facebook has a more interesting distribution of friend counts, with significant populations having very high or low friend counts.  To model that will require a more sophisticated "random graph" construction.  Stay tuned for further developments...

﻿