Social Network Analysis 101
Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.
He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.
You can follow him on Twitter at mich8elwu.
To understand social network analysis (SNA), you must understand what a social network is, and what a social graph is. Simply put, SNA is the analysis of social networks and a social network is just a network of entities that are connected by the relationship among the entities. This concept has existed since humans began walking the earth. In fact, social networks exist even in many social animals beside humans (e.g. wolves, lions, dolphins, bats, and even ants).
Of course, the entities that interest us are people, and the relationships that are of particular interest include friendships (as in Facebook), colleagues (as in LinkedIn), kinship, communications, and several other social interactions. And in the context of SNA, you can think of a social graph as simply a diagram that represents the social network (I am not going to bore you with the formal definition of a graph). In a social graph, each dot (a.k.a. node or vertex) represents a person, and an edge between two dots (persons) represents a relationship between them. As there are many complex relationships among people, there are equally many different social graphs that represent these relationships. I will illustrate this with an example.
A Representative Social Network and Its Social Graphs
Let's suppose that I, Michael, have a very small social network consisting of only seven friends (see the names in figure 1). Suppose I have a very simple life, and I only have three types of social relationships in my life: colleagues at work (denoted by the red edges), beer buddies (blue edges), and badminton pals (green edges).
So what is my social life like? My social network consists of my colleagues at Lithium (Phil, and Joe, who obviously are also colleagues of each other). Before, I joined Lithium, I also worked with Jack and Ryan at UC Berkeley, and prior to that, I worked with Ryan and Don at the Los Alamos National Lab. Ryan came to Berkeley for his PhD with me, so we overlapped in two jobs. That is why Ryan also worked with both Jack and Don, but Jack and Don are not colleagues.
The other part of my social life consists of my beer buddies. I often went out for drinks with Doug, Adam and Ryan during grad school. But Ryan and Doug don't get along and never go out together. After I joined Lithium, I found out that Phil and Jack often go drinking too, but I've never gone drinking with either of them.
Finally, I love badminton. Everywhere I've worked, I've found a badminton pal. I have played with Joe at Lithium, with Jack at Berkeley, and with Don at Los Alamos. Ryan also plays, and has played with Phil and Doug. But they are much better than me, so I actually never play with Ryan, Phil, or Doug.
If my seven hypothetical friends are all on Facebook then the friendship graph would look like figure 2a. In this case, the black edges represent friendship, or just people who know each other. If you want to look at my professional network, that social graph looks like figure 2b. In this case, the red edges represent the colleague relationship. Note that Adam and Doug are not in my professional network (noticed the absence of red edges between us) because we have never worked together.
My beer buddy graph, figure 2c, (where the blue edges represent the relationship of drinking together) really only consists of Doug, Adam, and Ryan, since I have never been out drinking with my other friends. Even though Jack and Phil have been out drinking together, I've never been out with them, so there are no blue edges between us. So Jack and Phil are really on an entirely separate beer buddy network.
Finally, my badminton pal graph looks like figure 2d, and there the edges represent the relationship formed by playing badminton together. There, only Jack, Joe and Don are in my badminton pal network. Ryan has his own badminton pal network, which consists of Phil and Doug, but none of them are in my network.
Reading and Interpreting the Social Graph
Notice that we have constructed four different social graphs from a single social network of the same eight people. By specifying what relationship the edges represent, we get a very different graph with completely different graph metrics. For example, if the edges represent having fun together, then we can construct yet another social graph, and that graph will look like an overlay of my beer buddy graph and badminton pal graph (of course, working at Lithium is fun too, but I'm simplifying here). Since there are many complex relationships among people, many different social graphs can be constructed.
So the most important thing when reading a social graph is to find out what relationships are being represented by the edges. This is even more important than what the vertices represent, because for SNA, the entities represented by the vertices will usually be people. 99% of the graph metrics out there depend heavily on the edges, so if the edge relationships change, the metrics will also change.
For example, the simplest graph metric is the degree centrality, and it measures how many connections a vertex has. For example, there are seven black edges connecting to me on the friendship graph (figure 2a), so I have seven friends. But there are only five red edges connecting to me, so I have five colleagues. My degree centrality on the beer buddy graph (figure 2c) is three, so I only have three beer buddies. Degree centrality can be computed for all users in the graph, for example, Ryan's degree centrality on the badminton pal graph (figure 2d) is two.
The interpretation of the graph metric also depends on the edge relationship. So, you cannot say anything about how many colleagues I have based on the friendship graph (figure 2a), because the colleague relationship is not being represented in the friendship graph. Even if you assume that everyone I've worked with are my friends; using just the friendship graph, then number of colleagues I have can still be anywhere from zero to seven. Therefore, do not try to make any inference or conclusion based on a graph about any relationship that is not explicitly represented by the edges. If you do that, you might as well just flip a coin or make a random guess.
Please drop me a note if you have any questions or comments, or give me a kudo if you like this post. Now that you know how to read a social graph, next time I will begin a miniseries on the data analytics of social influence. I will begin with an analysis of the social influence process using a very simple model. Stay tuned!