Blog Post

Release Notes
5 MIN READ

Social Network Analysis 101

MikeW's avatar
MikeW
Lithium Alumni (Retired)
15 years ago

Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.

 

He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.

 

You can follow him on Twitter at mich8elwu.

 


 

To understand social network analysis (SNA), you must understand what a social network is, and what a social graph is. Simply put, SNA is the analysis of social networks and a social network is just a network of entities that are connected by the relationship among the entities. This concept has existed since humans began walking the earth. In fact, social networks exist even in many social animals beside humans (e.g. wolves, lions, dolphins, bats, and even ants).

 

Of course, the entities that interest us are people, and the relationships that are of particular interest include friendships (as in Facebook), colleagues (as in LinkedIn), kinship, communications, and several other social interactions. And in the context of SNA, you can think of a social graph as simply a diagram that represents the social network (I am not going to bore you with the formal definition of a graph). In a social graph, each dot (a.k.a. node or vertex) represents a person, and an edge between two dots (persons) represents a relationship between them. As there are many complex relationships among people, there are equally many different social graphs that represent these relationships. I will illustrate this with an example.

 

A Representative Social Network and Its Social Graphs

Let's suppose that I, Michael, have a very small social network consisting of only seven friends (see the names in figure 1). Suppose I have a very simple life, and I only have three types of social relationships in my life: colleagues at work (denoted by the red edges), beer buddies (blue edges), and badminton pals (green edges).

 

So what is my social life like? My social network consists of my colleagues at Lithium (Phil, and Joe, who obviously are also colleagues of each other). Before, I joined Lithium, I also worked with Jack and Ryan at UC Berkeley, and prior to that, I worked with Ryan and Don at the Los Alamos National Lab. Ryan came to Berkeley for his PhD with me, so we overlapped in two jobs. That is why Ryan also worked with both Jack and Don, but Jack and Don are not colleagues.

 

The other part of my social life consists of my beer buddies. I often went out for drinks with Doug, Adam and Ryan during grad school. But Ryan and Doug don't get along and never go out together. After I joined Lithium, I found out that Phil and Jack often go drinking too, but I've never gone drinking with either of them.

 

Finally, I love badminton. Everywhere I've worked, I've found a badminton pal. I have played with Joe at Lithium, with Jack at Berkeley, and with Don at Los Alamos. Ryan also plays, and has played with Phil and Doug. But they are much better than me, so I actually never play with Ryan, Phil, or Doug.

 

If my seven hypothetical friends are all on Facebook then the friendship graph would look like figure 2a. In this case, the black edges represent friendship, or just people who know each other. If you want to look at my professional network, that social graph looks like figure 2b. In this case, the red edges represent the colleague relationship. Note that Adam and Doug are not in my professional network (noticed the absence of red edges between us) because we have never worked together.

 

My beer buddy graph, figure 2c, (where the blue edges represent the relationship of drinking together) really only consists of Doug, Adam, and Ryan, since I have never been out drinking with my other friends. Even though Jack and Phil have been out drinking together, I've never been out with them, so there are no blue edges between us. So Jack and Phil are really on an entirely separate beer buddy network.

 

Finally, my badminton pal graph looks like figure 2d, and there the edges represent the relationship formed by playing badminton together. There, only Jack, Joe and Don are in my badminton pal network. Ryan has his own badminton pal network, which consists of Phil and Doug, but none of them are in my network.

 

Reading and Interpreting the Social Graph

Notice that we have constructed four different social graphs from a single social network of the same eight people. By specifying what relationship the edges represent, we get a very different graph with completely different graph metrics. For example, if the edges represent having fun together, then we can construct yet another social graph, and that graph will look like an overlay of my beer buddy graph and badminton pal graph (of course, working at Lithium is fun too, but I'm simplifying here). Since there are many complex relationships among people, many different social graphs can be constructed.

 

So the most important thing when reading a social graph is to find out what relationships are being represented by the edges. This is even more important than what the vertices represent, because for SNA, the entities represented by the vertices will usually be people. 99% of the graph metrics out there depend heavily on the edges, so if the edge relationships change, the metrics will also change.

 

For example, the simplest graph metric is the degree centrality, and it measures how many connections a vertex has. For example, there are seven black edges connecting to me on the friendship graph (figure 2a), so I have seven friends. But there are only five red edges connecting to me, so I have five colleagues. My degree centrality on the beer buddy graph (figure 2c) is three, so I only have three beer buddies. Degree centrality can be computed for all users in the graph, for example, Ryan's degree centrality on the badminton pal graph (figure 2d) is two.

 

The interpretation of the graph metric also depends on the edge relationship. So, you cannot say anything about how many colleagues I have based on the friendship graph (figure 2a), because the colleague relationship is not being represented in the friendship graph. Even if you assume that everyone I've worked with are my friends; using just the friendship graph, then number of colleagues I have can still be anywhere from zero to seven. Therefore, do not try to make any inference or conclusion based on a graph about any relationship that is not explicitly represented by the edges. If you do that, you might as well just flip a coin or make a random guess.

 

Please drop me a note if you have any questions or comments, or give me a kudo if you like this post. Now that you know how to read a social graph, next time I will begin a miniseries on the data analytics of social influence. I will begin with an analysis of the social influence process using a very simple model. Stay tuned!

 

 

Updated 7 months ago
Version 8.0

9 Comments

  • KevinC's avatar
    KevinC
    Lithium Alumni (Retired)
    15 years ago

    I find this article to be an excellent plain-language explanation of the importance of context when assessing these social relationships in a business environment. For instance if you are a product line manager, it is not enough to know that there are a high volume of conversations on your community.

    What you really need to know is what specific products or issues are experiencing the highest conversational volumes.

    Taking it a step further, additional context will help you uncover which individuals are the experts or go-to resources for specific products and areas of expertise.

    Contextual detail will also unearth primary recommenders and detractors.

    This type of context can not really be meaningfully derived from a static website, freeware forum or support portal.

    kudos MikeW, thanks for posting this.

  • Dr. Wu,

     

    There are so many aspects of this one could discuss.   One that I thought you might be getting at, when you modeled the facebook vs LinkedIn is the venue.  

     

    In my world, I see some of my community members not only have relationships within our community, but they also interact on other communities, read each other's blogs, follow and RT on twitter, and share content on facebook and comment or like what each other say there. They do this in different combinations.  No doubt there are several more networks that they are on in various combinations which I'm not following.  I can only keep up with a tiny fraction of the community and greater social eco-system.  What does it look like?

     

    Because I'm a simple kind of guy, I'm thinking about the social platform as a layer and seeing which members exist in each layer and how they interact (creating an edge?) and how those might stack up to form a 3 dimensional model.   Quickly though, this gets beyond human ability to neatly diagram it out, or keep up with it in our head.  

     

    One could imagine how the Lithium platform might evolve to support this kind of model, to be able to visualize interactions within the community, into facebook, twitter, linkedIn - adding layers to the model.....

     

    Mark

  • MikeW's avatar
    MikeW
    Lithium Alumni (Retired)
    15 years ago

    Hello Kevin,

    Thank you for the comment. I should thank Mark_Hopkins for asking me to cover this topic.

    You are right on. The context is extremely important when reading the social graph. In fact, most of the analytics we do and metrics we compute are context sensitive. That is why, we construct our social graph based on relevant context. Well, this will be the topic that I will cover in the next article in this series. So I don’t want to give away too much information here. Stay tuned...

    Thanks for the kudo!

  • MikeW's avatar
    MikeW
    Lithium Alumni (Retired)
    15 years ago

    Hello Mark,

     

    Thank you for the comment...

     

    What you said is very true. The community is only one channel for people to interact. So we only see a subset of the people’s social network through the community. Thus, we only see a subset of relationships there.


    This is exactly the same situation in the real world. For example, our colleagues only see the relationship that we establish at work: who is my teammates, who works under me, who is my manager, etc. But they may not see our family-relationship or our friend-relationship, unless that colleague also happen to be a friend of your as well.


    That is the reason why we are creating the social engagement center (Some of you out there are already on the beta program for this product feature). The hope is that we can see some slices of the relationship across different social media channels that are relevant to a brand or a company.


    I agree that visualizing the social network beyond communities may be very nice, but I am certain if that will be very useful or meaningful. Because, as we learned from social anthropology that human have an inherent limit on how many relationships they can keep track. This limit is known as the Dunbar's number, which is around 150. It says that human can only keep track of ALL the relationships among 150 people. Today social media appears to increase the number of relationships that we can maintain, but our brain did not changed. We simply know less about each relationship.


    But that is OK, in most case we only need to know the tiny but relevant bits of relationships about someone. At work we only need to know about our colleagues work-related relationships, and we do not need to know about their family and friends. That is why social media, social network sites and communities focus on only one specific relationship at a time (i.e. Facebook is for friendship, LinkedIn is for work-business related relationships, communities are for relationships that is form by a common interest). 


    If we try to overlay the relationship for all social media channels, that social graph would not be readable. You can see that with my toy example in this post. Even with an overly simplistic social network of my 7 hypothetical friends that only have 3 kinds of relationships, the social graph of my entire social network is an overlay of my LinkedIn social graph, my beer buddy graph, and my badminton pal graph. If I want to keep all these relationships visible, we get (figure 1), which is kind of hard to read. If we collapse all these relationship together and just call them friendship, then we get (figure 2a). But you also lose the detail information about who is colleague, who is beer buddy, and who is your badminton pal.


    It really is a tradeoff of how much detail you want to know about how many people. It is not possible to know a lot of details for all people. Our brains just cannot process that much information, even if we can visualize them.


    Moreover, we can't really vizualize everything at once. Think about this: Facebook has 400 million users, your computer monitor don’t even have enough pixels to display a tiny fraction of the users in that network (even if you represent each user as 1 pixel dot). If we want to draw all the lines connecting between all the dots, the whole screen would be pitch black full of all the edges on top of each other, which is clearly meaningless and useless.

  • Hugh Macken's avatar
    Hugh Macken
    15 years ago

    Dr. Wu,

    I'm quite impressed by the graphs. I am very interested in knowing whether you are familiar with how to build demographic profile data based off of conversations. Are you able to conduct what I suppose might be called persona analysis? So for example, on twitter, let's say I tweet the following:

     

    ---

    my wife is amazing. she is always encouraging me.

     

    listening to bruce springstein on pandora

     

    golfing with the guys

     

    sitting in a board meeting at Church

     

    i wish journalists would talk more about the crisis in Haiti

     

    my staff is having too much fun today :)

     

    can't wait to get back home for some authentic New York pizza

     

    -

     

    Using "social media persona analysis" I could deduce (with some margin for error of course):

    Hugh is male

    Hugh is employed

    Hugh is married

    Hugh plays golf

    Hugh manages employees

    He is a practicing Christian

    He likes listening to rock music

    He is socially conscious

    He is from New York

     

    So here is my question. Let us say that my client is Apple. If my clients were looking to engage with key INFLUENCERS in relation to the ipad, they could use SNA. Correct?

     

    But if they wanted to reach out to prospective CUSTOMERS (many of whom are not influencers)  who might be interested in PURCHASING the ipad then I suspect they would determine the typical buyer persona and then look for people online with that persona. They might also overlay the SNA influence graph as well. Does that make sense?

     

    So if the persona they came up with was:

     

    Urban lifestyle

    Married

    Employed

    Salary: Above $50k

     

    The Hugh would be identified as a "prospect."

     

    **So ultimately my question: Are you able to do this type of "persona analysis" in addition to SNA?  Does such an analysis type exist? If so, what is its proper name?

     

    Thanks so much for your help!

    Sincerely,

    Hugh

  • MikeW's avatar
    MikeW
    Lithium Alumni (Retired)
    15 years ago

    Hello Hugh,

     

    Thank you for asking this very interesting question.

     

    The kind of “persona analysis” you mentioned can definitely be done, but it is pretty much a manual process now. I am not aware of any system out there that does it for you. It is not a trivial problem. Most business will probably find it easier to get the demographic data from other sources than to infer them directly from the conversation.

     

    I am not sure what your background is. The following answer might be a bit technical. If so, please let me know, I will try to water down it a bit, but since you ask such a great question, I will give my full blown response first.

     

    If you were to build such analytical system, you will need 2 tools.

     

    1. The system must have a linguistic component that does Natural Language Processing (NLP). This component will understand the grammar of the conversation and try to do some simple part-of-speech (POS) tagging, entity extraction, etc. This lets you reliably identify things like subject, object, in the conversation so the system can understand what the conversation is about.

     

    2. Then you will need to implement a latent variable inference component. There is a class of model in statistics call the Latent Variable Model (LVM) that infers a latent (unobserved) variable from data collected from a series of observed variables.

     

    Then you will need to set up the inference for your target persona one variable at a time. Some of these variables will be binary (e.g. using your example, married/not married, or employed/unemployed). Some of them will have categorical variable. (e.g. life style in your example). For these variables, you will need examples for each possible kinds of life style to train the inference engine. Some of them will have numerical values (e.g salary in your example). For these variables, you will need to either bin the values into ranges 30K-50K, 50K-75K, 75K-100K, etc. If you want very fine granularity, you can use a continuous variable, but then you will have to perform a regression analysis in addition to the LVM.

     

    Then you set up your LVM model using the NLP result as your output (the observed variable) and the persona variable as the unobserved latent variables. Train your system with as much example as you can. Then you run the LVM inference. The result will be an inference of the unobserved variable based on the NLP with confidence bound. You may get results like.

     

    For categorical variables, you will get a probability for each of the category.

         60% certainty: Urban lifestyle

         25% certainty: Suburban lifestyle

         15% certainty: Country lifestyle

     

    For binary variable, you will just get one probability, because the probability of the other value is determined.

         92% certainty: Married

         78% certainty: Employed

     

    For continuous variables, if you bind them into categories, then you will get similar result as the categorical variables. If you use regression analysis, you will need to find the maximally probable salary.

         30% certainty: Salary: 30K-50K

         30% certainty: Salary: 50K-75K

         40% certainty: Salary: 75K-100K

     

    You will get this for every person. Then you will need to filter out the ones that have low probability. You will have to do subjectively base on how much risk you want to take. If you only want to target people that you are very certain will match your persona profile, you might want to draw the line 85% or above for all variables. Suppose you don’t care too much about salary, then you tolerate a lower confidence on that variable. After you made the cut, and filter out all people who didn’t meet the criteria, the remaining will be your target persona.

     

    This is not an SNA analysis. Nonetheless, it is just statistics, albeit more advance statistics. I can do the analysis, but it is not an automated process. I don’t think such analysis exist out there. So you can call it persona inference, or demographic inference if you like, but that is probably just for marketing. The real underlying statistical analysis is NLP and LVM inference.

     

  • Hugh Macken's avatar
    Hugh Macken
    15 years ago

    Dr Wu - 

    Thank you so much for your very insightful response. From a social media marketing ROI standpoint, I believe this discussion is critical.

    A little background: As you may know, my company, VMR, is a client of ScoutLabs, which, of course has just been acquired by Lithium. I believe the main reason the Lithium - ScoutLabs combination makes sense is because it addresses a fundamental problem that virtually all marketers are currently facing: We need to be able to measure and then demonstrate positive financial ROI from social media marketing investments. In other words, we need to be able to show to our clients that their investment in social media marketing is resulting in either (a) increased revenue or (b) decreased costs. Those two business objectives are really the only ones that matter to executives and shareholders.  So one way of doing that is to create online spaces that are social, that encourage brand advocacy and (most important) bring the voices online one step closer to opening their wallet as a paying customer. The online community expertise that Lithium brings to the table helps tremendously in this regard. 

    So given that context, I think what we are talking about is doing whatever we can from an analytics standpoint to smooth the path from online conversations to online conversions

    Specifically, we at VMR want to be able to "categorize" individuals online according to their relation to our client brands. Specifically, we want to be able to quickly identify (on a dynamic, real-time basis): 

    • Influencers (Centers of Influence) 
    • Potential Customers

    In terms of identifying influencers (who can and often are potential customers), I see the following methods as being very useful:

    • Social Network Analysis
    • Web Analytics, such as the following, that are related to specific online posts authored by the influencer:
      •  # of on-topic posts by influencer
      •  Total Comment Count
      •  On-topic comment count
      •  Total Unique Commenters
      •  On-topic inbound links from other websites
      •  Total inbound links
      •  For forums: On-topic forum replies
      •  # of Retweets

    In terms of identifying potential customers, it seems to me one's level of influence is less relevant than one's demographic, psychographic and (if the potential customer is a business) firmographic profile. It sounds as though you have come up with a method that will be very useful in terms of statistically inferring this information. To your suggestion of

    • Textual Analysis of user-provided data using
      • Natural Language Processing (NLP)
      • Latent Variable Modeling (LVM)

    I would also add 

    • Online Historical Behavior (services such as Sysomos Audience and Tealium Universal Tag provide online behavior translation technologies that are useful in this regard and specifically useful to marketers looking to justify ROI from social media marketing)

    One final note: With regard to demographic analytics, to my knowledge Sysomos does offer such analytics. I do not know their method for doing so but perhaps it does involve a combination of NLP and LVM.  

    Ok, I need to finish up here. Apologies for this essay. But let's keep the conversation going. I believe this will help our clients achieve more conversions, thus demonstrating the value of social media marketing from the standpoint of financial ROI.

     

  • Hugh Macken's avatar
    Hugh Macken
    14 years ago

    Hi Michael,


    Good article and informative blog in general. SNA is data intense. Are there any public available social network databases? Also like to hear your comment on challenges in SNA.

     

     

  • MikeW's avatar
    MikeW
    Lithium Alumni (Retired)
    14 years ago

    Hello Feng

     

    Thank you for the comment.

     

    Well, there is good SNA and bad SNA. Those that have a lot of data to specify all the different kinds of edges relationships and strength of relationship are the good ones, but there are a lot of bad SNA that do not use very many data to specify the edge relationships. Although technically they are SNA, and many people claim they are doing SNA are not wrong saying that, but they are just not very useful, because the edge relationship are not very specific and the data on them are too noisy.

     

    Most of the public social data are not design for doing SNA. Even in our own platform, we have to re-construct the graph by the edge relationship that we define.

     

    Certainly some of the challenges are scaling and accuracy. Right now, most people are doing SNA on 1 kind of edges, but in reality there are many different kinds of relationships (i.e. many different kinds of edges). That will certainly pose a lot of computational challenges.

     

    Thanks for the comment. See you next time.