Khoros Atlas Logo


Community Influencers Step by Step

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

michaelwu.jpg Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.


He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.


You can follow him on Twitter at mich8elwu.



Suppose you need to find the influencers for your brand in a community, how would you go about doing this? What kind of data do you need, and where do you start? Good question, today I am going to show you, step by step, how to find influencers in an online community. I will use Lithium’s own online community, Lithosphere, in the following example, but you can do the same with any social media platform.


1. Identify the Necessary Data

From my earlier post “Finding the Influencers,” we know that an influencer must have high bandwidth and be credible. For finding high bandwidth users, our platform tracks participation velocity data for each user, and also collects a variety of social equity data. Our community platform also allows users to specify friends in the community. However, this feature is more common in enthusiasts’ communities than in other types of sites, so we have only a small amount of data to construct the friendship graph.


For finding credible users, we have reciprocity data (e.g. kudos and accepted solutions) and reputation data from our reputation engine. However, because our community platform does not require users to build an extensive user profile, we do not have self-proclaimed data on interest and expertise. As mentioned above, we probably do not have enough data to construct a reliable friendship graph, but friendship graph is less useful for finding credible users anyway. Since our community is a highly interactive platform with many topic-specific conversations among users, we can build a topic-specific conversation graph from the conversation data. Social network analysis (SNA) can then be applied to this social graph to accurately identify influencers.


From our analysis, we can see that our community platform has plenty of data for finding influencers, and we can pick and choose what data we would like to use. This may not be true on other platforms or other social media channels. As Phil alluded to in his post, I took the SNA approach because it allows us to identify influencers reliably -- furthermore, it allows us to identify different types of influencers (see Are all Influencers Created Equal?).


2. Build the Relevant Social Graph

Most of the conversations in a branded community will already be focused around the brand, so they are likely all relevant. This eliminates the need to filter out irrelevant content. However, to ensure temporal relevance (i.e. timing), we still need to filter out temporally irrelevant (i.e. old) content before we build the conversation social graph. To do this, I filtered out all conversations that are older than one month and use only the data within this one month window.


Within this relevant time window, I built the social graph by connecting two users if they have participated in a common conversation. So the relationship that is represented by the edges is co-participation in a conversation (recall that this edge relationship is the most important thing to keep in mind when reading social graphs). The conversation can be a thread in a forum, a blog article or an idea via comments. I also did something more sophisticated by computing the strength of connection between the two users, which is determined by three factors.

  1. The connection strength in a conversation is proportional to the number of messages a user contributed to a conversation. The more messages they contribute, the more likely they will be seen and remembered.
  2. The connection strength in a conversation is inversely proportional to the number of unique participants in the conversation. If there are more participants, then each user is less likely to be remembered.
  3. The connection strength is then summed across all the conversations where the two users have co-participated. The more conversations a pair of users co-participated in, the stronger the connection between them.


The result is the conversation graph shown here. The data is from Lithosphere for the one month period between 2010-03-05 and 2010-04-05. I labeled the users with their screen name. I can see myself, MikeW, and see that I’ve co-participated in conversations with seven other Lithosphere members (Mark_Hopkins, jennyb, MikeTD, reinvent_ed, Laura, PhilS, and PaulGi).




Since I have the connection strength between users, I could also filter out weak connections if necessary (when there are too many conversations in a community and the social graph becomes too cluttered to read). In this case, I didn’t need to do this because Lithosphere is still a small community; there were only 57 registered users co-participating in conversations within the one month period of interest. However, keep in mind that this does not mean that only 57 users posted. A user can post, but if no one replies, then there is no co-participation.


3. Social Network Analysis of the Conversation Graph

Once the social graph is built, the ‘hard work’ is done. The rest is number crunching using SNA and interpreting the results. SNA analyzes the social graph and computes node metrics that rank the importance of each user in the network. However, there are many ways that a user can be important. Some are well connected (have many connections), some are reputable (recognized by other important people), yet others may still be important in subtle ways. So SNA actually computes a series of node metrics depending on how we want to measure importance.


Currently, I’ve implemented 10 different such node metrics:

  1. Degree Centrality: How connected a user is, (depending on the edge relationship, this centrality measures the number of connections of a user -- friends, colleagues, etc.)
  2. Eigenvector Centrality: How reputable and recognized is a user
  3. PageRank Score: How much of an authority the user is (this is the same algorithm that Google uses to find authority web pages on the Internet)
  4. Potential reach: How many people can the user reach within two degrees
  5. Clustering Coefficient: How cliquish is the user (the probability that two of your friends are also friends with each other)
  6. Betweenness Centrality: How critical is the user for information diffusion
  7. Core Number: How central is the user in the network
  8. Vertex Eccentricity: How far away from the center is the user
  9. Closeness Centrality for the connected components: How close the user is to the rest of the connected component of the network
  10. Closeness Centrality for all components: How close the user is to the entire network, including disconnected components


You do not have to know all of them, but you should try to understand the common ones, such as degree centrality, Eigenvector centrality, PageRank score, and potential reach. I can overlay these node metrics on the social graph by mapping them to the size and color of the vertices, so we can visually see these metrics along with the social graph.




In the above social graph, I map the degree centrality (connectedness) to the size of the dots, and the eigenvector centrality (reputation) to their color. Clearly, Mark_Hopkins has the most connections as indicated by the biggest dot. However, PhilS is the most reputable as indicated by the most yellow color, even though he only talked to six users in that period. Although Mark_Hopkins, PaulGi, IngridS and I are not too far behind on the eigenvector centrality scale, the reason that PhilS is more reputable is because he has strong connections with other users (indicated by the brown edges) and that he is connected to a lot of other reputable users. Remember, how connected a user is does not necessarily correlate with his reputation.




In this version of the social graph (above), I map the potential reach metric to the size of the dots, and the PageRank score to their color. Again, Mark_Hopkins has the greatest reach and the highest PageRankScore for the past month. But reach and authority is not always correlated either. For example, KevinC has greater reach than MatthewT in Lithosphere (KevinC is bigger dot), but MatthewT has higher authority than KevinC (MatthewT’s dot has more yellow tone).


I must emphasize that these ranking are only relevant for about a month window, because I have restricted the computation from 2010-03-05 to 2010-04-05. If we plot the social graph today, these node metrics could well be very different. So if you start participating more today, you may be one of the biggest and most yellow dots a month later.


Since Lithosphere is a pretty small community, most of these node metrics are quite well correlated. This will not be the case for larger communities. Therefore, depending on your marketing needs and constraints, you will need help from different kind of influencers in the community (see Are all Influencers Created Equal?).


Next week is our Lithium Network Confernece (LiNC2010). I will be there, so please come by and say hello if you will be attending. If not, we can always meet here at Lithosphere. Unless there are some special request at LiNC, I plan to show you some more social graphs from larger communities. But for now, I welcome any questions and comments as usual. See you next week at LiNC2010 or here at Lithosphere.



About the Author
Dr. Michael Wu was the Chief Scientist at Lithium Technologies from 2008 until 2018, where he applied data-driven methodologies to investigate and understand the social web. Michael developed many predictive social analytics with actionable insights. His R&D work won him the recognition as a 2010 Influential Leader by CRM Magazine. His insights are made accessible through “The Science of Social,” and “The Science of Social 2”—two easy-reading e-books for business audience. Prior to industry, Michael received his Ph.D. from UC Berkeley’s Biophysics program, where he also received his triple major undergraduate degree in Applied Math, Physics, and Molecular & Cell Biology.
Joshua Letourneau
Not applicable

Michael, great post. Your approach is enlightening. However, let me ask: How many companies are attempting to monitor KOLs' outside of their own 'closed communities'?

Consumers today are less apt to join brand-specific Communities and Social Networks. They discuss brands, products, etc., but not in a closed arena. It's the era of "open".

Those that do join these closed communities/social networks are largely the 'Raving Fans' . . . so I guess the question becomes whether the fanatical fans mirror the larger consumer base.?. If anything, we do know that they're our most vocal brand champions/evangelists, so we must acknowledge and respect them. I then ask, however, how 'central' are they, meaning in terms of only influencing those that are already 'on the bus.' And if so, what type of incremental sales can we expect from those that are already consuming our product/service?

Don't we want to reach other clusters of people, perhaps those that aren't overly central to the brand, yet represent new frontiers of sales?

Perhaps instead of trying to find the KOLs in our own 'neighborhood', we should go out into the broader forest of consumers and penetrate new clusters of potential sales. Ultimately, that's the goal of market segmentation and value-proposition management.


Joshua Letourneau

Director, Client Engagement & Solutions Delivery



Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

 Hello Joshua,

Thanks for the comment. I'd like to put forth 3 points for further discussion.

1. I only use community as an example, you can do that for any social media platform you want. Facebook, Twitter, Yelp, anything. As long as you build the graph properly that encompasses the 6 factors of social media influence, you can find the influencers.

2. I wouldn't say that ONLY the raving fans join brand specific community. From our data, I definitely see a lot of people who join just to ask a question that they want answer or just to make a comment that they feel strongly about. Besides there are tones of lurkers who consume community content without ever participating, or even officially joining the community. Our unique visitor counts are usually 6x to 11x of registered member population (depending on how old the community is). Besides 99% of our clients' community are open to public.

Moreover, close or open really depends on your definition. Era of open could be just OpenID, they still can and will track who you are.

3. Finally, to answer your question... We do want to monitor the conversation beyond the community, and we do provide that service (community is just an example). And there are plenty of brand monitoring services that provide that kind of service. Many big brands hire and use listening platform for brand monitoring. Small companies probably don't do much yet b/c they tend to be pretty costly, but price will drop and more companies will adopt.

Nonetheless, the methodology does not change. You build the graph of who talked to who at the proper time, and then you perform SNA. You can do that with anything. Cell phone calls, emails, IMs, etc.

I hope this address your question.


Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

I'm just excited that there is now mathematical proof of my reputability. I wish my mom were still around so I could call her!

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

You should show her the graph and point out that this is the proof! 🙂

Occasional Advisor
Occasional Advisor
@joshua It's also a pretty big assumption that those most influential users are only active within a community and not across other channels/networks. And as MikeW alluded to, publicly accessible communities have the potential to be viewed by a very large audience - and often rise high in search engine results.
Scott Plocharczyk
Not applicable

Michael, great post!  I'm a huge fan and advocate of Influencer ID as a metric to drive a host of marketing, social network and community based initiatives and the associated optimization.


Joshua, you stole my thunder by about 90% (and well written by-the-way).  My past five years have been split between Social Media Platform Sales (mZinga) and Social Research (listening) Platform Sales (Cymfony/Collective Intellect).  Having cut my teeth on the community deployment side of Social Media, and having witnessed dozens of "on boarding" phases for community launch, I began to wonder about 6 engagements in; why aren't we aiming Phase I of the on boarding process at the most influential advocates and loyalists?  There has to be some influencer/follower and viral upside here, right?


Well I made the move to the Social Analytics side of the space and immediately developed a passion for the power of Influencer Identification.  Who are the most influential advocates and SME's amplifying my clients brand and their category?  Then of course there's analysis of the competitive set of influencers for the purposes of "courting" the competitions most influential customers.


So the 10% I wanted to add centers around identifying your most influential customer through the social sphere in ADVANCE of a branded community roll-out.  I've been banging this drum within my internal organizations for years and it's nice to see someone like Lithium embrace the power of influence and recognize the value of acquiring a Social Research & Monitoring company like ScoutLabs.  The intersection of engagement, collaboration and content creation with social insights, specifically Influencer ID, is where it's at.  Great job!

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello Scott,


Thanks for commenting on my blog here and on LinkedIn.


I'm glad to see that there are fans out there who advocate the importance of influencers.


We are definitely on our way to identifying influencers on the social web. As I mentioned in the blog, I simply used community as an example. But the methodology is general and is applicable to the greater social web. With additional social web data from Scout Labs, we can do this with greater accuracy and reliability.

There is definitely an increase value gain in seeding programs that target influencers. You may want to look at the discussion on my earlier blog article on this topic (specifically my reply to Win Rampen). We found that targeting influencers can increase the profit up to about 50%. Specifically, a random seeding program in a competing market can achieve profit gain of 13% where as an influencer program can achieve 20%.

Thanks again for the comment 🙂

Duncan Stuart
Not applicable

Excellent post - and it helps us settle a couple of arguments we were having in the office today over best measures. We appreciate your 'guru-ness' in these matter. I like the way you untangle and differentiate between measures such as reputation and reach. It isn't how many dots - it's the quality of the dots. We're working with such issues in org research survey design - and where many of your measures are physical (speed, participation in discussions) surveys are blunter and we have to get there by asking a few canny questions - and fairly non judgemental ones at that. people are scared that the surveys will ask: "So - who would YOU vote off the island??"


Your examples give us a tangible model to help us clarify ourn own thinking. So thanks.


Also, we're using UCINET, and have to say - our charts are not as pretty as yours!  The truth is - the future doesn't belong to analysts, per se, it belongs to "data stylists." You, sir, have style.

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello Duncan,


Thanks for the comment and the overly flattering praise. I felt that I was literally blushing for a sec.


I’m so glad that you find my post informative and that it helped you resolved some issues. This post is a illustrative example of the theoretical model that I put forth in a mini-series of 4 articles below, which will give you a more general understanding of what I did and why.


  1. The 6 Factors of Social Media Influence
  2. Finding the Influencers
  3. The Right Content at the Right Time
  4. Hitting Your Targets


Thanks again for dropping by. Hope to see you next again.

Lauren Klein
Not applicable

Thanks for all this great content, sharing and inspiration.  Just thought I'd mention that you can setup influencer workshops, trainings and environments that foster and harness their passion, input and ideas.  In fact, you can use creative problem solving with them to drive product and service devlopment cycles.


Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello Lauren,


Thank you for stopping by and commenting on my article.


Our community platform can definitely provide the environment to foster and harness passions and ideas. We have some best practices on community management that deals specifically with cultivating superusers and influencers. Consequently, many of the superusers and influences often provide valuable input that drive our product and service development cycle through our Ideas App voluntarily. By designing a rewarding and entertaining ranking structure through our reputation engine, we feel that we've already achieve some extraordinary results.


We never really tried workshops and trainings though. But it's definitely another avenue to explore.


Thank you again for the comment. Hope to see you back on Lithosphere later.