Khoros Atlas Logo

Finding the Influencers: Influence Analytics 2

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

michaelwu.jpg Michael Wu, Ph.D. is Lithium's Principal Scientist of Analytics, digging into the complex dynamics of social interaction and online communities.


He's a regular blogger on the Lithosphere and previously wrote in the Analytic Science blog.


You can follow him on Twitter at mich8elwu.



Last time, I introduced a very simple model of influence and outlined the basic ingredients that are required for successful WOM influence. Today, I will focus on the influencers and show you how to find them. Since this article builds on my previous post, I recommend you read "The 6 Factors of Social Media Influence" before diving into this article.


Influencers on Social Media Channels

In the real world, pretty much anyone can be an influencer, and the number of effective influencers is quite large compared to the number of potential targets. However, when we move onto a social media channel, the number of effective influencers is significantly reduced. The primary reason for this reduction is because the ability of the influencer to transmit their knowledge through the particular social media channel is greatly impeded.


Some of the reasons are:

  1. Proficiency: The influencer may not be as effective or efficient when communicating through the particular social channel of interest.
  2. Incentive: The influencer may have no desire to share his knowledge through social media. Or he may be too busy with other things that are more important to him.
  3. Social Equity: The influencer may not have built up his social equity on the particular social media channel to propagate his knowledge to the target. For example, he may not have enough followers on Twitter, or he may not have a large enough network on Facebook or LinkedIn.

This alleviates a problem for marketers -- it reduces the number of influencers in the channel of interest to a somewhat manageable amount. This enables marketers to have deeper engagement with the influencers and drive more effective marketing campaigns.


Because influencer identification is the first step in any WOM and influencer marketing, this has been the focus of most marketers. Most of the effort has been focus on finding influencers with high bandwidth in the social media channel of concern. Less effort has been focused on finding influencers with high credibility in the relevant domain. However, a powerful influencer needs to be both credible and have high bandwidth, yet credibility and bandwidth are not always correlated.


Finding the High Bandwidth Users

Finding the influencers is actually the easiest part, because there is plenty of data available to achieve this. Three types of data have been widely used in identifying high bandwidth users:

1. Participation velocity data: For example, number of tweets per day on Twitter, number of articles per month on a blog, number of messages per week in a community, etc.


2. Social equity data: For example, the number of follower on Twitter, number of friends/links on a social network, cumulative number of posts in a community, total viewership or total number of unique readers on a blogs, etc.


3. Social graph data: Certain social equity data can also be derived from the social network analysis (SNA) of social graph data. But it depends critically on what kind of relationship is encoded in the graph. For example, SNA on the friendship graph of a social network (where the graph edges indicate friendship relationship) could easily reveal the popular friends and the gregarious social butterflies in the network. Likewise, SNA on a social graph of following/followers relationship on Twitter would give you the fans, enthusiasts, and celebrities. In these types of social graph, where the users are not connected by a common domain of knowledge, SNA can discover the high bandwidth users.



Finding the Credible Users

Four types of data can be used to find credible users

1. Reciprocity data: The best way to find credible users is to use data on what others say about them in the particular domain of interest. Online, this is found in rating data. For example, if a user answered some questions in a support community, did he get a good rating for his answers? What percentage of his answers was marked helpful? If the user has a blog how many people tweeted his articles, how many hyperlinks linked to his blog, what was the blog's PageRank? If he wrote reviews, how many and what fraction of the people who read his reviews found them useful? How many people recommend that user on LinkedIn?


2. Reputation data: Reputation engines often summarize reciprocity data and assign a reputation score, or rank, to the user. This data is also valuable for finding credible users.


3. Self-proclaimed data: This is data that users assign to themselves in their user profile. It can be their career experience on LinkedIn, or any professional or non-professional groups they joined. Since these data are mostly self-proclaimed and are not validated, they are less reliable. However, for very specialized and niche domains where reciprocity and reputation data is not available, self-proclaimed data with some validation from an independent third party does come in handy for identifying credible users.


4. Social graph data:Although social network analysis (SNA) on a social graph can be used to identify credible users, how the graph is constructed is of key importance as well as what kind of relationship is captured in the edges of the graph. Using the same example as in my earlier discussion, a friendship graph on a social network would not be appropriate for finding credible users. Just ask yourself, do all your friends on Facebook have the same domain expertise as you? I think the answer is clearly "No." Likewise, a social graph of following/followers relationship on Twitter is also not suitable for finding the credible users. Unless we can remove the part of the graph that is irrelevant to the domain of interest, these social graphs will only give you the high bandwidth users, which are not necessarily the same as the credible users.


So what kinds of social graph can give us the credible users? A graph where the edge encoded relationship is domain specific. For example, the social graph of group members in a LinkedIn group, or the graph of the conversation network on a particular subject. These would be more suitable for finding the credible users.

Alright, now that you know how to find the high bandwidth users as well as the credible users, you are ready to find the influencers in your domain. You simply take the intersection (the users that are in both the high bandwidth group and the credible group) as shown in the Venn diagram above. In the next few weeks, I will continue to cover the science of influence, focusing on the issue of relevance and timing. As always, if you have questions, comments or feedback, then please let us know.


Related Blogs


About the Author
Dr. Michael Wu was the Chief Scientist at Lithium Technologies from 2008 until 2018, where he applied data-driven methodologies to investigate and understand the social web. Michael developed many predictive social analytics with actionable insights. His R&D work won him the recognition as a 2010 Influential Leader by CRM Magazine. His insights are made accessible through “The Science of Social,” and “The Science of Social 2”—two easy-reading e-books for business audience. Prior to industry, Michael received his Ph.D. from UC Berkeley’s Biophysics program, where he also received his triple major undergraduate degree in Applied Math, Physics, and Molecular & Cell Biology.
Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)
Love this post man. You really break it down. Keep up the good work.
Commentator KaushalS

Yeah, I like the way you present the information in layman terms with clear examples.  Can't wait for the next one 🙂

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello, MatthewT and KaushalS,


Thanks for the comment. I will try to give more examples in my future post. 🙂

Thanks for the kudo!


Chris Claydon
Not applicable

Good article. We have a massive amount of social data (10 billion rows) which we are just starting to analyse and draw conclusions from.


Can you justify why you think high bandwidth users are strong influencers? My experience has been the opposite - users who produce a constant stream of low value updates are generally ignored by their friends, whereas those users who publish just a few carefully selected items are given a lot more credibility and can be very influential.


Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello Chris Claydon,


Thanks for the comment (please do not be alarm of the new look). Just want to say welcome to the all new lithosphere that has just been upgraded today.


Glad to hear that you have plenty of data to work with. That is always a nice thing.


I must clarify that high bandwidth ALONE is definitely NOT sufficient to make someone a strong influencer. High bandwidth is only 1 of the necessary quality out of the 6 factor that I mentioned in my previous article "The 6 Factors of Social Media Influence." This blog only shows you what data you need to find the high bandwidth and the credible users (which are the first 2 factors). So at the end, you must take the intersection of the two lists to get the influencers you want.


Concerning your experience: The reason that high bandwidth users are often ignored is because their content are not always relevant, or did not reach the target at the right time. That is why influencers must also have content and temporal relevance. And both of these are also part of the 6 factors that I mention in my earlier blog. If they are relevant in both content and temporal domain, they would be valuable to the target. (e.g. If someone can provide you tips on the stock market every single day that have 90% chance of making you profit, you wouldn't mind seeing a few more message from that person.)


But what you said is correct. High bandwidth ALONE does not constitute a strong influencers. They do only if the other 5 factors (domain credibility, relevance, timing, channel alignment, and target confidence) are met. So the highest bandwidth user that also satisfies the other 5 factors may no longer be the highest bandwidth user overall.


To get a more complete picture of the social media influence process, I recommend that you take a look at the previous article in this miniseries as well as follow up on the articles that I will post later.


Thanks again for the comment, I'm sure this will also clarify things for others.



Skip Shuda
Not applicable



Enjoyed your 2nd post on this important topic.   Regarding credibility data, you may be aware of Dan Zarrella's work at  .  He does a lot of work on the importance of Retweets to measure the influence and content strength in Twitter.


Also - it would be helpful to list any specific, free tools like and to help us collect some of these data.  Do you have any favorites you can share?   I'm also curious as to how Lithium helps with this kind of analysis and collection? 


Finally, as a nod to the world of Integral Theory, we always must keep in mind that the "map is not the territory".  So we can collect metrics and turn them into KPIs, but they are only proxies for the interior of human interaction.  Only people know what they are truly feeling and thinking as the blog, tweet, interact and react.   That doesn't mean we should do the analysis - but we always want to keep it in perspective.  


Looking forward to your future posts!


Skip @skipshoe Shuda

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello Skip


Welcome back and thanks for the question and comments.


Yes, I know of Dan Zarrella’s work on retweets. However, I feel that his ReTweetability metric seems more like a credibility metric than true influence. It is a kind of reciprocity data, where others re-affirm your content by retweeting. And certainly, credibility it is one of the 6 factors that governs the social media influence process, and an important one that many have overlooked.


As for tools, I play around with Klout, TwitterAnalyzer, Twitalyzer, tweetStats, and TwInfluence. The trouble with these free tools is that you can only monitor a single user or a few term, hashtag, etc. But they usually do not let you do much drill down and discovery. If you know who to monitor, then they are great. If you want to know who were the influencers for a brand or a topic, say Lithium, they will give you a list. But you can’t get deeper insights like what they like about Lithium etc, why is a particular user the influencer, is it because he have a lot of follower or is it because he gets a lot of retweets, or he simply tweets a lot, etc. So if you want to discover deep insights, they are probably not sufficient.


At Lithium, we partner with listen platforms, so we have raw data feeds that we capture and stored. I was fortunate enough that I can get these raw data and analyze them myself. This way, I have full control over what algorithms I use, how I measure influence, how I determine relevance, etc.


There are a lot of different ways that a user can be influential, and social network analysis gives you dozens of metrics that you can use to measure these different types of influence. If you are interested, I have an introductory post here: Are all Influencers Created Equal? 


We actually have a lot more metrics than was mention in this article, but that was just an intro post. Besides the social network metrics, we also have participation velocity metric and social equity metrics. Maybe too much for most people’s need. But for a scientist, this is where we can have fun!  🙂