Khoros Atlas Logo


What’s New about the New Community Health Index? Part 1

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

gold greek small letter chi.pngFirst, an apology for going radio silence for a month. Sorry. But I’m back.


It’s been a crazy October for me. Most of October seems like a blur, because it’s been a mixture of sleep deprivation and 80+ hour work weeks. Besides all the traveling for speaking events during day times and the lost luggage on the way (which I’m happy to share with you later if you are interested), I’ve been working with our data platform team to implement the new Community Health Index at night (CHI, denote by the Greek letter chi). Since we just made this feature available to all customers on October 31st, I thought it would be appropriate to discuss what’s new with CHI today.


For those who are not familiar with my earlier works, CHI was actually the first project I embarked on after joining Lithium ~6 years ago. CHI is an index (ranging from 0 to 1000) that scores the performance of a community on how well it serves its end users. I must reiterate that community health is not the same as community success.

  • healthy communities are those that are meeting the needs of the end users (i.e. consumers)
  • successful communities are those that are meeting the needs of the business (i.e. the brand sponsoring the community)

These 2 concepts are related because meeting the needs of the consumers is probably a necessary condition for business success. However, a healthy community doesn’t automatically guarantee business success (i.e. it’s not a sufficient condition). This subtle distinction is what enabled the initial development of CHI that’s purely based on user behavior data.


If you take it at face value, the new CHI doesn’t seem very different from the old. It is still a score between 0 and 1000. It is still derived from the same 6 health factors as the old CHI: traffic, content, members, liveliness, interaction, responsiveness. However, if you dig deeper and look under the hood, you will start to see the significance of these changes.


Infrastructural Changes

big data infrastructure2.pngFirst, we completely revamped the computational infrastructure for computing CHI. CHI used to be computed on our custom build data warehouse. It relies on a complex sequence of ETL processing to extract the counter-based metrics out of our application database (E), transform it (T), and then load into our data warehouse (L). As our customer base grows and their customers’ engagement level increases, the data warehouse solution doesn’t offer the scalability and flexibility we need.


The new CHI is built on top of our new event-log framework that has little dependency on the counter-based metrics within our application database. That means CHI can be computed with little performance impact on the community. Consequently, computing CHI for some of our largest communities becomes feasible. The new CHI is also built on modern big data technology. Hadoop’s HDFS serves as our highly scalable distributed storage engine for all the raw event-logs emitted from all our communities. User-defined functions (UDFs) on Hive perform most of the aggregation and number crunching. The results are indexed in ElasticSearch and served through the Lithium social intelligence (LSI) app.


One of the benefits of using our event-log framework as the fundamental input data to compute CHI is that it’s easier to filter out bot-traffics that pollute the traffic health factor of CHI. In this framework, user actions on our community platform are emitted and logged as events that contain rich metadata and contextual information about who, when, where, what, and how the action is taken. For example, bot traffic is identified via the user agent string that is tracked with every page view action contributed by the user.


Algorithmic Changes

Besides the modernization of our computing infrastructure, there are also significant algorithmic changes in the way we compute, model, and normalize the health factors, and how we combine them into the final CHI score.


First, we removed the smoothing step and the history dependency that was initially designed to make CHI robust against transient changes. CHI was originally designed to capture the long-term sustained health for the parts of the community that are intended for public participation. It ignores all changes that are not sustained for more than several several weeks. Although this is an accurate reflection of the long-term health of the community, we also received feedback from our customers that wanted a more sensitive CHI that reflects the near real-time changes within the community. Removing the smoothing step and eliminating the history dependency greatly simplified the algorithm, but the tradeoff is that CHI will become more volatile. Although people didn’t like the volatility 5 years ago, they have grown to be more comfortable with it now. So it’s sensible to make CHI more sensitive and therefore also more responsive to any changes implemented by community managers.


Math on Chalk Board.pngThe original CHI ignored all activities within “segregated areas” of the community that are not intended for public participation (e.g. private boards, announcement boards, archive boards, hidden boards, boards that require special permissions to post). The reason was because these segregated areas of the community will often lower the overall CHI score quite significantly. Despite the fact that most of our communities are external focused, we are also seeing more people making use of these segregated areas for special purpose (e.g. employee participation). Some of these areas are actually very vibrant. We would like to study the effect of including these segregated areas into the CHI calculation, because they are an integral part of the community. If the impact on the final CHI score isn’t significant, perhaps it will make sense to include them. In the future we may revert back to the previous way—exclude the segregated areas from the computation of CHI altogether.


An important change to the CHI algorithm is that we have now normalized the raw health factors by converting them to quantile scores. While the computation of the raw health factors didn’t change, we model the distribution of each health factor at the population level (i.e. across all community, all time). We then use the fitted cumulative distribution to convert the raw health factors into scores between 0% and 100%. This has 2 important implications:

  • The quantile scores are comparable because they are all normalized to the same scale
  • The quantile scores are meaningful because they represent the percentile of how well they perform compare to the population of all other communities

Together, this means that the health factor quantile scores offer a simple way to determine what the problem is when your CHI score drops (or why your CHI score rises). This makes CHI much more actionable.


Finally, we also change the way we combine the health factors into the final CHI score. Because we have eliminated the smoothing step and history dependency, this final combining step is actually much simpler than the old CHI. Most of the heavy math is done when we model the population distribution of the raw health factors. While it’s not quite as simple as averaging the 6 quantile scores or computing a weighted average, it’s almost as simple—at least at the conceptual level.


Big-data-123-1024x768.pngWe compute the generalized mean of the 6 normalized quantile scores. Then we apply a linear function to shift and scale the result to between 0 and 1000 to obtain the final CHI score.


You can think of the generalized mean as a non-deterministic, symmetric, weighted average. Now, you all know what’s a weighted average, but what does non-deterministic and symmetric means? Let me explain:

  • Non-deterministic means the weights for the 6 health factor quantile scores are not fixed. It automatically gives more weight to the best quantile scores (among the 6). The more outstanding a particular quantile score is (compared to the other 5), the greater the weight will be.
  • Symmetric means it treats the 6 health factors the same. It will assign the same weight to the best score regardless of which health factor is the best. For some communities, it may be traffic, for others it may be interaction, etc. This is extremely useful, because smaller communities usually can’t beat the larger ones in terms of their diagnostic health factors—traffic, content, and members. But now smaller communities can still be as healthy by having extraordinary predictive health factors—liveliness, interaction, and responsiveness.



So what’s new about the new CHI? Despite the fact that the CHI score is still just a number, there are quite a lot of changes—both infrastructure and algorithm.


Infrastructural Changes:

  1. Entirely based on our event-log framework
  2. Built on highly scalable modern big data technologies
  3. Bot traffic is filtered out


Algorithmic Changes:

  1. Sensitive and responsive to real-time changes, but also more volatile
  2. Includes user activities on the entire community (no exclusion of segregated areas)
  3. Normalized health factor quantile scores that are meaningful, comparable, and therefore more actionable
  4. Compute a non-deterministic, symmetric, weighted average of the 6 quantile scores via the generalized mean. This allows different communities to be successful in different ways, because it automatically gives more weight to the best score regardless of which of the 6 might be the best


This is a significant positive development for us, and many of them are non-trivial infrastructural and algorithmic changes. This definitely makes me feel proud to be part of the talented data teams at Lithium. But there is more to come in the near future.


Stay tuned for more details on some forthcoming changes in the new CHI. In the meantime, I welcome any questions, comments, constructive criticisms, kudos, or just candid conversations.



Michael Wu, Ph.D.mwu_whiteKangolHat_blog.jpg is CRM2010MKTAWRD_influentials.pngLithium's Chief Scientist. His research includes: deriving insights from big data, understanding the behavioral economics of gamification, engaging + finding true social media influencers, developing predictive + actionable social analytics algorithms, social CRM, and using cyber anthropology + social network analysis to unravel the collective dynamics of communities + social networks.


Michael was voted a 2010 Influential Leader by CRM Magazine for his work on predictive social analytics + its application to Social CRM. He's a blogger on Lithosphere, and you can follow him @mich8elwu or Google+.

About the Author
Dr. Michael Wu was the Chief Scientist at Lithium Technologies from 2008 until 2018, where he applied data-driven methodologies to investigate and understand the social web. Michael developed many predictive social analytics with actionable insights. His R&D work won him the recognition as a 2010 Influential Leader by CRM Magazine. His insights are made accessible through “The Science of Social,” and “The Science of Social 2”—two easy-reading e-books for business audience. Prior to industry, Michael received his Ph.D. from UC Berkeley’s Biophysics program, where he also received his triple major undergraduate degree in Applied Math, Physics, and Molecular & Cell Biology.
Honored Contributor Honored Contributor
Honored Contributor

Been waiting for this and i have to say @MikeW you didnt let us down. Really interesting to see the level of detail about how CHI has transformed "even if i didnt understand it all :)" and to see your passion for data come through in your blog. I'm already tracking new CHI


Thanks !

Occasional Contributor
Occasional Contributor



Thanks for the detailed description of the changes. I appreciate the effort that has been put into this.


I do like having "the" number which represents the entire community and do think that it is great from a holistic approach. I also understand the 6 factors that come into play (and the weighting) but......we would really like to drill down and have a score for each of our community forums/area's so that we can see which ones may not be as successful as others, so that we can focus on improving those areas which may be underperforming. Can you not remove the "registrations" portion of teh equasion and focus on unique individuals which are engaging within each individual area? I did read your Health Measure #3 post, so understand what you are saying.


We would also like to exclude areas. We would like to look at our Int'l forums  the same way that we do with our English forums, but this # puts everything into 1 basket.  


Maybe we are in a unique position, but we are being asked how some forums "stack" up against others, and while page views, posts, accepted solutions are all good measurements, they can't provide a true comparrison. if we had 1 number that we could give them and put it into perspective for them, that would be much appreciated and helpful.



Mike Pascucci


Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello @Fellsteruk, and


Thank you for the comment and being supportive of my work.


Glad to hear that I didn't disappoint you. But I like to clarify that what I've talk about is actually just the beginning. There are more interesting features that will be forthcoming. Hope to see you when I publish part 2 for the new CHI algorithm.


See you next time.


Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello @MPascucci


Thank you for taking the time to comment and voice your usage needs.


I must emphasize that the big investment we made on the infrastructural changes I talk about in this post is to move us towards node level drill-down capability. That is something that we've been planning in our pipeline. There are many technical challenges involving node level drill down. Basically it's no feasible without a huge infrastructure redesign. The good news is that we did it! In fact, I will talk about this feature in the next post. So stay tuned for that...


Board exclusion is one of those use cases that is fairly unique to every community. Some use it for internal participation, some use it for archiving, etc. Like I said in this post we are still studying the effects of including all areas in the CHI computation. And we still need more data to understand how it impacts the overall CHI score. We might revert to the old logic of excluding those "segregated areas." So thank you for offering your data point.


However, regardless of whether we include or exclude those segregated areas, we probably cannot make this an option for people to choose which areas to include or exclude. It will have to be a standard set of rules that the algorithm use to deterime which board to include or exclude. (FYI: in the original CHI algorithm, we excluded all areas where a newly register user cannot participate via posting. That was the criteria we used to determine whether an area was intended for public participation or not.) The reason we cannot allow user determined exclusion is because CHI will no longer be a standard measure that’s comparable across community. That means we wouldn’t be able to compute the quantile score for the health factors. A more accurate statement would be that we can still crunch out a number, but it will be meaningless. Then it would be pointless to have CHI, b/c everyone will essentially be measuring their community their own way.


I like to draw the analogy between CHI and someone’s credit score. The reason credit scores are useful is because it is a standard and it doesn’t give users the option to tell the credit bureau which of their financial accounts to monitor, or which to exclude. The credit scoring agencies have a fixed set of rules that determine which account to include in the credit score calculation and user do not have a say on that. This is probably for good reason, because the credit score is then meaningful and comparable. However, we all have different set of accounts and we all use them differently. Perhaps we like to exclude some accounts that we use for special purposes (some may be very valid and good usage, too). But we don’t have this option. That is why the score is useful, because we can really compare them across individuals. We may not like how they score it (especially if we have a low score), b/c we think that it’s not applicable to us. But that is a self-centered and very myopic perspective.


This is the same reason that we cannot give user the choice to include or exclude a particular boards in the CHI calculation. That is not to say that we will not exclude boards. We may still exclude those “segregated areas” of the community from the calculation of CHI if they are not reflective of the overall community health, but it will be determined by the machine which areas to exclude.


Once we implemented the drill-down capability (which I will talk about in the next post), there will be a way for you to compare internal forum against others. It won’t be CHI anymore b/c CHI is the score that includes all 6 factors. Mathematically they are NOT the same formula. Perhaps, we can have category health index, board health index, etc., but they are not CHI.


I would love to have you come back again to discuss this after I publish the next post.


Hopefully this clarifies some of the logic and challenges behind creating a standard score for something that have so many unique and different use cases—such as a community. There will be something more exciting next time, and I hope to see you again. So stay tuned...


Occasional Contributor
Occasional Contributor



Thanks for the quick reply. I understand that there are "limitations and complications" with what I am asking for, and glad that you understand our needs to drill down. I really am looking forward to your next post. I do understand that it would not be a "CHI" score, and would be positioned a little differently. Am definitely looking forward to that. 


I compare that to my credit card statement. I have the final tally, but I want to know how much that I am spending on groceries vs. gas vs. clothing. If all of a sudden I see that I am driving too much, I can make the decision to ride my bike, or walk to places. That is what way that some of our product teams want to look into their individual communities. Team x is spending more on gas than we are, and their forums are "better" than ours, maybe we should spend more on gas. That is the type of scenario that I am being asked to present. Now what "better" actually means, needs to be definted by us, I realize. 


As far as excluding specific area's, I do understand. Maybe having the ability to drill down will help to accomplish what we are looking for, for the time being 😉


Thanks again for the reply Mike I will make sure to come back to comment on your next post!



Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello @MPascucci,


Thx for continuing the conversation.


I think that having the drill down capability would alleviate the need to exclude those segregated areas. For example, you may find that the "worst" area is always the archive board that is not intended for user to participate. Then you can just ignore that and look for the next one up.


It may, however impact your final CHI score and the quantiles scores compare to other communities. That is something that we will examine once we have more data. Then we can decide whether or not we will systematically exclude certain "segregated areas."


Ok, looking forward to discuss more on the next bog.



It's great to see the CHI score has migrated to Social Intelligence, @MikeW !  Congrats on getting that done!


Please disregard the rest of my original comment below. Since commenting I've discovered other info within the community explaining that the component scores will be coming later.


I wonder if there is still (or will be) a way to see the component scores of the six health factors: traffic, content, members, liveliness, interaction, and responsiveness.


I've been tracking and reporting on these individual data points for several years now and executive management seems to appreciate the spiderweb/radar chart approach which I've been mimicing via Excel chart.


Is that more granular data on the roadmap for the new CHI scoring or, if it's already available, can I get a UI hint?   🙂


Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello @jimr


Thank you for the comment. It's definitely a feat and the result of good team work from the various data teams involved in this project. They deserve more of the congrats than me.


I'm glad that you found the answers you were looking for. That's the beauty of the community, there are answers and solutions hidden all over. If you bother to look and be persistent, you will find it.  😉


But yes, all 6 health factor scores are computed and available. Our teams are building the UI / dashboard / visualization to present those data in a simple and useful way. I will also briefly talk about this in the next post. So stay tuned...


See you next time.


Frequent Advisor
Frequent Advisor

Does the number of discussion topics influence CHI at all? For example, if I have 10 boards but only use 5 of them regularly would my CHI be the same as if I only had 5 boards that we use all the time? This is assuming the same number of posts in either scenario. Thanks @MikeW !

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

 Hello @zachm,


Thx you for taking the time to ask a question.


The answer is yes. It will affect your CHI score. Specifically it affects your liveliness health factor. By having more boards that are not used regularly, you are diluting your communities contribution and give spectators a perception of inactivity (i.e. lower liveliness). If you are interested, I recommend you follow the link and read more about the liveliness health factor and what it is intended to measure.


Alright, I hope I answer your question. And if you (or anyone) have anymore questions, please keep them coming. All questions are welcome.


See you next time.


Occasional Contributor
Occasional Contributor

I love the idea of CHI, but have always struggled with what it really means. Is our score good? How do we compare versus other communities? How do we compare against other telco's? Without giving away the names of the communities, is this something that Lithium would consider sharing? Or for those with a comptitive edge like me, maybe you could publish a CHI Leaderboard?

Lithium Alumni (Retired) Lithium Alumni (Retired)
Lithium Alumni (Retired)

Hello @lushous3


Thank you for the question.


Under the original CHI algorithm, the average score is 500. A score between 400 to 650 is good, above 650 is great and below 400 is unhealthy. However due to the recent algorithm change, these distribution may change, I am collecting the data to understand the population behavior of the new algorithm. And I will publish another blog on that subject in part 3 (coming soon).


In terms of how you benchmark against other communities, or other telco's, have a look at part 2 of this post, which has just been published yesterday. A leaderboard without name is actually not that useful, b/c other people won't know who's on it, and even people who are on it won't know it either. But we will have a benchmark feature coming soon. So check out part 2. Love to hear your thoughts on that.


See you at the comment section of part 2.