Michael Wu joins us again for the second installment describing how the new Community Health Index was developed:
I wrote previously about how I came to start the development of the Community Health Index (CHI), through my background in the science of the brain and through Lithium's extensive data set of online communities. Picking up the task, I will start by defining what it means when we talk about community health.
The performance of any enterprise communities has two dimensions:
Community health addresses the first dimension, and it measures how well the community meets the needs of its member. It is very important, because without customer satisfaction, there is no business success.
With this understanding of community health, I set two basic criteria to narrow down the data we must plow through. Otherwise, the most complete picture of community health would be a consummate of all the data about the community. First, because it is our objective to make the community health index universal, we must use basic data that every community has. This eliminated many of the metric data that only Lithium keeps bringing the number down to about 20 (I actually analyze more than 20, but only about 20 are universally available). Among these are the usual metrics plus some less common ones such as percent of unanswered threads, average thread depth, average number of unique participants in a thread, average post length, etc. Although these metrics might not be recorded explicitly by every community platform, they can be easily computed from aggregating and summarizing the record of all the messages and user data that every community must have.
After establishing the initial data set, the second criterion we applied is known as the Occam's razor. The goal is to come up with a minimum set of data that gives the greatest predictive power. This is a challenging problem in statistics, known as the bias-variance tradeoff. In plain English, it means that there is a tradeoff between the complexity of the model and the predictive power of the model. Although complex models that use many variables will always have greater explanatory power for the available data, their predictive power for unseen future data degrades. On the other hand, simpler model with few variables may not explain the current data as well, but they are more predictive of future trend. Why is that? That is just the nature of uncertainty and how it works, much like why gravity always attracts.
Next time we'll start the journey through the Lithium community data set. And I'll turn the number crunching crank to identify areas with the greatest predictive power!
For updates and discussion between Michael's posts, leave your comments here or you can follow Michael on Twitter at mich8elwu.
Photo by xmatt
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.