Public

Guest Post: Interpreting the Statistics for CHI

Lithium Alumni (Retired)

Calipers.jpgWelcome back once more to Michael Wu, here for the penultimate installment in his series describing how the new Community Health Index was developed:

 

This is my fourth blog in the series that describe the development of the community health index. Previous blog posts can be found here:

 

  1. From the Brain to Community Analytics
  2. Criteria for Creating the Community Health Index
  3. Crunching Numbers for the Community Health Index

 

Last time, I crunched some numbers and talked about some of the mathematical challenges that I have overcome. Now, it is time to interpret the results.

 

Running the regression analysis is the easy part. Although it is fairly technical to set up the nonlinear regression equation, it is mechanical in the sense that anyone with background in math and statistics can do it. The remaining part of the analysis involves interpreting the results to derive meaning and insights. This is often the most challenging aspect of any statistical analysis because it is more an art than a science; yet it must have all the rigor, objectivity and accuracy of science. For example, I would have to decide which predictor variable to remove among those with similar predictive power. When a set of variables is found not predictive, is it a failure of the model to harness their predictive power or is it the case that these variables are truly independent of the response, in this case health. Interpretability of the final model becomes important, and looking at numbers alone is no longer sufficient. In statistics this process is call variable selection.

 

After eliminating the predictor variables that are not consistently predictive of health, we have only answered the question of which variables are predictive. But we still don't know how these variables are predicting health. For example, suppose we know that post count is predictive of health; will the health level increase by 10% if the post count is increased by 10%? Or will the health level increase by 30% if we observe a 10% increase in post count? Or perhaps, the health level depend more strongly on post initially, but become less dependent as the post count increases. To answer these questions, we must analyze the nonlinear relationship between the variables that we decide to keep. Not to complicate things, but it is often necessary to repeat the process of variable selection and nonlinear analysis for different subsets of variables, different nonlinearity, and perform them in different orders.

 

We are almost done! Next week we'll bring this all together into the new Community Health Index! If you have any questions I'd be more than happy to address them in the comments, or feel free to ask me on Twitter at mich8elwu.

 

 

Photo by Thomas Claveirole

2 Comments