Blog Post
Hello gdlowy,
Thank you for posting your comment here. And I'm glad to hear that my work clarify some misperceptions in the industry.
Yes, statistics can definitely lie. I'm sure you heard of the phrase: "there are lies, damned lies, and statistics." That is why it is very important for data scientist and statisticians to hold the highest integrity in their work. All the hype around big data is really a 2 edged sword. It makes businesses more data conscious, but very often, that is not enough. They still need the proper training in basic stats to be data savvy enough to spot a fraud analysis.
That is why I like to write about this subject, to make sure that people have the right information when they are making a decision.
Data quality can definitely help. But at the minimum we need to properly validate any model we build. Simple cross validation is often enough, although it is possible to overfit to the validation data set if we do this too much. If you've been following my writing on influence scoring, you probably remember the following post
- Learning the Science of Prediction: How do You Know Your Influence Score is Correct – part 1
- Validating the Influence Model: How do You Know Your Influence Score is Correct – part 2
These 2 posts describes a perfect example of how vendors in the influence industry don't properly validate the model they use to infer people's influence from social media activity data. As a result, people never know if their influence score actually means anything. Moreover, people game the influence scoring system, leading to IEO.
I can't stress the importance of model validation and data integrity. And cross validation is really not that hard. I do it all the time. As a scientist and analyst, we must have a higher standard. And even with that such a high bar, the possibility of predictive analytics is limitless.
For the rest of the readers who are more interested in the application of predictive analytics... Care to come discuss some interesting possibility of predictive analytics? Many brave start-ups are already doing some of it. Come and share your predictive analytics story.
Thanks again for your comment, and thx for the linked in.
BTW, I like the name "anticipatory analytics."
Hope to see you on lithosphere next time.