Social Network Analysis 101

Lithium Alumni (Retired)

16 years ago

Hello Hugh,

Thank you for asking this very interesting question.

The kind of “persona analysis” you mentioned can definitely be done, but it is pretty much a manual process now. I am not aware of any system out there that does it for you. It is not a trivial problem. Most business will probably find it easier to get the demographic data from other sources than to infer them directly from the conversation.

I am not sure what your background is. The following answer might be a bit technical. If so, please let me know, I will try to water down it a bit, but since you ask such a great question, I will give my full blown response first.

If you were to build such analytical system, you will need 2 tools.

1. The system must have a linguistic component that does Natural Language Processing (NLP). This component will understand the grammar of the conversation and try to do some simple part-of-speech (POS) tagging, entity extraction, etc. This lets you reliably identify things like subject, object, in the conversation so the system can understand what the conversation is about.

2. Then you will need to implement a latent variable inference component. There is a class of model in statistics call the Latent Variable Model (LVM) that infers a latent (unobserved) variable from data collected from a series of observed variables.

Then you will need to set up the inference for your target persona one variable at a time. Some of these variables will be binary (e.g. using your example, married/not married, or employed/unemployed). Some of them will have categorical variable. (e.g. life style in your example). For these variables, you will need examples for each possible kinds of life style to train the inference engine. Some of them will have numerical values (e.g salary in your example). For these variables, you will need to either bin the values into ranges 30K-50K, 50K-75K, 75K-100K, etc. If you want very fine granularity, you can use a continuous variable, but then you will have to perform a regression analysis in addition to the LVM.

Then you set up your LVM model using the NLP result as your output (the observed variable) and the persona variable as the unobserved latent variables. Train your system with as much example as you can. Then you run the LVM inference. The result will be an inference of the unobserved variable based on the NLP with confidence bound. You may get results like.

For categorical variables, you will get a probability for each of the category.

60% certainty: Urban lifestyle

25% certainty: Suburban lifestyle

15% certainty: Country lifestyle

For binary variable, you will just get one probability, because the probability of the other value is determined.

92% certainty: Married

78% certainty: Employed

For continuous variables, if you bind them into categories, then you will get similar result as the categorical variables. If you use regression analysis, you will need to find the maximally probable salary.

30% certainty: Salary: 30K-50K

30% certainty: Salary: 50K-75K

40% certainty: Salary: 75K-100K

You will get this for every person. Then you will need to filter out the ones that have low probability. You will have to do subjectively base on how much risk you want to take. If you only want to target people that you are very certain will match your persona profile, you might want to draw the line 85% or above for all variables. Suppose you don’t care too much about salary, then you tolerate a lower confidence on that variable. After you made the cut, and filter out all people who didn’t meet the criteria, the remaining will be your target persona.

This is not an SNA analysis. Nonetheless, it is just statistics, albeit more advance statistics. I can do the analysis, but it is not an automated process. I don’t think such analysis exist out there. So you can call it persona inference, or demographic inference if you like, but that is probably just for marketing. The real underlying statistical analysis is NLP and LVM inference.

Blog Post

Social Network Analysis 101