The Project Many communities, we have noticed, have communities within the community: local groups of users who might want to meet-up in person to share experiences, rather than always communicate on-line. We wanted to create such Groups but as the RapidMiner Community has grown organically over the last 10 years, it left us with the question: where are all those users? We looked at the LSI/ Labs/ Geo report but that really didn't go to the detail we needed. That left us with the last IP Address added to the Metrics/ User report and downloaded into Excel. While IP addresses are far from perfect in identifying individual users they can give accuracies of 80-95% for location . And on such large numbers, an indication would be enough to guide our Group strategy. Geo-Locate So task 1 was to get geolocation from an IP Address. There are a number of free databases out there you can download, such as FreeGeoIp, but I'm not very techy, so I got a free data tool to do the work for me (of course, I used our own RapidMiner Studio product, but others are available 🙂 ) . We read the data from our Metrics report into RMStudio, and used an 'Enrich Data by Web Service' operator to call out to the Ipinfodb API. This is a free service, but you have to wait 10 minutes between calls or it gets annoyed. It returned the Country, State/ Region and City for each IP Address in the file and appended it. Next I used a Split operator to make individual columns in Excel for Country, State and City. The Process looked like this. Pretty simple, even for me. And once created it can just be run over and over again for different chunks of community data. Map It! Looking at the output as a heatmap or bar chart was interesting, but ultimately you want to see this type of info as a real map - as in LSI. This last step to about 5 minutes with the help of my CEO, who was, coincidentally sat next to me. He suggested a tool called Carto. I set up an account, read in the data and then we spent the remaining 4 of our 5 minutes selecting the right map background. There are lots of options to choose. Here's a map of our community by density. Of course you might want to do this in a tool like Tableau or Qlik, both of which have great geo-coding capabilities and will also allow for a whole bunch of other visualizations. So what did we Learn? Actually, this simple process threw up a few surprises. Firstly, the greatest density of our members is in London. I would have put money on Dussledorf (where we started) or Boston (where our corp headquarters are + Harvard, MIT etc). Secondly, there are significant hot patches of users in Schleswig-Holstein (still under investigation), Singapore and Santiago, also in Bangalore, Dublin and Milan. All of this should help with setting up self-run local user groups. But now we want to extend this capability to look at where new registrants are coming from, and seeing the effectiveness of different campaigns being run in various geographies.
... View more
Date formats exported from LSI in a csv are not Excel (or other reporting tool) - friendly. The format exported is: yyyy-mm-dd hh whereas in metrics you get dd/mm/yyyy hh:mm which Excel immediately reads as a date/time. You can make the changes each time in Excel, but that is cumbersome for each report. Is there an easier way?
... View more