Public
Honored Contributor

Bulk API vs Community Analaytics/LSI?

Hey all - Is anyone utilizing the bulk data API, and how do the numbers line up compared to Community Analytics/LSI?  We're trying to get this built out internally, but are finding large discrepancies in the numbers.  LSI more often than not is providing higher numbers, but not always. Some weeks are much worse than other weeks, but we're seeing a 20% gap as we pull different weeks.

I would imagine/everything I see in the documentation is that these should align extremely close, but we're clearly missing something.

differents1.PNGdifferents2.PNG

8 Replies 8
Honored Contributor

Hi Stan,

There is another thread that may be helpful to you: Where is the best place to find accurate completed registrations numbers? 

In particular, the post here states that:

LSI is also the source for Bulk Data API data.

These discrepancies still do not make sense to me either. 🤔

----------------------------------

Lili McDonald
Senior Community Manager @ Udemy
Connect on LinkedIn
Khoros Staff

 

Please recheck your queries used for computing the above metrics. Make sure the timeline and Bulk API data is successfully download without any error. 

Description of fields available in Bulk API

https://developer.khoros.com/khoroscommunitydevdocs/reference/lithium-bulk-data-api

 

You find sample example queries to compute a few metrics from Bulk API

https://community.khoros.com/t5/Community-Analytics/Mapping-Community-Bulk-Data-API-fields-to-Commun...

 

Pageviews and all other metrics(except Visits and Unique Visitor) in Community Analytics should match Bulk API data. 

 To compute Visits and Unique Visitor Community Analytics leverages HLL, due to this approximation algorithm you may see -4% to 4% variation in Visits and Unique Visitor metrics.

https://community.khoros.com/t5/Khoros-Communities/Regarding-calculation-of-VISITS-and-UNIQUE-VISITO...

https://community.khoros.com/t5/Support-Forum/How-is-unique-visitors-calculated-in-LSI/m-p/195886

 

sudhakara.st

Thanks @SudhakaraS - We've had a ticket open for awhile now that they said looks to be a bug, so hoping we'll hear back soon on it. The gaps we are seeing are 20%+ at times, so I don't think it's due to the normal variations expected.

Hi there!

we're trying to establish the same.

I asked Khoros which values should we consider as the most accurate ones and finally they told us, that - for the visits we should use these from LSI Traffic-> Overview (top of the page / top bar)

We took the April as an example, and we found it strange that we have 4 different values 

ADMIN->  "METRICS": 151538
LSI Traffic Overview - top bar: 155933
LSI Traffic Overview - graph "Visits":  153760
Bulk API: 165 471

Hi
if these are the queries raised by customer community.anaplan.com, there is an internal Jira ticket created for the same. We would like to discuss the metric by metric, queries in the Jira. The data used to derive community analytics is the same date exposed via Bulk API. All the metrics(except Visits and Unique Visitor) in Community Analytics should match Bulk API data. Visits and Unique Visitors won't exactly match you may see -4% to 4% variation. We are happy to take look at each query and help. 

 

sudhakara.st

Thanks @SudhakaraS - I think Kris updated that ticket last week with some more information from our team!

Hi

by "-4% to 4% variation in Visits and Unique Visitor metrics." you meant differences between these two metrics?

 

We're trying to establish why there are differences between VISITS vs VISITS vs VISITS etc.

ADMIN->  "METRICS": 151538
LSI Traffic Overview - top bar: 155933
LSI Traffic Overview - graph "Visits":  153760
Bulk API: 165 471

 

 

I hope it's clear 🙂

Khoros Staff

"-4% to 4% variation in Visits and Unique Visitor metrics." is meant to the variance expected(Due to LSI uses HLL) between Visits/Unique_Visitor shown in LSI(CA) Traffic Overview - top bar(Summary) and Visits/Unique_Visitor computed using Bulk API data. 

Visits/Unique_Visitor shown in LSI(CA) Traffic Overview - top bar(Summary) and Visits/Unique_Visitor computed using Bulk API data are comparable. 

Visits metrics query for Bulk API data:

SELECT COUNT (DISTINCT visit.id) FROM <<bulkdata>> WHERE action.key="view" AND event.time.ms >= start_time_ms AND event.time.ms <= end_time_ms

Unique Visitor metrics query for Bulk API data:

SELECT COUNT (DISTINCT visitor.id) FROM <<bulkdata>> WHERE action.key="view" AND event.time.ms >= start_time_ms AND event.time.ms <= end_time_ms

If you observe the above queries Visits/Unique_Visitor metrics are derived using page view records. If page views are matching than HLL approximation is the only reason for the difference for the metric shown LSI(CA) Traffic Overview - top bar(Summary) and Visits/Unique_Visitor computed using Bulk API data. 

Once again we would like to repeat the data used to derive CA(LSI) metrics is the same as the data exposed via Bulk API. CA(LSI) stores this same data to elasticsearch and derive metrics with ES queries.


LSI Traffic Overview - top bar: 155933

vs

LSI Traffic Overview - graph "Visits":  153760 

this difference due to HLL and overlapping for a visit across month/days/hours.

Overlapping visits explained here

sudhakara.st

Welcome to the Technology board!

Curious about our platform? Looking to connect on social technology? You've come to the right place!

Are you a Lithium customer? For direct assistance from our Support team, please visit the Support Forum.