Hey all - Is anyone utilizing the bulk data API, and how do the numbers line up compared to Community Analytics/LSI? We're trying to get this built out internally, but are finding large discrepancies in the numbers. LSI more often than not is providing higher numbers, but not always. Some weeks are much worse than other weeks, but we're seeing a 20% gap as we pull different weeks.
I would imagine/everything I see in the documentation is that these should align extremely close, but we're clearly missing something.
Hi Stan,
There is another thread that may be helpful to you: Where is the best place to find accurate completed registrations numbers?
In particular, the post here states that:
LSI is also the source for Bulk Data API data.
These discrepancies still do not make sense to me either. 🤔
Please recheck your queries used for computing the above metrics. Make sure the timeline and Bulk API data is successfully download without any error.
Description of fields available in Bulk API
https://developer.khoros.com/khoroscommunitydevdocs/reference/lithium-bulk-data-api
You find sample example queries to compute a few metrics from Bulk API
Pageviews and all other metrics(except Visits and Unique Visitor) in Community Analytics should match Bulk API data.
To compute Visits and Unique Visitor Community Analytics leverages HLL, due to this approximation algorithm you may see -4% to 4% variation in Visits and Unique Visitor metrics.
https://community.khoros.com/t5/Support-Forum/How-is-unique-visitors-calculated-in-LSI/m-p/195886
Thanks @SudhakaraS - We've had a ticket open for awhile now that they said looks to be a bug, so hoping we'll hear back soon on it. The gaps we are seeing are 20%+ at times, so I don't think it's due to the normal variations expected.
Hi there!
we're trying to establish the same.
I asked Khoros which values should we consider as the most accurate ones and finally they told us, that - for the visits we should use these from LSI Traffic-> Overview (top of the page / top bar)
We took the April as an example, and we found it strange that we have 4 different values
ADMIN-> "METRICS": 151538
LSI Traffic Overview - top bar: 155933
LSI Traffic Overview - graph "Visits": 153760
Bulk API: 165 471
Hi
if these are the queries raised by customer community.anaplan.com, there is an internal Jira ticket created for the same. We would like to discuss the metric by metric, queries in the Jira. The data used to derive community analytics is the same date exposed via Bulk API. All the metrics(except Visits and Unique Visitor) in Community Analytics should match Bulk API data. Visits and Unique Visitors won't exactly match you may see -4% to 4% variation. We are happy to take look at each query and help.
Thanks @SudhakaraS - I think Kris updated that ticket last week with some more information from our team!
Hi
by "-4% to 4% variation in Visits and Unique Visitor metrics." you meant differences between these two metrics?
We're trying to establish why there are differences between VISITS vs VISITS vs VISITS etc.
ADMIN-> "METRICS": 151538
LSI Traffic Overview - top bar: 155933
LSI Traffic Overview - graph "Visits": 153760
Bulk API: 165 471
I hope it's clear 🙂
"-4% to 4% variation in Visits and Unique Visitor metrics." is meant to the variance expected(Due to LSI uses HLL) between Visits/Unique_Visitor shown in LSI(CA) Traffic Overview - top bar(Summary) and Visits/Unique_Visitor computed using Bulk API data.
Visits/Unique_Visitor shown in LSI(CA) Traffic Overview - top bar(Summary) and Visits/Unique_Visitor computed using Bulk API data are comparable.
Visits metrics query for Bulk API data:
SELECT COUNT (DISTINCT visit.id) FROM <<bulkdata>> WHERE action.key="view" AND event.time.ms >= start_time_ms AND event.time.ms <= end_time_ms
Unique Visitor metrics query for Bulk API data:
SELECT COUNT (DISTINCT visitor.id) FROM <<bulkdata>> WHERE action.key="view" AND event.time.ms >= start_time_ms AND event.time.ms <= end_time_ms
If you observe the above queries Visits/Unique_Visitor metrics are derived using page view records. If page views are matching than HLL approximation is the only reason for the difference for the metric shown LSI(CA) Traffic Overview - top bar(Summary) and Visits/Unique_Visitor computed using Bulk API data.
Once again we would like to repeat the data used to derive CA(LSI) metrics is the same as the data exposed via Bulk API. CA(LSI) stores this same data to elasticsearch and derive metrics with ES queries.
LSI Traffic Overview - top bar: 155933
vs
LSI Traffic Overview - graph "Visits": 153760
this difference due to HLL and overlapping for a visit across month/days/hours.
Overlapping visits explained here
Welcome to the Technology board!
Curious about our platform? Looking to connect on social technology? You've come to the right place!