prasath
9 months agoHelper
API v2 Pagination Error
Hi,
I'm new to the Khoros community and I'm attempting to retrieve users data from February 2023 until the present using pagination with cursor which has around 1.6 million records. The pagination method is generally effective, but at some point, I encountered two different failure with the error messages while running the script at different times.
Note: I'm getting this error after 10/13 iteration.
Error: 403 Client Error: Forbidden for url: https://[Domain]/api/2.0/search?q=SELECT%20id,%20sso_id,%20email,%20banned,%20deleted,%20last_visit_time%20FROM%20users%20WHERE%20last_visit_time%20%3E%202023-02-01T00:00:00%20order%20by%20id%20ASC%20limit%2010000%20CURSOR%20'<cursor_value>'
Error: 504 Server Error: Gateway Time-out for url: https://[Domain]/api/2.0/search?q=SELECT%20id,%20sso_id,%20email,%20banned,%20deleted,%20last_visit_time%20FROM%20users%20WHERE%20last_visit_time%20%3E%202023-02-01T00:00:00%20order%20by%20id%20ASC%20limit%2010000%20CURSOR%20'<cursor_value>'
Additionally, I'm curious if there are any limitations on the number of API calls allowed per day. I'd appreciate any assistance in resolving this issue. Below is the script I'm using.
first_url = "https://[COMMUNITY DOMAIN]/api/2.0/search?q=SELECT id, sso_id, email, banned, deleted, last_visit_time FROM users WHERE last_visit_time > 2023-02-01T00:00:00 order by id ASC limit 10000"
def fetch_data(url, headers):
user_response = requests.get(url, headers=headers)
user_response.raise_for_status()
response = user_response.json()
# print(response)
print('Response message: ' + str(response['data']['size']))
print('Response status: ' + str(response['status']))
# print('Response message: ' + str(response['message']))
return response['data']
final_df = pd.DataFrame()
def process_data(data):
# Process the fetched data here
global final_df# Access the global final_df
# print(data)
df = pd.DataFrame(data)
final_df = pd.concat([final_df, df], ignore_index=True)
# print('df size', final_df.shape[0])
def paginate(url, headers):
while True:
data = fetch_data(url, headers)
process_data(data['items'])
# Check if there's a next cursor in the response
if 'next_cursor' in data:
# Set the next cursor for the next page request
url = first_url + f" CURSOR '{data['next_cursor']}'"
print('### Next URL ###', url)
# time.sleep(10)
else:
# No more pages available, break the loop
break
paginate(first_url, headers)