Client should not un-paginate large results. Should return a `generator` that does this for you. #433

tnixon · 2024-04-04T00:12:58Z

Is there an existing issue for this?

I have searched the existing issues

Is your feature request related to a problem? Please describe.

When fetching historical data, even simple queries (fetching all trades / quotes for a single symbol on a single day) can have very large result sets which are paginated by the API. The data client attempts to un-paginate these and load them all into a single return structure. This is very slow (probably the main cause for #204). It also means that the client consumer has no choice but to allow this process to run (single-threaded) until it completes, or potentially fails with an OOM or similar.

Describe the solution you'd like.

The client should return a data structure that gives easy access to the paginated results, without actually loading them. The consumer can then decide how to access these results - possibly by looping through them in a single-threaded manner, but potentially also by parallelizing this data-loading to make it more efficient. It would also give the consumer the option of serializing each page of results and so avoid the OOM issue of building very large data structures in memory.

A Python generator seems a natural way to provide this functionality. The client can return an object that contains a generator which will (when accessed) fetch the appropriate pages of data from the API. This might look something like:

client = StockHistoricalDataClient(...)

trades_request = StockTradesRequest(symbol_or_symbols='NVDA')
trades_resultset = client.get_stock_trades_resultset(trades_request)

for(page_data in trades_resultset):
    # do something with the data (summarize it, serialize it, etc.)

note here I'm assuming that trades_resultset is a generator

Describe an alternate solution.

Another way to address this is to provide a client method for fetching an individual page of results, something like:

client = StockHistoricalDataClient(...)

trades_request = StockTradesRequest(symbol_or_symbols='NVDA')
trades_resultset = client.get_stock_trades_resultset(trades_request)

for(page in trades_resultset.pages):
    page_data = client.get_stock_trades_data(trades_request, page)
    # do something with the data (summarize it, serialize it, etc.)

note in this example I'm assuming that trades_resultset is an object that contains a reference to an iterator over page symbols.

Anything else? (Additional Context)

Give the user the option of how to handle fetching large data. Don't force them to wait on a single-threaded and potentially failure-bound process.

The text was updated successfully, but these errors were encountered:

tnixon · 2024-04-04T18:30:51Z

PS - I am willing to prepare a PR on this (as soon as I can carve out some time).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client should not un-paginate large results. Should return a `generator` that does this for you. #433

Client should not un-paginate large results. Should return a `generator` that does this for you. #433

tnixon commented Apr 4, 2024 •

edited

Loading

tnixon commented Apr 4, 2024

Client should *not* un-paginate large results. Should return a generator that does this for you. #433

Client should *not* un-paginate large results. Should return a generator that does this for you. #433

Comments

tnixon commented Apr 4, 2024 • edited Loading

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

tnixon commented Apr 4, 2024

Client should not un-paginate large results. Should return a `generator` that does this for you. #433

Client should not un-paginate large results. Should return a `generator` that does this for you. #433

tnixon commented Apr 4, 2024 •

edited

Loading