-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batching #430
Comments
Something like transport-level batching? Yes, that could be useful.
You would probably need to create Any help is greatly appreciated. Please check the CONTRIBUTING.md file. The PR needs to have 100% test code coverage to be accepted. |
I have one from a crypto exchange. We just need to generate a token but's pretty straightforward, I've managed to obtain one using a temporary email.
I'd create another transport that derives from the http / requests. In my case BatchingTransport Actually, I have a working prototype of this on another public repo with 100% coverage, using the github ci tool. I've used github secrets to store the api token. I've used threads and futures to implement this, but we could do better than that. As a prototype though it's ok. You have to use the execution result instead of dict to keep an open communication within the transport processor. Check out this: The client should care to not wait for the data from the execution result, So part of the validation should be split to outside of it, as a helper function. In my case called |
I'm sorry, I've realized I haven't explained myself enough My implementation uses the same client you have, it receives a single document each time, but the clever thing is that if you execute it many times in a row on a short period of time, then a little delay in the transport code allows to pack the documents together before sending the entire batch to the server. So every time you receive an execution, the transport is waiting in another thread for extra documents in a queue. After this short waiting, the batcher start processing, and it repeats the cycle. Every result is sent to the client using an special ExecutionResult that has a future injected though the constructor in order to achieve lazy loading, so it fetches and wait for the result of the future if and only if you get a value from it. The transport returns each one of the responses in the list received from the server using the future: .set_result or .set_exception I hope I've explained myself enough, if not let me know, I'm glad to contribute |
I see. So you want to mix two different things:
It's too bad you want to do that with a sync transport as that would have been much more straightforward with async... Now, if I understand correctly, in your batch example, it does not actually help to wait using a timer as you have to first provide all the queries, and then you run So that code with the timer only starts to be useful if the queries are provided from different threads, in that case you don't know when multiple different queries from different threads might be needed in the same time frame, and waiting might in some cases be useful. For that use case, we could call The reason I not really sure about sending a Future from the transport is that if you check the client.py file of gql, you'll see we can already return a Dict or an ExecutionResult from the execute methods and for that we have ugly In any case we need to separate the concerns, adding only an We could split this in multiple PRs:
Note: I don't understand how your lazy loading could work. In your code the query will be sent after the timer except if the user canceled the future before. But if the user is simply receiving the future and not bothering to get the data, then the future is not canceled and the query will be sent to the server. Even if |
Right! I agree with your concerns. Better if we split all the different variations of transports. In this case what I call "lazy-load" is actually not the official meaning per se. What it actually means is that the result is not yet fetched at the exact moment you asked it, but it will be in the future, so it's more like a "promise". In my case the requests are never getting cancelled; what the delay does, is to wait for couple of milliseconds so a bunch of documents are enqueued by the client and gathered by the transport so they could be sent at once to the server, it's a useful trick that I believe I've seen it on other implementations of graphql clients. But I agree with you 100% on that this kind of tricks are not optimal. My decision to do this is because it's more transparent to the user to simply use the client as usual and allow them to simply set I think it's a good idea to start with the sync batch first that receives a collection of documents and returns another collection with the results. I'll open a PR for it. Thanks! |
Alright. To be clear this should update the existing RequestsHTTPTransport and not create another transport. |
Yes, the design would be to add a A thing that I didn't understand was what did you mean with?:
Is it applicable in this case? I think you mean that instead of using threads and futures, a simple async/promise would have been more straightforward, right? |
👍
Right |
Agree, actually I wrote this code about six years ago and I didn't know about async 😄 I'm struggling a little bit about the signature of the
Biggest problem with solution 1, is that it seems very dangerous to me, since the user may end up mixing the order of the params and send the a query in the wrong corresponding order between its elements Solution 2 seems better but it implies the user has to build a If I have to choose, I'd go with the Do you have any idea about it? Thanks |
Also, I believe some queries don't need to receive a variable nor operation name. So if we decide to receive an iterable for every parameter, we'll end up with some transport.execute_batch(
documents=[doc1, doc2, doc3],
variable_values=[None, variables2, None],
operation_names=["operation1", None, "operation3"],
) With a datastructure instead: transport.execute_batch(
queries=[
GraphQLQuery(document=doc1, operation_name="operation1")
GraphQLQuery(document=doc2, variable_values=variables2)
GraphQLQuery(document=doc3, operation_name="operation3"),
],
) |
Yes, using a dataclass seems to be a much better option indeed. |
but I would use something like |
The spec says: Therefore: @dataclass(frozen=True)
class GQLRequest:
document: DocumentNode
variable_values: Optional[Dict[str, Any]] = None
operation_name: Optional[str] = None
|
I've opened a PR. Draft for now, I need to adjust the docstrings. Please let me know if you have any thoughts on anything. Thanks |
@itolosa I made the PR #436 to implement the auto batching with sync transports, with the inspiration of your code. Could you please take a look? You can test it with a max batch of 3 on the countries backend with the following command:
|
I looked your code and it seems ok. I can see that your implementation requires that the user has to send the executions using threads. While this is correct at the implementation level, and also has the advantage of not breaking the interface of the client, I believe there's an alternative way to improve this a little bit. I mean, if we're already using a thread and a queue in the client (therefore is a thread safe implementation), why not remove the waiting for the future result inside the client: request = GraphQLRequest(
document,
variable_values=variable_values,
operation_name=operation_name,
)
future_result = self._execute_future(request)
result = future_result.result() # <<< remove this. client.py:848 and find a way to allow the user to call execute in this way: session.execute(query1)
session.execute(query2)
... internally making a single batch request with those queries: # >>> to the server
[
query1,
query2
] I know that the real issue is that What do you think about it? |
Let's talk about use cases. Use case 1: automatic batchingWhat is the main use for automatically batching GraphQL requests together? In javascript, for a modern web page, you could have a multitude of components making GraphQL requests independently at the load of a page. That could be a bit inefficient so waiting a bit for all those queries to be made and sending them in a single request could be better. This is how apollo-link-batch-http (the main library for batching requests I think) operates, allowing the developers of the components not to care about it and not changing the components, not changing the methods used to send the requests. Note that in this case:
That approach is how it is currently implemented in the PR #436 introducing automatic batching, with the introduction of the If we don't receive requests concurrently, meaning from a single thread, then it does not make any sense to wait if we want to combine multiple requests in a batch! Use case 2: manual batchingFor that use case, we are in a single thread and we want to send multiple GraphQL requests. We could also use the automatic batching feature, and getting futures from multiple requests as you proposed, and waiting the Current stateIn fact, what you want to do is already possible by calling with client as session:
request_eu = GraphQLRequest(query, variable_values={"continent_code": "EU"})
future_result_eu = session._execute_future(request_eu)
request_af = GraphQLRequest(query, variable_values={"continent_code": "AF"})
future_result_af = session._execute_future(request_af)
result_eu = future_result_eu.result().data
result_af = future_result_af.result().data
assert result_eu["continent"]["name"] == "Europe"
assert result_af["continent"]["name"] == "Africa" and the equivalent code with with client as session:
request_eu = GraphQLRequest(query, variable_values={"continent_code": "EU"})
request_af = GraphQLRequest(query, variable_values={"continent_code": "AF"})
result_eu, result_af = session.execute_batch([request_eu, request_af])
assert result_eu["continent"]["name"] == "Europe"
assert result_af["continent"]["name"] == "Africa" ConclusionEither we have the control of the flow of execution and we should use I guess we could make a |
Thank you for your thorough explanation. The way I proposed to use this feature has another value for me, mostly about personal style and software design, but that's for another time. I'm not committed into making this issue into a flamewar. Also, you're right that I can use execute as I want, as in your example. I'll review your code in a few hours and I'll let you know if I found something to improve. Thanks! |
Hi there,
I've been working with this library for a while, and I'm using it to query an endpoint that is capable of receive multiple queries at once. The API returns responses in the same order as requested.
I've found this feature is implemented by other libraries under the name of Batching. And I've successfully implemented this in Python too. My question is, have you considered this already, are you accepting PRs for this?
I'd be glad to contribute
The text was updated successfully, but these errors were encountered: