-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[Poc] Introduce Internal API as JSON RPC with JSON serialization #27468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This needs an AIP. And for what it matters please no JSON-RPC imho, but I would be much more in favor of something like grpc. Json is nice for interop but nothing else. |
|
There is AIP for this: |
|
Yep. AIP is even approved - conditionally re- on-the-wire transport mechanism (i.e remote method execution choice and serialization choice) which was deferred to "before founding PR". And while I iniitially advocated for GRPC (which answer both questions), there are some drawbacks of using GRPC (debuggability and threading model used) - this has been pointed by @ashb and @andrewgodwin mostly. There are some benefitst of using more "web standard" approach - JSON-RPC allows us to plug-in better into already existing mechanisms we use and deos not require from us to learn and operate any new technology. The tests performed (initally by me then later by Mateusz) have shown that the performance overhead from using binary RPC vs. JSON are not really meaningful in our context, so they are not decisive factor. I think the last thing that we need to see if whether we can replace Pickling with our own JSON serialization we alredy use in airflow - it seems like bad idea initially (because of the overhead) but again, it's likely negigible and the benefit of using more "standard" for us approach (we serialize internal objects using JSON Serializer a lot in various cases) and being able to debug it "on-the-wire" without extra tools are I think important benefits that should not be under-rated. I think here our own JSON Serializer trumps Pickle, because it already transparently handles a number of structures we will have to serialize including some K8S configuraitons (which require custom serialization when using pickle. The "easiness of development" and maintainability were for me important but as @mhenc mentioned - we did both POCs - GRPC here (I did it): #25094 and this one is using JSON-RPC and for me, it looks like the approach proposed by Mateusz is even easier to develop and maintain - and as long as our serialization approch works transparently for all the object that we need to send/retrieve. So i got convinced that what Ash and Andrew proposed is a good idea. It does not change the principle of the AIP - it's just a matter of implementation detail. From the development perspective, this will be mostly decorating of the methods that we want to turn into "remote" ones. @bolkedebruin - I think if you read through the discussions - maybe you will have other specifc comments about those choices - if they have anything to add on top of the above POCs. Tests and conclusions we have based on those. We deliberately took it slower with Mateusz and gave both of us (and the community) time to comment and raise their concerns, the AIP general approach is already approved, but if there are any specific concerns on those technology choices - we are still open to hear a bit more comments before we make the first foundng "real PR" rather than POC - as we promised when we called for voting on the AIP. Do you have some specific concerns or other criteria that we should consider? Just to summarize those that we took into account:
They are now listed in order of preference - mostly because we found out that speed/performance of serialization and transport is literally orders of magnitude less important than speed/performance of the operations we are going to serialize and there is very little difference of the overhead. |
|
I stand corrected, the AIP was there and I missed the voting. So any argument I make below is mood. The issue I find with JSON-RPC and with JSON in general is its lack of strong typing (yes I know of JSON Schema, but that is like XML and XSL) and that when thing become complex it starts becoming unreadable and you need a seperate IDE/parser for it (ohhh I forgot a ","). It is great for public APIs and interop but for internal ones not so much as you control both ends. This means lack of maintainability ("ohhh you forgot to update the schema") for me which trumps imho the readability over the wire part (as you control both ends and can be enabled for debugging purposes wth grpc/protobuf). Too much stuff happens at runtime rather than at compile time. I have never been happy with any choice for JSON anywhere except public facing APIs. Now I am gonna cry in a corner somewhere. |
The JSON-RPC is actually up for discussion still. Until the founding PR (or even after - I think the way we will implement int the way that we will be able to replace the implementation in the future if we find this is not good. In the implementation proposal of @mhenc this is literally an implementation detail - and vast majority of changes will be to make sure we implement tests amd make sure all the calls are mapped. This is because we are using "standard" HTTP stack rather than Grpc-like solution, because GRPC gets a bit more of a "tie-in" about the processing model for threading etc. And after a lot of reading and talking to some of my friends, and analysing issues - I tend to agree with @ash and @andrewgodwin that (among other issues), the threading model of GRPC is a big, unknown risk (Python Threads and GIL issues are still unsolved issue, and using proven set of backend fori/process-driven/gevent etc. approach is much more reilable, debuggable and scaleable). And JSON RPC is basically piggy-backing on that while providing a simple and standard semantics of mapping input/output parameters of methods. I'd say - let's give it a try and see from the final tests, if JSON-RPC is as good solution as we think it is, and we can revisit it later. Eventually the whole test harness and making sure that there are no DB methods left and that some of the components of ours are truly db-less will let us move forward with it. Once we get it implemented, Replacing the internals of communication between the components will be far less of a problem. |
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.