Skip to content

Conversation

@mhenc
Copy link
Contributor

@mhenc mhenc commented Nov 24, 2022

Introduce Internal API as part of the webserver (standalone version in a follow-up) with a single JSON-RPC endpoint /internal/v1/rpcapi accepting 3 params "jsonrpc", "method_name" and "params".

Add a decorator for making calls to the Internal API. When method is decorated it is executed in one of two modes, depending on configuration

  • [core]database_access_isolation=False - run method locally, as before
  • [core]database_access_isolation=True - make RPC call to Internal API. The method must be first registered one the Internal API server-side.

All communication is serialized as JSON using BaseSerialization from Airflow.

Also decorate first function DagFileProcessor.update_import_errors with @internal_api_call decorator

@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues labels Nov 24, 2022
@mhenc
Copy link
Contributor Author

mhenc commented Nov 24, 2022

Sending for initial review ( I will add tests soon).
@potiuk @vincbeck @kosteev Can you take a look?

@mhenc mhenc force-pushed the internal_api_server branch from 7b8918f to a9e2f10 Compare December 6, 2022 12:51
@mhenc mhenc requested review from potiuk and removed request for vincbeck December 6, 2022 12:52
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really love how small and succint it is.

The way we have it is I think absolutely minimal amount of code we need to achieve the robust and easy to maintain internal API.

I think this is a great "founding PR".

Few nits, but other than that - LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary use for operationId is for code generators to generate method names. In our case, when we dynamically generate "actual" method names and parameters and pass them via json-rpc construct, it does not really matter (there will always be a single method to call) but I agree with @vincbeck that "json_rpc" is poor name here:.

How about internal_airflow_api

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the way how Open API specification describes it conformnce with JsonRPC. We should have it in. Various tools that might be later employed might use the information (swagger UI for example to generate the documentation and describe the API).

This is not crucial in our case, the Swagger UI and other OpenAPI artifacts are really "side-efffect" here - we have no-one to consume those artifacts as our API is purely internal and most of it is not really "fixed" and described (i.e. the actual methods and parameters are not really validated/processed by the API specification - because they are dynamically generated on both client and server side - so this is not actually very needed, but might be useful in the future in case we will do any kind of additional tooling that might rely on inspecting the specification and using the "metadata" about JsonRPC under the hood.

BTW. This is perfectly OK we do not have those methods and parameters described. This is not the "usual" API that you expose and document. Both client and server in this communication are guaranteed to have single source of truth (inspection of the method names and their parameters and serializing them using our own serializer).

So we are good here. The Open API specification we have looks cool:

  • single endpoint
  • JSonRpc conformance
  • no boiler-plate growing with every method added
  • single source of truth for the actual "schema" of passed data
  • the data is easy to inspect by human without extra tools (method names, and parameters send as JSON-serialized data).
  • guaranteed client-server compatibility

I really like it :)

@mhenc mhenc force-pushed the internal_api_server branch from a9e2f10 to 15ee13e Compare December 7, 2022 15:22
@mhenc mhenc requested a review from potiuk December 7, 2022 15:30

log.debug("Calling method %.", {method_name})
try:
output = handler(**params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized something here. Dont we have an issue here? All functions listed in METHODS_MAP are annotated with @internal_api_call, so when we execute these functions here, we'll trigger again the decorator and as such, send a rpc request. Am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I have an issue for that #28267
The idea is: some components (like Scheduler/Webserver/Internal API) will always run these methods locally

@pierrejeambrun pierrejeambrun added this to the Airflow 2.6.0 milestone Jan 9, 2023
@mhenc mhenc deleted the internal_api_server branch February 13, 2023 08:54
@pierrejeambrun pierrejeambrun added AIP-44 Airflow Internal API changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) labels Feb 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-44 Airflow Internal API area:Scheduler including HA (high availability) scheduler area:webserver Webserver related Issues changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants