Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio Policies documentation needs clarification #96377

Closed
fancidev opened this issue Aug 29, 2022 · 12 comments
Closed

asyncio Policies documentation needs clarification #96377

fancidev opened this issue Aug 29, 2022 · 12 comments
Labels
docs Documentation in the Doc dir topic-asyncio

Comments

@fancidev
Copy link
Contributor

On this page:
https://docs.python.org/3/library/asyncio-policy.html
The second sentence says “Each event loop has a default policy, …”
What does this clause mean exactly? My understanding is that asyncio defines a default policy depending on the platform. But this sentence seems confusing. Would be great if it can be clarified.

@fancidev fancidev changed the title Documentation needs clarification asyncio Policies documentation needs clarification Aug 29, 2022
@AlexWaygood AlexWaygood added docs Documentation in the Doc dir topic-asyncio labels Aug 29, 2022
@gvanrossum
Copy link
Member

I have stumbled over the same sentence. Something's definitely off there. This whole section is a bit off. It's introducing at least two concepts, policy and context, without give a clear definition of either. I don't actually think there is a per-eventloop "default policy". If someone can track down the original author of these words using git blame I'd like to ask the original author how they meant it. (Not the people like Elvis or Carol who more recently tried to improve this text -- they didn't change the meaning, just the grammar.)

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Sep 25, 2022

Agreed it could be confusing. Policy is what users configure to use different event loop implementations. A context is more of an instance of policy which manages loop objects which can per be thread or anything.

@gvanrossum
Copy link
Member

Anyone interested in submitting a PR to clarify the docs here?

@kumaraditya303
Copy link
Contributor

cc @CAM-Gerlach @slateny

@CAM-Gerlach
Copy link
Member

As an asyncio beginner (some might use the term n00b :), the main part that confuses me reading the introductory paragraphs is the relationships between the process, event loop, policy and context (e.g. thread).

  • The first sentence implies there is one global policy per interpreter process (1:1), which in turn controls "the" (1:1) event loop,
    i.e. Process -sets 1-> Policy -controls 1-> Event Loop

    ("An event loop policy is a global per-process object that controls the management of the event loop.").

  • The second sentence implies that one of each of multiple event loops (N:1) has its own policy (1:1), which can be customized
    i.e. (Process -has N->) Event Loop(s) -has 1-> Policy

    ("Each event loop has a default policy")

  • The third sentence states that context is defined by the policy, and implies there is a 1:N relationship between policies and event loops,
    i.e. Policy -manages N-> Context(s) (== Thread by default) -has 1-> Event Loop

("A policy defines the notion of context and manages a separate event loop per context.")

The sentences 1 and 3 are inconsistent in that 1 implies there is "the" (i.e. exactly one) event loop per process and per policy, while 3 implies that a policy manages multiple event loops, one per context (that it also defines). Meanwhile, 2 is inconsistent with 1 and 3 by implying that each event loop belongs to exactly one policy, while 1 and 3 imply that each policy belongs to either one (1), or one or more (3) event loops. Based on the API reference and the original text, it appears 2 at least is not correct, and either 1, 3 or some combination might be closest to correct, e.g. Process -sets 1-> Policy -manages N-> Context(s) (== Thread by default) -has 1-> Event Loop, but I'm not 100% sure. Could an asyncio expert advise?

There is another issue—sentence 3 implies that each policy defines what a "context" is (e.g. a thread, or something else), and this seems to be an important concept, but this doc at least doesn't either reference or elaborate any further on the topic, e.g. as to what context each policy subclass defines, how that can be accessed/changed on the Policy object, and how the user can interact with it (how do they change contexts, or see the current contexts the Policy object is managing? Creating a new thread with threading?). I'd appreciate some clarity or a reference here as well.

If someone can track down the original author of these words using git blame I'd like to ask the original author how they meant it. (Not the people like Elvis or Carol who more recently tried to improve this text -- they didn't change the meaning, just the grammar.)

This would appear to be @vstinner (original docs) and @1st1 (more recent major revisions), with the specific sentence in question being introduced by later revisions of the latter.

Full details

@1st1 substantially wrote the current iteration, while @vstinner wrote the original, though much has changed since particularly the latter.

The original wording of what is now the first ≈two paragraphs was:

Event loop management is abstracted with a policy pattern, to provide maximal flexibility for custom platforms and frameworks. Throughout the execution of a process, a single global policy object manages the event loops available to the process based on the calling context. A policy is an object implementing the :class:AbstractEventLoopPolicy interface.

and the earliest version following the move/reorganization to the current file was:

Event loop management is controlled by an event loop policy, which is a global per-process object. There is a default policy and an API to change the policy. A policy defines the notion of context; a policy manages a separate event loop per context. The default policy's notion of context is defined as the current thread.

@gvanrossum
Copy link
Member

Yeah, this is a web of lies. There is a single per-interpreter Policy object, which is used to create new event loops and to get and set the "current" event loop, which is the loop for the current "context". The notion of context is undefined but in practice always refers to a thread.

The idea that in the main thread an event loop would be created for you the first time but in other thread it wouldn't is also implemented here. This was meant to make it possible to have a loop per thread but to require apps to be intentional about this, while having a default loop in the main thread. That idea didn't work out so well, and now we're going through the awkward process of recommending everyone use get_running_loop() instead of get_event_loop().

The other idea here was that there could be other kinds of contexts and event loops, e.g. you could have an event loop subclass supporting UI events, and maybe the UI world had its own notion of when that loop was usable (maybe only in the "main" thread). That never materialized (we still have no Tkinter support other than an alleged hack that people occasionally reinvent as a toy demo). Since then, numerous other implementation details have assumed the "one loop per thread" model.

(Another, somewhat related, idea that didn't work out so well was the concept of passing an explicit loop=... parameter to all asyncio APIs. That has been outright deprecated.)

In the end the Policy concept hasn't been very valuable, but it will be tricky to deprecate it. Perhaps we should conduct a thorough search for get_event_loop_policy and set_event_loop_policy to see what people do with it, if anything. But until we're deprecating it, we should document it properly.

(Also note that there is some mention of "process-wide policy" which is nonsense -- at best it is "interpreter-wide". Maybe that can be abbreviated as "global", since most people probably don't know about multiple interpreters when they get to these docs.)

@gvanrossum
Copy link
Member

Policies are also tied to the child watcher concept. That ought to be process-wide, since C-level signal handlers are set per-process. But we cannot share the necessary data structures across interpreters, so in practice this is also just per-interpreter (with the additional constraint that it can only work in the main interpreter).

@CAM-Gerlach
Copy link
Member

Thanks for the detailed and accessible explanation, @gvanrossum! Trying to distill the relevant bits into a replacement for those two paragraphs, here's a start:

An event loop policy is a global (per-interpreter) object used to get and set the currently-running event loop, as well as create new event loops. The default policy can be replaced with built-in alternatives to use different event loop implementations, or substituted by a custom policy class that overrides some or all of these behaviors.

The policy object gets and sets a separate running event loop per context. This is per-thread by default, though custom policies could define context differently.

Is this going in the right direction? Any suggestions for fixes/improvements?

Perhaps we should conduct a thorough search for get_event_loop_policy and set_event_loop_policy to see what people do with it, if anything.

I don't have anywhere near the expertise to properly interpret the results, but here's a grep.app search for the latter (which seemed more heavily used than the former, particularly outside vendored copies of the stdlib/tests and other contexts that weren't actual runtime uses of the API in question).

@gvanrossum
Copy link
Member

An event loop policy is a global (per-interpreter) object used to get and set the currently-running event loop,

Alas, it's not quite "currently running". There is a case where get_running_loop() raises (when no loop is active) but get_event_loop() returns an EventLoop instance (possibly one it just created). This was intended so that you could call loop.run_forever() or loop.run_until_complete(), but it caused so much confusion that asyncio.get_event_loop() now defers to _get_running_loop() and if that's None it prints a warning before calling policy.get_event_loop(). I think it is even possible to get yourself into a state where the policy returns a different loop than get_running_loop():

# Untested
loop1 = new_event_loop()
loop2 = new_event_loop()
set_event_loop(loop1)
loop2.run_forever()

Code running in loop2 will find that get_event_loop() returns loop2, but get_event_loop_policy().get_event_loop() returns loop1.

as well as create new event loops. The default policy can be replaced with built-in alternatives to use different event loop implementations, or substituted by a custom policy class that overrides some or all of these behaviors.
The policy object gets and sets a separate running event loop per context. This is per-thread by default, though custom policies could define context differently.

The rest looks great!

Is this going in the right direction? Any suggestions for fixes/improvements?

Perhaps we should conduct a thorough search for get_event_loop_policy and set_event_loop_policy to see what people do with it, if anything.

I don't have anywhere near the expertise to properly interpret the results, but here's a grep.app search for the latter (which seemed more heavily used than the former, particularly outside vendored copies of the stdlib/tests and other contexts that weren't actual runtime uses of the API in question).

Makes sense that people set the policy more often than they get it, since getting it is implied when calling things like get_event_loop() and new_event_loop().

@CAM-Gerlach
Copy link
Member

CAM-Gerlach commented Sep 27, 2022

Alas, it's not quite "currently running".

Ah, yeah—that was one point I wasn't quite sure on, and spent a while playing around with what else to call it to avoid that, but I mostly drew a blank. Any suggestions here for what to call this? Just "current event loop"? "active event loop"? Something else? In the second instance, I just just drop "running" and it should work fine, but we still need to characterize it in the first instance...somehow.

Other than that, I can go ahead with a PR with the above, thanks.

@gvanrossum
Copy link
Member

Yeah, "current event loop" is probably the closest we can get to this. The docstring for BaseDefaultEventLoopPolicy.get_event_loop() uses "event loop for the current context" which is perhaps more correct but requires definition of "context" which is defined later. We don't have to be complete here, since this is just the intro for the more thorough API specification that follows.

@CAM-Gerlach
Copy link
Member

Thanks, sounds good. I've opened #97603 to implement that (along with adding reST refs to those two paras where relevant).

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 27, 2022
… accurate (pythonGH-97603)

Also fix up some cross-references in the asyncio docs.
(cherry picked from commit cc0f3a1)

Co-authored-by: C.A.M. Gerlach <[email protected]>
gvanrossum pushed a commit that referenced this issue Sep 27, 2022
…ate (#97603)

Also fix up some cross-references in the asyncio docs.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 27, 2022
… accurate (pythonGH-97603)

Also fix up some cross-references in the asyncio docs.
(cherry picked from commit cc0f3a1)

Co-authored-by: C.A.M. Gerlach <[email protected]>
miss-islington added a commit that referenced this issue Sep 28, 2022
…ate (GH-97603)

Also fix up some cross-references in the asyncio docs.
(cherry picked from commit cc0f3a1)

Co-authored-by: C.A.M. Gerlach <[email protected]>
miss-islington added a commit that referenced this issue Sep 28, 2022
…ate (GH-97603)

Also fix up some cross-references in the asyncio docs.
(cherry picked from commit cc0f3a1)

Co-authored-by: C.A.M. Gerlach <[email protected]>
pablogsal pushed a commit that referenced this issue Oct 22, 2022
…ate (GH-97603)

Also fix up some cross-references in the asyncio docs.
(cherry picked from commit cc0f3a1)

Co-authored-by: C.A.M. Gerlach <[email protected]>
auxsvr added a commit to auxsvr/scrapy that referenced this issue Feb 15, 2023
`asyncio.get_event_loop_policy().get_event_loop() is asyncio.get_running_loop()` is not always true, as mentioned in python/cpython#96377 (comment). The AsyncioSelectorReactor runs in the second thread and uses the already initialised event loop in the main thread to run the crawler, so a single event loop will be running code from two threads.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir topic-asyncio
Projects
None yet
Development

No branches or pull requests

5 participants