Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't hang on while loop when reconnecting #1663

Merged

Conversation

AstreaTSS
Copy link
Member

@AstreaTSS AstreaTSS commented Apr 23, 2024

Pull Request Type

  • Feature addition
  • Bugfix
  • Documentation update
  • Code refactor
  • Tests improvement
  • CI/CD pipeline enhancement
  • Other: [Replace with a description]

Description

There have been some user reports on 3.12 specifically (not sure why - the bug doesn't appear to have anything to do with anything 3.12 related) that some reconnects cause the CPU to spike to 100% and for the program to freeze. Eventually someone reported it as an issue, and it just so happens that I have the time to investigate now.

It turns out that interactions.py was hanging on a while loop. I suggest reading the old code and a snippet from the guild create processor and spotting the issue yourself, because it sticks out to you when you notice:

# old websocket ready code
while True:
    try:
        await asyncio.wait_for(self._guild_event.wait(), self.guild_event_timeout)
        if len(ready_guilds) == len(expected_guilds):
            break
    except asyncio.TimeoutError:
        break
@Processor.define()
async def _on_raw_guild_create(self, event: "RawGatewayEvent") -> None:
    new_guild = not self.cache.get_guild(event.data["id"])
    guild = self.cache.place_guild_data(event.data)

    self._user._guild_ids.add(to_snowflake(event.data.get("id")))

    self._guild_event.set()
    # more afterwards, but not relevant

Breaking down the old websocket code (let's assume expected_guilds > 1):

  • The code enters a while loop. It does so to wait for every guild to fire off a guild create event processor so that the bot can account for every guild (useful for us since we want to delay the actual Ready event we have until all of these guilds are resolved). Notice how it's a while True block.
  • On its first iteration, it waits for self._guild_event to be set. Let's say that it doesn't time out.
  • The first "guild create" event gets dispersed for one guild. On the processor for the guild create event, self._guild_event gets set, breaking us out of our waiting in our websocket ready.
  • ready_guilds is increased by 1 (making it 1), but this isn't equal to expected_guilds, so the code doesn't reach the break and loops again.
  • Because self._guild_event has been set, the wait instantly completes, even though no new guild create event has dispatched. The same conditions from the prior bullet point thus apply, causing it to repeat infinitely.

I don't have any clue how this clue worked for 3.11 and lower, or if it even does. It's probable since that interactions.py rarely needs to do reconnects where RESUME isn't used, this issue was just never encountered. Either that or the improvements in asyncio in 3.12 were that significant.

Regardless, this PR fixes the issue by copying code from just above it (which only triggers during startup READYs), because... honestly, why were these two different in structure in the first place?

Changes

  • Make the while loop for reconnecting "websocket ready" events have a condition to break out if the expected number of guilds is reached, and clear the guild event... event on each loop to prevent an infinite loop.

Related Issues

#1662

Test Scenarios

@slash_command()
async def test(ctx: SlashContext):
    await ctx.send("Hi!")
    await ctx.bot.ws.reconnect(resume=False)

Python Compatibility

  • I've ensured my code works on Python 3.10.x
  • I've ensured my code works on Python 3.12.x

Checklist

  • I've run the pre-commit code linter over all edited files
  • I've tested my changes on supported Python versions
  • I've added tests for my code, if applicable
  • I've updated / added documentation, where applicable

Copy link
Contributor

@Scrxtchy Scrxtchy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of the linked issue, these changes have fixed the issue from appearing on my system

Copy link
Member

@silasary silasary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable

@silasary silasary merged commit 0122f7b into interactions-py:unstable Apr 23, 2024
2 checks passed
@AstreaTSS AstreaTSS deleted the non-resume-reconnect-fix branch April 23, 2024 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants