Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally disable disconnects in read_response #2695

Merged
merged 6 commits into from
May 8, 2023

Conversation

kristjanvalur
Copy link
Contributor

@kristjanvalur kristjanvalur commented Apr 4, 2023

This PR is a re-submit of #2506 to start with a clean slate.
That PR also fixed all the recent issues including #2624 , which prompted recent changes to async code that are not necessary.

Pull Request check-list

Please make sure to review and check all of these items:

  • Does $ tox pass with this change (including linting)?
  • Do the CI tests pass with this change (enable it first in your forked repo and wait for the github action build to finish)?
  • Is the new or changed code fully tested?
  • Is a documentation update included (if this change modifies existing APIs, or introduces new ones)?
  • Is there an example added to the examples folder (if applicable)?
  • Was the change added to CHANGES file?

NOTE: these things are not required to open a PR and can be done
afterwards / while the PR is open.

Description of change

Please see description of #2506 for the original reason for that pull request.

In addition, the relevant changes also solve recently observed issues with CancelledError causing connections to
remain in indeterminate state.

  1. We fix the original regression, allowing overrides for when we do want to not terminate connection.
  2. We undo changes from pr Minor fixes for #2666 and enhanced async test #2673 and related changes, bringing the client.py and cluster.py files to their original form.
  3. We fix the test suite in test_cwe_404.py to be more reliable and less reliant on actual timing.
  4. We add a host_port_remap feature to RedisCluster (see https://redis-py-cluster.readthedocs.io/en/stable/client.html)
  5. We fixe the tests for cluster mode which were previously not doing anything because RedisCluster ignored the ports for
    the DelayProxy. the host_port_remap feature is used to ensure that the connections go through the DelayProxy objects.

@codecov-commenter
Copy link

codecov-commenter commented Apr 4, 2023

Codecov Report

Patch coverage: 71.17% and project coverage change: -13.90 ⚠️

Comparison is base (e52fd67) 92.30% compared to head (447c732) 78.41%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2695       +/-   ##
===========================================
- Coverage   92.30%   78.41%   -13.90%     
===========================================
  Files         116      116               
  Lines       30093    30169       +76     
===========================================
- Hits        27778    23656     -4122     
- Misses       2315     6513     +4198     
Impacted Files Coverage Δ
redis/asyncio/cluster.py 16.93% <0.00%> (-75.30%) ⬇️
redis/asyncio/connection.py 85.59% <50.00%> (-2.17%) ⬇️
redis/connection.py 85.68% <55.55%> (-0.53%) ⬇️
tests/test_asyncio/test_cwe_404.py 71.25% <67.12%> (-25.69%) ⬇️
tests/test_asyncio/test_commands.py 97.98% <90.47%> (-0.08%) ⬇️
redis/asyncio/client.py 93.46% <100.00%> (+0.85%) ⬆️
redis/client.py 88.59% <100.00%> (-0.96%) ⬇️
tests/test_asyncio/test_connection.py 98.06% <100.00%> (ø)
tests/test_commands.py 89.85% <100.00%> (-0.01%) ⬇️
tests/test_connection.py 98.34% <100.00%> (ø)

... and 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@kristjanvalur kristjanvalur force-pushed the kristjan/interrupt-2 branch 4 times, most recently from 562be35 to b1dbc5d Compare April 5, 2023 13:03
@kristjanvalur kristjanvalur changed the title Kristjan/interrupt 2 Optionally disable disconnects in read_response Apr 5, 2023
@kristjanvalur kristjanvalur marked this pull request as ready for review April 5, 2023 14:26
@@ -1319,6 +1316,30 @@ async def close(self, attr: str = "nodes_cache") -> None:
)
)

def remap_host_port(self, host: str, port: int) -> Tuple[str, int]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Necessary, if we are to use the DelayProxy for the cluster tests. See tests/test_asyncio/test_cwe_404.py. In its current form it is not actually doing anything. The format is taken from the redis-py-cluster project, see PR description for more info.

@kristjanvalur
Copy link
Contributor Author

Is there really no interest in fixing this issue properly? @ikonst, @chayim (for context, this is my many months old bugfix, which is also fixes the recently discoveed security problems with async, along with fixes to the newly added CWE tests)

@kristjanvalur kristjanvalur force-pushed the kristjan/interrupt-2 branch 3 times, most recently from cf2ae67 to 2fefdf2 Compare April 12, 2023 13:27
@ikonst
Copy link

ikonst commented Apr 12, 2023

I'm not a redis-py maintainer so I can't do much, but can volunteer a review.

@ikonst
Copy link

ikonst commented Apr 12, 2023

Initial reaction is that with 5 discrete improvements, this is a "workspace". Also consider if 1 year from now someone would need to understand whether this PR caused a regression, it'll be hard to reason about.

Can a change like "We fix the test suite in test_cwe_404.py to be more reliable and less reliant on actual timing" be made in a separate PR that's a no-brainer to merge? (as it would clearly change only a test file and the change would be obvious)

Of course, to have the motivation to break things up, we need some commitment from maintainers (👋 @chayim) to review those separate PRs, otherwise it's more work towards no purpose.

@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Apr 12, 2023

I'm not a redis-py maintainer so I can't do much, but can volunteer a review.

Sorry, my mistake. I'm sometimes struggling in this project to get attention. And I find that weird given that I have been a large contributor to the async part of redis-py for the past year.

Initial reaction is that with 5 discrete improvements, this is a "workspace". Also consider if 1 year from now someone would need to understand whether this PR caused a regression, it'll be hard to reason about.

This PR is essentially started as a bugfix, to fix a regression (#2499), caused by me in pr #2104.

Keeping connections open after receiving a BaseException seemed like a good idea at the time, but I was convinced that it was more trouble than it was worth, (see discussion in #2506) and I changed my pr into essentially a revert of #2104, but with the option of keeping the connection open for special cases, like timeouts.

That's the pr. This then has lingered for months, while I keep it up-to-date.

Recently, a different incarnation of #2499 showed up. It caused a flurry of changes to the async code to be made by the maintainers, while a fix for the issue was already here, in #2506, a fix to exactly that problem

Therefore, this PR contains the following:

  1. A fix for BaseException at I/O corrupts Connection #2499, which happens to be the same as for Off by 1 - Canceling async Redis command leaves connection open, in unsafe state for future commands #2624 and related issues. This is in connection.py and client.py for both sync and async.
  2. It contains a unit test designed to verify the fix for Off by 1 - Canceling async Redis command leaves connection open, in unsafe state for future commands #2624.
  3. It undoes the changes made in response to Off by 1 - Canceling async Redis command leaves connection open, in unsafe state for future commands #2624, which never should have been done. The changes made, using asyncio.shield() are detrimental both for code, performance and other things. These changes were in client.py and cluster.py in the async part
  4. It embraces and fixes the unit-tests which were added in Off by 1 - Canceling async Redis command leaves connection open, in unsafe state for future commands #2624. They were timing related and the cluster test was not working at all.

As such, this PR is really about undoing changes, and adding unit tests. It really cannot be split into separate PRs because it only works as a whole.

I could theoretically split the fixes to the test_cwe_404.py and the addition of a remap_host_port argument to the RedisCluster off into a separate PR, but it would make this one dependent on it. we really, really, should get the old behaviour back and remove the asyncio.shield() hack from the code.

@ikonst
Copy link

ikonst commented Apr 12, 2023

I think it's fine to make this PR dependent on it. Whatever you can split off, please do.

@dvora-h dvora-h requested a review from chayim April 13, 2023 12:53
@kristjanvalur
Copy link
Contributor Author

kristjanvalur commented Apr 13, 2023

See #2706 . This contains the changes to the unittest, and adds the remap feature to the RedisCluster.
Unfortunately, my fixing the unittests revealed that master is still not fixed. with regards to the problems that test_cwe_404.py was trying to verify that were fixed. Only my original fix does that.

I don't think there is any point trying to fix a unit-test which is trying to verify a fundamentally flawed fix to a problem.

What I can do, if it makes this PR more palatable to people, is to collapse all of the changes into a new set of coherent changes so that it is easier to understand what's going on. There is a lot of history here, since this PR has been open for so long.

@ikonst
Copy link

ikonst commented Apr 14, 2023

What I can do, if it makes this PR more palatable to people, is to collapse all of the changes into a new set of coherent changes so that it is easier to understand what's going on. There is a lot of history here, since this PR has been open for so long.

Given that no serious review has begun, this sounds like a win to me.

  1. We fix the original regression, allowing overrides for when we do want to not terminate connection.
  2. We undo changes from pr Minor fixes for #2666 and enhanced async test #2673 and related changes, bringing the client.py and cluster.py files to their original form.
  3. We fix the test suite in test_cwe_404.py to be more reliable and less reliant on actual timing.
  4. We add a host_port_remap feature to RedisCluster (see https://redis-py-cluster.readthedocs.io/en/stable/client.html)
  5. We fixed the tests for cluster mode which were previously not doing anything because RedisCluster ignored the ports for the DelayProxy. The host_port_remap feature is used to ensure that the connections go through the DelayProxy objects.

Let's say we make a new PR with (1) only, is that possible, Or does (1) break (2)?

@ikonst
Copy link

ikonst commented Apr 14, 2023

Oh, it's not passing tests yet.

@kristjanvalur
Copy link
Contributor Author

I"ve rebased it into three parts.

  • First part is my original fix, to the regression caused by my changes, with the optional "disconnect_on_error" toggle.
  • Second part undoes Minor fixes for #2666 and enhanced async test #2673 which was an attempt to solve the same issue.
  • Third part is updates and fixes to the cwe_404 unit test, spelled out in individual commits for clarity

Let's say we make a new PR with (1) only, is that possible, Or does (1) break (2)?

(2) indeed breaks 1, that's how I became aware of it #2673 in the first place. It makes it impossible to cancel operations.

We don't want (2). It is not the right way to do things. asyncio.shield() is not meant for this usage. We don't want to wrap every operation in a Task (performance). We don't want to acquire and release asyncio.Lock objects in different Tasks (deadlocks). We want to be able to send tasks signals (timeouts, cancels). We don't want unexpected and asymmetric programming constructs in the code without any comment or reason. You mentioned someone looking at the code in one-year's time. Imagine coming across this:

if self.single_connection_client:
await self._single_conn_lock.acquire()
return await asyncio.shield(
self._try_send_command_parse_response(conn, *args, **options)
)

@ikonst
Copy link

ikonst commented Apr 18, 2023

@kristjanvalur I tried reading #2666 and it wasn't immediately obvious from the PR description what's the issue it's solving (for starters, there was no issue linked).

If I understand correctly (I have only passing familiarity with asyncio) asyncio.shield suppresses propagation of cancels. Guided purely by intuition, it does does smell. Maybe even it should be reverted. But... Chesterton's Fence! — let's understand why users wanted #2666 and maybe look for a way to address that? @chayim, can shed some light on the motivation for #2666?

(Right now it's a bit "I wanted to build something else, something good, and your rotten fence was in my way", and I'm saying that as the person who started the saga with this PR. Happy to jump on a call. I really think you're putting plenty of effort into this, and want to make sure this can bear fruit.)

@kristjanvalur
Copy link
Contributor Author

I don't think Chesterton's fence applies.
Please allow me to re-state the situation, again.

  1. Last year I made a series of simplifications and changes to the async/client. One of them was to make it simpler to use subscriptions and I stumbled across code which indiscriminately closed connections when handling BaseExceptions. I changed that code in good faith, because BaseExceptions normally should be handled. This was pr Catch Exception and not BaseException in the Connection #2104
  2. Later, a defect BaseException at I/O corrupts Connection #2499 was opened where it turned out that due to how the connections are pooled, this change broke behaviour. Connections might get returned to the pool in an incorrect state. Much was discussed, and I realized that I had been wrong, it is okay and prudent to close connections in this library when something happens.
  3. I provided a fix to BaseException at I/O corrupts Connection #2499. This was pr Revert #2104, provide way to disable disconnects in read_response() #2506. It reverts Catch Exception and not BaseException in the Connection #2104, but adds a flag, allowing connections to stay open on error. This is to support timeouts and retries in subscriptions, which was the original reason I did the change.
  4. Three months pass, and the fix does not go in.
  5. Another issue shows up. It is the same issue as BaseException at I/O corrupts Connection #2499, just differently stated. There is a lot of activity and a series of experimental fixes are made, while Revert #2104, provide way to disable disconnects in read_response() #2506 just sits there, a fix to my original regression, which puts things back the way they were. No one notifies me about this, even though I've been very active in restructuring the async connection classes for the past year and would have been the obvious person to blame.
  6. Finally a solution is put in place, using asyncio.shield(), And the original bug is still there. The fix is supplemented with unit tests which are buggy (there is a missing await), slow and ineffective, and it will cause deadlocks if one attempts to perform cancellations in some cases.
  7. I become aware of this situation when my unittest, which is designed to trigger exactly this sort of problem with a cancel, deadlocks because of the shield() code. This was, while I was updating my PR, for the umpteenth time, to try to keep it in sync until some maintainer might feel the need to look at it.
  8. I look at the unittests, realize they are broken, fix them, undo the damage done to the async code, and verify that my original PR fixes also this issue.
  9. I create a new PR out of the old one to try to point out: Look, the fix was here all along! No need for any of this! and I fixed your unit tests, too, btw.

It would be nice for the maintainers to actually show interest in my efforts here. Honestly, I'm trying to help.

@ikonst
Copy link

ikonst commented Apr 18, 2023

I think @chayim should chime in at this point, as he's the author of #2666 and IIUC a maintainer.

@chayim
Copy link
Contributor

chayim commented Apr 23, 2023

Folks, apologies on delays. Life definitely has a way to get in the way.

First off @kristjanvalur Please don't think there's a lack of interest! I'm hoping you can instead view it through the lens of way too many things going on, plus holidays, plus RESP3. But you're right, it's time to pick up the ball.

Frankly, I'll be the first to admit that my async abilities are sub optimal. Regrettably, other things took priority, as happens. With the upcoming beta of 5.0.0b2 ... @dvora-h will free up a bit, and can return to contributing, reviews, etc.

Now getting back to this you're bang on the asyncio.shield change is absolutely a bandaid, in order to get things secure, and out. It's not ideal - but at the time, with what was known, I felt (perhaps incorrectly) that this was the right move forward.

Now, back to the PR. I'm a big fan of small, surgical PRs - even if this isn't always the case. To wit, I'm more than happy to rip out the unit tests test_cwe_404.py, especially once there's a full, crafted fix merged. But tl;dr I support the collapsed PR approach.

Comment on lines 3026 to 3036
if sys.version_info.major >= 3 and sys.version_info.minor >= 11:
asyncio.current_task().uncancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks highly dubious; at least it should never ever be necessary here. Is this a workaround for the stdlib bug fixed in v3.11.3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is necessary in current Python if you decide catch and ignore a Cancel request.
I am the author of pr python/cpython#102815 which fixed the Timeout issue in 3.11.3, and among other things, that PR clarifies this use case. Please see https://github.com/python/cpython/blob/main/Doc/library/asyncio-task.rst#task-cancellation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see now. This is necessary since you're catching CancelledError above.

Copy link
Collaborator

@dvora-h dvora-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Just one small question...

@@ -827,7 +831,9 @@ async def can_read_destructive(self):
async def read_response(
self,
disable_decoding: bool = False,
*,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you add this asterisk? I prefer not to break... (if someone is using something like read_response(False, 0.5))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it was an afterghought. I was the one who added the timeout argument originally and in retrospect it probably should have been a keyword-only argument. Since it was my addition, I thought changing it retro-actively to a kw-only would be ok (there was only a two-month inverval between those two changes). "timeout" is traditionally a kw-only arg these days.
I guess one should be careful with these things, so I´m moving the asterisk :)

@kristjanvalur
Copy link
Contributor Author

By the way, we could process pr #2706 first. which is a fix for the already existing tests, and then rebase this on top.

@kristjanvalur
Copy link
Contributor Author

I've rebased this now that #2706 was merged.

@dvora-h
Copy link
Collaborator

dvora-h commented May 7, 2023

Thanks @kristjanvalur !

@chayim I think this is ready for merge

@dvora-h dvora-h added the bug Bug label May 7, 2023
@dvora-h dvora-h merged commit c0833f6 into redis:master May 8, 2023
@kristjanvalur
Copy link
Contributor Author

Yay. Let's hope it fixes some of the other issues we've seen 🌵

@kristjanvalur kristjanvalur deleted the kristjan/interrupt-2 branch May 8, 2023 10:14
dvora-h added a commit that referenced this pull request Jul 3, 2023
* fix: do not use asyncio's timeout lib before 3.11.2 (#2659)

There's an issue in asyncio's timeout lib before 3.11.3 that causes
async calls to raise `CancelledError`.

This is a cpython issue that was fixed in this commit [1] and
cherry-picked to previous versions, meaning 3.11.3 will work correctly.

Check [2] for more info.

[1] python/cpython@04adf2d
[2] #2633

* UnixDomainSocketConnection missing constructor argument (#2630)

* removing useless files (#2642)

* Fix issue 2660: PytestUnraisableExceptionWarning from asycio client (#2669)

* Fixing cancelled async futures (#2666)

Co-authored-by: James R T <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* Fix async (#2673)

* Version 4.5.4 (#2674)

* Really do not use asyncio's timeout lib before 3.11.2 (#2699)

4802530 made async-timeout required
only on Python 3.11.2 and earlier. However, according to PEP-508,
python_version marker is compared to first two numbers of Python version
tuple - so it will evaluate to True also on 3.11.3, and install a
package as a dependency.

* asyncio: Fix memory leak caused by hiredis (#2693) (#2694)

* Update example of Redisearch creating index (#2703)

When creating index, fields should be passed inside an iterable (e.g. list or tuple)

* Improving Vector Similarity Search Example (#2661)

* update vss docs

* add embeddings creation and storage examples

* update based on feedback

* fix version and link

* include more realistic search examples and clean up indices

* completely remove initial cap reference

---------

Co-authored-by: Chayim <[email protected]>

* Fix incorrect usage of once flag in async Sentinel (#2718)

In the execute_command of the async Sentinel, the once flag was being
used incorrectly, with its meaning inverted. To fix we just needed to invert
the if and else bodies. This isn't being caught by the tests currently
because the tests of commands that use this flag do not check their
results/effects (for example the "test_ckquorum" test).

* Fix topk list example. (#2724)

* Improve error output for master discovery (#2720)

Make MasterNotFoundError exception more precise in the case of
ConnectionError and TimeoutError to help the user to identify
configuration errors

Co-authored-by: Marc Schöchlin <[email protected]>

* return response in case of KeyError (#2628)

* return response in case of KeyError

* fix code linters error

* fix linters 2

* fix linters 3

* Add WITHSCORES to ZREVRANK Command (#2725)

* add withscores to zrevrank

* change 0 -> 2

* fix errors

* split test

* Fix `ClusterCommandProtocol` not itself being marked as a protocol (#2729)

* Fix `ClusterCommandProtocol` not itself being marked as a protocol

* Update CHANGES

* Fix potential race condition during disconnection (#2719)

When the disconnect() function is called twice in parallel it is possible that
one thread deletes the self._sock reference, while the other thread will
attempt to call .close() on it, leading to an AttributeError.

This situation can routinely be encountered by closing the connection in a
PubSubWorkerThread error handler in a blocking thread (ie. with
sleep_time==None), and then calling .close() on the PubSub object.
The main thread will then run into the disconnect() function, and the listener
thread is woken up by the closure and will race into the disconnect()
function, too.

This can be fixed easily by copying the object reference before doing the
None-check, similar to what we do in the redis.client.close() function.

* add "address_remap" feature to RedisCluster (#2726)

* add cluster "host_port_remap" feature for asyncio.RedisCluster

* Add a unittest for asyncio.RedisCluster

* Add host_port_remap to _sync_ RedisCluster

* add synchronous tests

* rename arg to `address_remap` and take and return an address tuple.

* Add class documentation

* Add CHANGES

* nermina changes from NRedisStack (#2736)

* Updated AWS Elasticache IAM Connection Example (#2702)

Co-authored-by: Nick Gerow <[email protected]>

* pinning urllib3 to fix CI (#2748)

* Add RedisCluster.remap_host_port, Update tests for CWE 404 (#2706)

* Use provided redis address. Bind to IPv4

* Add missing "await" and perform the correct test for pipe eimpty

* Wait for a send event, rather than rely on sleep time. Excpect cancel errors.

* set delay to 0 except for operation we want to cancel
This speeds up the unit tests considerably by eliminating unnecessary delay.

* Release resources in test

* Fix cluster test to use address_remap and multiple proxies.

* Use context manager to manage DelayProxy

* Mark failing pipeline tests

* lint

* Use a common "master_host" test fixture

* Update redismodules.rst (#2747)

Co-authored-by: dvora-h <[email protected]>

* Add support for cluster myshardid (#2704)

* feat: adding support for cluster myshardid

* lint fix

* fix: comment fix and async test

* fix: adding version check

* fix lint:

* linters

---------

Co-authored-by: Anuragkillswitch <[email protected]>
Co-authored-by: dvora-h <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* clean warnings (#2731)

* fix parse_slowlog_get (#2732)

* Optionally disable disconnects in read_response (#2695)

* Add regression tests and fixes for issue #1128

* Fix tests for resumable read_response to use "disconnect_on_error"

* undo prevision fix attempts in async client and cluster

* re-enable cluster test

* Suggestions from code review

* Add CHANGES

* Add client no-touch (#2745)

* Add client no-touch

* Update redis/commands/core.py

Co-authored-by: dvora-h <[email protected]>

* Update test_commands.py

Improve test_client_no_touch

* Update test_commands.py

Add async version test case

* Chore remove whitespace

Oops

---------

Co-authored-by: dvora-h <[email protected]>

* fix create single_connection_client from url (#2752)

* Fix `xadd` allow non negative maxlen (#2739)

* Fix xadd allow non negative maxlen

* Update change log

---------

Co-authored-by: dvora-h <[email protected]>

* Version 4.5.5 (#2753)

* Kristjan/issue #2754: Add missing argument to SentinelManagedConnection.read_response() (#2756)

* Increase timeout for a test which would hang completely if failing.
Timeouts in virtualized CI backends can occasionally fail if too short.

* add "disconnect_on_error" argument to SentinelManagedConnection

* update Changes

* lint

* support JSON.MERGE Command (#2761)

* support JSON.MERGE Command

* linters

* try with abc instead person

* change @skip_ifmodversion_lt to latest ReJSON 2.4.7

* change version

* fix test

* linters

* add async test

* Issue #2749: Remove unnecessary __del__ handlers (#2755)

* Remove unnecessary __del__ handlers
There normally should be no logic attached to del.  Cleanly disconnecting network resources is not needed at that time.

* add CHANGES

* Add WITHSCORE to ZRANK (#2758)

* add withscore to zrank with tests

* fix test

* Fix JSON.MERGE Summary (#2786)

* Fix JSON.MERGE Summary

* linters

* Fixed key error in parse_xinfo_stream (#2788)

* insert newline to prevent sphinx from assuming code block (#2796)

* Introduce OutOfMemoryError exception for Redis write command rejections due to OOM errors (#2778)

* expose OutOfMemoryError as explicit exception type

- handle "OOM" error code string by raising explicit
  exception type instance
- enables callers to avoid string matching after
  catching ResponseError

* add OutOfMemoryError exception class docstring

* Provide more info in the exception docstring

* Fix formatting

* Again

* linters

---------

Co-authored-by: Chayim <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* Add unit tests for the `connect` method of all Redis connection classes (#2631)

* tests: move certificate discovery to a separate module

* tests: add 'connect' tests for all Redis connection classes

---------

Co-authored-by: dvora-h <[email protected]>

* Fix dead weakref in sentinel connection causing ReferenceError (#2767) (#2771)

* Fix dead weakref in sentinel conn (#2767)

* Update CHANGES

---------

Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* chore(documentation): fix redirects and some small cleanups (#2801)

* Add waitaof (#2760)

* Add waitaof

* Update test_commands.py

add test_waitaof

* Update test_commands.py

Add test_waitaof

* Fix doc string

---------

Co-authored-by: Chayim <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>

* Extract abstract async connection class (#2734)

* make 'socket_timeout' and 'socket_connect_timeout' equivalent for TCP and UDS connections

* abstract asynio connection in analogy with the synchronous connection

---------

Co-authored-by: dvora-h <[email protected]>

* Fix type hint for retry_on_error in async cluster (#2804)

* fix(asyncio.cluster): fixup retry_on_error type hint

This parameter accepts a list of _classes of Exceptions_, not a list of instantiated Exceptions. Fixup the type hint accordingly.

* chore: update changelog

---------

Co-authored-by: dvora-h <[email protected]>

* Fix CI (#2809)

* Support JSON.MSET Command (#2766)

* support JSON.MERGE Command

* linters

* try with abc instead person

* change @skip_ifmodversion_lt to latest ReJSON 2.4.7

* change version

* fix test

* linters

* add async test

* Support JSON.MSET command

* trying to run CI

* linters

* add async test

* reminder do delete the integration changes

* delete the line from integration

* fix the interface

* change docstring

---------

Co-authored-by: Chayim <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* Version 4.6.0 (#2810)

* master changes

* linters

* fix test_cwe_404 cluster test

---------

Co-authored-by: Thiago Bellini Ribeiro <[email protected]>
Co-authored-by: woutdenolf <[email protected]>
Co-authored-by: Chayim <[email protected]>
Co-authored-by: shacharPash <[email protected]>
Co-authored-by: James R T <[email protected]>
Co-authored-by: Mirek Długosz <[email protected]>
Co-authored-by: Oran Avraham <[email protected]>
Co-authored-by: mzdehbashi-github <[email protected]>
Co-authored-by: Tyler Hutcherson <[email protected]>
Co-authored-by: Felipe Machado <[email protected]>
Co-authored-by: AYMEN Mohammed <[email protected]>
Co-authored-by: Marc Schöchlin <[email protected]>
Co-authored-by: Marc Schöchlin <[email protected]>
Co-authored-by: Avasam <[email protected]>
Co-authored-by: Markus Gerstel <[email protected]>
Co-authored-by: Kristján Valur Jónsson <[email protected]>
Co-authored-by: Nick Gerow <[email protected]>
Co-authored-by: Nick Gerow <[email protected]>
Co-authored-by: Cristian Matache <[email protected]>
Co-authored-by: Anurag Bandyopadhyay <[email protected]>
Co-authored-by: Anuragkillswitch <[email protected]>
Co-authored-by: Seongchuel Ahn <[email protected]>
Co-authored-by: Alibi <[email protected]>
Co-authored-by: Smit Parmar <[email protected]>
Co-authored-by: Brad MacPhee <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: Shahar Lev <[email protected]>
Co-authored-by: Vladimir Mihailenco <[email protected]>
Co-authored-by: Kevin James <[email protected]>
dvora-h added a commit that referenced this pull request Jul 16, 2023
* Reorganizing the parsers code, and add support for RESP3 (#2574)

* Reorganizing the parsers code

* fix build package

* fix imports

* fix flake8

* add resp to Connection class

* core commands

* python resp3 parser

* pipeline

* async resp3 parser

* some asymc tests

* resp3 parser for async cluster

* async commands tests

* linters

* linters

* linters

* fix ModuleNotFoundError

* fix tests

* fix assert_resp_response_in

* fix command_getkeys in cluster

* fail-fast false

* version

---------

Co-authored-by: Chayim I. Kirshen <[email protected]>

* Fix async client with resp3 (#2657)

* Add support for PubSub with RESP3 parser (#2721)

* add resp3 pubsub

* linters

* _set_info_logger func

* async pubsun

* docstring

* 5.0.0b2 (#2723)

* Fix `COMMAND` response in resp3 (redis 7+) (#2740)

* Fix protocol version checking (#2737)

* bumping beta version to 5.0.0b3 (#2743)

* Fix parse resp3 dict response: don't use dict comprehension (#2757)

* Fix parse respp3 dict response

* linters

* pin urlib version

* Sharded pubsub (#2762)

* sharded pubsub

* sharded pubsub

Co-authored-by: Leibale Eidelman <[email protected]>

* Shrded Pubsub TestPubSubSubscribeUnsubscribe

* fix TestPubSubSubscribeUnsubscribe

* more tests

* linters

* TestPubSubSubcommands

* fix @leibale comments

* linters

* fix @chayim comments

---------

Co-authored-by: Leibale Eidelman <[email protected]>

* 5.0.0b4 (#2781)

Co-authored-by: dvora-h <[email protected]>

* RESP3 tests (#2780)

* fix command response in resp3

* linters

* acl_log & acl_getuser

* client_info

* test_commands and test_asyncio/test_commands

* fix test_command_parser

* fix asyncio/test_connection/test_invalid_response

* linters

* all the tests

* push handler sharded pubsub

* Use assert_resp_response wherever possible

* fix test_xreadgroup

* fix cluster_zdiffstore and cluster_zinter

* fix review comments

* fix review comments

* linters

* Fixing asyncio import (#2759)

* asyncio import fix

* pinning urllib3 to fix CI (#2748)

* noqa

* fixint linters

* fix (#2799)

* RESP3 response callbacks (#2798)

* start cleaning

* clean sone callbacks

* response callbacks

* revert redismod-url change

* fix async tests

* linters

* async cluster

---------

Co-authored-by: Chayim <[email protected]>

* RESP3 modules support (#2803)

* start cleaning

* clean sone callbacks

* response callbacks

* modules

* tests

* finish sync search tests

* linters

* async modules

* linters

* revert redismod-url change

* RESP3 fix async tests (#2806)

* fix tests

* add stralgo callback in resp2

* add callback to acl list in resp2

* Adding RESP3 tests support (#2793)

* start cleaning

* clean sone callbacks

* first phase

* tox wrap back

* changing cancel format

* syntax

* lint

* docker

* contain the docker

* tox dev reqs

* back to testing

* response callbacks

* protocol into async conftest

* fix for 3.11 invoke

* docker changes

* fix tests

* linters

* adding

* resp3 tox, until killed

* remove tox

* tests

* requirements.txt

* restoring requirements.txt

* adding a sleep, hopefully enough time for the cluster dockers to settle

* fix search tests

* search test, disable uvloop for pypy due to bug

* syn

* reg

* dialect test improvement

* sleep+, xfail

* tests

* resp

* flaky search test too

* timing

* timing for async test

* test changes

* fix assert_interval_advanced

* revert

* mark async health_check tests with xfail

* change strict to false

* fix github actions package validation

---------

Co-authored-by: dvora-h <[email protected]>

* change sismember return type (#2813)

* Version 5.0.0rc1 (#2815)

* Merge master to 5.0 (#2827)

* fix: do not use asyncio's timeout lib before 3.11.2 (#2659)

There's an issue in asyncio's timeout lib before 3.11.3 that causes
async calls to raise `CancelledError`.

This is a cpython issue that was fixed in this commit [1] and
cherry-picked to previous versions, meaning 3.11.3 will work correctly.

Check [2] for more info.

[1] python/cpython@04adf2d
[2] #2633

* UnixDomainSocketConnection missing constructor argument (#2630)

* removing useless files (#2642)

* Fix issue 2660: PytestUnraisableExceptionWarning from asycio client (#2669)

* Fixing cancelled async futures (#2666)

Co-authored-by: James R T <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* Fix async (#2673)

* Version 4.5.4 (#2674)

* Really do not use asyncio's timeout lib before 3.11.2 (#2699)

4802530 made async-timeout required
only on Python 3.11.2 and earlier. However, according to PEP-508,
python_version marker is compared to first two numbers of Python version
tuple - so it will evaluate to True also on 3.11.3, and install a
package as a dependency.

* asyncio: Fix memory leak caused by hiredis (#2693) (#2694)

* Update example of Redisearch creating index (#2703)

When creating index, fields should be passed inside an iterable (e.g. list or tuple)

* Improving Vector Similarity Search Example (#2661)

* update vss docs

* add embeddings creation and storage examples

* update based on feedback

* fix version and link

* include more realistic search examples and clean up indices

* completely remove initial cap reference

---------

Co-authored-by: Chayim <[email protected]>

* Fix incorrect usage of once flag in async Sentinel (#2718)

In the execute_command of the async Sentinel, the once flag was being
used incorrectly, with its meaning inverted. To fix we just needed to invert
the if and else bodies. This isn't being caught by the tests currently
because the tests of commands that use this flag do not check their
results/effects (for example the "test_ckquorum" test).

* Fix topk list example. (#2724)

* Improve error output for master discovery (#2720)

Make MasterNotFoundError exception more precise in the case of
ConnectionError and TimeoutError to help the user to identify
configuration errors

Co-authored-by: Marc Schöchlin <[email protected]>

* return response in case of KeyError (#2628)

* return response in case of KeyError

* fix code linters error

* fix linters 2

* fix linters 3

* Add WITHSCORES to ZREVRANK Command (#2725)

* add withscores to zrevrank

* change 0 -> 2

* fix errors

* split test

* Fix `ClusterCommandProtocol` not itself being marked as a protocol (#2729)

* Fix `ClusterCommandProtocol` not itself being marked as a protocol

* Update CHANGES

* Fix potential race condition during disconnection (#2719)

When the disconnect() function is called twice in parallel it is possible that
one thread deletes the self._sock reference, while the other thread will
attempt to call .close() on it, leading to an AttributeError.

This situation can routinely be encountered by closing the connection in a
PubSubWorkerThread error handler in a blocking thread (ie. with
sleep_time==None), and then calling .close() on the PubSub object.
The main thread will then run into the disconnect() function, and the listener
thread is woken up by the closure and will race into the disconnect()
function, too.

This can be fixed easily by copying the object reference before doing the
None-check, similar to what we do in the redis.client.close() function.

* add "address_remap" feature to RedisCluster (#2726)

* add cluster "host_port_remap" feature for asyncio.RedisCluster

* Add a unittest for asyncio.RedisCluster

* Add host_port_remap to _sync_ RedisCluster

* add synchronous tests

* rename arg to `address_remap` and take and return an address tuple.

* Add class documentation

* Add CHANGES

* nermina changes from NRedisStack (#2736)

* Updated AWS Elasticache IAM Connection Example (#2702)

Co-authored-by: Nick Gerow <[email protected]>

* pinning urllib3 to fix CI (#2748)

* Add RedisCluster.remap_host_port, Update tests for CWE 404 (#2706)

* Use provided redis address. Bind to IPv4

* Add missing "await" and perform the correct test for pipe eimpty

* Wait for a send event, rather than rely on sleep time. Excpect cancel errors.

* set delay to 0 except for operation we want to cancel
This speeds up the unit tests considerably by eliminating unnecessary delay.

* Release resources in test

* Fix cluster test to use address_remap and multiple proxies.

* Use context manager to manage DelayProxy

* Mark failing pipeline tests

* lint

* Use a common "master_host" test fixture

* Update redismodules.rst (#2747)

Co-authored-by: dvora-h <[email protected]>

* Add support for cluster myshardid (#2704)

* feat: adding support for cluster myshardid

* lint fix

* fix: comment fix and async test

* fix: adding version check

* fix lint:

* linters

---------

Co-authored-by: Anuragkillswitch <[email protected]>
Co-authored-by: dvora-h <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* clean warnings (#2731)

* fix parse_slowlog_get (#2732)

* Optionally disable disconnects in read_response (#2695)

* Add regression tests and fixes for issue #1128

* Fix tests for resumable read_response to use "disconnect_on_error"

* undo prevision fix attempts in async client and cluster

* re-enable cluster test

* Suggestions from code review

* Add CHANGES

* Add client no-touch (#2745)

* Add client no-touch

* Update redis/commands/core.py

Co-authored-by: dvora-h <[email protected]>

* Update test_commands.py

Improve test_client_no_touch

* Update test_commands.py

Add async version test case

* Chore remove whitespace

Oops

---------

Co-authored-by: dvora-h <[email protected]>

* fix create single_connection_client from url (#2752)

* Fix `xadd` allow non negative maxlen (#2739)

* Fix xadd allow non negative maxlen

* Update change log

---------

Co-authored-by: dvora-h <[email protected]>

* Version 4.5.5 (#2753)

* Kristjan/issue #2754: Add missing argument to SentinelManagedConnection.read_response() (#2756)

* Increase timeout for a test which would hang completely if failing.
Timeouts in virtualized CI backends can occasionally fail if too short.

* add "disconnect_on_error" argument to SentinelManagedConnection

* update Changes

* lint

* support JSON.MERGE Command (#2761)

* support JSON.MERGE Command

* linters

* try with abc instead person

* change @skip_ifmodversion_lt to latest ReJSON 2.4.7

* change version

* fix test

* linters

* add async test

* Issue #2749: Remove unnecessary __del__ handlers (#2755)

* Remove unnecessary __del__ handlers
There normally should be no logic attached to del.  Cleanly disconnecting network resources is not needed at that time.

* add CHANGES

* Add WITHSCORE to ZRANK (#2758)

* add withscore to zrank with tests

* fix test

* Fix JSON.MERGE Summary (#2786)

* Fix JSON.MERGE Summary

* linters

* Fixed key error in parse_xinfo_stream (#2788)

* insert newline to prevent sphinx from assuming code block (#2796)

* Introduce OutOfMemoryError exception for Redis write command rejections due to OOM errors (#2778)

* expose OutOfMemoryError as explicit exception type

- handle "OOM" error code string by raising explicit
  exception type instance
- enables callers to avoid string matching after
  catching ResponseError

* add OutOfMemoryError exception class docstring

* Provide more info in the exception docstring

* Fix formatting

* Again

* linters

---------

Co-authored-by: Chayim <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* Add unit tests for the `connect` method of all Redis connection classes (#2631)

* tests: move certificate discovery to a separate module

* tests: add 'connect' tests for all Redis connection classes

---------

Co-authored-by: dvora-h <[email protected]>

* Fix dead weakref in sentinel connection causing ReferenceError (#2767) (#2771)

* Fix dead weakref in sentinel conn (#2767)

* Update CHANGES

---------

Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* chore(documentation): fix redirects and some small cleanups (#2801)

* Add waitaof (#2760)

* Add waitaof

* Update test_commands.py

add test_waitaof

* Update test_commands.py

Add test_waitaof

* Fix doc string

---------

Co-authored-by: Chayim <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>

* Extract abstract async connection class (#2734)

* make 'socket_timeout' and 'socket_connect_timeout' equivalent for TCP and UDS connections

* abstract asynio connection in analogy with the synchronous connection

---------

Co-authored-by: dvora-h <[email protected]>

* Fix type hint for retry_on_error in async cluster (#2804)

* fix(asyncio.cluster): fixup retry_on_error type hint

This parameter accepts a list of _classes of Exceptions_, not a list of instantiated Exceptions. Fixup the type hint accordingly.

* chore: update changelog

---------

Co-authored-by: dvora-h <[email protected]>

* Fix CI (#2809)

* Support JSON.MSET Command (#2766)

* support JSON.MERGE Command

* linters

* try with abc instead person

* change @skip_ifmodversion_lt to latest ReJSON 2.4.7

* change version

* fix test

* linters

* add async test

* Support JSON.MSET command

* trying to run CI

* linters

* add async test

* reminder do delete the integration changes

* delete the line from integration

* fix the interface

* change docstring

---------

Co-authored-by: Chayim <[email protected]>
Co-authored-by: dvora-h <[email protected]>

* Version 4.6.0 (#2810)

* master changes

* linters

* fix test_cwe_404 cluster test

---------

Co-authored-by: Thiago Bellini Ribeiro <[email protected]>
Co-authored-by: woutdenolf <[email protected]>
Co-authored-by: Chayim <[email protected]>
Co-authored-by: shacharPash <[email protected]>
Co-authored-by: James R T <[email protected]>
Co-authored-by: Mirek Długosz <[email protected]>
Co-authored-by: Oran Avraham <[email protected]>
Co-authored-by: mzdehbashi-github <[email protected]>
Co-authored-by: Tyler Hutcherson <[email protected]>
Co-authored-by: Felipe Machado <[email protected]>
Co-authored-by: AYMEN Mohammed <[email protected]>
Co-authored-by: Marc Schöchlin <[email protected]>
Co-authored-by: Marc Schöchlin <[email protected]>
Co-authored-by: Avasam <[email protected]>
Co-authored-by: Markus Gerstel <[email protected]>
Co-authored-by: Kristján Valur Jónsson <[email protected]>
Co-authored-by: Nick Gerow <[email protected]>
Co-authored-by: Nick Gerow <[email protected]>
Co-authored-by: Cristian Matache <[email protected]>
Co-authored-by: Anurag Bandyopadhyay <[email protected]>
Co-authored-by: Anuragkillswitch <[email protected]>
Co-authored-by: Seongchuel Ahn <[email protected]>
Co-authored-by: Alibi <[email protected]>
Co-authored-by: Smit Parmar <[email protected]>
Co-authored-by: Brad MacPhee <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: Shahar Lev <[email protected]>
Co-authored-by: Vladimir Mihailenco <[email protected]>
Co-authored-by: Kevin James <[email protected]>

* RESP3 response-callbacks cleanup (#2841)

* cluenup

* sentinel callbacks

* move callbacks

* fix async cluster tests

* _parsers and import fix in tests

* linters

* make modules callbacks private

* fix async search

* fix

---------

Co-authored-by: Chayim I. Kirshen <[email protected]>

* Version 5.0.0rc2 (#2843)

* linters

---------

Co-authored-by: Chayim I. Kirshen <[email protected]>
Co-authored-by: Chayim <[email protected]>
Co-authored-by: Leibale Eidelman <[email protected]>
Co-authored-by: Thiago Bellini Ribeiro <[email protected]>
Co-authored-by: woutdenolf <[email protected]>
Co-authored-by: shacharPash <[email protected]>
Co-authored-by: James R T <[email protected]>
Co-authored-by: Mirek Długosz <[email protected]>
Co-authored-by: Oran Avraham <[email protected]>
Co-authored-by: mzdehbashi-github <[email protected]>
Co-authored-by: Tyler Hutcherson <[email protected]>
Co-authored-by: Felipe Machado <[email protected]>
Co-authored-by: AYMEN Mohammed <[email protected]>
Co-authored-by: Marc Schöchlin <[email protected]>
Co-authored-by: Marc Schöchlin <[email protected]>
Co-authored-by: Avasam <[email protected]>
Co-authored-by: Markus Gerstel <[email protected]>
Co-authored-by: Kristján Valur Jónsson <[email protected]>
Co-authored-by: Nick Gerow <[email protected]>
Co-authored-by: Nick Gerow <[email protected]>
Co-authored-by: Cristian Matache <[email protected]>
Co-authored-by: Anurag Bandyopadhyay <[email protected]>
Co-authored-by: Anuragkillswitch <[email protected]>
Co-authored-by: Seongchuel Ahn <[email protected]>
Co-authored-by: Alibi <[email protected]>
Co-authored-by: Smit Parmar <[email protected]>
Co-authored-by: Brad MacPhee <[email protected]>
Co-authored-by: Igor Malinovskiy <[email protected]>
Co-authored-by: Shahar Lev <[email protected]>
Co-authored-by: Vladimir Mihailenco <[email protected]>
Co-authored-by: Kevin James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants