Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

B023 false alarms #269

Closed
paw-lu opened this issue Jul 2, 2022 · 24 comments · Fixed by #303 or #305
Closed

B023 false alarms #269

paw-lu opened this issue Jul 2, 2022 · 24 comments · Fixed by #303 or #305

Comments

@paw-lu
Copy link

paw-lu commented Jul 2, 2022

When upgrading, some false positives were found for B023 (implimented in #265).

Example

"""B023.py"""
def filter_values(values: list[list[int]], max_percentage: float):
    for value in values:
        filter_val = max(value) * max_percentage
        yield list(filter(lambda x: x < filter_val, value))
$ flake8 B023.py
bugbear_me.py:4:41: B023 Function definition does not bind loop variable 'filter_val' is valid, and doesn't fall into the 

A silly example here, but one where the use of filter_val is valid.

cc @Zac-HD
In the PR you mentioned some hard to detect false positives that you were okay with. Was this kind of pattern one of the ones you had in mind?

Thanks for this check btw, I think it's a good idea. I've fallen for this in the past!

@Zac-HD
Copy link
Member

Zac-HD commented Jul 2, 2022

Yep, this is one of those false alarms! I'd use a list-comprehension instead, which is both faster and IMO more readable:

def filter_values(values: list[list[int]], max_percentage: float):
    for value in values:
        filter_val = max(value) * max_percentage
        yield [x for x in values if x < filter_val]

Unfortunately it's more difficult to avoid this false alarm than you might think: for example "lambdas are OK if they're the first argument in a call to filter()" would miss bugs where the filter iterable (and therefore lambda) isn't immediately evaluated.

@paw-lu
Copy link
Author

paw-lu commented Jul 2, 2022

I'd use a list-comprehension instead, which is both faster and IMO more readable:

Yeah I agree with you there, this was a silly example to illustrate the false postive. Real example this was triggered on is more convoluted (but less silly).

Unfortunately it's more difficult to avoid this false alarm than you might think

Oh yeah, I can see where this gets annoying to impliment. Wondering outloud though if we value reducing false positives or false negatives more? As in would the rule be more useful if it only flagged an error when it was very sure of the problem (the consequence being it lets some problems slide, and we get more false negatives).

For the record, okay if we mark this as a known false positive!

@cooperlees
Copy link
Collaborator

I'm down to add known false positives into the README to help save people time or maybe even allow for someone smarter to come up with a way to handle these edge cases. Maybe even state that in the README.

@Zac-HD
Copy link
Member

Zac-HD commented Jul 3, 2022

I... don't think the real example is really a false alarm?

(terminology note: I much prefer the "false alarm"/"missed alarm" to the "false positive/negative", because it makes it much easier to tell what you're talking about, especially when the alarm is for a negative outcome.)

    for token in iterator:
        if (token_type := token.type) in open_close_pairs:
            close_tag = open_close_pairs[token_type]
            if token_type == close_tag:
                yield iter([token])
            else:
                yield itertools.chain(
                    (token,),
                    itertools.takewhile(
                        lambda token_: token_.type != close_tag, iterator
                    ),
                )

Unless you consume each yielded iterator before getting the next, you might have the wrong close tag!

Safer to explicitly bind it in lambda token_, ct=close_tag: token_.type != ct 🙂

@AbdealiLoKo
Copy link

AbdealiLoKo commented Jul 4, 2022

Not the same usecase as described here originally, but I found a false alarm with the following code:

$ cat app.py
from app import db


for name in ['a', 'b']:
    def myfunc(name):
        qry = db.query(name=name)
        db.run(qry)

    myfunc(name=name)
    qry = db.query(description__contains=name)
    db.run(qry)

And when I run flake8:

$ venv/bin/flake8 app.py
app.py:7:16: B023 Function definition does not bind loop variable 'qry'.

Basically - if I have a variable defined within the loop - but the same variable name is also within the function.
The local variable shouldn't have any problem in terms of execution ?

But - the rule thinks it is an issue

@davidism
Copy link
Contributor

davidism commented Jul 4, 2022

Just ran into one in Jinja pallets/jinja#1681 where if a file in a list of files doesn't exist, we continue the loop, otherwise we create a function that uses the file and return it, ending the loop.

for name in names:
    if not exists(name):
        continue

    def up_to_date():
        if exists(name):
            return mtime(name)

        return False

    return name, up_to_date

raise TemplateNotFound

I refactored it to something along the lines of the following, which I'm fine with, but it did take a bit to figure out how to use break/else instead.

for name in names:
    if exists(name):
        break
else:
    raise TemplateNotFound

def up_to_date():
    if exists(name):
        return mtime(name)

    return False

return name, up_to_date

@Zac-HD
Copy link
Member

Zac-HD commented Jul 4, 2022

Could also do def up_to_date(name=name): inside the loop, if you prefer.

In principle we could avoid emitting the warning if we were sure that the it only occurs in the last loop iteration; in practice I'm not volunteering to write something that complicated at the moment.

@noamkush
Copy link

noamkush commented Jul 5, 2022

Another false positive:

def foo(list_of_lists):
    for l in list_of_lists:
        priorities = calculate_priorities()
        first = min(l, key=lambda x: priorities.index(x))
        do_something(first)

Unlike filter, min (or max and also sorted) is evaluated immediately so there's no possible bug here. I think these should be easier to allow.

@Zac-HD
Copy link
Member

Zac-HD commented Jul 7, 2022

Unlike filter, min (or max and also sorted) is evaluated immediately so there's no possible bug here. I think these should be easier to allow.

It's possible that min, max, or sorted has been redefined... but in practice I agree, this would be a nice way to reduce the false-alarm rate.

RenskeW added a commit to RenskeW/cwltool that referenced this issue Jul 11, 2022
mr-c pushed a commit to common-workflow-language/cwltool that referenced this issue Jul 13, 2022
mr-c pushed a commit to common-workflow-language/cwltool that referenced this issue Jul 13, 2022
pgjones added a commit to djmattyg007/werkzeug that referenced this issue Jul 22, 2022
This may be related to
PyCQA/flake8-bugbear#269. These errors only
occur if flake8 is run with Python 3.8, which is odd.
pgjones added a commit to pallets/werkzeug that referenced this issue Jul 22, 2022
This may be related to
PyCQA/flake8-bugbear#269. These errors only
occur if flake8 is run with Python 3.8, which is odd.
@asottile-sentry
Copy link

just my 2c, this should maybe be off-by-default as it's difficult-to-impossible to know if the inner lambda / function is used in a deferred manner

there's a bunch of examples similar to this when upgrading in sentry: https://github.com/getsentry/sentry/blob/d89af1dace44c3888e7042d1c4957b9b5d514948/tests/sentry/plugins/bases/test_notify.py#L47-L58

(yes parametrize would be better, we're a bit stuck on unittest things for now 😭)

@asottile-sentry
Copy link

this example appears to trigger an unrelated false positive as well:

def get_group_backfill_attributes(caches, group, events):
    return {
        k: v
        for k, v in reduce(
            lambda data, event: merge_mappings(
                [data, {name: f(caches, data, event) for name, f in backfill_fields.items()}]
            ),
            events,
            {
                name: getattr(group, name)
                for name in set(initial_fields.keys()) | set(backfill_fields.keys())
            },
        ).items()
        if k in backfill_fields
    }

(yes the code is incomprehensible FP nonsense, but the name variable is being confused with an unrelated name variable in a different comprehension)

@ecs-jnguyen
Copy link

def main():
    numbers = list(range(5))
    for number in numbers:
        one_hundred_list = list(range(100))
        squared = next(filter(lambda x: x == number**2, one_hundred_list))
        print(squared)  # This will print 0, 1, 4, 9, 16


if __name__ == "__main__":
    main()

Using filter() with next can also create a false positive

@WhyNotHugo
Copy link

Another false positive here: https://github.com/pimutils/vdirsyncer/blob/c50eabc77e3c30b1d20cd4fa9926459e4a081dc1/tests/storage/servers/davical/__init__.py#L42

tests/storage/servers/davical/init.py:42:50: B023 Function definition does not bind loop variable 's'.

WhyNotHugo added a commit to pimutils/vdirsyncer that referenced this issue Aug 9, 2022
@marxin
Copy link

marxin commented Aug 29, 2022

My false-positive example:

def foo(forbidden_calls, forbidden_functions, strings):
    for fn in forbidden_calls:
        f = forbidden_functions[fn]
        waiver = any(map(lambda string: f['waiver_regex'].search(string), strings))

    return waiver
flake8 /tmp/demo.py 
/tmp/demo.py:4:41: B023 Function definition does not bind loop variable 'f'.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 17, 2022

I expect it'll be quite a while before I get back to this again, so here: main...Zac-HD:flake8-bugbear:B023-false-alarms

This patch fixes the false alarm on functions where a local variable shadows the loop variable, and adds regression tests for the more common key=/filter/reduce -based false alarms but doesn't fix those yet. If anyone would like to finish this and make PR, do so with my blessing!

(and @cooperlees if you're making a release anyway, cherry-pick the partial fix?)

@Zac-HD Zac-HD changed the title False positive B023 B023 false alarms Oct 17, 2022
@jakkdl
Copy link
Contributor

jakkdl commented Oct 25, 2022

I'm on it!

@jakkdl
Copy link
Contributor

jakkdl commented Oct 25, 2022

The PR should fix a bunch of the false alarms mentioned in here, but probably not all of them. Feel free to check it out

@marxin
Copy link

marxin commented Oct 26, 2022

My false-positive example:

def foo(forbidden_calls, forbidden_functions, strings):
    for fn in forbidden_calls:
        f = forbidden_functions[fn]
        waiver = any(map(lambda string: f['waiver_regex'].search(string), strings))

    return waiver

This issue is still present with the latest release and main branch.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 26, 2022

As written, you skip all but the last loop iteration, since waiver is reassigned every time! Did you mean waiver = waiver or any(...)? Other tricks to fix the warning include adding , f=f: to the lambda args, or avoiding a lambda entirely with:

for fn in forbidden_calls:
    searcher = forbidden_functions[fn]['waiver_regex'].search
    waiver = any(map(searcher, strings))

But more generally, yeah, we should handle map the same way as filter and reduce - thanks for the re-report 😅
@jakkdl mind adding that and copy-and-editing some of the existing tests? My bad for omitting them in the first place...

@Zac-HD Zac-HD reopened this Oct 26, 2022
@marxin
Copy link

marxin commented Oct 26, 2022

As written, you skip all but the last loop iteration, since waiver is reassigned every time!

Whoops, you are right, my code snippet is really suspicious.

@jakkdl
Copy link
Contributor

jakkdl commented Oct 26, 2022

Done~ 🙂

@WhyNotHugo
Copy link

This sample still fails. Shall II open a separate issue for it?

for shipment in sender.shipments:
    transaction.on_commit(lambda: update_single_me_shipment.delay(shipment.id)

@Zac-HD
Copy link
Member

Zac-HD commented Oct 27, 2022

That's not a false alarm; unless the lambdas run immediately every single one will use the last shipment.

@Zac-HD
Copy link
Member

Zac-HD commented Oct 27, 2022

Further comments to new issues please, we've fixed all the false alarms we know of.

We've also received several reports where the linter is in fact correct, so consider refactoring your code to make it happy even if you think it's probably a false alarm!

@PyCQA PyCQA locked as resolved and limited conversation to collaborators Oct 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.