Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

analyze.py started to pick up stdlib functions #160

Closed
austin3dickey opened this issue Mar 20, 2020 · 10 comments
Closed

analyze.py started to pick up stdlib functions #160

austin3dickey opened this issue Mar 20, 2020 · 10 comments
Labels
bug Something isn't working question Further information is requested

Comments

@austin3dickey
Copy link

Hi!

With the recent release, I noticed that analyze.py started classifying stdlib functions like the following as exported from python packages:

from functools import reduce
from math import ceil
from warnings import warn

If I put these at the top of my_module.py, and in my __init__.py do something like from my_module import *, these 3 functions get picked up and counted as part of the public interface (though they didn't before the release).

To be fair, these functions are exported from my package. Perhaps it would be better to do something like from functools import reduce as _reduce or just import functools in my package instead. But I was wondering if that's this tool's official stance, or if we're interested in adding functionality to discount stdlib functions like these from the exported list.

@jameslamb jameslamb added the question Further information is requested label Mar 20, 2020
@jameslamb
Copy link
Owner

ah! it was probably from #155

In my opinion, if you do from my_module import * then you really are "exporting" these, since someone could do from my_module import reduce. So I think the default behavior makes sense, although I admit I didn't think through it for this case and it could be better documented.

Have you seen that this only applies to stdlib functions? Like if you do from numpy import log does log show up in the output? I do have tests to explicitly prevent that, but I realize now that I never tested with an __init__.py that uses *

Could you tell me what you'd want the interface for "ignore stdlib functions" to look like? It could be:

  • a special case of "ignore this particular hard-coded list of object names"
  • a special case of "ignore this particular hard-coded list of packages"
  • a flag specific to the standard library which is like --ignore-stdlib where doppel-describe checks if something is from a stdlib package before adding it
  • something else

@jameslamb jameslamb added the enhancement New feature or request label Mar 20, 2020
@jameslamb jameslamb added this to the 0.4.0 (configurability) milestone Mar 20, 2020
@jayqi
Copy link
Contributor

jayqi commented Mar 20, 2020

This won't be limited to standard library functions. Any time you import something, you're adding it to the namespace of that module.

So if you have from numpy import log in my_module.py, and then my_module.log will be a available from that module. If you have import numpy, then my_module.numpy will be available from that module.

If you then do from my_module import * then all of that will end up attached to your namespace in your __init__.py.

This is going to be a pitfall of doing import *. If you really want to filter all of those out, you'd need to check __module__ attribute of objects and see where they come from. I had to do this in pkgnet: https://github.com/jayqi/pkgnet/blob/307dfaeb47be2cdda1bf56a9bf41ad8c195b264f/py-pkg/pkgnet/function_reporter.py#L87

@jameslamb
Copy link
Owner

jameslamb commented Mar 20, 2020

@jayqi sorry, to be clear I'm asking what the observed behavior with doppel-cli is, not generically how such imports work in Python.

doppel-describe does check the __module__ attribute of objects but it's a lot of branches of if-else logic so I'm wondering if everything is getting through or just stdlib.

https://github.com/jameslamb/doppel-cli/blob/master/doppel/bin/analyze.py#L135-L148

I can try to repro tonight with a simple package.

@austin3dickey
Copy link
Author

Yep, this all makes sense! Let me try to clarify my original question. Given:

  • doppel-describe doesn't count some common python patterns as part of a package's interface, like exceptions
  • from math import ceil etc. is a pretty common pattern

I thought stdlib functions qualified as one of those special exceptions (no pun intended), especially because that's how it was handled before 0.3.0. However, I agree that people could want it either way (ex. cloudpickle re-exports pickle.load on purpose) and so we'd have to provide the option to count them or not. Which gets complicated quickly, as you alluded to in your response. Perhaps it's simpler for this tool to take the official stance that "these situations count as exported functions, so be careful with * in your __init__.py."

IMO it's still valuable to try to repro, so that doppel-describe is consistent in its treatment of stdlib vs third-party functions.

@jameslamb
Copy link
Owner

Thanks for the summary! I didn't catch that anything in the 0.3.0 was changing previous behavior (other than bugfixes) because all my test packages use explicit imports (none have an __init__.py with *).

And I was inspired by examples like the cloudpickle one (the one on my mind was feather.

For anyone else arriving at this thread, the workaround if you want doppel-cli 0.3.0 to behave the same way 0.2.1 did:

  • (better, more effort) change uses of import * in __init__.py files to explicit imports like from <module> import <thing>
  • (worse, less effort) increase the --errors-allowed in doppel-test calls and wait for doppel-cli 0.3.1 to be released

@austin3dickey
Copy link
Author

Update: I removed all instances of import * from my package, and still am getting strange behavior. I can reproduce with a simple package:

setup.py
testpkg/
   __init__.py
   module.py

setup.py:

from setuptools import find_packages, setup
setup(
    name='testpkg',
    version='0.0.1',
    packages=find_packages(),
)

testpkg/__init__.py:

from .module import create_warning

testpkg/module.py:

from warnings import warn
def create_warning():
    print('uh oh')
    warn('not good')

After doing pip install -e . I am able to run doppel-describe -p testpkg -l python -d out/. But this gives

{
  "name": "testpkg [python]",
  "language": "python",
  "functions": {
    "create_warning": {
      "args": []
    },
    "warn": {
      "args": []
    }
  },
  "classes": {}
}

I would not expect warn to be there, because it's not exported in the __init__.py of the testpkg module. I think doppel-describe is looping through both the testpkg and testpkg.module modules, and collecting all top-level functions from both of them. Is this intended behavior?

@jameslamb
Copy link
Owner

@austin3dickey thanks for the producible example!

So, this is interesting. It is the current behavior that you don't have to explicitly import something in __init__.py for it to become part of the "API" from doppel-cli's perspective. I just tried with your exact setup, except changing module.py to this:

from warnings import warn
from requests import get
def create_warning():
    print('uh oh')
    warn('not good')

def shmeate_schmarning():
    print('closer to the edge')
    warn('im about to break')

And got

{
  "name": "testpkg [python]",
  "language": "python",
  "functions": {
    "create_warning": {
      "args": []
    },
    "shmeate_schmarning": {
      "args": []
    },
    "warn": {
      "args": []
    }
  },
  "classes": {}
}

shmeate_schmarning() making it through is expected, since from testpkg.module import shmeate_schmarning will still work. Currently the only way to hide something from doppel-cli's perspective is to use _ at the beginning of its name. We could discuss whether or not that is the right approach (it feels wrong) in a separate issue, but that has not changed recently.

As for imports from other packages...that's where this gets weird. Did you notice that get from from requests import get didn't make it through, but warn from from warnings import warn did? That is absolutely a bug.

I think I found the cause, too. this check is not sufficient:

# built-ins like 'min()' are not classes, functions, or modules,
# according to the previous checks, but if they're callable
# they should count as exported functions
elif _is_builtin(obj) and callable(obj):
    out["functions"][obj_name] = {
        "args": []
    }

I think it should be more like

# built-ins like 'min()' are not classes, functions, or modules,
# according to the previous checks, but if they're callable
# they should count as exported functions
elif _is_builtin(obj) and callable(obj):
    if not obj.__module__.startswith(PKG_NAME):
        _log_info("Callable '{}' is a built-in not included in this package's namespace. Skipping it.".format(obj.__name__))
        next
    out["functions"][obj_name] = {
        "args": []
    }

I'm working on a PR. Your reproducible example is just directly going to become a test case, thanks again for putting it together!

@jameslamb
Copy link
Owner

@austin3dickey ok I tried to add a fix in #161 . I'll leave it open for a a few days. Could you try doing what your'e doing from that branch and see if it fixes the issue?

If it does, I'll merge that and cut a 0.3.1 release.

@austin3dickey
Copy link
Author

Yes, your PR completely fixes it! Thanks for the quick turnaround :)

@jameslamb jameslamb added bug Something isn't working and removed enhancement New feature or request labels Mar 27, 2020
@jameslamb
Copy link
Owner

One addition to this.. @austin3dickey while working on #171 , I realized that one case is not working the way I expected. If you do something like this:

wrap_min = min

wrap_min() is not going to show up in the exported functions from doppel-describe. I didn't catch this on #161 because of a very subtle pytest bug. I had two test classes named TestPythonSpecific in the same module. The second one overrode the first, so the tests in the first one were never run. Another one of those situations, like running pytest on an empty directory, where pytest does something unexpected :(

This was found in #171 because flake8 raises a warning when you redefine something.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants