Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many scancode dependencies are broken for Mac M1 #3205

Open
rnjudge opened this issue Jan 17, 2023 · 19 comments
Open

Many scancode dependencies are broken for Mac M1 #3205

rnjudge opened this issue Jan 17, 2023 · 19 comments
Labels

Comments

@rnjudge
Copy link

rnjudge commented Jan 17, 2023

Description

This is not necessarily a bug with Scancode, but a handful of the required dependencies are broken and cannot install on Mac M1 platform. This is a huge blocker for anyone using this platform as it is impossible to install scancode using pip. Perhaps the scancode docs should be updated to note that scancode is not usable on this platform?

How To Reproduce

Try to install scancode on Mac M1 computer. In my case, I am running Ubunutu VM using fusion but the same issues arise when I try to install scancode directly on my mac shell (not in a fusion VM).

(ternenv) rose@rose-vm:~/ternenv/tern$ pip install --upgrade pip setuptools wheel
(ternenv) rose@rose-vm:~/ternenv/tern$ pip install scancode-toolkit

Specific libraries with issues installing:
bitarray intbitset lxml pyahocorasick pycryptodome cffi

System configuration

  • What OS are you running on?
    MacOS Monteray 12.2.1 (M1 chip)

  • What version of scancode-toolkit was used to generate the scan file?
    Attempting to install scancode_toolkit-31.2.4-cp310-none-any.whl

  • What installation method was used to install/run scancode? (pip/source download/other)
    pip

@rnjudge
Copy link
Author

rnjudge commented Jan 17, 2023

Additionally, trying to install the [full] scancode option yields:

]ERROR: Cannot install scancode-toolkit[full]==3.1.1, scancode-toolkit[full]==3.2.0 and scancode-toolkit[full]==3.2.3 because these package versions have conflicting dependencies.

The conflict is caused by:
    scancode-toolkit[full] 3.2.3 depends on typecode-libmagic; extra == "full"
    scancode-toolkit[full] 3.2.0 depends on extractcode-7z
    scancode-toolkit[full] 3.1.1 depends on extractcode-7z

@rnjudge
Copy link
Author

rnjudge commented Jan 17, 2023

At least one of the issues I am seeing is being tracked here: #3169.

@pombredanne
Copy link
Member

@rnjudge Thanks for the report! do you know if there is a free (as in beer) hosted CI service that runs Apple ARM silicon on macOS?

@Jeeppler
Copy link

Is there a big difference between the Apple ARM M1 chip and a normal ARM based chip?

Meaning, could it be enough to make everything work on a ARM chip (e. g. Raspberry PI or AWS A1 instance). And once it is working one could test it manually on a M1 chip.

@pombredanne
Copy link
Member

@Jeeppler re:

Is there a big difference between the Apple ARM M1 chip and a normal ARM based chip?

I have no idea... but I would doubt that macOS can be installed (legally) on non-Apple hardware, so the issue may be part ARM and part macOS?
We could likely --with a bit a sweat--cross compile for ARM on X86, but I do not know if we can get a binary that would run on mac....

Though, I recall now that I released macOS fat binary wheels for https://github.com/inveniosoftware/intbitset/ and https://github.com/WojciechMula/pyahocorasick in the last few days.

Could you test if these pre-built binaries work on your machine with a pip install?

@Jeeppler
Copy link

@pombredanne I have no mac. I was just curious if the solution could be to compile the dependencies on Linux ARM and then try to run it on macOS.

@rnjudge has a macOS with M1 chip, as far as I understand. Only she can help you test.

@rnjudge
Copy link
Author

rnjudge commented Jan 20, 2023

Though, I recall now that I released macOS fat binary wheels for https://github.com/inveniosoftware/intbitset/ and https://github.com/WojciechMula/pyahocorasick in the last few days.

Could you test if these pre-built binaries work on your machine with a pip install?

I installed the latest pyahocorasick and intbitset libraries with pip but still can't install scancode because building the wheel for intbitset fails when I run pip install scancode-toolkit. What's strange is that when I install intbitset on its own, the wheel is created properly:

(ternenv) rose@rose-vm:~/ternenv/scancode-toolkit$ pip3 install intbitset
Collecting intbitset
  Using cached intbitset-3.0.2.tar.gz (152 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: intbitset
  Building wheel for intbitset (setup.py) ... done
  Created wheel for intbitset: filename=intbitset-3.0.2-cp310-cp310-linux_aarch64.whl size=322885 sha256=b70b76050f66ed1a11940a5ce4c4ddf3592c10c476bc971c364ce587b851eb5b
  Stored in directory: /home/rose/.cache/pip/wheels/bc/f1/4c/1691139855e3477383cd895bdf4a69875d0ae83119ea20d38f
Successfully built intbitset
Installing collected packages: intbitset
Successfully installed intbitset-3.0.2
      running build_ext
      building 'intbitset' extension
      creating build/temp.linux-aarch64-cpython-310
      creating build/temp.linux-aarch64-cpython-310/intbitset
      aarch64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/home/rose/ternenv/include -I/usr/include/python3.10 -c intbitset/intbitset.c -o build/temp.linux-aarch64-cpython-310/intbitset/intbitset.o -O3 -march=core2 -mtune=native
      intbitset/intbitset.c: In function ‘__pyx_pf_9intbitset_9intbitset___cinit__’:
      intbitset/intbitset.c:2267:13: warning: ‘PyObject_AsReadBuffer’ is deprecated [-Wdeprecated-declarations]
       2267 |             __pyx_t_4 = ((PyObject_AsReadBuffer(__pyx_v_tmp, (&__pyx_v_buf), (&__pyx_v_size)) < 0) != 0);
            |             ^~~~~~~~~
      In file included from /usr/include/python3.10/genobject.h:12,
                       from /usr/include/python3.10/Python.h:110,
                       from intbitset/intbitset.c:4:
      /usr/include/python3.10/abstract.h:343:17: note: declared here
        343 | PyAPI_FUNC(int) PyObject_AsReadBuffer(PyObject *obj,
            |                 ^~~~~~~~~~~~~~~~~~~~~
      intbitset/intbitset.c: In function ‘__pyx_f_9intbitset_9intbitset_fastload’:
      intbitset/intbitset.c:9543:7: warning: ‘PyObject_AsReadBuffer’ is deprecated [-Wdeprecated-declarations]
       9543 |       __pyx_t_9 = ((PyObject_AsReadBuffer(__pyx_v_tmp, (&__pyx_v_buf), (&__pyx_v_size)) < 0) != 0);
            |       ^~~~~~~~~
      /usr/include/python3.10/abstract.h:343:17: note: declared here
        343 | PyAPI_FUNC(int) PyObject_AsReadBuffer(PyObject *obj,
            |                 ^~~~~~~~~~~~~~~~~~~~~
      Assembler messages:
      Error: unknown architecture `core2'
      
      Error: unrecognized option -march=core2
      error: command '/usr/bin/aarch64-linux-gnu-gcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Rolling back uninstall of intbitset
  Moving to /home/rose/ternenv/lib/python3.10/site-packages/__pycache__/intbitset_helper.cpython-310.pyc
   from /tmp/pip-uninstall-u9t5ci_4/intbitset_helper.cpython-310.pyc
  Moving to /home/rose/ternenv/lib/python3.10/site-packages/__pycache__/intbitset_version.cpython-310.pyc
   from /tmp/pip-uninstall-u9t5ci_4/intbitset_version.cpython-310.pyc
  Moving to /home/rose/ternenv/lib/python3.10/site-packages/intbitset-3.0.2.dist-info/
   from /home/rose/ternenv/lib/python3.10/site-packages/~ntbitset-3.0.2.dist-info
  Moving to /home/rose/ternenv/lib/python3.10/site-packages/intbitset.cpython-310-aarch64-linux-gnu.so
   from /tmp/pip-uninstall-l9czslo5/intbitset.cpython-310-aarch64-linux-gnu.so
  Moving to /home/rose/ternenv/lib/python3.10/site-packages/intbitset_helper.py
   from /tmp/pip-uninstall-l9czslo5/intbitset_helper.py
  Moving to /home/rose/ternenv/lib/python3.10/site-packages/intbitset_version.py
   from /tmp/pip-uninstall-l9czslo5/intbitset_version.py
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> intbitset

@pombredanne
Copy link
Member

pombredanne commented Jan 21, 2023

@rnjudge I had NOT paid attention to all the details you brought... you are not running a macOS with an ARM chip ;) .... you are running Linux on ARM, right?

In this case this totally untested and I have no pre-built wheels for pyahocorasick or intbitset (and for also several other more common libs) but I see lxml as aarch builds.

Can you try a pip install --upgrade lxml and tell me if the wheel is fetched as-is or if it built from sources?

@rnjudge
Copy link
Author

rnjudge commented Jan 23, 2023

Apologies for not being more clear, @pombredanne! As I noted in my original issue, I do see the same install issues in both environments (M1 shell vs Ubunutu shell on M1 via Fusion) but I am most interested in getting this working for my Linux environment given that is where I do all my development for Tern.

I ran the pip upgrade command for lxml but I already had the latest version installed so it didn't try to fetch or build from source. I uninstalled lxml and re-installed it to see what it would pull and it fetched and installed the wheel as is:

(ternenv) rose@rose-vm:~/ternenv$ pip install lxml --no-cache-dir
Collecting lxml
  Downloading lxml-4.9.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.manylinux_2_24_aarch64.whl (6.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 25.4 MB/s eta 0:00:00
Installing collected packages: lxml
Successfully installed lxml-4.9.2

@stevespringett
Copy link

@pombredanne, my experience on an M1 (both native macOS and Ubuntu w/Parallels) is exactly the same as what @rnjudge experiences. Even have the same issue with intbitset.

@rnjudge
Copy link
Author

rnjudge commented May 19, 2023

@pombredanne any update on this? This is a large blocker from my POV and one that is making me rethink Tern's integration with Scancode given that there is an increasing developer community (myself included) that still cannot install Scancode :/

@pombredanne
Copy link
Member

pombredanne commented May 21, 2023

@rnjudge We support running ScanCode toolkit on macOS M1 using X86 roseta emulation alright as documented in the installation instructions.

If you have a properly installed toolchain you should be able otherwise to compile from sources intbitset and pyahocorasick and there are ways to create plugins and configure typecode and extractcode to use OS-provided binaries for these natives. This is rather involved and requires some time to spend.

You can also try to use scancode-toolkit-mini for a pip installation that has fewer dependencies (and also fewer features).

So the net is that this is not a trivial thing to do AND things are otherwise working fine using the documented roseta/X86 emulation installation when using an application archive.

To "support natively macOS on M1 and Linux on ARM" -- we would need all these covered:

    • We need someone to step up to sponsor and financially support a paid Ci/CD that provides ARM on Linux and ARM on macOS. Of note, CircleCI announced support for Apple ARM a few weeks ago: https://circleci.com/blog/m1-mac-resource-class/ in their paid plans. They seem to also have ARM support for Linux.
    • We need to pre-build and release all the native deps, including possibly a few other than are not under our control, forking these as needed. And this each time there are updates.
    • We need to adjust the Ci/CD and release scripts to also build and test on ARM for macOS and Linux and test all this to ensure this works correctly and amend the tests if needed.
    • Whoever steps in to help -- by either doing or funding -- needs to be committing for the long term as this is not just a one time effort.

A rough estimate is that this is going to take 15 to 20 days of initial setup work and about 1 day per month afterwards to maintain and support this. And the Ci/CD should to be in the $50/month range.

Since this requires quite a bit of work and demands using extra paid resources for a CI/CD, are you willing to help there in cash or in kind? I cannot see how this can happen without funding and help.

@heliocastro I know you are interested too and started some work there, @stevespringett you too? so may be there is a way to pool efforts to fund this?

@pombredanne
Copy link
Member

@rnjudge note also that we have a GSoC student selected to help in this area with a project to provide a few pure python, slower alternative for three core libraries: intbitset, pyahocorasick and lxml. This will run over the summer and is only a partial solution.

@enricozb
Copy link

responding to ping. I can't contribute. I was just trying to help solve that specific wheel issue, I'm not doing anything on M1 or ARM.

@rnjudge
Copy link
Author

rnjudge commented Jun 1, 2023

@pombredanne I was able to install scancode-toolkit-mini and will suggest this to Tern's M1 users going forward. I unfortunately can't pledge other resources right now.

@pombredanne
Copy link
Member

@rnjudge re:

I was able to install scancode-toolkit-mini and will suggest this to Tern's M1 users going forward.

This is awesome and this was designed for this use case too! Glad you got it to work.

I unfortunately can't pledge other resources right now.

that's cool too.

@jayanth-kumar-morem
Copy link

Facing the same issue,

Im on Mac M1 and I am using the ubuntu docker shell with volume mounted

docker run -it -v <path>:<path> ubuntu:22.04

Inside the docker shell, when I install the pre-requisites and run ./configure --clean && ./configure --dev, the dependecies seems to have conflicts.

Collecting regipy>=3.1.0 (from scancode-toolkit==32.0.8)
  Using cached regipy-3.1.0-py3-none-any.whl (65 kB)
INFO: pip is looking at multiple versions of scancode-toolkit[packages,testing] to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install scancode-toolkit[packages,testing]==32.0.8 because these package versions have conflicting dependencies.

The conflict is caused by:
    scancode-toolkit[packages,testing] 32.0.8 depends on packagedcode-msitools>=0.101.210706; platform_system == "Linux" and extra == "packages"
    The user requested (constraint) packagedcode-msitools==0.101.210706

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

@pombredanne
Copy link
Member

@jayanth-kumar-morem the workaround is to enable the X86 "rosetta" emulation on your shell (which is also done automatically when you use the app archive's "scancode" root script https://github.com/nexB/scancode-toolkit/blob/f70bbb7d9d9bab40a9d504e664bc945b6a1630e8/scancode#L124) or use the arch command to force a one-off x86 mode like in this script.
And of course getting a better way to automate the build of all our native deps on ARM in general would be awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants