Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standalone lib build with Python install #417

Merged
merged 1 commit into from
Feb 16, 2023

Conversation

jacobkahn
Copy link
Contributor

@jacobkahn jacobkahn commented Feb 13, 2023

Augments the Python build to also ship C++ library builds of KenLM that are dynamically usable by other C++ libs. Specifically, pip install . in the repo (or pip install git+https://github.com/kpu/kenlm, as it'll likely be most often used) now installs:

  • kenlm.cpython-[platform].{so,pyd} (as before) -- openable by Python
  • libkenlm.{so,dll,lib} [new] -- not loadable by Python but usable by C++ libraries.

In particular, this PR:

  • Moves the KENLM_MAX_ORDER CMake definition to the beginning of the build
  • Adds BuildStandalone.cmake, enabled by BUILD_PYTHON_STANDALONE, which builds libkenlm as a shared lib containing symbols from all project sources
  • Removes forcing building a -dynamiclib on macOS and let CMake handle this. MSVC will build a pyd (special DLL) anyways on Windows, so we need a cross-platform way to build shared libs that are dlopen'ed in C++ land by Python libs.
  • Adds the above custom build components to setup.py and run them with the regular cython extension build.
  • Build dependencies (setuptools, wheel, cmake) are all available from PyPI and are installed automatically.

@jacobkahn jacobkahn force-pushed the standalone_python_build_2 branch 2 times, most recently from 96ecd3d to ff55f1f Compare February 14, 2023 00:53
@jacobkahn jacobkahn marked this pull request as ready for review February 14, 2023 02:25
@jacobkahn
Copy link
Contributor Author

@kpu this should drastically reduce the number of "how do I install KenLM, I'm trying to do X" issues given that a lot of those downstream needs are coming from Python packages requiring it. Hope to merge this shortly -- let me know if you have any questions.

@kpu kpu merged commit a73dbec into kpu:master Feb 16, 2023
@ZJaume
Copy link

ZJaume commented Feb 24, 2023

@jacobkahn had you tested non-default max order like --install-option="--max_order 7" in the python installation? Build is now taking terrribly long for me. Also people is complaining about not being compiled correctly, but I'm not even able to check because it takes a lot of time to build.

May I suggest the use of scikit-build?

@jacobkahn
Copy link
Contributor Author

@ZJaume thanks for letting me know. This change doesn't affect the default Python library build at all, so all of those build options should stay exactly the same... I'll test those again now and see what's going on.

With respect to build speed, this quite literally performs the identical build twice once using the CMake version in order to produce dylib that C++ binaries can consume, so I'm not sure why build speed would suffer so significantly.

I'll spend some time on fixing this right now given those issues; I'll give scikit build a try as well. I'll tag you in the fix.

@jacobkahn
Copy link
Contributor Author

jacobkahn commented Feb 26, 2023

@ZJaume — after further investigation, the increase in build time is due to the confluence of multiple things:

  1. The use of pyproject.toml, as stipulated in PEP 517 and PEP 518, means builds occur in an isolated environment. This means that build dependencies need to be installed, which will increase the build time slightly as wheels for CMake and setuptools need to be downloaded. Isolated builds are the new standard in Python/pip, and other approaches to building will be deprecated. See related documentation.
  2. Configuring via argv/--install-option for pip builds is behavior that precludes using caching of build dependencies (Deprecate --install-options pypa/pip#11358), and is also not recommended. I was able to reproduce a significant build slowdown when using those options waiting for build dependencies to be installed.

In #420, I've changed the behavior to read from a MAX_ORDER environment variable, which is a simpler alternative to --install-option-based configuration. I've updated the documentation accordingly, which didn't document the --max-order install option as it was. If this breaks downstream dependencies, they should swap to using a PEP 517-compliant way of building and installing KenLM via pip. Let me know if you have any other problems here if you decide to switch away from the old commit and onto master.

I explored scikit-build briefly (https://github.com/jacobkahn/kenlm/tree/scikit_build_python), but had some problems. We can revisit this down the line.

cc @kpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants