Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency version check #76

Open
joshlk opened this issue Aug 1, 2022 · 8 comments
Open

Dependency version check #76

joshlk opened this issue Aug 1, 2022 · 8 comments

Comments

@joshlk
Copy link
Collaborator

joshlk commented Aug 1, 2022

Hey,

I have a source file that has a dependency and I want to force cppimport to recompile the source when the dependency version changes. Unfortunately the header files of the dependency can remain the same when the version changes so I can't rely on that.

I have a Python function that obtains the version of the dependency, what's the best way to force cppimport too recompile given this information? Would it be possible to add functionality so you can pass a string to be included in the checksum calculation (cppimport.checksum._calc_cur_checksum)?

@tbenthompson
Copy link
Owner

Would it be feasible to do something like this:

version_str = '1.0.1'
with open('version', 'w') as f:
    f.write(version_str)
cfg['dependencies'].append('version')

The 'dependencies' configuration variable doesn't actually do anything with the contents so the paths provided can be any sort of file you want.

@joshlk
Copy link
Collaborator Author

joshlk commented Aug 1, 2022

Yer that could work. That's not easily doable at the moment right?

@tbenthompson
Copy link
Owner

The snippet that I wrote above would work with no changes. In practice, I'd probably write out the version file as part of the process that updates dependencies. Or, do something like pip freeze > deps (substitute in the equivalent list of package versions for your setup). Then, the cfg['dependencies'].append(version_or_deps_filename) would be either in the preamble of the .cpp file or in the code that imports the file.

Does that make sense? Am I missing a piece?

@joshlk
Copy link
Collaborator Author

joshlk commented Aug 2, 2022

Ow do you mean to add this to the template comment in the source file?

But would that trigger a rebuild as the template is only read if it's determined that a rebuild is needed?

@tbenthompson
Copy link
Owner

Here's a more explicit version of what I'm imagining. Two files:

// cppimport - my_module.cpp

// you c++ code here ...

/*
<%
cfg['dependencies'].append('dependency_versions.lock')
%>
*/
# python_file.py
import cppimport
cppimport.import_hook
import my_module
# your python code here ...

This will result in a rebuild any time the dependency_versions.lock file is updated. The list of dependencies is stored in an appendix to the shared library.

Then, elsewhere, whenever you want to update the dependency versions, you write to that file. How exactly you write to it will depend on your exact application, but one application might be something like:

// update_dependencies.py
pip install -U package_names
pip freeze > dependency_versions.lock

Is this useful?

@joshlk
Copy link
Collaborator Author

joshlk commented Aug 4, 2022

Thanks - the above works but it would be better if the dependency file was generated on the fly. This code doesn't seem to work:

import hashlib
from pathlib import Path
import os
from cppimport import build_filepath

cpp_code = """// cppimport

#include <pybind11/pybind11.h>

int add(int i, int j) {
    return i + j;
}

PYBIND11_MODULE(source, m) {
    m.def("add", &add);
}

/*
<%
import random
print('Running template')
with open('versionsX', 'w') as f:
    f.write(str(random.random()))
cfg['dependencies'].append('versionsX')
setup_pybind11(cfg)
%>
*/
"""

def md5_file_hash(path: str) -> str:
    # File data + creation time + modified time
    data = Path(path).read_bytes() \
            + bytes(str(os.path.getctime(path)), 'utf8') \
            + bytes(str(os.path.getmtime(path)), 'utf8')
    return hashlib.md5(data).hexdigest()

def test_load_lib_dep_change():
    with open('source.cpp', 'w') as f:
        f.write(cpp_code)
    
    # Compile first time
    print('First compile')
    binary_path = build_filepath('source.cpp')
    assert os.path.exists(binary_path)
    binary_hash = md5_file_hash(binary_path)

    # Compile again
    print('Second compile')
    binary_path = build_filepath('source.cpp')
    assert os.path.exists(binary_path)
    assert binary_hash != md5_file_hash(binary_path)

Here the dependency file is generated in the template. From looking at the cppimport code I don't understand why the template isn't run a second time.

@tbenthompson
Copy link
Owner

Yes, it's as intended that the template is not being run a second time here. You can see the check for whether to enter the build step here:

if is_build_needed(module_data) or not try_load(module_data):

Currently, if the dependencies haven't changed and the extension seems built and importable then the expected behavior is to import the extension without re-running the template.

I'd be open to changing the behavior so that we always run the templates on import. Or at least providing an option to run the template. One thing I'd be curious about is the performance impact of running the template during every import. Are we adding 500us or 50ms to the import time in the average case?

Another related issue here is that the cppimport configuration is currently unpleasantly global. If we're adding another configuration option, it might be time to either make the configuration something that can be passed to the imp(...) function or add a built-in configuration context manager so that configuration variables can easily be set just for the duration of a single import.

@joshlk
Copy link
Collaborator Author

joshlk commented Aug 9, 2022

Ok I tested it here:

scratch.py:

import os
from cppimport.importer import is_build_needed, setup_module_data
from cppimport.templating import run_templating

cpp_code = """// cppimport

#include <pybind11/pybind11.h>

int add(int i, int j) {
    return i + j;
}

PYBIND11_MODULE(source, m) {
    m.def("add", &add);
}

/*
<%
import random
with open('versionsX', 'w') as f:
    f.write(str(0))
cfg['dependencies'].append('versionsX')
setup_pybind11(cfg)
%>
*/
"""

with open('source.cpp', 'w') as f:
    f.write(cpp_code)

filepath = os.path.abspath('source.cpp')
fullname = os.path.splitext(os.path.basename(filepath))[0]
module_data = setup_module_data(fullname, filepath)
is_build_needed(module_data)
# run_templating(module_data)   # << uncomment for comparison
# No templating
$ python3 -m timeit -n 1000 "exec(open('scratch.py').read())"
1000 loops, best of 5: 494 usec per loop

# With templating
$ python3 -m timeit -n 1000 "exec(open('scratch.py').read())"
1000 loops, best of 5: 2.04 msec per loop

So its around 10 times slower.


I think for our purpose it should be ok to write to the dependency file before importing the module. I will explore that option.

Thanks for all the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants