-
Notifications
You must be signed in to change notification settings - Fork 623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sklearn instrumentation package #1054
Conversation
39a8af0
to
342c88c
Compare
342c88c
to
1f4cb32
Compare
hello and welcome @crflynn! Please sign the CLA, I'll review afterwards. |
instrumentation/opentelemetry-instrumentation-sklearn/CHANGELOG.md
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
I'm not sure how to get the docs to build. Sphinx doesn't seem to want to cooperate with the type hints I've provided. I have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to get the docs to build. Sphinx doesn't seem to want to cooperate with the type hints I've provided. I have a
nitpick_ignore
for the sklearn BaseEstimator but that seems to not be enough.
I looked into this issue, apparently sphinx can't find a class for some reason. It seems like a similar issue as this one, which has a workaround here. Nevertheless, I tried it and it did not solve the documentation problem. Apparently the root cause between this issue is not the same as the one the workaround is for, since sklearn.base.BaseEstimator
is not being imported into any other module in the sklearn
for Sphinx to import from this new location. Will look further into this.
|
||
class TestSklearn(TestBase): | ||
def test_package_instrumentation(self): | ||
ski = SklearnInstrumentor(packages=["sklearn"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to pass ["sklearn"]
to the constructor here? I am under the impression that this is done by default because of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessary, no. I can remove it.
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
|
||
The base class' new method passes args and kwargs. We override because | ||
we init the class with configuration and Python raises TypeError when | ||
additional arguments are passed to the object.__new__() method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an issue in the base class implementation of the singleton mechanism, actually... will have to look into this.
for method_name in self.methods: | ||
if hasattr(klass, method_name): | ||
self._instrument_class_method( | ||
estimator=klass, method_name=method_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a concern, what if there is another instrumentation also installed for the same package that is passed as an argument for this instrumentor? Would that cause double instrumentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only way it wouldn't is if that other instrumentation package used the same strategy, i.e. resulted in a True return value from the _check_instrumented
method. Order of instrumentation might also matter in that regard.
Related, one of the things I tried to do here was abstract the span decorator as an instrumentor arg with the idea that multiple instrumentations (say otel + datadog) could be applied to estimators by just instantiating multiple instrumentors with different decorators. However, this isn't feasible with the current code because of how instrumentation is applied with the _original_xxx
attributes and the _check_instrumented
method.
After iterating for a few days I'm a bit stuck on instrumenting the library. Particularly I'm having a hard time patching methods which are class attributes (rather than instance attributes) because of the if_delegate_has_method decorator which exists on some metaestimators. This decorator acts as a conditional property which delegates if and only specific instance attributes exist. The problem is here where
I could omit these methods, as I do for properties, but I believe spanning these methods is important because they delegate to other internal estimators and it would obfuscate some of the model hierarchy if they were missing. |
Hello! I have a PR to move some files you have in this PR to the Contrib repo, please let me know if this gets merged before the PR in the Contrib repo. Please see https://github.com/open-telemetry/opentelemetry-python-contrib/pulls/ |
Is there value in having the per model instrumentation way have all the methods patched, but having the whole package instrumentation omit some of these methods? We can just update our documentations to reflect this. This way at least we have functionality for both. |
d90763f
to
de241d3
Compare
I've got a solution for the autoinstrumentation delegation problems in the latest push, which passes the tests locally. It seems though that
I think this is a lot closer now. Let me know how we should go from here. |
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Outdated
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
.../opentelemetry-instrumentation-sklearn/src/opentelemetry/instrumentation/sklearn/__init__.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the contrib.
This PR probably going to be affected the migration. |
Should I just move this over to the contrib repo? |
@crflynn Yes please! That would be super helpful. The contrib repo is ready to accept new PRs like these :) |
@NathanielRN I'll work on that later today |
@ocelotl can you review again, would like to get this merged |
Description
Provides an opentelemetry instrumentation package for sklearn models, instrumenting internal spans at the estimator level. The motivation is to provide observability into machine learning models that run for realtime predictive applications that have many complex transformers and predictors.
The instrumentor adds spans to sklearn estimators according to a set of default estimator methods (namely
fit
,predict
,predict_proba
andtransform
) and other configuration parameters that determine how spans are implemented through the model hierarchy. The default configuration also handlesPipeline
andFeatureUnion
hierarchies. Since sklearn's API is easily extended, the configuration parameters allow for custom model hierarchy traversal, allowing spans to be implemented in custom estimators as well.Type of change
How Has This Been Tested?
The package provides two tests for the implementation.
test_span_properties
uses an sklearn model fixture and asserts span names, kinds, and parent-child relationships.test_attrib_config
uses the same fixture to assert implementation of non-default configuration parameters.I also have an example implementation here: https://github.com/crflynn/opentelemetry-sklearn
Checklist: