Observability Helper Library #23

h2floh · 2020-11-23T04:26:27Z

Implements: #17

Started from MLOpsPython Observability PR at commit and made some smaller modifications:

Added tracing features for logs and http calls
Added a "Console Logger" for local debugging/development
Added exception logging
Added custom properties to get information about the AML run into the logs/traces/exceptions
Added sampling rates and log level as env variables
Moved namespace to ml_service since it anyway depends on ml_service.util.env_variables

IMPORTANT: In order for the pipeline step scripts to be able to use the logger, the full repo needs to be made available, so the TRAIN_SCRIPT_PATH is now set to repo root ./ and all script variables need to be referenced from there e.g. ml_model/preprocessing/process_aml.py

Disclaimer

Not yet worked on or extended the unit tests
Not yet tested ML metrics logging to AML

Signed-off-by: Florian Wagner <[email protected]>

tarockey

This looks great! I really like the integration of the 3 different logging options.
One thing to consider (or maybe I missed it), would be to add the ability to specify which logger a singular message goes. For example, I generally want to log all metrics to both Azure ML and App Insights, but maybe I just want to send exceptions to App Insights and not AzureML.

Also, be careful with having the TRAIN_SCRIPT_ROOT being set to ./, as that means that every file is uploaded to AzureML for each run - it is worth digging into .amlignore, or looking into another route for accessing the logger (maybe publishing a wheel?)

h2floh · 2020-12-14T05:00:05Z

Hey @tarockey

One thing to consider (or maybe I missed it), would be to add the ability to specify which logger a singular message goes. For example, I generally want to log all metrics to both Azure ML and App Insights, but maybe I just want to send exceptions to App Insights and not AzureML.

Mainly the Azure ML logger is just a "print to console" for .log and .exception, the only difference is that log_metric will use the mechanism to record AML metrics directly within AMLS on the run. This feature is more or less as is copied from the MLOpsPython PR.

You probably want to have the exceptions logged in the AML log text files and in AppInsights. But I am open to add some flags.

Also, be careful with having the TRAIN_SCRIPT_ROOT being set to ./, as that means that every file is uploaded to AzureML for each run - it is worth digging into .amlignore, or looking into another route for accessing the logger (maybe publishing a wheel?)

Yeah, thanks that you also found that :) This is the main drawback and we are already considering packing the observability lib. Therefore we probably won't merge this PR but first refactor the repo structure into folders for different examples and tools. The planned structure is:

samples
- image-classification-tensorflow
  - .pipelines
  - docs
  - model
  - ml_service
  - ...
- object-detection-yolo
- non-pyton
- ...
common
- observability-lib <-- create a package which can be imported by the samples
- docs
  - variable management
  - security best practice
    --- ...
readme

kenakamu · 2021-01-05T05:08:27Z

ml_service/util/logger/app_insights_logger.py

+
+        # Prepare integrations and initialize tracer
+        config_integration.trace_integrations(['httplib', 'logging'])
+        texporter = AzureExporter(connection_string=self.


Is there any reason to separate lines here?

kenakamu · 2021-01-05T05:11:18Z

ml_service/util/logger/app_insights_logger.py

+        # Create AppInsights Handler and set log format
+        self.logger = logging.getLogger(__name__)
+        self.logger.setLevel(
+            getattr(logging, self.env.log_level.upper(), "WARNING"))


Just curious why default is set to WARNING when we set DEBUG/INFO in env? Do we want to set the lowest level instead?

kenakamu · 2021-01-05T05:18:27Z

ml_service/util/logger/logger_interface.py

+        environment variable  is set --> build_id
+        - Else --> generate a unique id
+
+        Sets also the custom context dimensions based on On or Offline run


on Online or Offline?

kenakamu · 2021-01-05T05:20:13Z

ml_service/util/logger/logger_interface.py

+        if not run.id.startswith(self.OFFLINE_RUN):
+            run_id = run.id
+            self.custom_dimensions = {
+                'custom_dimensions': {


Adding run number is useful when display the result as guid is not easy to identify which run it is.
Also adding workspace is useful when we use this module in multiple workspace but logging into same appinsights.

liupeirong · 2021-01-09T22:32:20Z

.pipelines/variables-template.yml

    value: flower_custom_preprocess_env

  # AML Compute Cluster Config
+  - name: AML_ENV_TRAIN_CONDA_DEP_FILE


Is this var used?

liupeirong · 2021-01-09T22:34:56Z

.pipelines/variables-template.yml

-  # - name: AML_REBUILD_ENVIRONMENT
-  #  value: "false"
+  - name: AML_REBUILD_ENVIRONMENT
+    value: "true"


We only want to rebuild the first time the dependencies are changed, right? Perhaps leave the default to false, and set the ADO variable to true?

liupeirong · 2021-01-09T22:39:07Z

ml_model/preprocessing/Dockerfile

 RUN conda install -y conda=${CONDA_VERSION} python=${PYTHON_VERSION} && \
    pip install azureml-defaults==${AZUREML_SDK_VERSION} inference-schema==${INFERENCE_SCHEMA_VERSION} &&\
+    pip install python-dotenv==0.12.* dataclasses==0.6 opencensus==0.7.11 opencensus-ext-httplib==0.7.3 \
+    opencensus-ext-azure==1.0.5 opencensus-ext-logging==0.1.0 opencensus-context==0.1.2 && \


should we use conda dependency yml to be consistent rather than directly pip install?

liupeirong · 2021-01-09T22:49:32Z

Let's pack it to a lib rather than having all samples repeat the observability code, as noted, it'll also keep the training folder smaller.

h2floh · 2021-01-11T01:53:59Z

lib rather than having all samples re

Yes, didn't had the chance to close this PR. It is outdated because we did the folder structure refactoring. We have to decide about things like account creation and management for the pip package repo. How it should be named etc.

After that we reincorporate the logging methods into the existing samples, this should be more or less copy&paste work based on the work done here. Also good for splitting up the work/PRs for each sample.

Thanks for any comments let's incorporate them into the refactored observability lib. I'll now close this PR for now.

@liupeirong @kenakamu

h2floh · 2021-02-25T02:07:06Z

Completed with #59

h2floh added the enhancement New feature or request label Nov 23, 2020

h2floh linked an issue Nov 23, 2020 that may be closed by this pull request

AML Observability Library #17

Closed

h2floh mentioned this pull request Nov 23, 2020

Mamokari/observability microsoft/MLOpsPython#307

Open

Florian Wagner added 6 commits December 3, 2020 05:10

Modified Observability added dependency tracing

e36368b

Signed-off-by: Florian Wagner <[email protected]>

Add add env vars to .env.example

df33fa7

Signed-off-by: Florian Wagner <[email protected]>

Improvements for Observability Helper

e2d38dd

Signed-off-by: Florian Wagner <[email protected]>

Update script paths

5faab7e

Signed-off-by: Florian Wagner <[email protected]>

Add observability module to scripts

3f6f317

Signed-off-by: Florian Wagner <[email protected]>

Changes to CI for observability

7893e32

Signed-off-by: Florian Wagner <[email protected]>

h2floh force-pushed the h2floh/17_logger branch from 05a83f2 to 7893e32 Compare December 3, 2020 07:39

Fix log_metrics, adapt test to singleton pattern

fcba561

Signed-off-by: Florian Wagner <[email protected]>

h2floh marked this pull request as ready for review December 4, 2020 02:43

h2floh requested a review from liupeirong December 4, 2020 02:43

Florian Wagner added 8 commits December 4, 2020 03:51

Refactor CI dependency updates

99edaa6

Signed-off-by: Florian Wagner <[email protected]>

CI dependency template fix add to train pipe

89ecf71

Signed-off-by: Florian Wagner <[email protected]>

Fix observability formatting

1d076b3

Signed-off-by: Florian Wagner <[email protected]>

Activate env rebuild for aml pipeline tests

38d11b4

Signed-off-by: Florian Wagner <[email protected]>

Fix parameters.json path reference

48d57ee

Signed-off-by: Florian Wagner <[email protected]>

Fix logger usage in train.py

8390138

Signed-off-by: Florian Wagner <[email protected]>

Fix copy/paste error - print to file

284d2b0

Signed-off-by: Florian Wagner <[email protected]>

Fix duplicate log entries, enrich custom dimension

05284f3

Signed-off-by: Florian Wagner <[email protected]>

h2floh mentioned this pull request Dec 8, 2020

Console Logger approach #29

Closed

Florian Wagner added 8 commits December 8, 2020 06:02

Add appinsights cs (secret) to env-var

293f344

Signed-off-by: Florian Wagner <[email protected]>

change appinsights env variable name

367de8f

Signed-off-by: Florian Wagner <[email protected]>

Fix log init if appinsights cs key is absent in vg

f5e2e83

Signed-off-by: Florian Wagner <[email protected]>

Extend run env with log related variables

3fcbd3e

Signed-off-by: Florian Wagner <[email protected]>

Fix c&p error in register_loggers

943c956

Signed-off-by: Florian Wagner <[email protected]>

Add missing env to build_train_pipeline step

89a9d39

Signed-off-by: Florian Wagner <[email protected]>

Optimize for AppInsights Application Map

2348bd3

Signed-off-by: Florian Wagner <[email protected]>

Move end_span to finally block

46b39f1

Signed-off-by: Florian Wagner <[email protected]>

tarockey approved these changes Dec 11, 2020

View reviewed changes

kenakamu reviewed Jan 5, 2021

View reviewed changes

liupeirong reviewed Jan 9, 2021

View reviewed changes

h2floh closed this Jan 11, 2021

h2floh deleted the h2floh/17_logger branch March 2, 2021 01:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability Helper Library #23

Observability Helper Library #23

Uh oh!

h2floh commented Nov 23, 2020 •

edited

Loading

Uh oh!

tarockey left a comment •

edited

Loading

Uh oh!

h2floh commented Dec 14, 2020

Uh oh!

kenakamu Jan 5, 2021

Uh oh!

kenakamu Jan 5, 2021

Uh oh!

kenakamu Jan 5, 2021

Uh oh!

kenakamu Jan 5, 2021

Uh oh!

liupeirong Jan 9, 2021

Uh oh!

liupeirong Jan 9, 2021

Uh oh!

liupeirong Jan 9, 2021

Uh oh!

liupeirong commented Jan 9, 2021

Uh oh!

h2floh commented Jan 11, 2021

Uh oh!

h2floh commented Feb 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Observability Helper Library #23

Observability Helper Library #23

Uh oh!

Conversation

h2floh commented Nov 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarockey left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

h2floh commented Dec 14, 2020

Uh oh!

kenakamu Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

kenakamu Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

kenakamu Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

kenakamu Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

liupeirong Jan 9, 2021

Choose a reason for hiding this comment

Uh oh!

liupeirong Jan 9, 2021

Choose a reason for hiding this comment

Uh oh!

liupeirong Jan 9, 2021

Choose a reason for hiding this comment

Uh oh!

liupeirong commented Jan 9, 2021

Uh oh!

h2floh commented Jan 11, 2021

Uh oh!

h2floh commented Feb 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

h2floh commented Nov 23, 2020 •

edited

Loading

tarockey left a comment •

edited

Loading