Skip to content
This repository was archived by the owner on Dec 2, 2021. It is now read-only.

Conversation

@h2floh
Copy link
Collaborator

@h2floh h2floh commented Nov 23, 2020

Implements: #17

Started from MLOpsPython Observability PR at commit and made some smaller modifications:

  • Added tracing features for logs and http calls
  • Added a "Console Logger" for local debugging/development
  • Added exception logging
  • Added custom properties to get information about the AML run into the logs/traces/exceptions
  • Added sampling rates and log level as env variables
  • Moved namespace to ml_service since it anyway depends on ml_service.util.env_variables

IMPORTANT: In order for the pipeline step scripts to be able to use the logger, the full repo needs to be made available, so the TRAIN_SCRIPT_PATH is now set to repo root ./ and all script variables need to be referenced from there e.g. ml_model/preprocessing/process_aml.py

Disclaimer

  • Not yet worked on or extended the unit tests
  • Not yet tested ML metrics logging to AML

@h2floh h2floh added the enhancement New feature or request label Nov 23, 2020
@h2floh h2floh linked an issue Nov 23, 2020 that may be closed by this pull request
Florian Wagner added 6 commits December 3, 2020 05:10
Signed-off-by: Florian Wagner <[email protected]>
Signed-off-by: Florian Wagner <[email protected]>
Signed-off-by: Florian Wagner <[email protected]>
Signed-off-by: Florian Wagner <[email protected]>
@h2floh h2floh marked this pull request as ready for review December 4, 2020 02:43
@h2floh h2floh requested a review from liupeirong December 4, 2020 02:43
Florian Wagner added 8 commits December 4, 2020 03:51
Signed-off-by: Florian Wagner <[email protected]>
Signed-off-by: Florian Wagner <[email protected]>
Signed-off-by: Florian Wagner <[email protected]>
Signed-off-by: Florian Wagner <[email protected]>
@h2floh h2floh mentioned this pull request Dec 8, 2020
Copy link

@tarockey tarockey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I really like the integration of the 3 different logging options.
One thing to consider (or maybe I missed it), would be to add the ability to specify which logger a singular message goes. For example, I generally want to log all metrics to both Azure ML and App Insights, but maybe I just want to send exceptions to App Insights and not AzureML.

Also, be careful with having the TRAIN_SCRIPT_ROOT being set to ./, as that means that every file is uploaded to AzureML for each run - it is worth digging into .amlignore, or looking into another route for accessing the logger (maybe publishing a wheel?)

@h2floh
Copy link
Collaborator Author

h2floh commented Dec 14, 2020

Hey @tarockey

One thing to consider (or maybe I missed it), would be to add the ability to specify which logger a singular message goes. For example, I generally want to log all metrics to both Azure ML and App Insights, but maybe I just want to send exceptions to App Insights and not AzureML.

Mainly the Azure ML logger is just a "print to console" for .log and .exception, the only difference is that log_metric will use the mechanism to record AML metrics directly within AMLS on the run. This feature is more or less as is copied from the MLOpsPython PR.

You probably want to have the exceptions logged in the AML log text files and in AppInsights. But I am open to add some flags.

Also, be careful with having the TRAIN_SCRIPT_ROOT being set to ./, as that means that every file is uploaded to AzureML for each run - it is worth digging into .amlignore, or looking into another route for accessing the logger (maybe publishing a wheel?)

Yeah, thanks that you also found that :) This is the main drawback and we are already considering packing the observability lib. Therefore we probably won't merge this PR but first refactor the repo structure into folders for different examples and tools. The planned structure is:

  • samples
    • image-classification-tensorflow
      • .pipelines
      • docs
      • model
      • ml_service
      • ...
    • object-detection-yolo
    • non-pyton
    • ...
  • common
    • observability-lib <-- create a package which can be imported by the samples
    • docs
      • variable management
      • security best practice
        --- ...
  • readme


# Prepare integrations and initialize tracer
config_integration.trace_integrations(['httplib', 'logging'])
texporter = AzureExporter(connection_string=self.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to separate lines here?

# Create AppInsights Handler and set log format
self.logger = logging.getLogger(__name__)
self.logger.setLevel(
getattr(logging, self.env.log_level.upper(), "WARNING"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious why default is set to WARNING when we set DEBUG/INFO in env? Do we want to set the lowest level instead?

environment variable is set --> build_id
- Else --> generate a unique id

Sets also the custom context dimensions based on On or Offline run
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on Online or Offline?

if not run.id.startswith(self.OFFLINE_RUN):
run_id = run.id
self.custom_dimensions = {
'custom_dimensions': {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding run number is useful when display the result as guid is not easy to identify which run it is.
Also adding workspace is useful when we use this module in multiple workspace but logging into same appinsights.

value: flower_custom_preprocess_env

# AML Compute Cluster Config
- name: AML_ENV_TRAIN_CONDA_DEP_FILE
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this var used?

# - name: AML_REBUILD_ENVIRONMENT
# value: "false"
- name: AML_REBUILD_ENVIRONMENT
value: "true"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to rebuild the first time the dependencies are changed, right? Perhaps leave the default to false, and set the ADO variable to true?

RUN conda install -y conda=${CONDA_VERSION} python=${PYTHON_VERSION} && \
pip install azureml-defaults==${AZUREML_SDK_VERSION} inference-schema==${INFERENCE_SCHEMA_VERSION} &&\
pip install python-dotenv==0.12.* dataclasses==0.6 opencensus==0.7.11 opencensus-ext-httplib==0.7.3 \
opencensus-ext-azure==1.0.5 opencensus-ext-logging==0.1.0 opencensus-context==0.1.2 && \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use conda dependency yml to be consistent rather than directly pip install?

@liupeirong
Copy link
Owner

Let's pack it to a lib rather than having all samples repeat the observability code, as noted, it'll also keep the training folder smaller.

@h2floh
Copy link
Collaborator Author

h2floh commented Jan 11, 2021

lib rather than having all samples re

Yes, didn't had the chance to close this PR. It is outdated because we did the folder structure refactoring. We have to decide about things like account creation and management for the pip package repo. How it should be named etc.

After that we reincorporate the logging methods into the existing samples, this should be more or less copy&paste work based on the work done here. Also good for splitting up the work/PRs for each sample.

Thanks for any comments let's incorporate them into the refactored observability lib. I'll now close this PR for now.

@liupeirong @kenakamu

@h2floh h2floh closed this Jan 11, 2021
@h2floh
Copy link
Collaborator Author

h2floh commented Feb 25, 2021

Completed with #59

@h2floh h2floh deleted the h2floh/17_logger branch March 2, 2021 01:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AML Observability Library

4 participants