Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-8694wh3d5 track usage #458

Merged
merged 14 commits into from
Jul 23, 2024
Merged

CU-8694wh3d5 track usage #458

merged 14 commits into from
Jul 23, 2024

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Jun 21, 2024

Add a usage monitor.

What it does is monitor:

  • The input text length
  • The input preprocessed text length
  • The number of output entities

The usage monitor is configured such that it logs on file every time 10 (configurable) inferences are logged (to avoid constant IO). It also logs the rest of the buffer when the user monitor is dereferenced.

This PR also adds the relevant config part (config.general.usage_monitor). The usage monitoring can be disabled if needed by changing the config.

@tomolopolis
Copy link
Member

Task linked: CU-8694wh3d5 Track model usage

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting this via an env var i.e. MEDCAT_LOGS
If this is False or 0, no logs / usage at all
Also we need logs to persist outside of process exit

@@ -36,6 +36,9 @@ def setUpClass(cls) -> None:
cls.cdb.config.linking.train = True
cls.cdb.config.linking.disamb_length_limit = 5
cls.cdb.config.general.full_unlink = True
cls._temp_logs_folder = tempfile.TemporaryDirectory()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dir and all contents will be destroyed on interpreter process stop(?) We want the logs to persist outside of the process running.

For linux this could be configurable via an env var, set within MedCATService or CogStack-ModelServe or something.

By default for *nix we want something like:
~/.local/share/medcat/logs/

windows:
C:\Users\%USERNAME%\.cache\medcat\logs\

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course we want the logs to persist!
But not during test time. We don't want tests to create too many extra files (mostly full of junk) locally (they already do, actually).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it - yes, didn't read module name

@mart-r
Copy link
Collaborator Author

mart-r commented Jul 10, 2024

setting this via an env var i.e. MEDCAT_LOGS If this is False or 0, no logs / usage at all Also we need logs to persist outside of process exit

Do we want dynamic changes to the environmental variables to be reflected in medcat? Or would it be sufficient to check once upon model init / load?
I think it would make sense to be able to change the behaviour dynamically. However, since os.environ is a snapshot of the environmental variables when the process was started, we'd need to somehow retrieve these changes (don't know if this is trivial).

What I'm thinking is something along the following lines:

  • Allow enabled to be True, False, or auto
  • When set to False, no logging (obviously)
  • When set to True, the config-specified files are used
  • When set to auto, the behaviour is automatic
    • Based on the environmental variable
    • The environmental variable is checked periodically (i.e no more often than 10 seconds)
    • The file location is picked automatically based on OS
      • I.e ~/.local/share/medcat/logs/ for *nix and C:\Users\%USERNAME%\.cache\medcat\logs\ for Windows

As for persistent - the current implementation (outside the tests) is persistent.

@tomolopolis
Copy link
Member

Sounds good - happy for auto just make sure the docstrings explain the behaviour.

from medcat.config import UsageMonitor as UsageMonitorConfig


LOGS_ENV = "MEDCAT_LOGS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth renaming to MEDCAT_USAGE_LOGS and MEDCAT_USAGE_LOGS_LOCATION ?
Don't want this to be confused with dev logs(?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mart-r mart-r merged commit 96706c8 into master Jul 23, 2024
8 checks passed
@mart-r mart-r deleted the CU-8694wh3d5-track-usage branch August 12, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants