Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-8694cd9t2: Allow merging config into model pack config before init #462

Merged
merged 4 commits into from
Aug 12, 2024

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Jul 15, 2024

Sometimes one might want to load an existing model off disk and make changes to the config before it's initialised.
However, this can be problematic for parts of the config that only take effect at initialisation time (see e.g #447).

As such, this PR will add an additional parameter to CAT.load_model_pack such that the config of a model pack can be amended before the initialisation (i.e before the pipe gets built).

@tomolopolis
Copy link
Member

…461)

* CU-86951923u: Add option for simplified hash along with a few tests

* CU-86951923u: Make sure simplified hashing test compares regular to simplified timings

* CU-86951923u: Hopefully fix hashing test with simplified hash after saving

* CU-86951923u: Call patched methods when performing fake save during tests for python 3.8 support

* CU-86951923u: Fix fake save during tests for python 3.11 support
* CU-8694vbw6y: Update k-fold metrics to allow including standard deviation in results

* CU-8694vbw6y: Add tests for new parts of k-fold metrics (e.g standard deviation)

* CU-8694vbw6y: Fix typing issues k-fold metrics and standard deviation
Copy link
Member

@tomolopolis tomolopolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mart-r mart-r merged commit 76c2fa2 into master Aug 12, 2024
8 checks passed
mart-r added a commit that referenced this pull request Aug 15, 2024
…#462)

* CU-8694cd9t2: Allow merging config into model pack config before init
mart-r added a commit that referenced this pull request Aug 28, 2024
* CU-86956du3q: Move to placeholder-based replacement

* CU-86956du3q: Update regression tests to a more reasonable state.

Make sure to compare the correct annotation, not just hoping for any CUI annotated to match the one we are looking for.
Output the specifics of the type of match that was found:
 - Identical
 - Bigger / smaller span
 - Random overlap
 - Parents / grandparetns, or children
Add strictness options to summary (success / failure).

* CU-86956du3q: Further fixes for regression checking:

Remove 'Failure reason' and 'Failre descriptor' - now using Finding instead.
Remove simplified success/failure metrics wherever relevant.
Fix tests that relied on old logic and fix test-time replacement/cui location.

* CU-86956du3q: Add documentation for new clases and methods

* CU-86956du3q: Rename enum constant (SPAN_OVERLAP -> PARTIAL_OVERLAP)

* CU-86956du3q: Add matching for partially overlapping children

* CU-86956du3q: Add tests for partially overlapping children

* CU-86956du3q: Update regression checking to generate multiple sub-cases for multiple placeholders

* CU-86956du3q: Update some tests for new format

* CU-86956du3q: Remove old / unused / irrelevant tests and test-code

* CU-86956du3q: Some renaming (filter -> placeholders)

* CU-86956du3q: Add some additional fail safes for option set

* CU-86956du3q: Fix option set for only 1 placeholder

* CU-86956du3q: Fix targeting

* CU-86956du3q: Add tests for targeting

* CU-86956du3q: Remove MCT export conversion (at least for now)

* CU-86956du3q: Remove MCT export conversion tests (at least for now)

* CU-86956du3q: Remove suite editing (at least for now)

* CU-86956du3q: Remove category separation (at least for now)

* CU-86956du3q: Remove unused regression utils (at least for now)

* CU-86956du3q: Remove serialisation tests (at least for now)

* CU-86956du3q: Improve quality of default regression test set

* CU-86956du3q: Improve exceptions in targeting

* CU-86956du3q: Fix docstring issue regarding exceptions

* CU-86956du3q: Update test with correct exceptions

* CU-86956du3q: Add utils for partial substitutions and corresponding tests

* CU-86956du3q: Allow multiple of the same placeholder in a phrase.

And more specifically, treat each one as their own sub-case

* CU-86956du3q: Add relevant tests for multi-placeholder checking

* CU-86956du3q: Allow changing of multiple pre-processing placeholders

* CU-86956du3q: Fix 1-placeholder sub-case yielding

* CU-86956du3q: Remove debug output

* CU-86956du3q: Replace separator (~) with whitespace when checking

* CU-86956du3q: Add utility method to limit string length for output

* CU-86956du3q: Improve string length limiting method

* CU-86956du3q: Add a few tests for string length limiting method

* CU-86956du3q: Add an ANYTHING strictness (mostly for example disbaling)

* CU-86956du3q: Add storage of examples (of a certain strictness) as well as relevant output

* CU-86956du3q: Fix type (missing ending bracket) in report output

* CU-86956du3q: Fix examples header appearing for every example

* CU-86956du3q: Print the same phrase fewer times for examples

* CU-86956du3q: Update fake CDB with (default) config

* CU-86956du3q: Add finding to examples and output

* CU-86956du3q: Add config to another fake CDB during test time

* CU-86956du3q: Allow strictness to propagate to parts when looking at examples

* CU-86956du3q: Add placeholder to examples output

* CU-86956du3q: Refactor report output generation slightly

* CU-86956du3q: Show all non-identical examples

* CU-86956du3q: Update example checking with strictness requirement (instead of simple boolean)

* CU-86956du3q: Simplify targeting somewhat (remove unnecessary method)

* CU-86956du3q: Allow changing of ouptut phrase max length

* CU-86956du3q: Fix doc string for changed method

* CU-86956du3q: Small whitespace fix

* CU-86956du3q: Fix total-included checking iteration

* CU-86956du3q: Add strictness and max phrase length to CLI

* CU-86956du3q: Add examople strictness to CLI

* CU-86956du3q: Fix default value for strictness in CLI

* CU-86956du3q: Update to use number of sub-cases for tqdm/progress bar

* CU-86956du3q: Remove option to set the total for progress bar (the automated one works fine now)

* CU-86956du3q: Simplify the progress bar by combining all cases

* CU-86956du3q: Split subcase iteration

* CU-86956du3q: Rename regression checker to regression suite

* CU-86956du3q: Streamline typing and the like by using intermediate data classes

* CU-86956du3q: Remove redundant method

* CU-86956du3q: Remove redundant method and acommpanying test

* CU-86956du3q: Remove redundant class

* CU-86956du3q: Add another intermediate data class

* CU-86956du3q: Remove completed TODO notes and redundant method

* CU-86956du3q: Add documentation to new methods and clases. Simplify example keeping.

* CU-86956du3q: Small update for how default test suite is handled for CLI

* CU-86956du3q: Small to report output format

* CU-86956du3q: Add easier to read exception when unable to load a placeholder

* CU-86956du3q: Update percentages output to avoid as many decimal places

* CU-86956du3q: Use preferred name for run-to-run consistency

* CU-86956du3q: Update test time fake CDBs

* CU-86956du3q: Update default regression tests with new extensive (yet simple) test case

* CU-86956du3q: Add initial README for regression stuff

* CU-86956du3q: Add option to for failing with having found another concept.

Added other incorrect cui that was found (if applicable).
Fixed issue with finding grandparents.

* CU-86956du3q: Add tests for parent and grandparent finding; fix tests for new changes (with optionally found alternative CUI)

* CU-86956du3q: Add preferred name to wrong CUI found

* CU-86956du3q: Fix tests for new form of determine cui description; add test for exact span grandchild

* CU-86956du3q: Fix determining partial matches for grandchildren and beyond

* CU-86956du3q: Add test for partial matches of grandchildren

* Fixing bug for metacat

Fix issues with compute_class_weights JSON serialization and enforce fc2 usage when fc3 is enabled

* Resolved an issue where compute_class_weights returns a NumPy array, causing an error when saving the configuration as JSON (since JSON does not support NumPy arrays). The fix ensures compatibility by converting the NumPy array to a JSON-serializable format.

* Added a safeguard in the model_architecture_config for meta_cat_config. The current architecture assumes fc3 is only used when fc2 is enabled. If fc2 is set to False and fc3 is True, the model would fail due to a mismatch in hidden layer sizes. The fix automatically enables fc2 if fc3 is set to True, preventing potential errors.

* CU-86956duhb: Add method to backport a model pack from 1.12 to previous version (#465)

* CU-86956duhb: Add method to backport a model pack from 1.12 to previous version

* CU-86956duhb: Fix some doc string issues

* CU-86956duhb: Add deprecation decorator to old config-fix

* CU-86956duhb: Mark backporting method as deprecated and to be removed in 1.14

* CU-8694cd9t2: Allow merging config into model pack config before init (#462)

* CU-8694cd9t2: Allow merging config into model pack config before init

* CU-8694fwyje: Update all configs with pre-load parts documented (#473)

* CU-86956du3q: Add converter from MCT export

* CU-86956du3q: Add documentation to MCT export converter

* CU-86956du3q: Add option to create a regression suite from an MCT export

* CU-86956du3q: Add option to create a regression suite from an MCT export to CLI

* CU-86956du3q: Add a small note for converter placeholder

* CU-86956du3q: Add tests for MedCATtrainer export converter

* CU-86956du3q: Add tests for regression suite generation based on MCT export

* CU-86956du3q: Simplify regression case creation tests somewhat

* CU-86956du3q: Add option to create a regression suite YAML from MCT export

* CU-86956du3q: Add option to stop at MCT export conversion

* CU-86956du3q: Make use of only-prefnames option

* CU-86956du3q: Fix loading of only-prefnames option from yaml

* CU-86956du3q: Add comment for only using preferred names to the default regression suite yaml

* CU-86956du3q: Fix tests broken due to pref-name only change

* CU-86956du3q: Add utility method to set runtime doc strings for enum constants

* CU-86956du3q: Add tests for runtime doc string addition

* CU-86956du3q: Add more tests for runtime doc string addition (to make sure it fails without the change)

* CU-86956du3q: Make Finding enum has runtime doc strings

* CU-86956du3q: Add CLI option to show the various descriptions of the finding types (--only-describe)

* CU-86956du3q: Update dict and json methods for some results for JSON serialisation

* CU-86956du3q: Add a few json serialisation tests

* CU-86956du3q: Add json serialisation example strictness to CLI

* CU-86956du3q: Add a few more json serialisation tests

* CU-86956du3q: Add usage of regression suite name from the name of the file being read

* CU-86956du3q: Fix tests by adding the regression suite name where applicable

* CU-86956du3q: Avoid examples in ResultDescriptor

* CU-86956du3q: Make sure strictness propagates accross all parts of a multi-result descriptor

* CU-86956du3q: Update tests: Use correct reporting for generating fake reports

* CU-86956du3q: Fix small test issue

* CU-86956du3q: Update tests for manual success/fail for results

* CU-86956du3q: Separate calculation section of report finding

* CU-86956du3q: Add a few more tests for report/results

* CU-86956du3q: Add option to force a non-0 exit status upon any regression test failure

* CU-86956du3q: Add files for regression model creation and checking

* CU-86956du3q: Add new part to main workflow to create and regression check a simple model pack

* CU-86956du3q: Update a mistyped comment

* CU-86956du3q: Make regression run at STRICTEST strictness at GHA workflow time

* CU-86956du3q: Fix strictness matrix for anything-typed strictness

* CU-86956du3q: Add strictness matrix information to --describe-only

* CU-86956du3q: Add python version to created model pack for test time

* CU-86956du3q: Use the python version of creat model pack during test time to avoid conflicts with other python versions running in parallel

* CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking

* Revert "CU-86956du3q: [TEMP] Remove tests from main workflow (for faster iteration) and add args to output upon regression checking"

This reverts commit 4bf3089.

* CU-86956du3q: Make full model path the last line of the output upon creation model for regression

* CU-86956du3q: Move regression workflow logic to a separate bash script

* CU-86956du3q: Update comments in regression bash script

* CU-8694pz44d: Fix model cleanup during regression

* CU-86956du3q: Fix typos in utils

* CU-86956du3q: Fix a bunch of various typos in doc strings and comments

---------

Co-authored-by: shubham-s-agarwal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants