CU-8692my74y / CU-8692kj0tw Clean eval mct export #7

mart-r · 2023-10-11T12:43:57Z

What this PR does:

Removed the legacy notebook.

Some fixes to medcat/evaluate_mct_export/mct_analysis.py:

Stop duplication of project and document name (i.e after renaming meta-annotations)
Fix project_names containing whole projects, not just their names

Some added functionality to medcat/evaluate_mct_export/mct_analysis.py:

Add support for multiple date formats (the default wouldn't work for the MCT export I had)
Raise a meaningful exception if a model pack (CAT instance) is required

Some more minor changes that shouldn't affect operation

Removed unused imports
Some minor whitespace changes
Some refactoring
- For lower indentation
- For better readability
- For lower code duplication
More documentation
- For renaming of meta-annotations

Added a bunch of tests for medcat/evaluate_mct_export/mct_analysis.py in tests/medcat/evaluate_mct_export/test_mct_analysis.py:

A test MCT export resource
Testing most methods actually work
- Don't raise exceptions
- Return results of interest
- Ignoring things that need a CAT instance (not available at test time)
Tests for renaming meta annotation names
Tests for renaming meta annotation values

I tried also looking at the few medcat/evaluate_mct_export/mct_analysis.py methods that depended on the model pack

full_annotation_df
- Got a result with the example MCT export and MedMen model
- But didn't seem like there's much I can change here
meta_anns_concept_summary
- I don't have a dataset to get meaningful results from this
- For the test dataset and the MedMen model, this was empty
generate_report
- ~~Wouldn't work~~
  - Works but requires openpyxl and a model pack
- Probably some version mismatch
- Though I tried to pin my versions to what it should need
- Though perhaps openpyxl could be added to requirements
  - It's required for this method

EDIT (12.12.2023)
I've since added a set of offline tests. The idea is that these should be ran not on GHA but locally on the developer's computer (or perhaps in the future in some other isolated environment). The reason is the fact that these tests require model packs and we don't want to have to download and re-download these on GHA.
I've added the tests in a way that makes them not discoverable for `python -m unittest discover. As such, to run them, you'd need to run:

python -m unittest tests.medcat.evaluate_mct_export.offline_test_mct_analysis

With that said, you also need to have the medmentions model at tests/medcat/resources/offline/medmen_wstatus_2021_oct.zip in order for these tests to run.

PS:
There may still be quite a few things that could use some attention. So I'm open to suggestions.

Couldn't find a MCT export that had the datatime format that was supported. So now supporting the format that was present in the example I had

… add the relevant document names

…in case of subsequent calls (i.e after renaming meta annotations)

…a annotations

…, lower indentation)

…o model pack available

antsh3k · 2023-11-27T12:01:33Z

Hey can you elaborate on what "Wouldn't work" in the generate report function?

The rest of this PR can be merged.

mart-r · 2023-11-27T12:37:23Z

Hey can you elaborate on what "Wouldn't work" in the generate report function?

Is this what you mean?
https://github.com/CogStack/working_with_cogstack/pull/7/files#diff-915a1b5b3ba4a938a71069d6ab0528a9d3d1ddec244da548fad6981e9adfe7daR417

All I meant was that it requires a CAT instance for it to do anything. I.e the line right after this:
https://github.com/CogStack/working_with_cogstack/pull/7/files#diff-915a1b5b3ba4a938a71069d6ab0528a9d3d1ddec244da548fad6981e9adfe7daR421
(you'd need to expand to see it which is also why a direct link wouldn't work)

If you meant something else, I must have missed what you meant.

mart-r · 2023-12-08T17:51:02Z

Hey can you elaborate on what "Wouldn't work" in the generate report function?

Looking at it again, I can now tell what you meant. I didn't read my own description of the PR, only looked at the code.

It's been a while since I did all of this, so I don't remember what the issue was exactly.
But the gist of it was this:

I couldn't run it as is (due to it needing openpyxl)
After installing openpyxl I couldn't run it because I wasn't specifying a CAT instance/path.

I tried and it does work just fine if when I specified a model.

In any case, unless we want to bundle this project with a full model pack, it's not (automatically) testable in its current state.

…hod and meta annotation summary method

mart-r · 2023-12-13T09:45:50Z

I've since added a set of offline tests. The idea is that these should be ran not on GHA but locally on the developer's computer (or perhaps in the future in some other isolated environment). The reason is the fact that these tests require model packs and we don't want to have to download and re-download these on GHA.
I've added the tests in a way that makes them not discoverable for `python -m unittest discover. As such, to run them, you'd need to run:

python -m unittest tests.medcat.evaluate_mct_export.offline_test_mct_analysis

With that said, you also need to have the medmentions model at tests/medcat/resources/offline/medmen_wstatus_2021_oct.zip in order for these tests to run.

mart-r added 21 commits October 9, 2023 16:34

CU-8692my74y: Add support for multiple datetime formats.

323b6dc

Couldn't find a MCT export that had the datatime format that was supported. So now supporting the format that was present in the example I had

CU-8692my74y: Add initial few tests for mct_analysis

f6cc8b8

CU-8692my74y: Add some further tests for MCT export evaluation behaviour

5a751f8

CU-8692my74y: Remove unused imports; Some trivial whitespace fixes

f85b563

CU-8692my74y: More trivial whitespace fixes

a4e1ca9

CU-8692my74y: Fix project names list containig whole projects

5b0286e

CU-8692my74y: Add tests for project and document names

38d0264

CU-8692my74y: Lower indentation and simplify annotation getting method

b56aee3

CU-8692my74y: Add option for docs/annotations iterator methods to not…

29fd295

… add the relevant document names

CU-8692my74y: Reset project/document names when calling _annotations …

cfd7e00

…in case of subsequent calls (i.e after renaming meta annotations)

CU-8692my74y: Add tests for project/document names after renaming met…

4233321

…a annotations

CU-8692my74y: Add a meta annotation to test MCT export

a17e15e

CU-8692my74y: Add tests for meta annotation rennaming

b70e632

CU-8692my74y: Refactor meta annotation rennaming (less duplicate code…

390235c

…, lower indentation)

CU-8692my74y: Add further documentation to meta annotation renaming

d32c028

CU-8692my74y: Add exception for Excel report generation if there is n…

6bfacf0

…o model pack available

CU-8692my74y: Rename protected method name

5324d79

CU-8692my74y: Remove legacy notebook

0cdf2e2

CU-8692my74y: Add type hints to a bunch of methods

46af624

CU-8692my74y: Remove duplicate import

17a0991

CU-8692kkavp: Fix typo in creating cocab script regarding paths

a0f1666

tomolopolis mentioned this pull request Dec 4, 2023

CU-2e77a31 improve print stats CogStack/MedCAT#366

Merged

mart-r added 3 commits December 12, 2023 15:11

CU-8692my74y / CU-8692kj0tw: Add offline test for generate report method

151f690

CU-8692my74y / CU-8692kj0tw: Add offline test for full annotation met…

5eec38f

…hod and meta annotation summary method

CU-8692my74y / CU-8692kj0tw: Remove some unused comments

b53ad81

Merge branch 'main' into CU-8692my74y-clean-eval-mct-export

fa0b4d3

mart-r added 5 commits December 18, 2023 12:13

CU-8692my74y / CU-8692kj0tw: Add some typing fixes

4fba1e7

CU-8692my74y / CU-8692kj0tw: Another typing fix

9224a50

CU-8692my74y / CU-8692kj0tw: Add plotly requirement

8eaf520

CU-8692my74y / CU-8692kj0tw: Fix typo in method name

7748448

CU-8692my74y / CU-8692kj0tw: Add spacy model to requirements

4502b04

mart-r merged commit a006c31 into CogStack:main Dec 18, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CU-8692my74y / CU-8692kj0tw Clean eval mct export #7

CU-8692my74y / CU-8692kj0tw Clean eval mct export #7

mart-r commented Oct 11, 2023 •

edited

Loading

antsh3k commented Nov 27, 2023

mart-r commented Nov 27, 2023

mart-r commented Dec 8, 2023 •

edited

Loading

mart-r commented Dec 13, 2023

CU-8692my74y / CU-8692kj0tw Clean eval mct export #7

CU-8692my74y / CU-8692kj0tw Clean eval mct export #7

Conversation

mart-r commented Oct 11, 2023 • edited Loading

antsh3k commented Nov 27, 2023

mart-r commented Nov 27, 2023

mart-r commented Dec 8, 2023 • edited Loading

mart-r commented Dec 13, 2023

mart-r commented Oct 11, 2023 •

edited

Loading

mart-r commented Dec 8, 2023 •

edited

Loading