Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CU-8692my74y / CU-8692kj0tw Clean eval mct export #7

Merged
merged 30 commits into from
Dec 18, 2023

Conversation

mart-r
Copy link
Collaborator

@mart-r mart-r commented Oct 11, 2023

What this PR does:

Removed the legacy notebook.

Some fixes to medcat/evaluate_mct_export/mct_analysis.py:

  • Stop duplication of project and document name (i.e after renaming meta-annotations)
  • Fix project_names containing whole projects, not just their names

Some added functionality to medcat/evaluate_mct_export/mct_analysis.py:

  • Add support for multiple date formats (the default wouldn't work for the MCT export I had)
  • Raise a meaningful exception if a model pack (CAT instance) is required

Some more minor changes that shouldn't affect operation

  • Removed unused imports
  • Some minor whitespace changes
  • Some refactoring
    • For lower indentation
    • For better readability
    • For lower code duplication
  • More documentation
    • For renaming of meta-annotations

Added a bunch of tests for medcat/evaluate_mct_export/mct_analysis.py in tests/medcat/evaluate_mct_export/test_mct_analysis.py:

  • A test MCT export resource
  • Testing most methods actually work
    • Don't raise exceptions
    • Return results of interest
    • Ignoring things that need a CAT instance (not available at test time)
  • Tests for renaming meta annotation names
  • Tests for renaming meta annotation values

I tried also looking at the few medcat/evaluate_mct_export/mct_analysis.py methods that depended on the model pack

  • full_annotation_df
    • Got a result with the example MCT export and MedMen model
    • But didn't seem like there's much I can change here
  • meta_anns_concept_summary
    • I don't have a dataset to get meaningful results from this
    • For the test dataset and the MedMen model, this was empty
  • generate_report
    • Wouldn't work
      • Works but requires openpyxl and a model pack
    • Probably some version mismatch
    • Though I tried to pin my versions to what it should need
    • Though perhaps openpyxl could be added to requirements
      • It's required for this method

EDIT (12.12.2023)
I've since added a set of offline tests. The idea is that these should be ran not on GHA but locally on the developer's computer (or perhaps in the future in some other isolated environment). The reason is the fact that these tests require model packs and we don't want to have to download and re-download these on GHA.
I've added the tests in a way that makes them not discoverable for `python -m unittest discover. As such, to run them, you'd need to run:

python -m unittest tests.medcat.evaluate_mct_export.offline_test_mct_analysis

With that said, you also need to have the medmentions model at tests/medcat/resources/offline/medmen_wstatus_2021_oct.zip in order for these tests to run.

PS:
There may still be quite a few things that could use some attention. So I'm open to suggestions.

Couldn't find a MCT export that had the datatime format that was supported. So now supporting the format that was present in the example I had
…in case of subsequent calls (i.e after renaming meta annotations)
@antsh3k
Copy link
Collaborator

antsh3k commented Nov 27, 2023

Hey can you elaborate on what "Wouldn't work" in the generate report function?

The rest of this PR can be merged.

@mart-r
Copy link
Collaborator Author

mart-r commented Nov 27, 2023

Hey can you elaborate on what "Wouldn't work" in the generate report function?

Is this what you mean?
https://github.com/CogStack/working_with_cogstack/pull/7/files#diff-915a1b5b3ba4a938a71069d6ab0528a9d3d1ddec244da548fad6981e9adfe7daR417

All I meant was that it requires a CAT instance for it to do anything. I.e the line right after this:
https://github.com/CogStack/working_with_cogstack/pull/7/files#diff-915a1b5b3ba4a938a71069d6ab0528a9d3d1ddec244da548fad6981e9adfe7daR421
(you'd need to expand to see it which is also why a direct link wouldn't work)

If you meant something else, I must have missed what you meant.

@mart-r
Copy link
Collaborator Author

mart-r commented Dec 8, 2023

Hey can you elaborate on what "Wouldn't work" in the generate report function?

Looking at it again, I can now tell what you meant. I didn't read my own description of the PR, only looked at the code.

It's been a while since I did all of this, so I don't remember what the issue was exactly.
But the gist of it was this:

  • I couldn't run it as is (due to it needing openpyxl)
  • After installing openpyxl I couldn't run it because I wasn't specifying a CAT instance/path.

I tried and it does work just fine if when I specified a model.

In any case, unless we want to bundle this project with a full model pack, it's not (automatically) testable in its current state.

@mart-r
Copy link
Collaborator Author

mart-r commented Dec 13, 2023

I've since added a set of offline tests. The idea is that these should be ran not on GHA but locally on the developer's computer (or perhaps in the future in some other isolated environment). The reason is the fact that these tests require model packs and we don't want to have to download and re-download these on GHA.
I've added the tests in a way that makes them not discoverable for `python -m unittest discover. As such, to run them, you'd need to run:

python -m unittest tests.medcat.evaluate_mct_export.offline_test_mct_analysis

With that said, you also need to have the medmentions model at tests/medcat/resources/offline/medmen_wstatus_2021_oct.zip in order for these tests to run.

@mart-r mart-r merged commit a006c31 into CogStack:main Dec 18, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants