Implementation of CohortIncidence module. #147

chrisknoll · 2024-07-14T23:53:55Z

This PR uses ResultsModelManager to create the results schema and upload results.

Fixed param typo in ResultDataModel.R

azimov · 2024-07-16T21:25:13Z

inst/csv/cohortIncidenceRdms.csv

My feeling is that this should be in the package for several reasons:

If a change is introduced in the model for any reason that requires a release of both packages. Update just one and this will result in a break interoperability

Users will always need to work with their data, even if they don't use strategus they may wish to use the shiny apps or other report generation tools. Now they will have a dependency they didn't previously need

The current release of CohortIncidence just released a breaking change with regards to shiny apps and any other apps by not implementing migrations. Now, currently existing projects will no longer be upgradable without custom hacks for what is a relatively small change to the data model

It's inconsistent with other packages and utilities

It places a burden on the maintainer of Strategus to maintain 100% version consistency with CohortIncidence

Thanks for the questions and comments, @azimov . I'll preface this by saying I did give a lot of consideration between what goes into the package and what is part of the module. The deciding factor was if it was a Strategus concern, it went into the module, if it was a package concern, it goes into the package.

If a change is introduced in the model for any reason that requires a release of both packages. Update just one and this will result in a break interoperability

Not necessarily: if the new version of the package produces the same output, from the same input, then I don't think you need a new version of Strategus (just reference a new version in renv.lock). If the analytics change, either by inputs or outputs, I think it would make sense that a different version of Strategus would be necessary to represent the changes in dependencies. I would suggest 2 things on this front: a release of Strategus comes with a published renv.lock file that contains which versions of package dependencies have been tested with the given version of Strategus, and that within a single release of Strategus you may have multiple updates to underlying packages.

Users will always need to work with their data, even if they don't use strategus they may wish to use the shiny apps or other report generation tools. Now they will have a dependency they didn't previously need

While this is true, it can't use the OhdsiShinyModules to view results from the underlying packages at the package level (ie: take the output of CI package and use OhdsiShinyModules to view results). The reason is that the OhdsiShinyModules nave adopted conventions that have been specified by Strategus. 2 Examples: if you want the list of databases, you go to the database meta table (which was created by Strategus; and if you want to get cohort definition names, you go to the cg_table_prefixcohort_definition, a table created by CohortGenerator which Strategus enforces that CohortGenerator is part of the Strategus analsyis. See the following:
https://github.com/search?q=repo%3AOHDSI%2FOhdsiShinyModules%20prefixcohort_definition&type=code

Based on these decisions, I would suggest that OHDSIShinyModules becomes something like StrategusShinyModules, or we could embed the ShinyModules into the release of Strategus so that you have both the execution of analysis, persistence of results and report viewer all bound to the same version of the software. I fully understand that the original intent of OhdsiShinyModules was intended to view results of the individual HADES packages, but, in my estimation, this principle has been abandoned in favor of Strategus-specific concerns.

The current release of CohortIncidence just released a breaking change with regards to shiny apps and any other apps by not implementing migrations. Now, currently existing projects will no longer be upgradable without custom hacks for what is a relatively small change to the data model

I thought about this, but I decided on the approach for 2 reasons: 1) you won't be changing the version of the analysis packages within a single project. and 2) while I would have preferred a migration approach to this, the most expedient way was to use the functionality out of RMM to create a schema base don the results model spec. I'd like to move to a pure-migration approach to managing database schemas (as we do in WebAPI), but based on the nature that the schema shouldn't change within a study, I decided the simplest approach would be to use RMM.

It's inconsistent with other packages and utilities

There isn't exactly a published standard on this, and the module layer provides that structure. However, it is consistent with the EvidenseSynthisis package which is another RMM that is contained in Strategus.

This PR follows the Strategus conventions through the use of the common R6 classes for module implementation, and in this way, it is consistent with the other packages. This module layer allows the underlying packages to be independent towards how they can function most effectively (in their own way) while being consistent in execution across all modules.

There are a number of ways that packages are inconsistent: CI handles JSON serialization/deserializtion differently, but it is because serialization via ParallelLogger is broken. Not every package defines inputs via R6 classes. I think there is going to be differences in approaches for many of the HADES packages, and I can understand the desire to keep that consistent, it may be more trouble than it's worth to get down to every possible detail.

It places a burden on the maintainer of Strategus to maintain 100% version consistency with CohortIncidence

I'm not sure this is the case, as @anthonysena wasn't involved in any of the implementation of this PR for incorporation of CI into Strategus. I do expect that we'll have different collaborators as the 'responsible party' for the maintenance of the underlying modules (which may or may not be the underlying package maintainer), and I think this will be coordinated through the Strategus team.

Thanks again for your thoughts and comments. Let me know if I misunderstood anything, and I will be glad to provide more detail.

chrisknoll · 2024-07-17T14:54:54Z

@anthonysena , I've converted to draft as there's one additional thing I'd like to add (and this is a good illustration of some comments that I made in the feedback from @azimov ):

In OhdsiShinyModules, the Characterization/CohortIncidence view has some functionality that wants to navigate to those outcomes that belong to specific targets of interest. In other words, the CI UX for reporting would like to be able to select outcomes that exist for a given T. This isn't something that is captured in the analysis results, but it is something that is required by Strategus. So, I will add a new table to the results model specification in Strategus that doesn't exist in the underlying package, which will store the given T-O-TAR pairs that were specified in the analysis. The Module implementation will add this result to the output as part of the execute() method.

…nterface' into module-interface-cohortincidence # Conflicts: # R/Module-CohortIncidence.R

chrisknoll · 2024-07-17T18:27:28Z

Hmm, after I merged the upstream branch to this branch, tests failed in something related to Characterization. @anthonysena do you have any idea where that came from?

chrisknoll · 2024-07-17T18:36:25Z

On the topic of module implementation philosophy:

I spoke for an hour with @jreps and @pbr6cornell about the implementation details and what is a Strategus, OHDSIShinyModule and package concern, and while we do want to get to a consensus on philosophy and approach, we're not quite there yet. To that end, I've decided that the implementation for this PR will stand, but I did commit to after having this in place for a time, we can discuss the merits and problems with the approach and I have committed to revisiting the code to address any issues that arise. Moving the implementation from module into package will not result in any API changes, so there should not be a backwards compatibility challenge by making the change in the future.

anthonysena · 2024-07-18T14:04:03Z

Hmm, after I merged the upstream branch to this branch, tests failed in something related to Characterization. @anthonysena do you have any idea where that came from?

Yes, the most recent release of FeatureExtraction broke Characterization as described here: OHDSI/Characterization#54. Once we patch Characterization, the unit tests should run without issue.

Given this, we can move forward with merging this work into the other feature branch if you feel it is ready? I did a quick review and it is fine for now so we can wrap up the initial module work.

chrisknoll · 2024-07-19T17:00:09Z

@anthonysena , I added a new table to the results model for CI that isn't part of the output of the package, but is a requested table to support reporting. This is a good example of where module comes in and bridges functionality betwene the package and the reporting. So this PR is ready to go.

chrisknoll · 2024-07-22T13:51:31Z

@anthonysena , i added a commit to this PR but touches DatabaseMetaData.R:

I had to set runCheckAndFixCommands = TRUE, in uploadResults, because that seems to be the only way to fix the database_id int value to be properly converted to a varchar. Without it, the number comes up as '####.0' instead of an integer ####. I was running into the issue where the dabasese_id in one table was #.0 and in another table was ###, and was causing the join to fail. This was working when the database_id was part of a PK column (it performs the int->as.character conversion if it is a primary key column, but something changed with that in my local env so now I force these tables to undergo the check and fix.

Ideally, we'd just want to convert all columns to the correct underlying type before upload (which is one of the 3 things that checkAndFixCommands does), so maybe we can have RMM change to allow just the column conversion done separately from the other 2 things (which one checks for dupes and I forget what the 3rd function is, but these 2 ops would be very expensive on very large data).

anthonysena · 2024-07-22T16:10:00Z

@anthonysena , i added a commit to this PR but touches DatabaseMetaData.R:

I had to set runCheckAndFixCommands = TRUE, in uploadResults, because that seems to be the only way to fix the database_id int value to be properly converted to a varchar. Without it, the number comes up as '####.0' instead of an integer ####. I was running into the issue where the dabasese_id in one table was #.0 and in another table was ###, and was causing the join to fail. This was working when the database_id was part of a PK column (it performs the int->as.character conversion if it is a primary key column, but something changed with that in my local env so now I force these tables to undergo the check and fix.

Ideally, we'd just want to convert all columns to the correct underlying type before upload (which is one of the 3 things that checkAndFixCommands does), so maybe we can have RMM change to allow just the column conversion done separately from the other 2 things (which one checks for dupes and I forget what the 3rd function is, but these 2 ops would be very expensive on very large data).

So database_id is a varchar as declared in https://github.com/OHDSI/Strategus/blob/remove-deps-add-module-interface/inst/csv/databaseMetaDataRdms.csv#L13. Perhaps you need to cast this in your module when you set the database_id? It might be helpful to see how you are testing this out as the unit tests were passing earlier without this change to the DatabaseMetaData.R.

chrisknoll · 2024-07-22T21:52:37Z

It's all handed by uploadResults:
https://github.com/OHDSI/ResultModelManager/blob/d939010caf5fdc4e898c6e6d1c68d39dc86dc77f/R/DataModel.R#L535

I do not know why it started working differently, I'm not sure if it was based on a merge from upstream or a new version install of RMM, but the solution was to set that flag in uploadResults.

chrisknoll · 2024-07-23T03:15:27Z

Let me push up a change where i don't use that flag, but append the databaseId as a as.character() However, there will need to be some change on the DatabaseModule.R side where you do the same (add database_id as a character value).

Edit:

I see the problem with this now: The value is written to the CSV correctly (there is no .0 in the csv). However, we lose control over what happens in ResultsModelManager::uploadResults() since it reads the CSVs and works on them based on resultsModelSpec. By default, RMM does not convert the csv columns into the correct data type (UNLESS you use the runCheckAndFixCommands flag), so there's nothing we can do on the outside of RMM. Hence, the change was necessary.

I think the long term solution is to ask RMM to modify the readCSV to read the columns based on the data types that are specified in the results model. By default, the read_csv() tries to 'guess' the value and the database_id looks like numbers.

anthonysena · 2024-07-26T18:44:39Z

I did a test to see if there was a difference in behavior when uploading results to SQLite vs. PostgreSQL based on the notes made by @chrisknoll above. The answer is that there is a difference in behavior. When I uploaded the database_meta_data into SQLite, the value was formatted as database_id = 85642205.0 while in PostgreSQL the value is database_id = 85642205.

For now, to keep things consistent, I'll remove the runCheckAndFixCommands = TRUE after I merge this into the other feature branch.

* Implementation of CohortIncidence module. (#147) * Adjustments from testing --------- Co-authored-by: Chris Knoll <[email protected]>

Implementation of CohortIncidence module.

5e3fa68

Fixed param typo in ResultDataModel.R

azimov reviewed Jul 16, 2024

View reviewed changes

chrisknoll marked this pull request as draft July 17, 2024 14:48

chrisknoll added 2 commits July 17, 2024 12:41

Set runCheckAndFixCommands = TRUE to upload database_id properly.

cde801b

Merge remote-tracking branch 'remotes/origin/remove-deps-add-module-i…

6de3c47

…nterface' into module-interface-cohortincidence # Conflicts: # R/Module-CohortIncidence.R

This was referenced Jul 18, 2024

Executing settings functions inside renvs #51

Closed

How to design and distribute a Strategus network study #148

Open

Added target_outcome_xref to results model.

cc2b4b4

chrisknoll marked this pull request as ready for review July 19, 2024 17:00

runCheckAndFixCommands = TRUE in DatabaseMetaData.R.

ede3c9b

anthonysena mentioned this pull request Jul 29, 2024

insertTable for Sqlite adds decimal precision to data inserted into character field OHDSI/DatabaseConnector#280

Closed

anthonysena merged commit 00be102 into remove-deps-add-module-interface Jul 26, 2024
2 of 8 checks passed

anthonysena deleted the module-interface-cohortincidence branch July 26, 2024 18:58

anthonysena added a commit that referenced this pull request Jul 31, 2024

Refactor Strategus module approach (#145)

0a0171b

* Implementation of CohortIncidence module. (#147) * Adjustments from testing --------- Co-authored-by: Chris Knoll <[email protected]>

anthonysena added a commit that referenced this pull request Oct 7, 2024

Refactor Strategus module approach (#145)

1198f34

* Implementation of CohortIncidence module. (#147) * Adjustments from testing --------- Co-authored-by: Chris Knoll <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of CohortIncidence module. #147

Implementation of CohortIncidence module. #147

chrisknoll commented Jul 14, 2024

azimov Jul 16, 2024

chrisknoll Jul 17, 2024 •

edited

Loading

chrisknoll commented Jul 17, 2024

chrisknoll commented Jul 17, 2024

chrisknoll commented Jul 17, 2024 •

edited

Loading

anthonysena commented Jul 18, 2024

chrisknoll commented Jul 19, 2024 •

edited

Loading

chrisknoll commented Jul 22, 2024 •

edited

Loading

anthonysena commented Jul 22, 2024

chrisknoll commented Jul 22, 2024 •

edited

Loading

chrisknoll commented Jul 23, 2024 •

edited

Loading

anthonysena commented Jul 26, 2024

Implementation of CohortIncidence module. #147

Implementation of CohortIncidence module. #147

Conversation

chrisknoll commented Jul 14, 2024

azimov Jul 16, 2024

Choose a reason for hiding this comment

chrisknoll Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

chrisknoll commented Jul 17, 2024

chrisknoll commented Jul 17, 2024

chrisknoll commented Jul 17, 2024 • edited Loading

anthonysena commented Jul 18, 2024

chrisknoll commented Jul 19, 2024 • edited Loading

chrisknoll commented Jul 22, 2024 • edited Loading

anthonysena commented Jul 22, 2024

chrisknoll commented Jul 22, 2024 • edited Loading

chrisknoll commented Jul 23, 2024 • edited Loading

anthonysena commented Jul 26, 2024

chrisknoll Jul 17, 2024 •

edited

Loading

chrisknoll commented Jul 17, 2024 •

edited

Loading

chrisknoll commented Jul 19, 2024 •

edited

Loading

chrisknoll commented Jul 22, 2024 •

edited

Loading

chrisknoll commented Jul 22, 2024 •

edited

Loading

chrisknoll commented Jul 23, 2024 •

edited

Loading