Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_dictionary_builder.php unique constraint fix #9201

Merged

Conversation

laemtl
Copy link
Contributor

@laemtl laemtl commented Apr 11, 2024

Fix the data_dictionary_builder.php script
by enforcing the unique key constraint Name + SourceFrom.
by reducing the scope of the main query to instruments. Instruments have unique parameter names, and the script is intended to build a dictionary for instruments.

Error below was encountered with EEG/MRI parameter_file fields.

Error seen when 2 parameter_type entries share the same Name:

lorisadmin@smathew-dev:/var/www/loris/project/tools/exporters$ php data_dictionary_builder.php
Could not execute Select Name, ParameterTypeID from parameter_type. Stack trace#0 /var/www/loris/project/tools/exporters/data_dictionary_builder.php(60): Database->pselectColWithIndexKey()
#1 {main}
PHP Fatal error:  Uncaught DatabaseException: The uniqueKey supplied to pselectColWithIndexKey() does not 
                    appear to be unique or is nullable. This function expects the 
                    key to be both UNIQUE and NOT NULL. in /var/www/loris/php/libraries/Database.class.inc:971
Stack trace:
#0 /var/www/loris/project/tools/exporters/data_dictionary_builder.php(60): Database->pselectColWithIndexKey()

@laemtl laemtl force-pushed the data_dictionary_builder-unique-constraint-fix branch from 4eb5df8 to 4591517 Compare April 11, 2024 17:53
@laemtl laemtl force-pushed the data_dictionary_builder-unique-constraint-fix branch from 4591517 to 812ec37 Compare April 11, 2024 18:05
@sruthymathew123 sruthymathew123 self-requested a review April 11, 2024 19:48
Copy link
Contributor

@sruthymathew123 sruthymathew123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tesed and looks good

@sruthymathew123 sruthymathew123 added the Passed manual tests PR has been successfully tested by at least one peer label Apr 11, 2024
@ridz1208
Copy link
Collaborator

@laemtl what is the usecase here? In my DB, the Name parameter includes the "sourceFrom" basically on my end Name = sourcefrom+sourcefield

@laemtl
Copy link
Contributor Author

laemtl commented Apr 17, 2024

@ridz1208 Some EEG/MRI pipelines use parameters with the same Name, different SourceFrom since they both import BIDS file parameters. We experienced this issue on HBCD.

I guess it's all about how the different pipelines will decide to name the parameters but our only uniqueness guarantee is the one enforced by the db (Name + SourceFrom)

@ridz1208
Copy link
Collaborator

@driusan I think the better fix here would be to make sure this script is looking at instrument fields in parameter_type. What do you think?

@driusan
Copy link
Collaborator

driusan commented Apr 17, 2024

I still don't understand. There are 2 issues:

  1. What do EEG/MRI pipelines have to do with this script? This script parses and generates parameter_type entries for the Instruments category and pipelines shouldn't be categorized as Instruments.
  2. I don't see why the the constraint is (SourceFrom, Name) in the parameter_type schema. That change appears to have been introduced without explanation in PR#7503 (which only talks about the same parameter being duplicated across different IDs, not different IDs having the same name.) "SourceFrom" is just a table name for the "SourceField" which is a field name database field name. They have nothing to do with the name (other than the name sometimes being derived from the table/field name for instruments.) SourceFrom is not a pipeline name which it seems like is being incorrectly assumed here.

@cmadjar
Copy link
Collaborator

cmadjar commented Apr 18, 2024

@driusan in the PR description you mention, it is written that this was discussed during a roadmap meeting so this does not come out of nowhere ;).

How do you categorize things as instrument in the parameter_type table? Maybe from the sourceFrom, if the sourceFrom comes from an instrument table, then build data dictionary?

@cmadjar
Copy link
Collaborator

cmadjar commented Apr 18, 2024

@driusan also, sourcefrom is linking to parameter_file in the case of MRI and physiological_parameter_file tables in the case of EEG. So there are no pipeline names specified in this field but tables used by the imaging pipelines. Maybe the phrasing caused some confusion.

@laemtl
Copy link
Contributor Author

laemtl commented Apr 18, 2024

@cmadjar We can join the table parameter_type_category to filter the instruments.

@laemtl
Copy link
Contributor Author

laemtl commented Apr 18, 2024

@driusan

What do EEG/MRI pipelines have to do with this script? This script parses and generates parameter_type entries for the Instruments category and pipelines shouldn't be categorized as Instruments.

Current script implementation does not distinguish instruments vs other parameters in the first query that collects all the parameters, so the first select in the script is failing since we use pselectColWithIndexKey with a non unique key. This is as far as EEG/MRI is involved.

@laemtl laemtl force-pushed the data_dictionary_builder-unique-constraint-fix branch from c5f69b5 to 40a51a6 Compare April 19, 2024 01:50
@laemtl laemtl requested a review from ridz1208 April 19, 2024 01:54
@driusan driusan merged commit 43f53e9 into aces:main Apr 23, 2024
28 checks passed
@ridz1208 ridz1208 added this to the 26.0.0 milestone Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed manual tests PR has been successfully tested by at least one peer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants