Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open questions about the ISA JSON Model #4

Open
asishallab opened this issue Jun 26, 2023 · 4 comments
Open

Open questions about the ISA JSON Model #4

asishallab opened this issue Jun 26, 2023 · 4 comments

Comments

@asishallab
Copy link

asishallab commented Jun 26, 2023

Open Questions

When parsing and reading through the ISA JSON Model a few questions arose. They are listed here.

How to treat properties of type object

In some cases BrApi JSON data models have properties of type object. We can model them in Zendro in a number of ways.

  • We could create a separate data model for these properties and instantiate a e.g. one-to-many association.
  • We could serialize these objects and store them as a JSON string.

Probably this should be decided on a case-by-case level?

Example:
additionalInfo and additionalProperties e.g. in Person.json.

Structure of additionalInfo

The definition taken from Person.json says:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "properties": {
        "additionalInfo": {
            "additionalProperties": {
                "type": "string"
            },
            "description": "Additional arbitrary info",
            "type": [
                "null",
                "object"
            ]
        },

So, according to this specification, a person can have additional info. But, what is the structure of this object? The object additionalInfo can have a number of additionalProperties that are of type string?

  • But if so, why don't we see an array of these properties?
  • Or are additionalProperties a collection of key and value pairs that can store any information? In that case, we cannot provide a schema, but must use serialized JSON.

Reply from meeting with the BrApi group

additionalInfo should be the only case, where we see non formatted data. In the BrApi test server we serialize and store this object as JSON.

How to model externalReferences?

The array of external references is found in the Person model:

"externalReferences": {
            "description": "An array of external reference ids. These are references to this piece of data in an external system. Could be a simple string or a URI.",
            "items": {
                "properties": {
                    "referenceId": {
                        "description": "The external reference ID. Could be a simple string or a URI.",
                        "type": [
                            "null",
                            "string"
                        ]
                    },
                    "referenceSource": {
                        "description": "An identifier for the source system or database of this reference",
                        "type": [
                            "null",
                            "string"
                        ]
                    }
                },
                "required": [
                    "referenceId",
                    "referenceSource"
                ],
                "type": "object"
            },
            "title": "ExternalReferences",
            "type": [
                "null",
                "array"
            ]
        },

There are several question about this specification:

  • According to the array specification, external reference properties referenceId and referenceSource are marked as required, but in their type specification null is allowed.
  • Shall we model this as a separate data model or store such information serialized as a JSON string?

Response from the BrApi development team

  • Ideally we use a separate data model ExternalReference
  • the referenceId can be e.g. a DOI URL
  • the source would then be Page & Holmes 2012, Inferring Phylogenies (actually this example does not exist)

Another example would be the field-book-App:

  • collects data and sends that data to a database
  • referenceId is the internal DB id (primary key)
  • and the source here would be field-book

Validation

Zendro has the capability to use any validation function on provided data. The Zendro framework can validate both data formats (syntactically) and data values (semantically). However, if the database has to be queried, we should consider whether this might be a performance bottleneck.

Example taken from Sample.json:

"column": {
            "description": "The Column identifier for this `Sample` location in the `Plate`",
            "maximum": 12,
            "minimum": 1,
            "type": [
                "null",
                "integer"
            ]
        },

Questions:

  • What kind of validations are provided in the whole of the BrApi specification?
  • Is there way to automatically recognize validation functions?
    If we can recognize validations and the respective function to carry out on data values, we can implement them in Zendro, and generate them automatically.

Relationships / Associations

For some associations we see the foreign keys implemented in the JSON Specs, e.g. in Sample.json:

 "germplasmDbId": {
            "description": "The ID which uniquely identifies a `Germplasm`",
            "type": [
                "null",
                "string"
            ]
        },
        "observationUnitDbId": {
            "description": "The ID which uniquely identifies an `ObservationUnit`",
            "type": [
                "null",
                "string"
            ]
        },
        "plateDbId": {
            "description": "The ID which uniquely identifies a `Plate` of `Sample`",
            "type": [
                "null",
                "string"
            ]
        },

Here, we can conclude from the name of the foreign key and its existence:

  • what type of relationship (here many-to-[one|many, i.e. many samples belong to probably many (not one) germplasm)
  • what type the related data model has (here: Germplasm).

However a formal specification of all relationships would be extremely helpful and resolve open questions.

To be excluded properties

In some data models foreign keys are stated. Also, to spare the user to send another request to the RESTful API some of the properties of the associated (relationship) models are stored, too. See this example taken from Sample.json:

        "plateDbId": {
            "description": "The ID which uniquely identifies a `Plate` of `Sample`",
            "type": [
                "null",
                "string"
            ]
        },
        "plateName": {
            "description": "The human readable name of a `Plate`",
            "type": [
                "null",
                "string"
            ]
        },

Using GraphQL these properties are not required. GraphQL specifically allows to fetch within a single HTTP-Request all data the user wants, including properties of related (associated) data models. Furthermore, given we at some point have a formal description of relationships between data models, foreign keys would ideally no longer be listed among data model definitions.
Is there a way, we can recognize these "to be excluded" properties and not include them in the final GraphQL data model definitions. An easy quick and dirty solution would be a simple exclusion list?

Response from the GraphQL development group

  • Everything that terminates in DbId is either a primary or a foreign key.
  • A simple exclusion list could be a JSON file of data model names (keys) and arrays of properties to be excluded (values)
@LzLang
Copy link
Owner

LzLang commented Nov 29, 2023

Request for uniforming the way relationships/associations are implemented.

Almost all schemes use a uniform format for the associations, for example observationVariables from Study.json:

{
    "$defs": {
        "Study": {
            "properties": {
                "observationVariables": {
                    "description": "The list of Observation Variables being used in this study. \n\nThis list is intended to be the wishlist of variables to collect in this study. It may or may not match the set of variables used in the collected observation records. ",
                    "items": {
                        "$ref": "ObservationVariable.json#/$defs/ObservationVariable"
                    },
                    "referencedAttribute": "studies",
                    "relationshipType": "many-to-many",
                    "type": "array"
                }
            }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Core/Study.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

As you can see the the association is a property itself therefore there is no problem in automatic converting the relationships.
While working on associations I noticed that in all 3 associations are defined differently than all others, these are:

All of these three associations are defined differently than the others.
They are not defined as own property but rather as a nested property, as example: parentGermplasm:

{
    "$defs": {
        "PedigreeNode": {
            "properties": {
                "parents": {
                    "description": "A list of parent germplasm references in the pedigree tree for this germplasm. These represent edges in the tree, connecting to other nodes.\n<br/> Typically, this array should only have one parent (clonal or self) or two parents (cross). In some special cases, there may be more parents, usually when the exact parent is not known. \n<br/> If the parameter 'includeParents' is set to false, then this array should be empty, null, or not present in the response.",
                    "items": {
                        "properties": {
			                "parentGermplasm": {
			                    "$ref": "Germplasm.json#/$defs/Germplasm",
			                    "description": "The ID which uniquely identifies a parent germplasm",
			                    "referencedAttribute": "progenyPedigreeNodes",
			                    "relationshipType": "many-to-one"
			                },
                            "parentType": {
                                "description": "The type of parent used during crossing. Accepted values for this field are 'MALE', 'FEMALE', 'SELF', 'POPULATION', and 'CLONAL'. \n\nIn a pedigree record, the 'parentType' describes each parent of a particular germplasm. \n\nIn a progeny record, the 'parentType' is used to describe how this germplasm was crossed to generate a particular progeny. \nFor example, given a record for germplasm A, having a progeny B and C. The 'parentType' field for progeny B item refers \nto the 'parentType' of A toward B. The 'parentType' field for progeny C item refers to the 'parentType' of A toward C.\nIn this way, A could be a male parent to B, but a female parent to C. ",
                                "enum": [
                                    "MALE",
                                    "FEMALE",
                                    "SELF",
                                    "POPULATION",
                                    "CLONAL"
                                ],
                                "type": "string"
                            }
                        },
                        "required": [
                            "germplasmDbId",
                            "parentType"
                        ],
                        "type": "object"
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                }

            }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/PedigreeNode.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

As you can the the association is here defined as a nested property of the property parents.
The converter ignores nested properties therefore is the association also ignored.

Is there a possibility to uniform the format of the associations and define them as individual property?
This would be very helpful!

@BrapiCoordinatorSelby
Copy link

The siblingsGermplasm is something I can fix immediately. But parentGermplasm and progenyGermplasm are a little bit tricky. While they are referencing an array of Germplasm elements, they also need the additional metadata parentType associated with each Germplasm. I think we need some kind of polymorphism for the Germplasm entity in this case.
I think the model proposed in this Blog post might work for us: https://json-schema.org/blog/posts/modelling-inheritance

It would look something like this:

{
    "$defs": {
        "PedigreeNode": {
            "properties": {
                "parents": {
                    "description": "A list of parent germplasm referen...,",
                    "referencedAttribute": "progenyPedigreeNodes",
                    "relationshipType": "many-to-one",
                    "items": {
                        "type": "object",
                        "$ref": "Germplasm.json#/$defs/Germplasm",
                        "properties": {
                            "parentType": {
                                "description": "The type of parent used du... ",
                                "enum": ["MALE", "FEMALE","SELF", "POPULATION", "CLONAL" ],
                                "type": "string"
                            }
                        },
                        "required": ["germplasmDbId","parentType"],
                    },
                    "type": ["null","array"]
                }
            }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/PedigreeNode.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

@LzLang Will this work for Zendro? Will it be able to pick up the reference to Germplasm AND keep the additional property parentType? I don't know how polymorphism works with GraphQL...

@asishallab
Copy link
Author

asishallab commented Mar 15, 2024

Notes on how to resolve the above issue(s)

"Nested relationships"

So, any one-to-one or one-to-many relation to objects that do not have a separate data model definitions we dub "nested". It'd be helpful to discontinue usage of such nested relationships and rather have separate JSON data model definitions for those and then define the relationships as in all cases.

List of explicitly defined foreign keys

Some data models have foreign keys stated, which should be excluded from the "standard" data model definition. @LzLang will provide us with a list of these keys in order to remove them from the JSON model definitions.

Note that currently in the context of automated data warehouse generation with Zendro, we automatically create foreign keys for each association.

"Compound foreign keys"

In Zendro with only support single foreign keys, of course we could have one for the mother germplasm id and another one for the father. This would be a solution everywhere where we know how many associations we have to the same data model.

@LzLang
Copy link
Owner

LzLang commented Apr 16, 2024

Possible Solution for nested properties

Hello @BrapiCoordinatorSelby ,

we worked on the nested properties issue and tried to separate those into different/there own models.
We used your Cross.json schema and modified it.
Could you please review the idea and tell us your opinion?
Basically we have to modify the schema manually.

Cross.json now (condensed to the changed attributes):

{
    "$defs": {
        "Cross": {
            "properties": {
                "crossAttributes": {
                    "referencedAttribute": "cross",
                    "relationshipType": "one-to-many",
                    "items": {
                        "$ref": "CrossAttribute.json#/$defs/CrossAttribute",
                        "description": "Set of custom attributes associated with a cross"
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                },
                "externalReferences": {
                    "referencedAttribute": "cross",
                    "relationshipType": "one-to-many",
                    "items": {
                        "$ref": "CrossExternalReferences.json#/$defs/CrossExternalReferences",
                        "description": "An array of external reference ids. These are references to this piece of data in an external system. Could be a simple string or a URI."               
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                },
                "parent1": {
                    "$ref": "Germplasm.json#/$defs/Germplasm",
                    "description": "the unique identifier for a germplasm",
                    "referencedAttribute": "parent1Childs",
                    "relationshipType": "many-to-one"
                },
                "parent2": {
                    "$ref": "Germplasm.json#/$defs/Germplasm",
                    "description": "the unique identifier for a germplasm",
                    "referencedAttribute": "parent2Childs",
                    "relationshipType": "many-to-one"
                },
               "pollinationEvents": {
                    "referencedAttribute": "cross",
                    "relationshipType": "one-to-many",
                    "items": {
                        "$ref": "CrossPollinationEvent.json#/$defs/CrossPollinationEvent",
                        "description": "The list of pollination events that occurred for this cross"
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                }
            },
            "required": [
                "crossDbId"
            ],
            "title": "Cross",
            "type": "object"
        }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/Cross.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

We created the following models:

  • CrossAttributes
  • CrossExternalReferences
  • CrossPollinationEvent

CrossAttributes:

{
    "$defs": {
        "CrossAttribute": {
            "properties": {
                "cross_attribute_ID": {
                    "description": "the unique identifier for a cross attribute",
                    "type": "string"
                },
                "crossAttributeName": {
                    "description": "the human readable name of a cross attribute",
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "crossAttributeValue": {
                    "description": "the value of a cross attribute",
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "cross": {
                    "$ref": "Cross.json#/$defs/Cross",
                    "description": "The unique identifier for a Cross",
                    "referencedAttribute": "crossAttributes",
                    "relationshipType": "many-to-one"
                }
            },
            "required": [
                "cross_attribute_ID"
            ],
            "title": "CrossAttribute",
            "type": "object"
        }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/CrossAttribute.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

CrossExternalReferences

{
    "$defs": {
        "CrossExternalReferences": {
            "properties": {
                "reference_ID": {
                    "description": "The external reference ID. Could be a simple string or a URI.",
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "referenceSource": {
                    "description": "An identifier for the source system or database of this reference",
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "cross": {
                    "$ref": "Cross.json#/$defs/Cross",
                    "description": "The unique identifier for a Cross",
                    "referencedAttribute": "externalReferences",
                    "relationshipType": "many-to-one"
                }
            },
            "required": [
                "reference_ID"
            ],
            "title": "CrossExternalReferences",
            "type": "object"
        }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/Cross.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

CrossPollinationEvent

{
    "$defs": {
        "CrossPollinationEvent": {
            "properties": {
                "pollination_ID": {
                    "description": "The unique identifier for this pollination event",
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "pollinationSuccessful": {
                    "description": "True if the pollination was successful",
                    "type": [
                        "null",
                        "boolean"
                    ]
                },
                "pollinationTimeStamp": {
                    "description": "The timestamp when the pollination took place",
                    "format": "date-time",
                    "type": [
                        "null",
                        "string"
                    ]
                },
                "cross": {
                    "$ref": "Cross.json#/$defs/Cross",
                    "description": "The unique identifier for a Cross",
                    "referencedAttribute": "pollinationEvents",
                    "relationshipType": "many-to-one"
                }

            },
            "required": [
                "pollination_ID"
            ],
            "title": "CrossPollinationEvent",
            "type": "object"
        }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/CrossPollinationEvent.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

In the original Cross model, there were 2 special nested properties "parent1" and "parent2".
Those properties were basically just a association to Germplasm to link the parents.
Instead of creating a separate model for those 2 properties, we just created an association to Germplasm.json
Cross:

               "parent1": {
                    "$ref": "Germplasm.json#/$defs/Germplasm",
                    "description": "the unique identifier for a germplasm",
                    "referencedAttribute": "parent1Childs",
                    "relationshipType": "many-to-one"
                },
                "parent2": {
                    "$ref": "Germplasm.json#/$defs/Germplasm",
                    "description": "the unique identifier for a germplasm",
                    "referencedAttribute": "parent2Childs",
                    "relationshipType": "many-to-one"
                },

Germplasm.json

               "parent1Childs": {
                    "title": "parent1Childs",
                    "description": "Childs of the germplasm",
                    "referencedAttribute": "parent1",
                    "relationshipType": "one-to-many",
                    "items": {
                        "$ref": "Cross.json#/$defs/Cross",
                        "description": "Crosses"
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                },
                "parent2Childs": {
                    "title": "parent2Childs",
                    "description": "Childs of the germplasm",
                    "referencedAttribute": "parent2",
                    "relationshipType": "one-to-many",
                    "items": {
                        "$ref": "Cross.json#/$defs/Cross",
                        "description": "Crosses"
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                }

Way of standardizing primary and foreign keys

Currently primary and foreign keys are defined the same way, e.g. from Cross:

{
    "$defs": {
        "Cross": {
            "properties": {
                "crossDbId": {
                    "description": "the unique identifier for a cross",
                    "type": "string"
                },
                "parent1": {
                    "properties": {
                        "germplasmDbId": {
                            "description": "the unique identifier for a germplasm",
                            "type": [
                                "null",
                                "string"
                            ]
                        }
                    },
                    "type": [
                        "null",
                        "object"
                    ]
                }
            },
            "required": [
                "crossDbId"
            ],
            "title": "Cross",
            "type": "object"
        }
    },
    "$id": "https://brapi.org/Specification/BrAPI-Schema/BrAPI-Germplasm/Cross.json",
    "$schema": "http://json-schema.org/draft/2020-12/schema"
}

The primary key crossDbId and foreign key germplasmDbId are defined the same way.
In our project we defined primary keys like [model]_ID.

  • Cross -> cross_ID
  • AlleleMatrix -> allele_matrix_ID

And for foreign keys we used a similar pattern, for example I use listOwnerPerson from List.json:

                "listOwnerPerson": {
                    "$ref": "Person.json#/$defs/Person",
                    "description": "The unique identifier for a List Owner. (usually a user or person)",
                    "referencedAttribute": "lists",
                    "relationshipType": "many-to-one"
                },

So basically one person can have multiple lists, in Zendro we would define the relationship like:

        "listOwnerPerson": {
            "type": "many_to_one",
            "implementation": "foreignkeys",
            "reverseAssociation": "lists",
            "target": "Person",
            "targetKey": "lists_ids",
            "sourceKey": "list_owner_person_id",
            "keysIn": "List",
            "targetStorageType": "sql"
        }

So our foreign keys are named after the attribute and uses id/ids, depending if it's an array or not.


Standardizing a way of defining associations

Currently BrAPI is using two different ways to define associations.
X-to-many always has the items tag where a description and the reference is noted:

                "observationUnits": {
                    "title": "observationUnits",
                    "description": "observationUnits",
                    "referencedAttribute": "cross",
                    "relationshipType": "one-to-many",
                    "items": {
                        "$ref": "ObservationUnit.json#/$defs/ObservationUnit",
                        "description": "ObservationUnit"
                    },
                    "type": [
                        "null",
                        "array"
                    ]
                }

On the other side many-to-X don't have this nesting

                "crossingProject": {
                    "$ref": "CrossingProject.json#/$defs/CrossingProject",
                    "description": "the unique identifier for a crossing project",
                    "referencedAttribute": "crosses",
                    "relationshipType": "many-to-one"
                },

We don't see a benefit in nesting the reference and giving it a separate description.
Basically you could define this relationship without nesting, like:

                "observationUnits": {
                    "title": "observationUnits",
                    "description": "observationUnits",
                    "referencedAttribute": "cross",
                    "relationshipType": "one-to-many",
                    "$ref": "ObservationUnit.json#/$defs/ObservationUnit",
                    "type": [
                        "null",
                        "array"
                    ]
                }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants