Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support /tracker/event?dataElementIdScheme DHIS2-14968 #19153

Merged
merged 6 commits into from
Nov 15, 2024

Conversation

teleivo
Copy link
Contributor

@teleivo teleivo commented Nov 14, 2024

support /tracker/events?dataElementIdScheme

  • remove interface EventStore and let DefaultEventService depend directly on the concrete implementation. As with other interfaces we've removed: only add an interface to isolate for testing purposes or if there are multiple implementations. None of these cases apply here.
  • add TrackerIdSchemeParams to EventOperationParams and EventQueryParams so we can modify the SQL query only if the dataElementIdScheme is not UID

Query and EventDataValues

EventDataValues are stored as JSONB in the event.eventdatavalues column. See https://dhis2.atlassian.net/browse/DHIS2-4477 for details on why this came to be. This means an event can have multiple EventDataValue in one event row.

EventDataValue.dataElement is of type String. Unlike other types of ours which have a reference to a Metadata object.

The JSONB is an object with a dataValue per dataElement UID

{
  "GieVkTxp4HH": {
    "value": "73",
    "created": "2015-04-21T14:05:54.648",
    "storedBy": "[Unknown]",
    "lastUpdated": "2015-04-21T14:05:54.648",
    "providedElsewhere": false
  },
...
}

Luckily this bug https://dhis2.atlassian.net/browse/DHIS2-10018 was fixed and we can rely on the dataElement keys being UIDs only.

We need to join with the dataelement table to get access to code, name and attributevalues. This means we need to extract/expand the eventdatavalues into individual eventdatavalue objects. This leads to a event x eventdatavalues of rows. Like we would with the classical normalized model of tables event, eventdatavalues and event_eventdatavalues:

left join
    lateral jsonb_each(
        coalesce(event.ev_eventdatavalues, '{}')
    ) as eventdatavalue(dataelement_uid, value)
    on true
left join dataelement de on de.uid = eventdatavalue.dataelement_uid
where eventdatavalue.dataelement_uid is not null

This is the rough anatomy of an event query

select *
from
    (
        select
            ev.uid as ev_uid,
            ou.uid as orgunit_uid,
            ou.code as orgunit_code,
            ou.name as orgunit_name,
            ou.attributevalues as orgunit_attributevalues,
...
        from event ev
        inner join enrollment en on en.enrollmentid = ev.enrollmentid
        inner join program p on p.programid = en.programid
        inner join programstage ps on ps.programstageid = ev.programstageid
        ...
        order by ev_id desc
        limit 50
        offset 0
    ) as event
left join
    (
        select
            evn.eventid as evn_id,
            n.noteid as note_id,
            n.notetext as note_text,
...
        from event_notes evn
        inner join note n on evn.noteid = n.noteid
...
    ) as cm
    on event.ev_id = cm.evn_id
order by ev_id desc

Note that the subquery (with limit/offset) cannot have more than one row per event so pagination using offset and limit is correct. First attempt was changing the main subquery which was too slow as it was working on all events #19123. This change is to the parent query (as we do with notes) so events have already been filtered to one event or a page in most cases. This means we then need to "aggregate" the result set like we do with notes when idScheme != UID.

Latency

These measurements are made using a single curl request to a local instance with no other traffic. They do give an indication of the effect of the changed query on latency though.

dataElementIdScheme other than UID increases latency significantly for large page sizes. This is what we have to face if we align ACL for dataValues in /tracker/events and /tracker/events/{uid}.

Query Parameters PageSize Total Time (ms) Download Size
idScheme=UID 50 51.27 115.71 KiB
idScheme=NAME&dataElementIdScheme=UID 50 50.11 117.39 KiB
idScheme=NAME 50 57.49 49.91 KiB
idScheme=UID 10000 2620 12.83 MiB
idScheme=NAME&dataElementIdScheme=UID 10000 2890 13.35 MiB
idScheme=NAME 10000 4550 6.03 MiB

The difference in download size with idScheme=NAME is likely due to some metadata not having a name which right now will not be returned.

Next

  • deal with default category option/combo options when it comes to attribute values as they do not have any and users cannot add any
  • return an error reporting what metadata does not have the requested idScheme
  • return metadata in relationships using the correct idScheme

@teleivo teleivo force-pushed the DHIS2-14968-datavalues-disaggregate branch from 94ae399 to 41290b2 Compare November 14, 2024 09:18
@teleivo teleivo force-pushed the DHIS2-14968-datavalues-disaggregate branch from 47998ec to 53412c4 Compare November 15, 2024 04:45
@teleivo teleivo force-pushed the DHIS2-14968-datavalues-disaggregate branch from 53412c4 to b5cdaaa Compare November 15, 2024 04:46
@teleivo teleivo marked this pull request as ready for review November 15, 2024 05:06
@teleivo teleivo requested review from a team as code owners November 15, 2024 05:06
Copy link

sonarcloud bot commented Nov 15, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
1 New issue
1 New Code Smells (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants