Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query resource #14

Open
ErikSundvall opened this issue Jun 23, 2016 · 16 comments
Open

query resource #14

ErikSundvall opened this issue Jun 23, 2016 · 16 comments
Milestone

Comments

@ErikSundvall
Copy link
Member

There are several issues to discuss regarding how to specify the Query resource. Including but not limited to:

  1. Format of result set (see https://openehr.atlassian.net/wiki/display/spec/AQL+Result+Set+work+area )
  2. Support for other Query formalisms than AQL?
  3. Storing and executing stored queries (see previous discussion at https://openehr.atlassian.net/wiki/display/spec/openEHR+REST+APIs )
  4. Long queries do not fit within the length limit of GET-requests, so there must be a way to POST queries too.
  5. ...
@bostjanl
Copy link
Contributor

  1. In the api spec there are POST as well as GET query calls

@wolandscat
Copy link
Member

I would expect that a POST would make sense for the logical operation of 'register query' which just returns a query handle / id. Then some later request - a GET? - does an 'execute query' with that id.

In a more sophisticated system, a query could be registered and then used to generate push results every x time, or event driven to a named receiver.

Queries that are registered in this way need to somehow time-out and disappear over time, else the query service will fill up pretty fast...

@bostjanl
Copy link
Contributor

We need POST because GET is limited by the URL length limitation. There are some really long queries out there.

@ErikSundvall
Copy link
Member Author

ErikSundvall commented Jun 27, 2016

A "POST to register/store Query"-approach was described in the LiU EEE work, see the section "Querying" (and its subsections) a bit in to the chapter "Implementation" http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-57#Sec13

We used a SHA-1 hash to give every differently formulated query an ID, and stored every* new (not previously seen/hashed) Query in it's original form (as a help to log what has been queried for). That does not take a lot of space and does not need to be done inside the DBMS "stored procedure"-mechanism. Instead admins (or statistics + config rules) can decide wich commonly occuring queries to convert to real DBMS-native stored procedures/queries for improved efficiency.

Also, re-translating from AQL to native can be avoided for repeated queries with identical hash, if you in your implementation also choose to store the translated Query in some form. The translated but seldom used ones could of course be purged after a while if space is a problem.

(In implementations where the standard http-server-log is used as a major part of the system log the original queries need to be kept, since the content of a POST won't be logged.)

I thought the store-query-and-redirect-approach was a good way to adhere to REST-design patterns, and the only approach needed, but from the openEHR-wiki-rest-discussion I reckoned that people thought it might be seen as complicated by some developers and thus a simpler non-storing approach was wanted. When we get to the week for discussin /query in our time-table we could revisit this and discuss pros and cons of different approaches.

Those interested in java code implementing Query storage might want to look in...
https://github.com/LiU-IMT/EEE/tree/master/src/main/java/se/liu/imt/mi/eee/db
...and its subdirectories to see an implementation for a specific database.

The variable-cleaning+ordering and SHA-hashing etc can be found in...
https://github.com/LiU-IMT/EEE/blob/master/src/main/java/se/liu/imt/mi/eee/ehr/res/Query.java
Pretty simple stuff to implement.

Regarding space:
Sendling long queries via GET will store the entire query string in the standard http-log every time, even when the same query is repeated thousands of times. So a POST->store-once->redirect approach might be less space-consuming for many use-cases. Well, zipping logs, will reduce that space in the long run of course, but storing unique POSTed queries neatly (and optionally storing their usage statistics) also has other benefits.

*) "Debug"-marked queries were not stored, they were just translated to native Query language and the translated Query returned instead of executed towards patient data.

@wolandscat
Copy link
Member

@ErikSundvall I like this general approach.
That last point about space is also a good one.

@bjornna
Copy link
Contributor

bjornna commented Jun 30, 2016

We have implemented POST as a GET call to make it work for long queries. This is the same as the search servers do. Like Apache Solr and also Elastic Search. I guess this is a pragmatic way to handle this use-case.

Related to the space and long queries:
We have also implemented a stored query interface. This has the normal CRUD operations and you may use a stored query identifier as input to the query. Such a stored query may have parameters that should be serialized as key/value structure.

@bjornna
Copy link
Contributor

bjornna commented Jun 30, 2016

One important challenge we are facing is to be able to reuse the same query within diffrerent scopes. One implementation of this could be the parameter function. We didn't find that flexible enough. That's why we implemented the concept of query scope. By using query scope you may use the same AQL to query for the latest bloodpressure of a patient and you may put scope as episode of care, folder, etc as an external parameter.

I think this concept is something that should be added to the openEHR service specification.

@wolandscat
Copy link
Member

@bjornna that's a nice idea; can it be extended to make the same query run for 'this patient 1234' or over a population of patients?

@bjornna
Copy link
Contributor

bjornna commented Jun 30, 2016

Related to the GET/POST topic - I think the query resource is something different than a ordinary REST resource. It is by definition a read only resource. You are not able to change the state of the system (the Ehr ) by doing Query.
Given this there would be no problem using both GET and POST.

Thus is of course not true for other resources which by nature is CRUD oriented.

We have two resources to work with stored queries and virtual archetype definitions. They are CRUD based and follows the REST verb pattern. The identifier of these resources may be used in the query resource

@ErikSundvall
Copy link
Member Author

ErikSundvall commented Jun 30, 2016

The hash-mechanism combined with a (redirectable) shortcut/name/alias-mechanism will achieve the same thing as CRUD exept that the U (update) and D (deletion) of a named query will be a logical change of what the shortcut/name points to, rather than a physical change/delete of a stored query since the old query can still be inspected by accessing it via the stored/hash URL - good for for log/auditing purposes.

Parametized stored queries can be fed with parameters via the GET call to the hashed (or named/aliased) URL of the stored query. That is what we did in the above mentioned Java implementation, see excerpt from
https://github.com/LiU-IMT/EEE/blob/master/src/main/java/se/liu/imt/mi/eee/ehr/res/Query.java below

        // Create string (currently compact JSON) with keys and values sorted alphabetically, allows multiple values with same name         
        Iterator<String> it = postedQueryAsForm.getNames().iterator();

        while (it.hasNext()) {
            String name = (String) it.next();
            String[] valueArray = postedQueryAsForm.getValuesArray(name);
            // TODO: Separate static & dynamic variables,
            if (name.startsWith("_")) {
                // POST-ed variable names starting with _ (underscore) will not be stored, 
                // but instead sent on as dynamic parameters in URI after removing first underscore in the variable name
                for (int i = 0; i < valueArray.length; i++) {
                    uriGetQueryAsForm.add(name.substring(1), valueArray[i]);
                    // System.out.println("Querysource.handleFormPost() adding entry: "+name.substring(1) +" = "+ valueArray[i]);
                }                   
            } else {
                // All other posted (static) variables get stored in the query map
                if (valueArray.length != 1) {
                    throw new ResourceException(Status.CLIENT_ERROR_BAD_REQUEST, "Keys for static (stored) query variables need to be unique (duplicates are not allowed). Dynamic variables prepended with _ (underscore) will instead be passed on as a URI query and do not need to be unique.");                    
                }
                sortedMap.put(name, valueArray[0]);
                cleanedStaticForm.add(name, valueArray[0]);             
            }   

I don't know if that is the best approach or not (or if something else than _ should be used as prefix), but for a developer/user writing ad-hoc queries, e.g. in a big form field, the result will be the same no matter if the variable is prefixed with an underscore or not. A developer that on the other hand understands the hashing/storing and knows they'll want some dynamic parameters, can prefix those variables with underscores and then in later repeated calls just use the hashed GET URL with dynamic parameters without passing the POST/store mechanism for subsequent calls.

Please have a look at the section called "Benefits from storing and redirecting POSTed queries"
in http://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/1472-6947-13-57

(While in that paper you can also search for "bookmark" to read about a "bookmark" service in LiU EEE, that could use autogenerated (or manually assigned) names. It would work for many query-use-cases too, but I guess some developers might find that general bookmark-approach overkill or complicated if the thing wanted is only named queries. Also, it might hide "GET"-parameters for named queries unless we specify that any query-parameters added to the bookmark URL should be passed along to the target, in this case an AQL query.)

The "scope"-concept that you write about @bjornna looks interesting, do you have more information and a list of the parameters/restrictions that you have found useful?

@bjornna
Copy link
Contributor

bjornna commented Jun 30, 2016

@erik - you have a working server at http://arenaehr.dips.no:9000/api-doc to test some queries

From the api doc you find the following template for payload in the request . There are several parameters;

  • you may restrict the query to a set of Composition, Ehrs or a set of tags. Tags are actually key/value structures on a Composition. They could be whatever. What we use most is period of care and episode of care.
  • you may also partition the query by a tag. This makes it convenient to get the latest temperature for the last episode for a given patient, or more used : the latest temp for all patients currently at the hospital ( a set of episodes ).

{
"aql": "string",
"compositionUids": [
"string"
],
"ehrIds": [
"string"
],
"tagScope": {
"tags": [
{
"values": [
"string"
],
"tag": "string"
}
]
},
"partitionBy": {
"tag": "string",
"limit": 0
}
}

@bjornna
Copy link
Contributor

bjornna commented Jun 30, 2016

@thomas - I guess the partition question was answered by the comment above. Let me know if I missed something.

@wolandscat
Copy link
Member

Re-reading everything above, it seems to me we should (largely following Erik):

  • avoid filling up the http-log with repeats of long queries
  • use POST to register a query and execute via a GET that supplies params as needed
  • use hashes created from the query templates to id the stored queries

(The current API POST method says: 'Execute an AQL query, but this seems wrong to me).

In this scheme there is no 'one-shot' query execution approach, unless we provide such via another GET, which appears to be what is in the current API. If there is to be one-shot query execute + return results, what are the semantics? Do these queries get registered as well? Are parameters treated separately in the same manner for queries registered via POST and executed via GET?

@wolandscat
Copy link
Member

Another idea, following from the hashing concept described by Erik: it would seem obvious to support some kind of 'query set', that would typically be used to populate a whole form. Query sets could be identified by hashes and stored in the same way as single queries. Doing a GET on a query set would get a table of QueryResults, keyed by individual query hashes.

Crazy idea?

@bjornna
Copy link
Contributor

bjornna commented Aug 19, 2017

No - not crazy at all. I think we have implemented almost this feature with VAQM. Clients may query for several queries in a batch. Use case is I.e. a ward list with a query for each distinct column.

In the query result we provide a correlation identifier to match the results.

Let's look into this when we discuss the Query endpoint.

@ppazos
Copy link
Contributor

ppazos commented Aug 19, 2017 via email

@bostjanl bostjanl added this to the 1.1.0 milestone Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants