WIP: n-dimensional support #20

JonjonHays · 2019-10-15T07:32:29Z

This PR is in-progress, however it could potentially be merged if necessary/desired.

Changes in this PR

N-Dimensional Support for the Back-end Server Extension

Replaced the row and col query parameters with a single select parameter
- The select param is a string and uses the same syntax and semantics as h5py, and h5serv. E.g, ?select=[0:10, 5:10, :, 7]. select determines an n-dimensional hyperslab selection of a dataset.
select is always optional. If it is not specified for calls to the data endpoint, the entire dataset will be returned. If it is not specified for calls to the contents endpoint, the data property of the returned dsetContentDict will be None. It might make sense to change this last rule so there is an easy way to have the entire dataset included in the dsetContentDict, rather than having to construct the query string. Maybe select=all, or a boolean query param on the contents endpoint that indicates we want actual data included in the response.
The api.yaml file has not been updated to reflect these changes yet. I imported the current file in Swagger's online editor, but I was getting a lot of compiling errors. Are these errors expected? Is the Swagger online editor a good way to go about this? (this is my first experience working with an OpenApi specification)

Front-end Compatibility for the N-Dimensional Back-end Changes

Minimal changes to the front end to restore working state after making changes to the back-end
These changes don't actually implement n-dimensional support to the front end, but they do (as far as I know) restore existing functionality (i.e, displaying 2d data and 2d slicing via the Slice UI field)

Next Steps

N-Dimensional Support for the Front-end JupyterLab Extension

I haven't gone too deep into the front end just yet, so I'm mostly including this section as a placeholder. Please feel free to edit with a more comprehensive roadmap.

Currently the extension is requesting extraneous data on higher dimensions, and it’s causing some hefty lag. This will be resolved if we implement a default slice, as @telamonian suggested, that selects the full length of the first two dimensions, and the 0th index of all higher dimensions.
- It might make sense to disallow slices that request extraneous data. E.g, [:, :, :, 0, 0, etc].
Tentative: Modify/extend the ISlice interface to handle the construction of the select query string.
...

…anges

…into merger

JonjonHays · 2019-11-02T23:59:58Z

A bit of a followup: it turns out my time is needed elsewhere on our project since we need to get to pilot by December, so I unfortunately won't have time to implement n-d support for the frontend plugin. Of course, feel free to request changes on this PR, and ping me with any questions on my changes. I plan to write some unit tests for the server extension at some point, and I will push those after I get them in place. Cheers!

…e hyperslabSlices list instead of a slice object

…re_nan flag when serializing the payload

telamonian

Thanks for the PR! It's great that we were both thinking along the same lines in terms of implementing nD support. There's a lot of cool stuff that I think this extension will be able to do, but it's going to take a lot of work, and I would be very happy to have you involved as a member of the jupyterlab-hdf5 community.

Overall I think that the thrust of this PR is good. My one big problem with it is how throwing all of the slice info into a single string in the API makes it so that a complex parser is needed in the backend.

Partly, I think the problem here is that we're trying to shove a list-of-slice into a uri query parameter. It's not a great fit. I've now come up with a simpler API; basically, the list-of-slice will instead be passed around as a JSON object within the body of a POST request. The new API also now has some interactive docs.

telamonian · 2019-12-05T11:37:24Z

jupyterlab_hdf/baseHandler.py

        try:
-            self.finish(json.dumps(self.manager.get(path, uri, row, col)))
-
+            self.finish(simplejson.dumps(self.manager.get(path, uri, select), ignore_nan=True))


The NaN handling issue was a good catch.

Ultimately I want to try the plan that @kgryte sketched out in #22, but that can be a future PR

I trust that the reviver approach is probably the best option for handling this on the front end, but is there a reason to avoid handling it (or an option to) on the server?

telamonian · 2019-12-05T12:18:44Z

jupyterlab_hdf/util.py

+                msg = "Bad Request: invalid selection parameter (can't convert to int) for dimension: " + str(dim)
+                raise HTTPError(400, reason=msg)
+
+        if start < 0 or start > extent:


Here's a good example of why you generally shouldn't write your own parsing function. If we're claiming to support Numpy-like syntax, then both negative indices and indices out of bounds should be treated as valid in slices.

telamonian · 2019-12-05T12:32:57Z

jupyterlab_hdf/util.py

@@ -55,3 +65,88 @@ def uriJoin(*parts):

 def uriName(uri):
    return uri.split('/')[-1]
+
+
+def _getHyperslabSlices(dsetshape, select):


One of my big design goals in how I put the frontend, API, and backend together was to avoid the need for any complex parsing function. They're never worth the trouble, there's almost always an edge case or three that you miss, and maintenance is a pain.

Instead, what I'd really like to do is make sure that the frontend is structured such that the slices it asks for don't require anything other than some simple splitting, if even that. I was even thinking that it might be good to split up the slice input box into n segments given a nD dataset (or it might be really annoying; I need to try that one out). The job of the API and backend is then to not lose the information that gets handed to them (by, eg concating all the info on all the slices into a single string).

If you do ever really want a parsing function like this, I'd strongly recommend against rolling your own. Instead, you should search for prior art, or even better, come up with something that works off of ast.literal_eval (which is safe, so long as you can tolerate the possibility of a user causing a segfault at will).

Agreed. I think sending a list of slices in a post body as you've proposed is a cleaner approach. The idea for the select query param came directly from the HDF group's API, so it seemed like a good idea at the time. The complexity of the parser did get out of hand pretty quick, and as you mentioned is error prone.

telamonian · 2019-12-05T12:34:50Z

src/hdf.ts

-  // require the uri, row, and col query parameters
-  const { fpath, uri, row, col } = parameters;
+  // require the uri query parameter, select is optional
+  const { fpath, uri, ...select } = parameters;


The ... doesn't really mean optional in typescript. Check out https://basarat.gitbooks.io/typescript/docs/spread-operator.html

Oops. I'm familiar with spread syntax -- not sure why I have it there!

telamonian · 2019-12-05T12:42:41Z

jupyterlab_hdf/util.py

+
+    if not select.startswith('['):
+        msg = "Bad Request: selection query missing start bracket"
+        raise HTTPError(400, reason=msg)


It's good practice/encapsulation for a function like this to raise semantic errors (in this case, probably a ValueError). It would then be the responsibility of eg HdfBaseManager (the thing that's actually interacting directly with the HTTP requests) to catch said semantic errors and raise HTTPError as appropriate

Yes that makes sense, good call!

JonjonHays · 2019-12-06T09:10:24Z

Thanks for the very thoughtful feedback @telamonian! I'll look into this more and give a proper reply when I have a bit more time, cheers!

JonjonHays · 2019-12-08T02:39:28Z

Let me know if I should make any fixes to the existing code added in this PR. I'm thinking not since we will be ditching the parser and select param.

telamonian · 2020-10-12T17:12:15Z

If you do ever really want a parsing function like this, I'd strongly recommend against rolling your own. Instead, you should search for prior art, or even better, come up with something that works off of ast.literal_eval (which is safe, so long as you can tolerate the possibility of a user causing a segfault at will).

So I finally went and wrote a proper AST parser for arbitrary numpy-style index expressions: Quansight-Labs/ndindex#91. Turns out that parsing a formal grammar is actually pretty straightforward, and I was able to do most of what I needed just by copying the implementation of ast.parse_literal and then making some tweaks. Best of all, since this will be a contribution to the ongoing ndindex project there will be a number of very component folks helping me to validate the basic functionality of literal_eval_index.

@JonjonHays So what I'm going to do is cherry pick a couple of the commits from this PR (in particular the json fixes, nice work btw), and then rewrite a bunch of stuff to take full advantage of literal_eval_index

JonjonHays · 2020-10-14T21:37:47Z

So I finally went and wrote a proper AST parser for arbitrary numpy-style index expressions: Quansight/ndindex#91. Turns out that parsing a formal grammar is actually pretty straightforward, and I was able to do most of what I needed just by copying the implementation of ast.parse_literal and then making some tweaks. Best of all, since this will be a contribution to the ongoing ndindex project there will be a number of very component folks helping me to validate the basic functionality of literal_eval_index.

@JonjonHays So what I'm going to do is cherry pick a couple of the commits from this PR (in particular the json fixes, nice work btw), and then rewrite a bunch of stuff to take full advantage of literal_eval_index

Nice @telamonian! Great to see this work out and continue moving along. Hope all is well -- cheers Max!

telamonian · 2020-11-20T02:59:09Z

The new PR over at #41 has now implemented fully working n-dimensional support in both frontend and backend, so I'll close this PR in favor of that one

JonjonHays added 4 commits October 14, 2019 17:23

WIP jupyterlab#4: Adds n-dimensional support to backend server extension

759fa3d

WIP jupyterlab#4: Frontend compatibility for n-dimensional backend ch…

013e9f4

…anges

Merge branch 'master' of https://github.com/jupyterlab/jupyterlab-hdf5 …

24ef479

…into merger

Updated select param documentation

c6a8670

JonjonHays added 2 commits November 4, 2019 18:05

Bug fix: for single index slice parameters, use just an integer in th…

2fcf0d3

…e hyperslabSlices list instead of a slice object

Produce valid JSON in the presence of NaN or inf values by using igno…

36dd05c

…re_nan flag when serializing the payload

telamonian reviewed Dec 5, 2019

View reviewed changes

telamonian mentioned this pull request Oct 12, 2020

API YaML needs update #39

Closed

telamonian closed this Nov 20, 2020

telamonian added the type:enhancement label Dec 15, 2020

telamonian added this to the 0.5.0 milestone Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: n-dimensional support #20

WIP: n-dimensional support #20

JonjonHays commented Oct 15, 2019

JonjonHays commented Nov 2, 2019

telamonian left a comment •

edited

Loading

telamonian Dec 5, 2019

JonjonHays Dec 8, 2019

telamonian Dec 5, 2019

telamonian Dec 5, 2019

JonjonHays Dec 8, 2019

telamonian Dec 5, 2019

JonjonHays Dec 8, 2019

telamonian Dec 5, 2019

JonjonHays Dec 8, 2019

JonjonHays commented Dec 6, 2019

JonjonHays commented Dec 8, 2019 •

edited

Loading

telamonian commented Oct 12, 2020

JonjonHays commented Oct 14, 2020

telamonian commented Nov 20, 2020

WIP: n-dimensional support #20

WIP: n-dimensional support #20

Conversation

JonjonHays commented Oct 15, 2019

Changes in this PR

N-Dimensional Support for the Back-end Server Extension

Front-end Compatibility for the N-Dimensional Back-end Changes

Next Steps

N-Dimensional Support for the Front-end JupyterLab Extension

JonjonHays commented Nov 2, 2019

telamonian left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonjonHays commented Dec 6, 2019

JonjonHays commented Dec 8, 2019 • edited Loading

telamonian commented Oct 12, 2020

JonjonHays commented Oct 14, 2020

telamonian commented Nov 20, 2020

telamonian left a comment •

edited

Loading

JonjonHays commented Dec 8, 2019 •

edited

Loading