-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
example #2
example #2
Conversation
cc #1 |
Wow that was fast! 😲 Amazing work Martin!
I don't see much of a way around this.
Agreed: the There is no automatic chunking of single files in In my original siphon to xarray example, I loaded many individual OpenDAP data files into a single xarray dataset via
I think that's fine.
👏
Agreed, thredds is probably more generic and recognizable. |
This is great! I agree with @rabernat that calling this a intake-thredds might be clearer. Also, I will second @rabernat's question about the |
This is cool! Going to check out your code when I get a chance.
However I'm not sure about use in practice, my experience when hitting
opendap servers with open_mfdataset and subsetting with multiple dask
workers is that the thredds server dies or starts to timeout. Usually end
up making a mirror(!)
Hoping to do some comparisons between Netcdf on thredds, minio S3 and
Amazon S3 via FUSE in the coming months.
Wondering about any more recent results from
https://github.com/pangeo-data/storage-benchmarks
…On Sun., 13 Jan. 2019, 6:35 am Kevin Paul ***@***.*** wrote:
This is great!
I agree with @rabernat <https://github.com/rabernat> that calling this a
intake-thredds might be clearer.
Also, I will second @rabernat <https://github.com/rabernat>'s question
about the to_dask() method. I believe that some of @martindurant
<https://github.com/martindurant>'s thoughts on this can be found here:
intake/intake-xarray#26
<intake/intake-xarray#26>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AM3bQOjN1j8c822yucKUqbSfdMEnq5Djks5vCmMxgaJpZM4Z8Jty>
.
|
I have had the opposite experience. Well configured TDS servers seem to be able to handle many simultaneous request. I have some experiments in this binder:
This project was abandoned by the intern who was working on it and is not going anywhere. However, I have done some of my own benchmarking. Here is throughput from an ESGF THREDDS server running in google cloud, accessed in parallel via xarray / dask: Here is the same access pattern using zarr with directly cloud storage. You can see that the direct approach gets orders of magnitude higher throughput. More here: https://speakerdeck.com/rabernat/cloud-native-climate-data-with-zarr-and-xarray |
Is there a way that I could have known (attached metadata or something) that this was the final level and that the entries below it formed a coordinate grid? |
With intake/intake#229, the following syntax does work:
|
This looks really cool! I'd echo the sentiment about In siphon, I think we're looking at adding some syntax for walking through the catalog more simply (Unidata/siphon#263). Not sure if any similar ideas are applicable to the intake world. |
@dopplershift , those ideas for walking the catalog will work as-is with the code here, except that "cat" is an Intake catalogue (but every instance has the siphon cat as an attribute too). Since the names are not valid python identifiers, you cannot use tab-completion, but if they were you could. |
@martindurant IPython has hooks that also enable completion of dictionary keys within |
@dopplershift , no I was not saying that. Do you know how ipython fetches the set of potential completions? |
Looks like you can define |
^ Done in master |
@andersy005 , can we change the name of this repo to intake-thredds? I'll change the name of the python package in this PR, and I think we should merge this near to what is here, so that people can try it, and we can iterate over ideas like @rabernat 's about merging several like sources using xarray. |
@martindurant, this is done. |
Thanks @andersy005 . I have globally renamed things within the repo, including places where it probably doesn't matter. What do you think remains to be done in this PR? I'd be keen to merge earlier, so that we can get something up and usable for experimentation. |
This looks good, @martindurant! Are you planning on adding tests to this PR? If not, that's also fine. We can merge this and iterate on tests in future PRs. |
I do not know where to go for tests, it doesn't seem like a good idea to test against live thredds servers which may change or go down without notice. |
Sounds good. I am going to merge this. By the way, I added you as an Admin to the repo. |
For siphon, we’ve used vcrpy as a way to record web requests to e.g. THREDDS servers and play them back for testing purposes. |
I have used vcrpy for gcsfs, and find it an immense pain to work with! |
An alternative may be to use pydap to start up a lightweight opendap server. Pydap also serves THREDDS metadata. Since it is pure python, it can be launched in a range of ways that are compatible with testing environments. |
Thanks @rabernat , that sounds like the best option then |
@martindurant It’s interesting that you feel that way. In the scheme of things that cause me pain in testing and maintaining a CI system, vcrpy doesn’t crack my top 20. Would love to know more (but don’t want to belabor the point). As far as pydap is concerned, you’re now introducing a 3rd party package here to stand in as a mock for TDS—one whose goal isn’t to be a TDS, just serve THREDDS-compatible catalogs. I feel like this has the potential to be shaking out issues in pydap rather than test intake-thredds. Just $0.02 from someone whose not signing up for more work. 😉 |
!! It may well be that my vcr setup for gcsfs is not as intended, but I first came across it in the context of adlfs (azure-datalake-storage), and the collaborators there claimed to know vcr well, and I mostly copied their prescription. Perhaps imperfectly. |
It’s entirely possible our use of vcrpy is too simplistic to encounter the pain points. |
Works:
Notes:
[...]()
syntax is an artefact of having reference names that aren't valid python identifiers. It is certainly possible to get rid of the parentheses and maybe to include the chain as[.., ..]