-
Notifications
You must be signed in to change notification settings - Fork 0
Allow browse files #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@afitzgerrell here is the list of questions that currently blocks this issue: What are all the allowed browse options (file types)? |
What are all the allowed browse options (file types)? Based on the MimeTypeEnum listing in Earthdata's UMM-G schema, the allowed options are: “image/jpeg”, “image/png”, “image/gif”, “image/tiff”, & “image/bmp” (although the only ones I'm familiar with seeing for browse images are .jpg, .png, and .tif). Although, something to keep in mind is that any of these are also valid data types for data or ancillary files, either stand alone for a single-file granule + browse, or within a multi-file granule + browse (as noted in the CNM documentation under the Product sub-fields / files description and in the UMM-G documentation at DataGranule, Identifiers section). Does browse file size need to go into UMM-G or is browse file size excluded from size summary in UMM-G? Browse are discrete files that should not be considered part of a granule's contents, thus should not be included in the UMM-G SizeInBytes value. Are multiple browse per granule an option that we need to allow? Yes, this would be ideal. Generally, if data producers generate browse images it's typically been one browse per granule. There are, however, data sets where multiple browse images are included for one granule, an example of this is NISE which has two browse, one for each Hemisphere. |
Somehow the operator needs to be indicate what are the browse files or where the browse files are. (Either a separate location for browse or a consistent file name ending) -> @afitzgerrell will discuss with Ops to find out their preference A browse file can not at the same time be a science file as part of the granule. Browse files will be listed in the CNM message Browse files will be listed in the UMM-G as URLs but not part of the file size |
I queried OPS this morning and the consensus is: they'd prefer we (ideally) encourage data producers to include "brws" in the file names for browse images (rather than not altering the names and requiring the browse images to be sequestered in a different directory from the data files). In a pinch, were we to end up with data sans this keyword and a data producer unable or unwilling to rename files, OPS could add it in. |
@afitzgerrell double-checking the plan for browse files: our NSIDC_0081DUCk sample data currently is presented with browse files in a separate directory, but the words in the comments above suggest we will have browse files intermingled with data files, with something like |
@juliacollins I like your suggestion of making the .ini be told what text to recognize as being used to identify browse files!! Seems it'd be a helpful safeguard and allow for flexibility in an inconsistent world of data producer deliveries. Also, if it's not a heavy lift, allowing for the ini to be pointed to a separate browse image directory likely wouldn't hurt as long as it'd be acceptable to allow an ops person to steer the .ini to just look to the /data directory if desired. When I asked ops for their preference on keeping browse in a separate directory from the data vs. adding "_brws" to browse file names instead and letting everything mingle, the latter was the overwhelming choice. BUT just in case there's some crazy scenario to account for, having an option to point the .ini file to look to a separate browse dir seems potentially handy. If it's not super easy though, I'd wax it!! |
changing story points from 3 to 2 in this sprint |
@afitzgerrell if you want to do any reviewing of the README before the changes are merged, please take a look at https://github.com/nsidc/granule-metgen/tree/issue-61?tab=readme-ov-file and confirm I didn't take any inappropriate editing liberties. |
@afitzgerrell you can ignore my previous comment because...the branch is merged! Version 1.3.0 is now available on PyPI -- and as a bonus PyPI is displaying the latest README for your reviewing pleasure (along with Github as a reviewing option, of course). |
@juliacollins thanks for having tried to give me a very nice heads-up and opportunity to peruse the readme before merging/publishing...regrettably, it was one of the emails i didn't discover i was missing until today. i'll test new pypi and peruse the readme then. |
Development done, only testing open. Change from 2SP to 0SP. |
I fear I'm doing something incorrectly, but who knows, so here's the lowdown: When I do a dry run with an .ini file specifying I want to include browse images, metgenc runs and claims success. No browse images are accounted for in the cnm files generated though. I don't know if perhaps I'm supposed to be doing more than accepting regex to be: "browse_regex = _brws", and that's where the trouble is stemming from(?). My .ini file is attached masquerading as a .txt file here. Acceptance criteria:
Browse and data file names example:
|
Update: After chatting with Kevin and his suggesting that I rename the browse files, I was able to run metgenc process and see the cnm files updated to contain the browse "type" added for each file's three associated browse images.
What I've stumbled into now though is that the metgenc validate command fails when I run it for cnm files though, it succeeds on ummg files. It's not just failing when I want to validate the new, browse-containing cnm files, it fails on browse-free cnm files.
while the following is what's shown on screen in terminal:
I'm running metgenc A side note: |
@afitzgerrell Looks like your Here's the regex I've been using for my 0081 testing:
|
@afitzgerrell Could you verify all of the cnm files in the directory being validated actually contain typical "CNM" stuff? Based on the output you included, I'm guessing the file |
@afitzgerrell here's my attempt at a summary of the current implementation:
Example of the last point (no
In summary: unless you tell The question I have now is: Where do we document this business logic? |
@juliacollins re. your question about the cnm file quality/content: I tested running validate with three cnm files to be validated (those records below with 11:46:32 timestamp), and then moving two out, to test just one cnm file. When just one cnm file is in the output/cnm directory (11:46:43 timestamp), the errors are still thrown on-screen, but the log doesn't record anything beyond validate being kicked off:
I'm now mulling over your next comment. Thank you for explaining the intent of the regex(s) in the config file. That confirms my suspicion of failure to understand on my part!! As to where this greater explanation should go, I'm thinking a couple of actual examples added to the README.md would at least indicate "you need to engage brain power here (and not think the placeholder is a panacea)". If it turns out later that more elaborate 'how to regex: for metgenc' information is needed, I can add a more elaborate explanation/examples to the MetGenC Ancillary Resources Confluence page and pop a "for greater detail and more examples" link to it in the README.md. |
@afitzgerrell I'm unable to reproduce the error(s) you're seeing with my local |
thanks @juliacollins...my suspicion is that i've somehow messed up my working directory and that i'd do best by making a new venv and reinstalling metgenc. BUT, i'd enjoy giving you a laugh and providing a sanity check for myself, sooo whenever you have a second, or many seconds, attached you'll find my ini file and three cnm files that presently causes metgenc validate -c ./init/0081duck.ini -t cnm much displeasure: NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc.cnm.json NSIDC0081_SEAICE_PS_S25km_20211103_v2.0_DUCk.nc.cnm.json |
@afitzgerrell we are forced to accept mystery and uncertainty. I can kick off a validation of all CNM files with
Something is happening (or has happened) in your local workspace, but I have no idea what! |
@juliacollins thanks for this. it's good to know it's me, not metgenc 🎉. i'll get a new working directory established and move along!! |
oh, one interesting(?) thing: i just cleaned up and reinstalled metgenc locally...it still fails to validate cnm files. but i'd forgotten about the check-jsonschema option. i ran that, and it also gripes about validation, but is a little more to-the-point (to me, certainly!) about what it's gripe is:
the path i used to the schemafile is copied from your example above because i forgot where cumulus_sns_schema.json lived, not really paying attention that i don't have a
this is all a prelude to my question: does EDIT: dawned on me i can answer my own last questions by coping the cumulus_sns_schema.json to a recreated |
@afitzgerrell Thanks to your feedback and our standup conversation I am realizing that I didn't try running |
I set up a virtual environment, activated it, and did a I'm now stumped as to the next step we should take to untangle this mystery! |
@juliacollins ok. arrg. welp, that makes two of us 🙃. therefore, i'll shut down and power my laptop back up! probably won't help, but it can't hurt. |
@juliacollins Ignoring other insanity of my working dir/environment, I CAN confirm that all of the acceptance criteria have been met when I test this end to end using a correct granule_regex string 🎉
|
🎉 |
Allow each granule to have a browse file [likely only CNM message needs to change for this issue]
Details:
Data set: use a netcdf, NSIDC-0081DUCk is a good option, in the end browse should work for all science file types
browse: allow all possible options (Research the options that are allowed for browse) -> see below
Size summary might need to go into UMM-G? (Amy? -this issue is blocked till Amy returns ) -> see below
Multiple browse per granule are possible ? -> see below
Acceptance criteria:
All browse files are found and end up in the CNM message
UMM-G: confirm that granule size is correctly calculated (confirm if this is with or without browse)
Browse files need to be staged in correct location (same S3 bucket as data and metadata)
The text was updated successfully, but these errors were encountered: