Skip to content

Allow browse files #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lisakaser opened this issue Sep 17, 2024 · 27 comments
Closed

Allow browse files #61

lisakaser opened this issue Sep 17, 2024 · 27 comments
Assignees
Milestone

Comments

@lisakaser
Copy link

lisakaser commented Sep 17, 2024

Allow each granule to have a browse file [likely only CNM message needs to change for this issue]



Details:
Data set: use a netcdf, NSIDC-0081DUCk is a good option, in the end browse should work for all science file types
browse: allow all possible options (Research the options that are allowed for browse)
 -> see below
Size summary might need to go into UMM-G? (Amy? -this issue is blocked till Amy returns ) -> see below

Multiple browse per granule are possible
? -> see below

Acceptance criteria:
All browse files are found and end up in the CNM message


UMM-G: confirm that granule size is correctly calculated (confirm if this is with or without browse)


Browse files need to be staged in correct location (same S3 bucket as data and metadata)

@lisakaser lisakaser added the enhancement New feature or request label Nov 15, 2024
@lisakaser lisakaser added question Further information is requested and removed enhancement New feature or request labels Dec 4, 2024
@lisakaser lisakaser added this to the Dec-Jan-Feb milestone Dec 4, 2024
@lisakaser
Copy link
Author

lisakaser commented Dec 18, 2024

@afitzgerrell here is the list of questions that currently blocks this issue:

What are all the allowed browse options (file types)?
Does browse file size need to go into UMM-G or is browse file size excluded from size summary in UMM-G?

Are multiple browse per granule an option that we need to allow?

@afitzgerrell
Copy link
Contributor

afitzgerrell commented Jan 8, 2025

What are all the allowed browse options (file types)? Based on the MimeTypeEnum listing in Earthdata's UMM-G schema, the allowed options are: “image/jpeg”, “image/png”, “image/gif”, “image/tiff”, & “image/bmp” (although the only ones I'm familiar with seeing for browse images are .jpg, .png, and .tif). Although, something to keep in mind is that any of these are also valid data types for data or ancillary files, either stand alone for a single-file granule + browse, or within a multi-file granule + browse (as noted in the CNM documentation under the Product sub-fields / files description and in the UMM-G documentation at DataGranule, Identifiers section).

Does browse file size need to go into UMM-G or is browse file size excluded from size summary in UMM-G? Browse are discrete files that should not be considered part of a granule's contents, thus should not be included in the UMM-G SizeInBytes value.

Are multiple browse per granule an option that we need to allow? Yes, this would be ideal. Generally, if data producers generate browse images it's typically been one browse per granule. There are, however, data sets where multiple browse images are included for one granule, an example of this is NISE which has two browse, one for each Hemisphere.

@lisakaser
Copy link
Author

lisakaser commented Jan 13, 2025

Somehow the operator needs to be indicate what are the browse files or where the browse files are. (Either a separate location for browse or a consistent file name ending) -> @afitzgerrell will discuss with Ops to find out their preference

A browse file can not at the same time be a science file as part of the granule.

Browse files will be listed in the CNM message

Browse files will be listed in the UMM-G as URLs but not part of the file size

@lisakaser lisakaser removed the question Further information is requested label Jan 13, 2025
@afitzgerrell
Copy link
Contributor

afitzgerrell commented Jan 14, 2025

I queried OPS this morning and the consensus is: they'd prefer we (ideally) encourage data producers to include "brws" in the file names for browse images (rather than not altering the names and requiring the browse images to be sequestered in a different directory from the data files). In a pinch, were we to end up with data sans this keyword and a data producer unable or unwilling to rename files, OPS could add it in.

@juliacollins
Copy link
Contributor

juliacollins commented Feb 19, 2025

@afitzgerrell double-checking the plan for browse files: our NSIDC_0081DUCk sample data currently is presented with browse files in a separate directory, but the words in the comments above suggest we will have browse files intermingled with data files, with something like brws in the file name. I'm wondering if, to be on the safe side, I should extend the ini file to include a separate browse file location as well as the identifying bit of text we use to figure out whether something is a browse file (e.g. brws vs browse vs prettypicture vs ???).

@afitzgerrell
Copy link
Contributor

afitzgerrell commented Feb 19, 2025

@juliacollins I like your suggestion of making the .ini be told what text to recognize as being used to identify browse files!! Seems it'd be a helpful safeguard and allow for flexibility in an inconsistent world of data producer deliveries.

Also, if it's not a heavy lift, allowing for the ini to be pointed to a separate browse image directory likely wouldn't hurt as long as it'd be acceptable to allow an ops person to steer the .ini to just look to the /data directory if desired. When I asked ops for their preference on keeping browse in a separate directory from the data vs. adding "_brws" to browse file names instead and letting everything mingle, the latter was the overwhelming choice. BUT just in case there's some crazy scenario to account for, having an option to point the .ini file to look to a separate browse dir seems potentially handy. If it's not super easy though, I'd wax it!!

@lisakaser
Copy link
Author

changing story points from 3 to 2 in this sprint

@juliacollins
Copy link
Contributor

@afitzgerrell if you want to do any reviewing of the README before the changes are merged, please take a look at https://github.com/nsidc/granule-metgen/tree/issue-61?tab=readme-ov-file and confirm I didn't take any inappropriate editing liberties.

@juliacollins
Copy link
Contributor

@afitzgerrell you can ignore my previous comment because...the branch is merged! Version 1.3.0 is now available on PyPI -- and as a bonus PyPI is displaying the latest README for your reviewing pleasure (along with Github as a reviewing option, of course).

@afitzgerrell
Copy link
Contributor

@juliacollins thanks for having tried to give me a very nice heads-up and opportunity to peruse the readme before merging/publishing...regrettably, it was one of the emails i didn't discover i was missing until today. i'll test new pypi and peruse the readme then.

@lisakaser
Copy link
Author

Development done, only testing open. Change from 2SP to 0SP.

@afitzgerrell
Copy link
Contributor

afitzgerrell commented Mar 12, 2025

I fear I'm doing something incorrectly, but who knows, so here's the lowdown:

When I do a dry run with an .ini file specifying I want to include browse images, metgenc runs and claims success. No browse images are accounted for in the cnm files generated though. I don't know if perhaps I'm supposed to be doing more than accepting regex to be: "browse_regex = _brws", and that's where the trouble is stemming from(?). My .ini file is attached masquerading as a .txt file here.

Acceptance criteria:

  • All browse files are found and end up in the CNM message
  • UMM-G: confirm that granule size is correctly calculated (confirm if this is with or without browse)
  • Browse files need to be staged in correct location (same S3 bucket as data and metadata)

Browse and data file names example:

NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_F16_DUCk_brws.png
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_F17_DUCk_brws.png
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_F18_DUCk_brws.png

0081duckBRWS.ini.txt

@afitzgerrell
Copy link
Contributor

Update: After chatting with Kevin and his suggesting that I rename the browse files, I was able to run metgenc process and see the cnm files updated to contain the browse "type" added for each file's three associated browse images.
Data and renamed browse files:

NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F16.png
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F17.png
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F18.png

What I've stumbled into now though is that the metgenc validate command fails when I run it for cnm files though, it succeeds on ummg files. It's not just failing when I want to validate the new, browse-containing cnm files, it fails on browse-free cnm files.
This following is shown in the metgenc.log after I run metgenc validate:

2025-03-13 17:12:57,642|INFO|metgenc|
2025-03-13 17:12:57,643|INFO|metgenc|Validating files in output/cnm...
2025-03-13 17:12:57,661|INFO|metgenc|No validation errors: output/cnm/NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json

while the following is what's shown on screen in terminal:

Validating files in output/cnm...
No validation errors: output/cnm/NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json
Traceback (most recent call last):
  File "/Users/afitzger/metgenc/bin/metgenc", line 8, in <module>
    sys.exit(cli())
             ~~~^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/nsidc/metgen/cli.py", line 74, in validate
    metgen.validate(configuration, content_type)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/nsidc/metgen/metgen.py", line 868, in validate
    apply_schema(schema, json_file, dummy_json)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/nsidc/metgen/metgen.py", line 911, in apply_schema
    json_content = json.load(jf)
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py", line 293, in load
    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py", line 344, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py", line 362, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm running metgenc Version: 1.3.0

A side note:
I also noticed, and maybe (probably) is to be expected: if data files and browse files are hanging out together in the data directory, and I run metgenc process with a config file lacking the browse_regex = _brws line, the browse images are still added to the cnm files created.

@juliacollins
Copy link
Contributor

juliacollins commented Mar 18, 2025

@afitzgerrell Looks like your ini file doesn't contain a granule_regex value, so the code has no idea that each granule might have multiple data files and/or associated browse files sharing file name components. I'll look at options for assessing this scenario and reporting it to the user.

Here's the regex I've been using for my 0081 testing:

granule_regex = _(?P<granuleid>[NS]{1}\d{2}km_\d{8})_

@juliacollins
Copy link
Contributor

juliacollins commented Mar 18, 2025

Update: After chatting with Kevin and his suggesting that I rename the browse files, I was able to run metgenc process and see the cnm files updated to contain the browse "type" added for each file's three associated browse images. Data and renamed browse files:

NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F16.png
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F17.png
NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F18.png

What I've stumbled into now though is that the metgenc validate command fails when I run it for cnm files though, it succeeds on ummg files. It's not just failing when I want to validate the new, browse-containing cnm files, it fails on browse-free cnm files. This following is shown in the metgenc.log after I run metgenc validate:

2025-03-13 17:12:57,642|INFO|metgenc|
2025-03-13 17:12:57,643|INFO|metgenc|Validating files in output/cnm...
2025-03-13 17:12:57,661|INFO|metgenc|No validation errors: output/cnm/NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json

while the following is what's shown on screen in terminal:

Validating files in output/cnm...
No validation errors: output/cnm/NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json
Traceback (most recent call last):
  File "/Users/afitzger/metgenc/bin/metgenc", line 8, in <module>
    sys.exit(cli())
             ~~~^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/nsidc/metgen/cli.py", line 74, in validate
    metgen.validate(configuration, content_type)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/nsidc/metgen/metgen.py", line 868, in validate
    apply_schema(schema, json_file, dummy_json)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/afitzger/metgenc/lib/python3.13/site-packages/nsidc/metgen/metgen.py", line 911, in apply_schema
    json_content = json.load(jf)
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py", line 293, in load
    return loads(fp.read(),
        cls=cls, object_hook=object_hook,
        parse_float=parse_float, parse_int=parse_int,
        parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ~~~~~~~~~~~~~~~~~~~~~~~^^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py", line 344, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.13.0_1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/json/decoder.py", line 362, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I'm running metgenc Version: 1.3.0

A side note: I also noticed, and maybe (probably) is to be expected: if data files and browse files are hanging out together in the data directory, and I run metgenc process with a config file lacking the browse_regex = _brws line, the browse images are still added to the cnm files created.

@afitzgerrell Could you verify all of the cnm files in the directory being validated actually contain typical "CNM" stuff? Based on the output you included, I'm guessing the file output/cnm/NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json has valid content, but then the next file in the list does not. That's just a guess, though.

@juliacollins
Copy link
Contributor

@afitzgerrell here's my attempt at a summary of the current implementation:

  • If no browse_regex exists in the ini file, _brws is used as the default.
  • If a granule_regex value exists in the ini file, it's used to identify files associated with a single granule. Currently the match string doesn't accommodate optional bits in the file name match (those optional bits will be handled as part of issue-103). The granule_regex needs to match both data file(s) and browse file(s) associated with a granule, and the browse files also need to match the browse_regex value.
  • If there is no granule_regex value in the ini file, the granules are assumed to consist of a single data file. The code determines the data files by excluding all file names that match the browse_regex. The browse files must match both the browse_regex value and the name of a data file, less its extension, to be associated with a granule.

Example of the last point (no granule_regex in the ini file):

  • The data file name is NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc The granule file name to match is NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.
  • NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_F16_DUCk_brws.png does not match NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk due to the _F16_ string, so no browse file(s) is/are included in the cnm file.
  • When you renamed the browse files to a pattern like NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk_brws_F16.png, the browse files do match the data file base name (NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk) and are included in the cnm file.

In summary: unless you tell metgenc how to gather files into groups via the granule_regex value, it takes the most basic approach possible in terms of identifying a granule's contents: it assumes one data file per granule, and any ancillary (browse, in the current case) files must match the entire data file name without its file type extension. The browse file name can have more text characters before or after the characters matching the data file name, but the data file name must exist as a complete chunk somewhere in the text of the browse file name.

The question I have now is: Where do we document this business logic? browse_regex and granule_regex are mentioned in the README, but obviously some extra detail is needed!

@afitzgerrell
Copy link
Contributor

@juliacollins re. your question about the cnm file quality/content:
I've checked my output/cnm files, and they contain correct, expected cnm stuff (despite the errors implying that NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json is ok but the other two maybe aren't).

I tested running validate with three cnm files to be validated (those records below with 11:46:32 timestamp), and then moving two out, to test just one cnm file. When just one cnm file is in the output/cnm directory (11:46:43 timestamp), the errors are still thrown on-screen, but the log doesn't record anything beyond validate being kicked off:

2025-03-20 11:46:32,886|INFO|metgenc|
2025-03-20 11:46:32,887|INFO|metgenc|Validating files in output/cnm...
2025-03-20 11:46:32,904|INFO|metgenc|No validation errors: output/cnm/NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json
2025-03-20 11:46:43,948|INFO|metgenc|
2025-03-20 11:46:43,948|INFO|metgenc|Validating files in output/cnm...

I'm now mulling over your next comment. Thank you for explaining the intent of the regex(s) in the config file. That confirms my suspicion of failure to understand on my part!!

As to where this greater explanation should go, I'm thinking a couple of actual examples added to the README.md would at least indicate "you need to engage brain power here (and not think the placeholder is a panacea)". If it turns out later that more elaborate 'how to regex: for metgenc' information is needed, I can add a more elaborate explanation/examples to the MetGenC Ancillary Resources Confluence page and pop a "for greater detail and more examples" link to it in the README.md.

@juliacollins
Copy link
Contributor

juliacollins commented Mar 24, 2025

@afitzgerrell I'm unable to reproduce the error(s) you're seeing with my local ini file and data files (the output CNM validates with no extra excitement spit to STDOUT). If you still have the failing CNM output, along with the ini file you used to create them, feel free to attach them to this issue and I'll try to test them out!

@afitzgerrell
Copy link
Contributor

thanks @juliacollins...my suspicion is that i've somehow messed up my working directory and that i'd do best by making a new venv and reinstalling metgenc.

BUT, i'd enjoy giving you a laugh and providing a sanity check for myself, sooo whenever you have a second, or many seconds, attached you'll find my ini file and three cnm files that presently causes metgenc validate -c ./init/0081duck.ini -t cnm much displeasure:

NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc.cnm.json

NSIDC0081_SEAICE_PS_S25km_20211103_v2.0_DUCk.nc.cnm.json

NSIDC0081_SEAICE_PS_S25km_20211105_v2.0_DUCk.nc.cnm.json

0081duck.ini.txt

@juliacollins
Copy link
Contributor

@afitzgerrell we are forced to accept mystery and uncertainty. I can kick off a validation of all CNM files with metgenc validate -c 0081duck.ini and it spits out No validation errors: ... for each of the three files (same result if I add -t cnm to ensure CNM files are being validated). I can also successfully use check-jsonschema for each file, for example:

check-jsonschema --schemafile src/nsidc/metgen/json-schema/cumulus_sns_schema.json output/cnm/NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc.cnm.json

Something is happening (or has happened) in your local workspace, but I have no idea what!

@afitzgerrell
Copy link
Contributor

@juliacollins thanks for this. it's good to know it's me, not metgenc 🎉. i'll get a new working directory established and move along!!

@afitzgerrell
Copy link
Contributor

afitzgerrell commented Mar 25, 2025

oh, one interesting(?) thing: i just cleaned up and reinstalled metgenc locally...it still fails to validate cnm files. but i'd forgotten about the check-jsonschema option. i ran that, and it also gripes about validation, but is a little more to-the-point (to me, certainly!) about what it's gripe is:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/afitzger/metgenc/src/nsidc/metgen/json-schema/cumulus_sns_schema.json'

the path i used to the schemafile is copied from your example above because i forgot where cumulus_sns_schema.json lived, not really paying attention that i don't have a src directory by default when i install metgenc. searching for cumulus_sns_schema.json, reveals it's in lib/python3.13/site-packages/nsidc/metgen/json-schema/. now i can successfully run:

metgenc afitzger$ check-jsonschema --schemafile lib/python3.13/site-packages/nsidc/metgen/json-schema/cumulus_sns_schema.json output/cnm/NSIDC0081_SEAICE_PS_N25km_20211101_v2.0_DUCk.nc.cnm.json
ok -- validation done

this is all a prelude to my question: does metgenc validate rely on the path to cumulus_sns_schema.json, and if so, is it problematic that it lives deep within the lib, not src, dir?

EDIT: dawned on me i can answer my own last questions by coping the cumulus_sns_schema.json to a recreated src/nsidc/metgen/json-schema/cumulus_sns_schema.json path, and metgenc validate still fails, so answers are "no" and "no". i'll move along and when time allows, test on the vagrant vm i made to see how metgenc validate behaves there.

@juliacollins
Copy link
Contributor

@afitzgerrell Thanks to your feedback and our standup conversation I am realizing that I didn't try running validate in an environment based on a PyPI installation (as opposed to my GitHub repository-populated working environment). I'm betting the schema files are not correctly included in the PyPI package. Stay tuned!

@juliacollins
Copy link
Contributor

I set up a virtual environment, activated it, and did a pip install nsidc-metgenc. I then populated the directory with the files @afitzgerrell attached above (ini file and example cnm files). Both validate and check-jsonschema run without error. I tried both python 3.12.7 and python 3.13.2.

I'm now stumped as to the next step we should take to untangle this mystery!

@afitzgerrell
Copy link
Contributor

@juliacollins ok. arrg. welp, that makes two of us 🙃. therefore, i'll shut down and power my laptop back up! probably won't help, but it can't hurt.

@afitzgerrell
Copy link
Contributor

afitzgerrell commented Mar 25, 2025

@juliacollins Ignoring other insanity of my working dir/environment, I CAN confirm that all of the acceptance criteria have been met when I test this end to end using a correct granule_regex string 🎉
Acceptance criteria:

  • All browse files are found and end up in the CNM message
  • UMM-G: confirm that granule size is correctly calculated (confirm if this is with or without browse)
  • Browse files need to be staged in correct location (same S3 bucket as data and metadata)

@lisakaser
Copy link
Author

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants