-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking OBIS Submission for WBTS Calanus Data #102
Comments
@albenson-usgs, you can find the final Darwin Core files at https://github.com/ioos/bio_data_guide/tree/main/datasets/WBTS_MBON/data/processed A description of the process is at https://github.com/ioos/bio_data_guide/tree/main/datasets/WBTS_MBON |
Thanks Dylan! A few things to change / add before we can load this in the IPT.
|
Thanks @albenson-usgs! That's very helpful - I'll start working through these today.
|
I've made a number of updates here, and opened a new PR #103 with the updated script & output files. A quick question about the
Currently the data in this field comes from the
Do you have a sense of which of those is preferable? MetadataWe can use "WBTS_CFIN_2004_2017" as the short name for this dataset. I've also attached the cruise report for the dataset here, let me know if this format is workable, or if you need something else! |
Great! Let's put it all in For the metadata:
|
Thanks Abby! I reached out to Jeff Runge for some clarity on the contact info, license, and preferred citation. I also moved all the comments into the While reviewing I noticed that there were some duplicate Changes can be seen in this PR #104. I'll update here once I hear back from Jeff! |
After reviewing the newest files-
|
Thanks Abby, sorry for my misunderstanding about the
I checked the source data, and the eventID "GC120604WBWB-72" has no occurrences, so that's correct. However, there's something strange going on with the occurrenceID: Thanks for the clarification on the emof file - that concept makes sense (little by little!), and I'm just trying to think of how to achieve that separation programmatically. I'll open a new PR once I implement the changes and check back here. |
Circling back here - I'm wondering how best to handle the event/occurrence split for the MoF file. I think the thing that's tripping me up is that in the original dataset each row corresponds to a sampling event, and several (between 0 and 8) occurrences. During processing I'm expanding each original row into 8 new rows, each representing an occurrence. Because of this, all of the data contained in the MoF file is identical for each expanded row - the only thing that changes is data related to an actual organism ( Does that make sense at all? I'd also be available for a quick call tomorrow to discuss! Here's a diagram of how each input row becomes the 8 output rows: |
Ok I think I understand this better now and I think you are right. In this case all the eMoFs are event level eMoFs and data in the columns for the different life stages (N, CI, CII, CIII, CIV, CV) and sexes (F, M) is best in |
Great - I think switching to
So my thought would be to have As far as recording absences - there are some records which are blank (or NaN) and others that actually record "0" in the Calanus columns. I'm going to reach out to the PI for clarification, because my understanding is that those would actually be treated differently:
|
Yes that makes sense to me. For the absences- yes good to get clarification from the PI for sure about that difference. However, I'm still wondering if it makes sense to have absences for males vs. females or different life stages. Note that the definition of
In the example I have above Calanus finmarchicus (taxon) is present at the location (event1) it's just that not all sexes or life stages are present. I would not include the sex / life stages "absences" so it would look like this:
The time when I would include a row for absent is when absolutely no Calanus finmarchicus are found. But I'm curious what others think about this so I will put this question over into the Slack for more discussion. |
That makes a lot of sense, and I think you're right about the absences. The fact that the definition specifically mentions Taxon definitely helps clarify that in my mind. For the moment I've gone ahead and updated the script to ignore "missing" sex and life stage records, and the output now looks like your example above. I'll keep an eye on the discussion in Slack and can amend this if need be. I also verified that blank records mean that an organism was not counted. From the cruise report:
so those records are now ignored. I've opened a new PR with those changes and some additional corrections: #105 MetadataThe PI also responded to my earlier questions about metadata:
He was unsure about the license question, so I'm following up with a few other people on that. Is this in reference to the source data's existing license, or the license which will be applied to the DwC files? |
Does this page help with the licenses question? It's the license that will be applied to the DwC files. Note that they must select one of three licenses or the data cannot be published to OBIS and GBIF: CC-0, CC-BY, CC-BY-NC. |
Hi @albenson-usgs - just wanted to circle back here! I heard back from the PI, and we'd like to use CC-BY for the license. |
Does that mean we're a go for publishing? Should I load what's here into the OBIS-USA IPT and publish to OBIS and GBIF? |
Sure thing - that sounds good to me! |
Thanks Abby - I'm correcting the longitude values in the source data now, and will also verify the missing occurrences. |
Hi @albenson-usgs - sorry for the super long delay in getting back to you. I've corrected the issue with the missing negative signs, and confirmed that there are 88 events with no occurrences - so the DwC files should be correct. I've opened a PR with the changes here: #108 Hopefully we'll be all set once it's merged in! |
I went ahead and merged it in. |
@Dylan-Pugh is this a one off and will never be updated or will this be updated with more observations in the future? I'm trying to decide if I should include the dates in the title of the resource (Wilkinson Basin Time Series Station (WBTS): MESOZOOPLANKTON 2004-2017) or not (Wilkinson Basin Time Series Station (WBTS): MESOZOOPLANKTON) |
This should be a one off! I don't think any new data will be added in the future. |
Thanks Abby & Matt! |
Thank you! Teamwork! |
Here's the dataset in OBIS https://obis.org/dataset/5ef55cd8-05a1-4569-8e17-ceb224e40f59 :-) |
I'm creating this issue to track the OBIS submission process for the WBTS Calanus dataset. I've opened a PR which contains the conversion script I used, as well as the three output files: #101.
Tagging @albenson-usgs here for help/guidance on using the IPT!
Please let me know if you have any questions, or see any issues.
The text was updated successfully, but these errors were encountered: