Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bogus calibrated values after running calibrate_carbon_bymonth() on some sites #72

Closed
jmackinnon109 opened this issue Jun 15, 2021 · 6 comments

Comments

@jmackinnon109
Copy link

jmackinnon109 commented Jun 15, 2021

Describe the bug
Large, erroneous numbers appear in min and max of calibrated dlta13CCo2 data. This seems widespread, we have tried calibrated the entire eddy covariance (DP4.00200.001) dataset and have found that a large portion of this data is affected by this issue. Also this issue only occurs when using the Gain and Offset method (Bowling_2003).

To Reproduce
Steps to reproduce the behavior:

  1. Use HDFView (or whatever program you like) to open and view the uncalibrated h5 file: NEON.D16.WREF.DP4.00200.001.nsae.2019-02.basic.20201020T154248Z.h5.
  2. Use the calibrate_carbon_bymonth() function to calibrate the WREF site data.
  3. Open the newly calibrated h5 file: NEON.D16.WREF.DP4.00200.001.nsae.2019-02.basic.20201020T154248Z.h5 next to its existing uncalibrated file in HDFView.
  4. In both files, navigate to the 000_010_09m directory located under isoCo2 data.
  5. Select the dlta13CCo2 data and observe the erroneous values in the min and max column of the calibrated data set.

Expected behavior
We expect the min and max in the calibrated file to be much closer to the min and max in the uncalibrated file and not 88447.84 across the board.

Screenshots
image

Additional context
We have seen this 88447.84 number pop up in the min and max columns of several other calibrated files as well, including files from other NEON data sites.

One thing I noticed is that this issue in the calibrated data seems due to missing cal data in the original file being calibrated, the datasets CO2High CO2Med and CO2Low appear to be used to generate the coefficients used to calibrate the data and when this is missing (or even just incomplete) the calibration fails silently.

I am not an R expert so it's possible I am doing something very wrong, but this calibration process seems pretty straightforward so its strange so much of the data would be bad after running it.

Thanks!

@rfiorella
Copy link
Collaborator

At some point this error did creep into the GitHub version - but I don't think it was present in the version of the package on CRAN.

In the short-term, I suggest using the version on CRAN (if you're not already). If you're running into that error with the CRAN version of the package as well, please let me know. In either case, this issue will be fixed for the next CRAN release (0.5). I'm away the next couple weeks, but I'm aiming to have a new version of the package submitted to CRAN within a month or so.

Thanks,
Rich

@jmackinnon109
Copy link
Author

Thanks for the response! When I initially saw this bug it was after following the directions on the github page (installed using devtools::install_github() method), but after your response I tried reinstalling the package from CRAN and still saw this issue. In both cases it seems to grab version 0.4.0. Would trying 0.3.0 be worth it?

@rfiorella
Copy link
Collaborator

Hmmm...so far I haven't been able to replicate this error on my computer, which seems to indicate there's either an issue with another package, a version issue with this package, or a cross-platform issue still lurking out there. Here's a plot of the data in my file for ABBY, 2019-02 (the month you provided the screenshot from):

Screen Shot 2021-06-28 at 11 26 07 AM

Could you provide/try the following?

  1. provide the output of the sessionInfo() command, after running the code that generates the files w/ the erroneous data?
  2. In a completely fresh R session, install and load the GitHub version of the package using devtools::install_github("SPATIAL-Lab/NEONiso") to see if the issue persists?

I can't imagine why installing v0.3 would solve the problem - and the results in the JGR paper were using v0.4 anyway. Thanks!
-Rich

@jmackinnon109
Copy link
Author

Hello Rich, we have moved our calibration over to docker containers in order to try to isolate what our issue is since this problem seems to be hard to reproduce (and we may just be doing something dumb hah!). I have attached a sessioninfo.txt file and the calibrated output from a run we did on the NEON.D16.WREF.DP4.00200.001.nsae.2020-05.basic.20201110T192806Z.h5 file using version 0.5.0 of NEONiso. We still see the 88447... number but its definitely less than the other versions (0.5.0 is enough of an improvement that for our uses we are probably fine but I still want to try to get to the bottom of this).

We ran some tests across the versions using 30m data files across all 47 terrestrial NEON sites (for all tower levels) to try to determine how much of the data is effected and this is what we found:

NEONiso version 0.3.0 (from devtools):

  • Empty rows (no data in original mean column): 6,740,043 rows (48% of total data)
  • Error in mean_cal column: 737,471 rows (9.86% of total data)

NEONiso version 0.5.0:

  • Empty rows (no data in original mean column): 6,740,043 rows (48% of total data)
  • Error in mean_cal column: 241,645 rows (1.72% of total data)

In both of these cases the "error" is that weird large number.

sessionInfo.txt
cran5WREF2020-05.h5.zip

@rfiorella
Copy link
Collaborator

Thanks for the update! I'll take a look and hopefully include a fix in the next release.

@rfiorella
Copy link
Collaborator

Should be fixed by c12f94e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants