-
Notifications
You must be signed in to change notification settings - Fork 315
Add more Landsat readers #3181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add more Landsat readers #3181
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3181 +/- ##
==========================================
- Coverage 96.32% 96.28% -0.04%
==========================================
Files 465 465
Lines 58159 58666 +507
==========================================
+ Hits 56020 56489 +469
- Misses 2139 2177 +38
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pull Request Test Coverage Report for Build 16914740677Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
I don't have time for a full review of this until some time in September, as I'm on annual leave and will then be catching up on work for some time thereafter. But one quick comment, the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a big PR, but thanks for putting it together!
I haven’t checked all the details, but I was wondering if it was possible somehow to refactor the tests, as it seems there is quite some duplication. Maybe using "pytest.mark.parametrize" could help? But as I said, I haven’t checked the details, so maybe my suggestion doesn’t make sense...
I tried to merge all the tests into a single file I used So I decided to keep the original test files for now and delete them later, because I am not sure if one single test file or separate test files for each sensor is preferrable in this case. Let's discuss which test implementation will be more suitable in this case! Also, probably tests could be optimized better (maybe somehow getting rid of similar fixtures), or maybe more tests should be added (e. g. to test bands that appear only in specific products, like GM bands for ETM+) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, what a pull request. Nice work. I only made it through the landsat_base.py
module. I had a possible simplification suggestion:
Could you remove some of the calibration coefficient paths and saturation paths to either the file_type
definition in the YAML or to the individual dataset definitions in the YAML? It could even be a format string with a placeholder for {band}
and/or other information. If it needs to be per-variable maybe you could add it as part of the available_datasets
method? Overall this is not a big deal, but I thought it could easily cut out a large chunk of the python code and make it more obvious that the only difference between the classes/types are where the metadata and factors are stored.
I refactored We could probably also keep only one Now: class LandsatL1MDReader(BaseLandsatMDReader):
"""Metadata file handler for Landsat L1 files (tif)."""
@property
def band_calibration(self):
"""Return per-band calibration parameters."""
radcal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "RADIANCE_MULT", "RADIANCE_ADD")
viscal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
tircal = self.get_cal_params("LEVEL1_THERMAL_CONSTANTS", "K1_CONSTANT", "K2_CONSTANT")
topcal = viscal | tircal
return {key: tuple([*radcal[key], *topcal[key]]) for key in radcal}
class LandsatL2MDReader(BaseLandsatMDReader):
"""Metadata file handler for Landsat L2 files (tif)."""
@property
def band_calibration(self):
"""Return per-band calibration parameters."""
viscal = self.get_cal_params("LEVEL2_SURFACE_REFLECTANCE_PARAMETERS", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
tircal = self.get_cal_params("LEVEL2_SURFACE_TEMPERATURE_PARAMETERS", "TEMPERATURE_MULT", "TEMPERATURE_ADD")
return viscal | tircal If merged to one class: @property
def band_calibration(self):
"""Return per-band calibration parameters."""
if "1" in self.process_level:
radcal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "RADIANCE_MULT", "RADIANCE_ADD")
viscal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
tircal = self.get_cal_params("LEVEL1_THERMAL_CONSTANTS", "K1_CONSTANT", "K2_CONSTANT")
topcal = viscal | tircal
return {key: tuple([*radcal[key], *topcal[key]]) for key in radcal}
elif "2" in self.process_level:
viscal = self.get_cal_params("LEVEL2_SURFACE_REFLECTANCE_PARAMETERS", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
tircal = self.get_cal_params("LEVEL2_SURFACE_TEMPERATURE_PARAMETERS", "TEMPERATURE_MULT", "TEMPERATURE_ADD")
return viscal | tircal
raise ValueError("blahblahblah") |
Improved |
satpy/readers/core/landsat.py
Outdated
area_extent = (ext_p1, ext_p2, ext_p3, ext_p4) | ||
|
||
# Return the area extent | ||
return AreaDefinition(f"EPSG: {proj_code}", pcs_id, pcs_id, proj_code, x_size, y_size, area_extent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mraspaud @pnuu and anyone else who might care, what do you think about the name of the area being the EPSG code. I asked to make it more descriptive but now using the EPSG code in this way seems wrong. I'm not sure what a better name would be and I don't know enough about landsat to suggest something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have all my production areas named like epsg_3035_1km
, so in principle I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with the "UTM{utm_zone}" name is that Arctic and Antarctic scenes do not have a UTM zone, so I decided to use EPSG code instead. I used it only because that was the first idea I came with, and if there is a more appropriate area definition name, we probably should use it.
If f"EPSG: {proj_code}" is bad mostly because it is not filename friendly, I can just replace it to f"epsg_{proj_code}"
If "UTM{utm_zone}" approach is preferrable, I can do something like that:
# Reading utm zone or get specific crs for arctic and antarctic
if self.root.find(".//PROJECTION_ATTRIBUTES/UTM_ZONE") is not None:
utm_zone = self.root.find(".//PROJECTION_ATTRIBUTES/UTM_ZONE").text
pcs_id = f"{datum} / UTM zone {utm_zone}N"
proj_code = f"EPSG:326{utm_zone.zfill(2)}"
name = f"UTM{utm_zone}"
else:
lat_ts = self.root.find(".//PROJECTION_ATTRIBUTES/TRUE_SCALE_LAT").text
if lat_ts == "-71.00000":
# Antarctic
proj_code = "EPSG:3031"
if lat_ts == "71.00000":
# Arctic
proj_code = "EPSG:3995"
pcs_id = f"{datum} / EPSG: {proj_code[5:]}N"
name = f"EPSG_{proj_code[5:]}"
return AreaDefinition(name, ...)
Btw, I adapted the code for the Arctic and Antarctic scenes from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh let's just leave it the way it is. If it ends up mattering in the future I'll make an issue or a PR.
Ok we're getting really close to a merge now. The biggest issue I have now is how much CodeScene (the check on this PR) dislikes this implementation. It makes good points about there being a lot of data declarations and there being a lot of function arguments. On the other hand these are tests and you're parametrizing them so you can kind of expect a lot of arguments. I thought about refactoring the tests for you but ran out of time today. I had the thought that maybe it would be nice to have a base class (ex. @mraspaud @pnuu how do you feel about just leaving CodeScene mad and merging this implementation with a single test file? Other consideration: Move the metadata definitions in the tests to text files in the tests directory that are loaded when they are needed (a fixture if it makes sense). |
I'll see if I have some time tomorrow to have a look at the complaints. I think at least the ones in |
Pushed the first round of refactorings, lets see what CodeScene thinks about them. |
Ok, |
With 31a6f23 the metadata are moved from the test module to separate text files. |
Complex test conditionals refactored in 63bbc0a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup @pnuu. I think I'm fine with this being merged as-is, but I also posted a possible further refactoring of the tests on slack that may or may not be worth it. I'll copy that message here:
a possible restructuring of the tests could be:
1. Per-reader test classes with one (or more) base classes.
2. The base class define the test methods but without the test_ prefix.
3. The subclasses defined class attributes for reader and classes to use.
4. The subclasses define the test methods and parametrize as needed but call the base classes method that does all the actual work. The base class will use the class attributes (self.X) and the parametrized values will be passed in as arguments.
This simplifies the number of arguments per test and separates based on reader/instrument. These classes could then be separated in to per-reader modules if it makes sense
Ok I've reduced the number of arguments by quite a bit, but they are still over the threshold for CodeScene. The main changes are:
I could combine the filename fixtures with the filename_info since the two are typically together but it wouldn't reduce the number of arguments enough for most of the test methods. The next option would be to do my subclassing idea, but I'm not sure I want to put that much work into this. |
I thought about @djhoese 's suggestion about per-reader test classes, and maybe in this case we could get rid from parametrization at all, as in the current implementation every parameter set usually corresponds to a single reader? So we could probably do something like this: class BaseLandsatTest:
def _basicload(remote):
if remote:
all_files = convert_to_fsfile(self.all_files)
else:
all_files = self.all_files
scn = Scene(reader=self.reader, filenames=all_files)
if thermal_name is not None:
scn.load([self.spectral_name, self.thermal_name])
else:
scn.load([self.spectral_name])
self._check_basicload_thermal(scn)
self._check_basicload_mss_l1_tif(scn)
...
def _ch_startend(self):
"""Test correct retrieval of start/end times."""
scn = Scene(reader=self.reader, filenames=[self.spectral_file, self.mda_file])
bnds = scn.available_dataset_names()
assert bnds == [self.spectral_name]
scn.load(["B4"])
assert scn.start_time == self.date_time
assert scn.end_time == self.date_time
class TestLandsatOLITIRSL1(BaseLandsatTest):
reader = "oli_tirs_l1_tif"
spectral_name = "B4"
thermal_name = "B11"
all_files = lf("oli_tirs_l1_all_files")
...
@pytest.mark.parametrize("remote", [True, False])
def test_basicload(self, remote):
self._basicload(remote)
def test_ch_startend(self):
self._ch_startend() We could use Should I implement it this way? |
@simonreise should we maybe merge this and then do a separate refactor PR? I'm actually not sure why we haven't merged this already. |
Shouldn't @ simonrp84 also review this? He said he will review the PR in September after his summer vacation in one of the first messages in the thread. |
Landsat readers
Overview
This PR adds readers for Landsat OLI-TIRS L2, ETM+ L1-2, TM L1-2 and MSS products.
All these readers are mostly based on the existing Landsat OLI-TIRS L1 reader.
Readers
Landsat reader classes are hierarchially organized into a nested structure of classes where
BaseLandsatReader
class contains__init__
andget_dataset
functions. Their logic is mostly copied from the original reader, with few additions and bug fixes.Landsat readers for L1 and L2 classes contain calibration logic, which is the same for every sensor, but differs for L1 and L2 products. L1 logic is copied from the existing reader.
Note: I moved the * 0.001 scaling logic for
TRAD
,URAD
,CDIST
etc bands to calibration. Maybe it belongs toget_dataset
function, just likeang
bands * 0.01 scaling in L1 reader? Or should I moveang
bands *0.01 scaling to calibrations also?MSSCHReader
also contains theavailable_datasets
functions that dynamically sets B4 wavelength (see yamls section for more info)YAMLs
I created YAML files for every reader.
They are all based on the OLI-TIRS L1 YAML file.
What was updated or needs additional checking:
Reader namings. Should etm+ reader be named "etm_lx_tif" or "etm_plus_lx_tif"?
data_identification_keys
section was added to the head of the file to add custom calibration support to L2 products. Is that necessary? Should custom calibrations be implemented in any other way?{collection_category}
was limited to{collection_category:2s}
because otherwise L2 product and Landsat-7 (which also contains GM bands) search was faulty. AFAIK collection category can contain only 2 symbols, but if not, it can cause errors.QA band namings: see Add new QA band filename support to Landsat reader #3176
calibration
standard_names
andunits
probably should be double-checkedwavelength
was added or updated according to official Landsat Data Format Control Books. But probably also should be double-checked.Again, custom calibrations at band definitions at
datasets
section. Are they added the right way?MSS
bands. In Landsat 1/2/3 MSS products bands are named B4, B5, B6, B7 and in Landsat 4/5 MSS products the same bands are named B1, B2, B3, B4. Now MSS reader just read the bands and their names as-is. The issue withB4
band beinggreen
in Landsat 1/2/3 andnir
in Landsat 4/5 is solved by changingwavelength
inavailable_datasets
function in the reader. But is it the right way to handle such issues? Maybe there is a better way to solve that problem? Or maybe separate readers for Landsat 1/2/3 and Landsat 4/5 MSS products should be added?Also, some docs say that Landsat-3 had a thermal band B8, but it failed just after the launch. I am not sure, if any products with B8 actually exist and if it should be added to band list.
Tests
I added separate test files for every reader. They are mostly just the adapted versions of the OLI-TIRS L1 test file. The differences are:
L2 product test files test
TRAD
file instead ofsza
fileProducts other than OLI-TIRS do not have
test_loading_badchan
test, because they do not have products with only spectral or only thermal bands, so they can not face such errorsMSS test tests both Landsat-1 and Landsat-4 products and tests if B4 wavelength is set correctly
Other small differences
Maybe any other tests should be added?
UPD: I also added alternative implementation of the same tests that uses one file and
pytest.mark.parametrize
Additions
eoreader
have C1 support for some reason