Skip to content

Conversation

simonreise
Copy link
Contributor

@simonreise simonreise commented Jul 30, 2025

Landsat readers

Overview

This PR adds readers for Landsat OLI-TIRS L2, ETM+ L1-2, TM L1-2 and MSS products.

All these readers are mostly based on the existing Landsat OLI-TIRS L1 reader.

Readers

Landsat reader classes are hierarchially organized into a nested structure of classes where

  • BaseLandsatReader class contains __init__ and get_dataset functions. Their logic is mostly copied from the original reader, with few additions and bug fixes.

  • Landsat readers for L1 and L2 classes contain calibration logic, which is the same for every sensor, but differs for L1 and L2 products. L1 logic is copied from the existing reader.

Note: I moved the * 0.001 scaling logic for TRAD, URAD, CDIST etc bands to calibration. Maybe it belongs to get_dataset function, just like ang bands * 0.01 scaling in L1 reader? Or should I move ang bands *0.01 scaling to calibrations also?

  • The product readers. They mostly contain attributes like spectral and thermal band list and sensor name. But MSSCHReader also contains the available_datasets functions that dynamically sets B4 wavelength (see yamls section for more info)

YAMLs

I created YAML files for every reader.

They are all based on the OLI-TIRS L1 YAML file.

What was updated or needs additional checking:

  • Reader namings. Should etm+ reader be named "etm_lx_tif" or "etm_plus_lx_tif"?

  • data_identification_keys section was added to the head of the file to add custom calibration support to L2 products. Is that necessary? Should custom calibrations be implemented in any other way?

  • {collection_category} was limited to {collection_category:2s} because otherwise L2 product and Landsat-7 (which also contains GM bands) search was faulty. AFAIK collection category can contain only 2 symbols, but if not, it can cause errors.

  • QA band namings: see Add new QA band filename support to Landsat reader #3176

  • calibration standard_names and units probably should be double-checked

  • wavelength was added or updated according to official Landsat Data Format Control Books. But probably also should be double-checked.

  • Again, custom calibrations at band definitions at datasets section. Are they added the right way?

  • MSS bands. In Landsat 1/2/3 MSS products bands are named B4, B5, B6, B7 and in Landsat 4/5 MSS products the same bands are named B1, B2, B3, B4. Now MSS reader just read the bands and their names as-is. The issue with B4 band being green in Landsat 1/2/3 and nir in Landsat 4/5 is solved by changing wavelength in available_datasets function in the reader. But is it the right way to handle such issues? Maybe there is a better way to solve that problem? Or maybe separate readers for Landsat 1/2/3 and Landsat 4/5 MSS products should be added?

  • Also, some docs say that Landsat-3 had a thermal band B8, but it failed just after the launch. I am not sure, if any products with B8 actually exist and if it should be added to band list.

Tests

I added separate test files for every reader. They are mostly just the adapted versions of the OLI-TIRS L1 test file. The differences are:

  • L2 product test files test TRAD file instead of sza file

  • Products other than OLI-TIRS do not have test_loading_badchan test, because they do not have products with only spectral or only thermal bands, so they can not face such errors

  • MSS test tests both Landsat-1 and Landsat-4 products and tests if B4 wavelength is set correctly

  • Other small differences

Maybe any other tests should be added?

UPD: I also added alternative implementation of the same tests that uses one file and pytest.mark.parametrize

Additions

  • Should I add Collection-1 support also? Collection-1 products are not available for download now, so it is probably useless. But eoreader have C1 support for some reason

Copy link

codecov bot commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 96.40288% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.28%. Comparing base (1b3c974) to head (16f316c).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
satpy/readers/core/landsat.py 91.25% 30 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3181      +/-   ##
==========================================
- Coverage   96.32%   96.28%   -0.04%     
==========================================
  Files         465      465              
  Lines       58159    58666     +507     
==========================================
+ Hits        56020    56489     +469     
- Misses       2139     2177      +38     
Flag Coverage Δ
behaviourtests 3.61% <0.00%> (-0.04%) ⬇️
unittests 96.37% <96.40%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@simonreise simonreise changed the title Added more Landsat readers Add more Landsat readers Jul 30, 2025
@coveralls
Copy link

coveralls commented Jul 30, 2025

Pull Request Test Coverage Report for Build 16914740677

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 806 of 836 (96.41%) changed or added relevant lines in 2 files are covered.
  • 10 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.03%) to 96.388%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/readers/core/landsat.py 314 344 91.28%
Files with Coverage Reduction New Missed Lines %
satpy/enhancements/init.py 1 90.0%
satpy/tests/enhancement_tests/test_enhancements.py 1 98.57%
satpy/tests/utils.py 2 92.98%
satpy/tests/reader_tests/gms/test_gms5_vissr_l1b.py 3 98.67%
satpy/tests/reader_tests/gms/test_gms5_vissr_navigation.py 3 97.18%
Totals Coverage Status
Change from base Build 16824161845: -0.03%
Covered Lines: 56816
Relevant Lines: 58945

💛 - Coveralls

@simonrp84
Copy link
Member

I don't have time for a full review of this until some time in September, as I'm on annual leave and will then be catching up on work for some time thereafter.

But one quick comment, the file oli_tirs_l1_tif.py (as well as the associated tests) should be renamed, it's no longer a reader for OLI/TIRS only. Something like landsat_base.py might be better...

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a big PR, but thanks for putting it together!
I haven’t checked all the details, but I was wondering if it was possible somehow to refactor the tests, as it seems there is quite some duplication. Maybe using "pytest.mark.parametrize" could help? But as I said, I haven’t checked the details, so maybe my suggestion doesn’t make sense...

@simonreise
Copy link
Contributor Author

simonreise commented Aug 5, 2025

I tried to merge all the tests into a single file test_landsat.py.

I used pytest.mark.parametrize, just as you suggested, and it really helped to get rid of duplicate code, but the final test file is very large and complicated, and parametrize blocks before functions are sometimes lareger than the function itself due to large number of parameters.

So I decided to keep the original test files for now and delete them later, because I am not sure if one single test file or separate test files for each sensor is preferrable in this case.

Let's discuss which test implementation will be more suitable in this case! Also, probably tests could be optimized better (maybe somehow getting rid of similar fixtures), or maybe more tests should be added (e. g. to test bands that appear only in specific products, like GM bands for ETM+)

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, what a pull request. Nice work. I only made it through the landsat_base.py module. I had a possible simplification suggestion:

Could you remove some of the calibration coefficient paths and saturation paths to either the file_type definition in the YAML or to the individual dataset definitions in the YAML? It could even be a format string with a placeholder for {band} and/or other information. If it needs to be per-variable maybe you could add it as part of the available_datasets method? Overall this is not a big deal, but I thought it could easily cut out a large chunk of the python code and make it more obvious that the only difference between the classes/types are where the metadata and factors are stored.

@djhoese djhoese added enhancement code enhancements, features, improvements component:readers labels Aug 6, 2025
@simonreise
Copy link
Contributor Author

I refactored MDReaders. Now band_calibration and band_saturation automatically read all the available parameters, so I completely got rid of product-level MDReader classes, leaving only L1 and L2 classes.

We could probably also keep only one MDReader class by letting band_calibration use self.process_level, but it won't simplify the code, because we will have to keep all the coef keys anyway:

Now:

class LandsatL1MDReader(BaseLandsatMDReader):
    """Metadata file handler for Landsat L1 files (tif)."""

    @property
    def band_calibration(self):
        """Return per-band calibration parameters."""
        radcal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "RADIANCE_MULT", "RADIANCE_ADD")
        viscal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL1_THERMAL_CONSTANTS", "K1_CONSTANT", "K2_CONSTANT")
        topcal = viscal | tircal
        return {key: tuple([*radcal[key], *topcal[key]]) for key in radcal}


class LandsatL2MDReader(BaseLandsatMDReader):
    """Metadata file handler for Landsat L2 files (tif)."""

    @property
    def band_calibration(self):
        """Return per-band calibration parameters."""
        viscal = self.get_cal_params("LEVEL2_SURFACE_REFLECTANCE_PARAMETERS", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL2_SURFACE_TEMPERATURE_PARAMETERS", "TEMPERATURE_MULT", "TEMPERATURE_ADD")
        return viscal | tircal

If merged to one class:

@property
def band_calibration(self):
    """Return per-band calibration parameters."""
    if "1" in self.process_level:
        radcal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "RADIANCE_MULT", "RADIANCE_ADD")
        viscal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL1_THERMAL_CONSTANTS", "K1_CONSTANT", "K2_CONSTANT")
        topcal = viscal | tircal
        return {key: tuple([*radcal[key], *topcal[key]]) for key in radcal}
    elif "2" in self.process_level:
        viscal = self.get_cal_params("LEVEL2_SURFACE_REFLECTANCE_PARAMETERS", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL2_SURFACE_TEMPERATURE_PARAMETERS", "TEMPERATURE_MULT", "TEMPERATURE_ADD")
        return viscal | tircal
    raise ValueError("blahblahblah")

@simonreise
Copy link
Contributor Author

Improved build_area_def function: now it creates area definition using EPSG code instead of WKT string and now it can handle Arctic and Antarctic scenes that have specific projections. I also added a test for an Antarctic scene

area_extent = (ext_p1, ext_p2, ext_p3, ext_p4)

# Return the area extent
return AreaDefinition(f"EPSG: {proj_code}", pcs_id, pcs_id, proj_code, x_size, y_size, area_extent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mraspaud @pnuu and anyone else who might care, what do you think about the name of the area being the EPSG code. I asked to make it more descriptive but now using the EPSG code in this way seems wrong. I'm not sure what a better name would be and I don't know enough about landsat to suggest something else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have all my production areas named like epsg_3035_1km, so in principle I'm fine with that.

Copy link
Contributor Author

@simonreise simonreise Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with the "UTM{utm_zone}" name is that Arctic and Antarctic scenes do not have a UTM zone, so I decided to use EPSG code instead. I used it only because that was the first idea I came with, and if there is a more appropriate area definition name, we probably should use it.

If f"EPSG: {proj_code}" is bad mostly because it is not filename friendly, I can just replace it to f"epsg_{proj_code}"

If "UTM{utm_zone}" approach is preferrable, I can do something like that:

# Reading utm zone or get specific crs for arctic and antarctic
if self.root.find(".//PROJECTION_ATTRIBUTES/UTM_ZONE") is not None:
    utm_zone = self.root.find(".//PROJECTION_ATTRIBUTES/UTM_ZONE").text
    pcs_id = f"{datum} / UTM zone {utm_zone}N"
    proj_code = f"EPSG:326{utm_zone.zfill(2)}"
    name = f"UTM{utm_zone}"
else:
    lat_ts = self.root.find(".//PROJECTION_ATTRIBUTES/TRUE_SCALE_LAT").text
    if lat_ts == "-71.00000":
        # Antarctic
        proj_code = "EPSG:3031"
    if lat_ts == "71.00000":
        # Arctic
        proj_code = "EPSG:3995"
    pcs_id = f"{datum} / EPSG: {proj_code[5:]}N"
    name = f"EPSG_{proj_code[5:]}"
return AreaDefinition(name, ...)

Btw, I adapted the code for the Arctic and Antarctic scenes from here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eh let's just leave it the way it is. If it ends up mattering in the future I'll make an issue or a PR.

@djhoese
Copy link
Member

djhoese commented Aug 8, 2025

Ok we're getting really close to a merge now. The biggest issue I have now is how much CodeScene (the check on this PR) dislikes this implementation. It makes good points about there being a lot of data declarations and there being a lot of function arguments. On the other hand these are tests and you're parametrizing them so you can kind of expect a lot of arguments.

I thought about refactoring the tests for you but ran out of time today. I had the thought that maybe it would be nice to have a base class (ex. LandsatTestBase) that defines these test methods and then the subclasses could have the reader and other reader-specific stuff set in class attributes. However, I'm realizing now that doesn't help you with the parametrization that you'd need to do for each reader. 🤔

@mraspaud @pnuu how do you feel about just leaving CodeScene mad and merging this implementation with a single test file?

Other consideration: Move the metadata definitions in the tests to text files in the tests directory that are loaded when they are needed (a fixture if it makes sense).

@pnuu
Copy link
Member

pnuu commented Aug 8, 2025

I'll see if I have some time tomorrow to have a look at the complaints. I think at least the ones in landsat.py would be nice to clean before merging. With a quick glance adding few helper methods/functions would solve them quite easily.

@pnuu
Copy link
Member

pnuu commented Aug 9, 2025

Pushed the first round of refactorings, lets see what CodeScene thinks about them.

@pnuu
Copy link
Member

pnuu commented Aug 9, 2025

Ok, landsat.py should now be ok. There's not much that we can do to the number of arguments in the __init__() at the momenta.

@pnuu
Copy link
Member

pnuu commented Aug 11, 2025

With 31a6f23 the metadata are moved from the test module to separate text files.

@pnuu
Copy link
Member

pnuu commented Aug 11, 2025

Complex test conditionals refactored in 63bbc0a

@pnuu
Copy link
Member

pnuu commented Aug 11, 2025

@mraspaud asked me to rename the metadata files as .xml, done in 3ea946c

Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup @pnuu. I think I'm fine with this being merged as-is, but I also posted a possible further refactoring of the tests on slack that may or may not be worth it. I'll copy that message here:

a possible restructuring of the tests could be:

1. Per-reader test classes with one (or more) base classes.
2. The base class define the test methods but without the test_ prefix.
3. The subclasses defined class attributes for reader and classes to use.
4. The subclasses define the test methods and parametrize as needed but call the base classes method that does all the actual work. The base class will use the class attributes (self.X) and the parametrized values will be passed in as arguments.

This simplifies the number of arguments per test and separates based on reader/instrument. These classes could then be separated in to per-reader modules if it makes sense

@djhoese
Copy link
Member

djhoese commented Aug 12, 2025

Ok I've reduced the number of arguments by quite a bit, but they are still over the threshold for CodeScene. The main changes are:

  1. Call get_filename_info in the parametrize arguments (4 arguments -> 1)
  2. Use lazy fixtures instead of request.. We already depend on this pytest plugin and it means we can get rid of the usage of request. Reduces the number of arguments by 1. FYI @simonreise this might be a nice thing to keep in mind in your other projects that may use pytest.

I could combine the filename fixtures with the filename_info since the two are typically together but it wouldn't reduce the number of arguments enough for most of the test methods. The next option would be to do my subclassing idea, but I'm not sure I want to put that much work into this.

@simonreise
Copy link
Contributor Author

simonreise commented Sep 1, 2025

I thought about @djhoese 's suggestion about per-reader test classes, and maybe in this case we could get rid from parametrization at all, as in the current implementation every parameter set usually corresponds to a single reader? So we could probably do something like this:

class BaseLandsatTest:
    def _basicload(remote):
        if remote:
            all_files = convert_to_fsfile(self.all_files)
        else:
            all_files = self.all_files

        scn = Scene(reader=self.reader, filenames=all_files)
        if thermal_name is not None:
            scn.load([self.spectral_name, self.thermal_name])
        else:
            scn.load([self.spectral_name])

        self._check_basicload_thermal(scn)
        self._check_basicload_mss_l1_tif(scn)
   
    ...
    
    def _ch_startend(self):
        """Test correct retrieval of start/end times."""
        scn = Scene(reader=self.reader, filenames=[self.spectral_file, self.mda_file])
        bnds = scn.available_dataset_names()
        assert bnds == [self.spectral_name]

        scn.load(["B4"])
        assert scn.start_time == self.date_time
        assert scn.end_time == self.date_time

class TestLandsatOLITIRSL1(BaseLandsatTest):
    reader = "oli_tirs_l1_tif"
    spectral_name = "B4"
    thermal_name = "B11"
    all_files = lf("oli_tirs_l1_all_files")
    ...
    
    @pytest.mark.parametrize("remote", [True, False])
    def test_basicload(self, remote):
        self._basicload(remote)
    
    def test_ch_startend(self):
        self._ch_startend()

We could use self. everywhere we can, so number of parameters is reduced to minimum. We could also use nested class structure just like in the Landsat class itself, it would help us to get rid of complex if-else blocks.

Should I implement it this way?

@djhoese
Copy link
Member

djhoese commented Sep 1, 2025

@simonreise should we maybe merge this and then do a separate refactor PR? I'm actually not sure why we haven't merged this already.

@simonreise
Copy link
Contributor Author

simonreise commented Sep 1, 2025

Shouldn't @ simonrp84 also review this? He said he will review the PR in September after his summer vacation in one of the first messages in the thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants