Add more Landsat readers #3181

simonreise · 2025-07-30T10:59:44Z

Landsat readers

Overview

This PR adds readers for Landsat OLI-TIRS L2, ETM+ L1-2, TM L1-2 and MSS products.

All these readers are mostly based on the existing Landsat OLI-TIRS L1 reader.

Readers

Landsat reader classes are hierarchially organized into a nested structure of classes where

BaseLandsatReader class contains __init__ and get_dataset functions. Their logic is mostly copied from the original reader, with few additions and bug fixes.
Landsat readers for L1 and L2 classes contain calibration logic, which is the same for every sensor, but differs for L1 and L2 products. L1 logic is copied from the existing reader.

Note: I moved the * 0.001 scaling logic for TRAD, URAD, CDIST etc bands to calibration. Maybe it belongs to get_dataset function, just like ang bands * 0.01 scaling in L1 reader? Or should I move ang bands *0.01 scaling to calibrations also?

The product readers. They mostly contain attributes like spectral and thermal band list and sensor name. But MSSCHReader also contains the available_datasets functions that dynamically sets B4 wavelength (see yamls section for more info)

YAMLs

I created YAML files for every reader.

They are all based on the OLI-TIRS L1 YAML file.

What was updated or needs additional checking:

Reader namings. Should etm+ reader be named "etm_lx_tif" or "etm_plus_lx_tif"?
data_identification_keys section was added to the head of the file to add custom calibration support to L2 products. Is that necessary? Should custom calibrations be implemented in any other way?
{collection_category} was limited to {collection_category:2s} because otherwise L2 product and Landsat-7 (which also contains GM bands) search was faulty. AFAIK collection category can contain only 2 symbols, but if not, it can cause errors.
QA band namings: see Add new QA band filename support to Landsat reader #3176
calibration standard_names and units probably should be double-checked
wavelength was added or updated according to official Landsat Data Format Control Books. But probably also should be double-checked.
Again, custom calibrations at band definitions at datasets section. Are they added the right way?
MSS bands. In Landsat 1/2/3 MSS products bands are named B4, B5, B6, B7 and in Landsat 4/5 MSS products the same bands are named B1, B2, B3, B4. Now MSS reader just read the bands and their names as-is. The issue with B4 band being green in Landsat 1/2/3 and nir in Landsat 4/5 is solved by changing wavelength in available_datasets function in the reader. But is it the right way to handle such issues? Maybe there is a better way to solve that problem? Or maybe separate readers for Landsat 1/2/3 and Landsat 4/5 MSS products should be added?
Also, some docs say that Landsat-3 had a thermal band B8, but it failed just after the launch. I am not sure, if any products with B8 actually exist and if it should be added to band list.

Tests

I added separate test files for every reader. They are mostly just the adapted versions of the OLI-TIRS L1 test file. The differences are:

L2 product test files test TRAD file instead of sza file
Products other than OLI-TIRS do not have test_loading_badchan test, because they do not have products with only spectral or only thermal bands, so they can not face such errors
MSS test tests both Landsat-1 and Landsat-4 products and tests if B4 wavelength is set correctly
Other small differences

Maybe any other tests should be added?

UPD: I also added alternative implementation of the same tests that uses one file and pytest.mark.parametrize

Additions

Should I add Collection-1 support also? Collection-1 products are not available for download now, so it is probably useless. But eoreader have C1 support for some reason

codecov · 2025-07-30T11:06:21Z

Codecov Report

❌ Patch coverage is 96.40288% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.28%. Comparing base (1b3c974) to head (16f316c).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
satpy/readers/core/landsat.py	91.25%	30 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3181      +/-   ##
==========================================
- Coverage   96.32%   96.28%   -0.04%     
==========================================
  Files         465      465              
  Lines       58159    58666     +507     
==========================================
+ Hits        56020    56489     +469     
- Misses       2139     2177      +38

Flag	Coverage Δ
behaviourtests	`3.61% <0.00%> (-0.04%)`	⬇️
unittests	`96.37% <96.40%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coveralls · 2025-07-30T11:32:44Z

Pull Request Test Coverage Report for Build 16914740677

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

806 of 836 (96.41%) changed or added relevant lines in 2 files are covered.
10 unchanged lines in 5 files lost coverage.
Overall coverage decreased (-0.03%) to 96.388%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
satpy/readers/core/landsat.py	314	344	91.28%

Files with Coverage Reduction	New Missed Lines	%
satpy/enhancements/init.py	1	90.0%
satpy/tests/enhancement_tests/test_enhancements.py	1	98.57%
satpy/tests/utils.py	2	92.98%
satpy/tests/reader_tests/gms/test_gms5_vissr_l1b.py	3	98.67%
satpy/tests/reader_tests/gms/test_gms5_vissr_navigation.py	3	97.18%

Totals
Change from base Build 16824161845:	-0.03%
Covered Lines:	56816
Relevant Lines:	58945

💛 - Coveralls

simonrp84 · 2025-08-01T13:27:05Z

I don't have time for a full review of this until some time in September, as I'm on annual leave and will then be catching up on work for some time thereafter.

But one quick comment, the file oli_tirs_l1_tif.py (as well as the associated tests) should be renamed, it's no longer a reader for OLI/TIRS only. Something like landsat_base.py might be better...

mraspaud

That’s a big PR, but thanks for putting it together!
I haven’t checked all the details, but I was wondering if it was possible somehow to refactor the tests, as it seems there is quite some duplication. Maybe using "pytest.mark.parametrize" could help? But as I said, I haven’t checked the details, so maybe my suggestion doesn’t make sense...

simonreise · 2025-08-05T22:44:45Z

I tried to merge all the tests into a single file test_landsat.py.

I used pytest.mark.parametrize, just as you suggested, and it really helped to get rid of duplicate code, but the final test file is very large and complicated, and parametrize blocks before functions are sometimes lareger than the function itself due to large number of parameters.

So I decided to keep the original test files for now and delete them later, because I am not sure if one single test file or separate test files for each sensor is preferrable in this case.

Let's discuss which test implementation will be more suitable in this case! Also, probably tests could be optimized better (maybe somehow getting rid of similar fixtures), or maybe more tests should be added (e. g. to test bands that appear only in specific products, like GM bands for ETM+)

djhoese

Wow, what a pull request. Nice work. I only made it through the landsat_base.py module. I had a possible simplification suggestion:

Could you remove some of the calibration coefficient paths and saturation paths to either the file_type definition in the YAML or to the individual dataset definitions in the YAML? It could even be a format string with a placeholder for {band} and/or other information. If it needs to be per-variable maybe you could add it as part of the available_datasets method? Overall this is not a big deal, but I thought it could easily cut out a large chunk of the python code and make it more obvious that the only difference between the classes/types are where the metadata and factors are stored.

satpy/readers/landsat_base.py

simonreise · 2025-08-06T20:53:15Z

I refactored MDReaders. Now band_calibration and band_saturation automatically read all the available parameters, so I completely got rid of product-level MDReader classes, leaving only L1 and L2 classes.

We could probably also keep only one MDReader class by letting band_calibration use self.process_level, but it won't simplify the code, because we will have to keep all the coef keys anyway:

Now:

class LandsatL1MDReader(BaseLandsatMDReader):
    """Metadata file handler for Landsat L1 files (tif)."""

    @property
    def band_calibration(self):
        """Return per-band calibration parameters."""
        radcal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "RADIANCE_MULT", "RADIANCE_ADD")
        viscal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL1_THERMAL_CONSTANTS", "K1_CONSTANT", "K2_CONSTANT")
        topcal = viscal | tircal
        return {key: tuple([*radcal[key], *topcal[key]]) for key in radcal}


class LandsatL2MDReader(BaseLandsatMDReader):
    """Metadata file handler for Landsat L2 files (tif)."""

    @property
    def band_calibration(self):
        """Return per-band calibration parameters."""
        viscal = self.get_cal_params("LEVEL2_SURFACE_REFLECTANCE_PARAMETERS", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL2_SURFACE_TEMPERATURE_PARAMETERS", "TEMPERATURE_MULT", "TEMPERATURE_ADD")
        return viscal | tircal

If merged to one class:

@property
def band_calibration(self):
    """Return per-band calibration parameters."""
    if "1" in self.process_level:
        radcal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "RADIANCE_MULT", "RADIANCE_ADD")
        viscal = self.get_cal_params("LEVEL1_RADIOMETRIC_RESCALING", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL1_THERMAL_CONSTANTS", "K1_CONSTANT", "K2_CONSTANT")
        topcal = viscal | tircal
        return {key: tuple([*radcal[key], *topcal[key]]) for key in radcal}
    elif "2" in self.process_level:
        viscal = self.get_cal_params("LEVEL2_SURFACE_REFLECTANCE_PARAMETERS", "REFLECTANCE_MULT", "REFLECTANCE_ADD")
        tircal = self.get_cal_params("LEVEL2_SURFACE_TEMPERATURE_PARAMETERS", "TEMPERATURE_MULT", "TEMPERATURE_ADD")
        return viscal | tircal
    raise ValueError("blahblahblah")

simonreise · 2025-08-06T22:24:58Z

Improved build_area_def function: now it creates area definition using EPSG code instead of WKT string and now it can handle Arctic and Antarctic scenes that have specific projections. I also added a test for an Antarctic scene

djhoese · 2025-08-07T15:39:52Z

satpy/readers/core/landsat.py

+        area_extent = (ext_p1, ext_p2, ext_p3, ext_p4)
+
+        # Return the area extent
+        return AreaDefinition(f"EPSG: {proj_code}", pcs_id, pcs_id, proj_code, x_size, y_size, area_extent)


@mraspaud @pnuu and anyone else who might care, what do you think about the name of the area being the EPSG code. I asked to make it more descriptive but now using the EPSG code in this way seems wrong. I'm not sure what a better name would be and I don't know enough about landsat to suggest something else.

I have all my production areas named like epsg_3035_1km, so in principle I'm fine with that.

The problem with the "UTM{utm_zone}" name is that Arctic and Antarctic scenes do not have a UTM zone, so I decided to use EPSG code instead. I used it only because that was the first idea I came with, and if there is a more appropriate area definition name, we probably should use it.

If f"EPSG: {proj_code}" is bad mostly because it is not filename friendly, I can just replace it to f"epsg_{proj_code}"

If "UTM{utm_zone}" approach is preferrable, I can do something like that:

# Reading utm zone or get specific crs for arctic and antarctic if self.root.find(".//PROJECTION_ATTRIBUTES/UTM_ZONE") is not None: utm_zone = self.root.find(".//PROJECTION_ATTRIBUTES/UTM_ZONE").text pcs_id = f"{datum} / UTM zone {utm_zone}N" proj_code = f"EPSG:326{utm_zone.zfill(2)}" name = f"UTM{utm_zone}" else: lat_ts = self.root.find(".//PROJECTION_ATTRIBUTES/TRUE_SCALE_LAT").text if lat_ts == "-71.00000": # Antarctic proj_code = "EPSG:3031" if lat_ts == "71.00000": # Arctic proj_code = "EPSG:3995" pcs_id = f"{datum} / EPSG: {proj_code[5:]}N" name = f"EPSG_{proj_code[5:]}" return AreaDefinition(name, ...)

Btw, I adapted the code for the Arctic and Antarctic scenes from here

eh let's just leave it the way it is. If it ends up mattering in the future I'll make an issue or a PR.

satpy/tests/reader_tests/test_landsat.py

djhoese · 2025-08-08T18:34:25Z

Ok we're getting really close to a merge now. The biggest issue I have now is how much CodeScene (the check on this PR) dislikes this implementation. It makes good points about there being a lot of data declarations and there being a lot of function arguments. On the other hand these are tests and you're parametrizing them so you can kind of expect a lot of arguments.

I thought about refactoring the tests for you but ran out of time today. I had the thought that maybe it would be nice to have a base class (ex. LandsatTestBase) that defines these test methods and then the subclasses could have the reader and other reader-specific stuff set in class attributes. However, I'm realizing now that doesn't help you with the parametrization that you'd need to do for each reader. 🤔

@mraspaud @pnuu how do you feel about just leaving CodeScene mad and merging this implementation with a single test file?

Other consideration: Move the metadata definitions in the tests to text files in the tests directory that are loaded when they are needed (a fixture if it makes sense).

pnuu · 2025-08-08T18:48:08Z

I'll see if I have some time tomorrow to have a look at the complaints. I think at least the ones in landsat.py would be nice to clean before merging. With a quick glance adding few helper methods/functions would solve them quite easily.

pnuu · 2025-08-09T13:45:08Z

Pushed the first round of refactorings, lets see what CodeScene thinks about them.

pnuu · 2025-08-09T14:02:36Z

Ok, landsat.py should now be ok. There's not much that we can do to the number of arguments in the __init__() at the momenta.

pnuu · 2025-08-11T08:52:45Z

With 31a6f23 the metadata are moved from the test module to separate text files.

pnuu · 2025-08-11T10:58:32Z

Complex test conditionals refactored in 63bbc0a

pnuu · 2025-08-11T11:11:10Z

@mraspaud asked me to rename the metadata files as .xml, done in 3ea946c

djhoese

Nice cleanup @pnuu. I think I'm fine with this being merged as-is, but I also posted a possible further refactoring of the tests on slack that may or may not be worth it. I'll copy that message here:

a possible restructuring of the tests could be:

1. Per-reader test classes with one (or more) base classes.
2. The base class define the test methods but without the test_ prefix.
3. The subclasses defined class attributes for reader and classes to use.
4. The subclasses define the test methods and parametrize as needed but call the base classes method that does all the actual work. The base class will use the class attributes (self.X) and the parametrized values will be passed in as arguments.

This simplifies the number of arguments per test and separates based on reader/instrument. These classes could then be separated in to per-reader modules if it makes sense

djhoese · 2025-08-12T16:27:06Z

Ok I've reduced the number of arguments by quite a bit, but they are still over the threshold for CodeScene. The main changes are:

Call get_filename_info in the parametrize arguments (4 arguments -> 1)
Use lazy fixtures instead of request.. We already depend on this pytest plugin and it means we can get rid of the usage of request. Reduces the number of arguments by 1. FYI @simonreise this might be a nice thing to keep in mind in your other projects that may use pytest.

I could combine the filename fixtures with the filename_info since the two are typically together but it wouldn't reduce the number of arguments enough for most of the test methods. The next option would be to do my subclassing idea, but I'm not sure I want to put that much work into this.

simonreise · 2025-09-01T11:48:55Z

I thought about @djhoese 's suggestion about per-reader test classes, and maybe in this case we could get rid from parametrization at all, as in the current implementation every parameter set usually corresponds to a single reader? So we could probably do something like this:

class BaseLandsatTest:
    def _basicload(remote):
        if remote:
            all_files = convert_to_fsfile(self.all_files)
        else:
            all_files = self.all_files

        scn = Scene(reader=self.reader, filenames=all_files)
        if thermal_name is not None:
            scn.load([self.spectral_name, self.thermal_name])
        else:
            scn.load([self.spectral_name])

        self._check_basicload_thermal(scn)
        self._check_basicload_mss_l1_tif(scn)
   
    ...
    
    def _ch_startend(self):
        """Test correct retrieval of start/end times."""
        scn = Scene(reader=self.reader, filenames=[self.spectral_file, self.mda_file])
        bnds = scn.available_dataset_names()
        assert bnds == [self.spectral_name]

        scn.load(["B4"])
        assert scn.start_time == self.date_time
        assert scn.end_time == self.date_time

class TestLandsatOLITIRSL1(BaseLandsatTest):
    reader = "oli_tirs_l1_tif"
    spectral_name = "B4"
    thermal_name = "B11"
    all_files = lf("oli_tirs_l1_all_files")
    ...
    
    @pytest.mark.parametrize("remote", [True, False])
    def test_basicload(self, remote):
        self._basicload(remote)
    
    def test_ch_startend(self):
        self._ch_startend()

We could use self. everywhere we can, so number of parameters is reduced to minimum. We could also use nested class structure just like in the Landsat class itself, it would help us to get rid of complex if-else blocks.

Should I implement it this way?

djhoese · 2025-09-01T12:05:32Z

@simonreise should we maybe merge this and then do a separate refactor PR? I'm actually not sure why we haven't merged this already.

simonreise · 2025-09-01T12:10:40Z

Shouldn't @ simonrp84 also review this? He said he will review the PR in September after his summer vacation in one of the first messages in the thread.

Added Landsat readers

d556dc2

simonreise requested review from djhoese and mraspaud as code owners July 30, 2025 10:59

eof fix

ba456fb

simonreise mentioned this pull request Jul 30, 2025

Add new QA band filename support to Landsat reader #3176

Closed

4 tasks

simonreise changed the title ~~Added more Landsat readers~~ Add more Landsat readers Jul 30, 2025

Renamed file to landsat_base

83ce152

mraspaud reviewed Aug 5, 2025

View reviewed changes

simonreise added 3 commits August 6, 2025 02:22

United tests to the single file

ec3c938

Ruff format

2bc10c3

Merge remote-tracking branch 'upstream/main' into oli_tirs_l2_tif

fd20afc

djhoese requested changes Aug 6, 2025

View reviewed changes

djhoese added enhancement code enhancements, features, improvements component:readers labels Aug 6, 2025

simonreise added 2 commits August 7, 2025 00:38

Renamed and moved landsat_base.py and refactored it

4434dd8

Merge remote-tracking branch 'upstream/main' into oli_tirs_l2_tif

25a99f2

Improved area definition

7165b97

djhoese reviewed Aug 7, 2025

View reviewed changes

satpy/tests/reader_tests/test_landsat.py Outdated Show resolved Hide resolved

satpy/tests/reader_tests/test_landsat.py Outdated Show resolved Hide resolved

simonreise added 5 commits August 8, 2025 01:53

Changed scope to module

61ad8a4

Merge remote-tracking branch 'upstream/main' into oli_tirs_l2_tif

745a0c4

Removed setup_method in favour of setting filename_info in functions

cb2e133

Removed one-per-sensor test files

72c85ae

Better area definition name

4b12d45

simonreise added 3 commits August 8, 2025 12:38

Merge remote-tracking branch 'upstream/main' into oli_tirs_l2_tif

f4240a4

Moved filename_info setup to separate function

99d452b

Ruff fix

348d9bc

pnuu added 3 commits August 9, 2025 16:12

Refactor get_dataset() in Landsat base class

2816e8b

Refactor get_avaiable_datasets() in MSSCHReader reader

0b28b6c

Refactor get_cal_params in BaseLandsatMDFReader

c28b5b4

pnuu added 2 commits August 9, 2025 16:54

Refactor channel availability checks in BaseLandsatReader

da6a28c

Split complex conditionals in channel availability checks

3c3edc8

Move Landsat metadata to text files

31a6f23

Refactor Landsat calibration and basicload tests

63bbc0a

Rename Landsat metadata files as XML

3ea946c

djhoese approved these changes Aug 12, 2025

View reviewed changes

djhoese added 4 commits August 12, 2025 10:59

Refactor landsat tests to use filename_info argument

275ef61

Rename landsat test data directory

eeb9a5f

Use lazy fixtures in landsat tests

9ea9add

Fix multiple spaces after commas

16f316c

Add more Landsat readers #3181

Are you sure you want to change the base?

Add more Landsat readers #3181

Uh oh!

Conversation

simonreise commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Landsat readers

Overview

Readers

YAMLs

Tests

Additions

Uh oh!

codecov bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16914740677

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

simonrp84 commented Aug 1, 2025

Uh oh!

mraspaud left a comment

Choose a reason for hiding this comment

Uh oh!

simonreise commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

djhoese left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

simonreise commented Aug 6, 2025

Uh oh!

simonreise commented Aug 6, 2025

Uh oh!

djhoese Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

pnuu Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

simonreise Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

djhoese Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

djhoese commented Aug 8, 2025

Uh oh!

pnuu commented Aug 8, 2025

Uh oh!

pnuu commented Aug 9, 2025

Uh oh!

pnuu commented Aug 9, 2025

Uh oh!

pnuu commented Aug 11, 2025

Uh oh!

pnuu commented Aug 11, 2025

Uh oh!

pnuu commented Aug 11, 2025

Uh oh!

djhoese left a comment

simonreise commented Jul 30, 2025 •

edited

Loading

codecov bot commented Jul 30, 2025 •

edited

Loading

coveralls commented Jul 30, 2025 •

edited

Loading

simonreise commented Aug 5, 2025 •

edited

Loading

simonreise Aug 7, 2025 •

edited

Loading

simonreise commented Sep 1, 2025 •

edited

Loading

simonreise commented Sep 1, 2025 •

edited

Loading