Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] classif step 1 : Create a base class for classification layers #692

Open
adebardo opened this issue Mar 3, 2025 · 1 comment · May be fixed by GlacioHack/geoutils#668
Open

[POC] classif step 1 : Create a base class for classification layers #692

adebardo opened this issue Mar 3, 2025 · 1 comment · May be fixed by GlacioHack/geoutils#668
Labels
[POC] Conception To review Tickets needs approval about it conception

Comments

@adebardo
Copy link
Contributor

adebardo commented Mar 3, 2025

Context

To kick off the implementation of classification layers in an object-oriented way, the first step is to define a base ClassificationLayer class. This class will act as the foundation for all specific classification types (e.g., SegmentationClassificationLayer, SlopeClassificationLayer, FusionClassificationLayer). The purpose of the base class is to encapsulate the common functionality shared by all classification types while allowing flexibility for subclasses to implement their specific logic.

Code

In a new python file classification.py in xDEM source code, write the ClassificationLayer abstract class:

  1. Define Common Attributes: The base class should define attributes that will be common accross all classification layers, such as:
    • dem DEM on which the classification will be applied to.
    • name name for the classification layer.
    • req_stats list of required statistics to compute (optional, all statistics in geoutils.Raster computed by default).
    • req_stats_classes list of the classes on which the statistics will be applied (optional, all classes by default).
    • class_names dict connecting the class names to the class indexes (set to None).
    • classification result of the classification, a geoutils.Mask object, (set to None).
    • stats_dict required statistics for required classes in a dict (set to None).
  2. Abstract Method: Since different classification layers will have their own classification logic (e.g., segmentation masks vs. slope ranges), the base class should declare abstract methods that the subclasses are required to implement, such as apply_classification(). This method will be used to compute the classification attribute, which will be a geoutils.Mask object, in which each band will represent one class mask.
  3. Statistics computation: A common feature across all classification layers is the ability to compute statistics on the classified pixels (e.g., mean, standard deviation). The base class should provide a get_stats() method that takes into account the two last attributes. The output, as stats attribute, should be a dict, in which the first layer represent the classes, and the second the statistics. This method should use for each required classes the DEM.set_mask() to apply the classification mask and the DEM.get_stats() method to compute the required statistics. The result will be a dict stored under stats attribute.
  4. Saving results: An other common feature is the ability to save the results. The save() method shoud have an output_dir in input, and save:
    • The classification object with the Mask.save() method, under name.tif;
    • The class_name attribute, that represents the name of each class in a dict, under name_classes.json;
    • The stats attribute, under name_stats.json OR name_stats.csv.

Documentation

We need to start a documentation page on this subject.

@adebardo adebardo added the [POC] Conception To review Tickets needs approval about it conception label Mar 3, 2025
@rhugonnet
Copy link
Member

Great to have this overview, including #693 to #696! 🙂
I'm commenting for all 5 issues below.

Only a couple conceptual remarks at this stage:

  1. Storing the classif output: I would be in favor of relying on pd.DataFrame objects instead of dictionaries to report bins. They are made for this, by supporting interval indexing (e.g., continuous with open/close support such as [1, 2[, [2, 5[, for binning, or discrete for segmentation such as [1], [5]), and by being able to natively combine several bins through multiple indexing (https://pandas.pydata.org/docs/user_guide/advanced.html; can also use named columns to simplify). That would support both types of binning mentioned (discrete=segmentation and continuous=slope), and through multi-indexing there might not be a need for a specific "fusion" type? (if I understood the objectives there correctly).
  2. Performing the classif: Note that we have multiple-variable classification (= N-D binning) in xDEM already (code here: https://github.com/GlacioHack/xdem/blob/main/xdem/spatialstats.py#L77; example of 2-D application with a figure here: https://xdem.readthedocs.io/en/stable/uncertainty.html#heteroscedasticity). We chose to rely on scipy.nd_binning at the time, but we could also switch to pd.groupby() that is now more modular than 5 years ago and available for rasters through Xarray as well (so soon in GeoUtils/xDEM through the accessor). With those functionalities already performing the binning, we might not need the ClassificationLayer classes? (I'm not sure I grasped all the objective of the classes! See my final note below 😉)
  3. Implement in GeoUtils directly: As classification is also very useful for any raster, as for get_stats(), we could have the binning functionality directly in GeoUtils, for example Raster.nd_binning() or Raster.groupby() (to match the Xarray accessor to come) returning a pd.DataFrame. We had actually planned to move xDEM's nd_binning() (https://github.com/GlacioHack/xdem/blob/main/xdem/spatialstats.py#L77) to geoutils/stats/, see details here: Re-structure spatialstats.py #378.

To understand the implementation better, I think what I'm missing is an explanation of the needs and their link to the class structure 😄 : Do we need to save specific spatial metadata/rasters from the bins that we can't with nd_binning or groupby()? Or other?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[POC] Conception To review Tickets needs approval about it conception
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants