MultiClean is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.
Below: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).
pip install multicleanor
uv add multicleanimport numpy as np
from multiclean import clean_array
# Create a sample classification array with classes 0, 1, 2, 3
array = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)
# Clean with default parameters
cleaned = clean_array(array)
# Custom parameters
cleaned = clean_array(
array,
class_values=[0, 1, 2, 3],
smooth_edge_size=2, # kernel width, larger value increases smoothness
min_island_size=100, # remove components with area < 100
connectivity=8, # 4 or 8
max_workers=4,
fill_nan=False # enable/disable the filling of nan values in input array
)MultiClean is designed for cleaning segmentation outputs from:
- Remote sensing: Land cover classification, crop mapping
- Computer vision: Semantic segmentation post-processing
- Geospatial analysis: Raster classification cleaning
- Machine learning: Neural network output refinement
- Multi-class processing: Clean all classes in one pass
- Edge smoothing: Morphological opening to reduce jagged boundaries
- Island removal: Remove small connected components per class
- Gap filling: Fill invalids via nearest valid class (distance transform)
- Fast: NumPy + OpenCV + SciPy with parallelism
MultiClean uses morphological operations to clean classification arrays:
- Edge smoothing (per class): Morphological opening with an elliptical kernel.
- Island removal (per class): Find connected components (OpenCV) and mark components with area
< min_island_sizeas invalid. - Gap filling: Compute a distance transform to copy the nearest valid class into invalid pixels.
Classes are processed together and the result maintains a valid label at every pixel.
from multiclean import clean_array
out = clean_array(
array: np.ndarray,
class_values: int | list[int] | None = None,
smooth_edge_size: int = 2,
min_island_size: int = 100,
connectivity: int = 4,
max_workers: int | None = None,
fill_nan: bool = False
)array: 2D numpy array of class labels (int or float). For float arrays,NaNis treated as nodata and will remainNaNunlessfill_nanis set toTrue.class_values: Classes to consider. IfNone, inferred fromarray(ignoresNaNfor floats). An int restricts cleaning to a single class.smooth_edge_size: Kernel size (pixels) for morphological opening. Use0to disable.min_island_size: Remove components with area strictly< min_island_size. Use1to keep single pixels.connectivity: Pixel connectivity for components,4or8.max_workers: Parallelism for per-class operations (None lets the executor choose).fill_nan: If True will fill NAN values from input array with nearest valid value.
Returns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with NaN are supported and can be filled or remain as NAN.
from multiclean import clean_array
import rasterio
# Read land cover classification
with rasterio.open('landcover.tif') as src:
landcover = src.read(1)
# Clean with appropriate parameters for satellite data
cleaned = clean_array(
landcover,
class_values=[0, 1, 2, 3, 4], # forest, water, urban, crop, other
smooth_edge_size=1,
min_island_size=25,
connectivity=8,
fill_nan=False
)from multiclean import clean_array
# Model produces logits; convert to class predictions
np_pred = np_model_logits.argmax(axis=0) # shape: (H, W)
# Clean the segmentation
cleaned = clean_array(
np_pred,
smooth_edge_size=2,
min_island_size=100,
connectivity=4,
)See the notebooks folder for end-to-end examples:
- Land Use Example Notebook: land use classification cleaning
- Cloud Example Notebook: cloud/shadow classification cleaning
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
