Skip to content

Commit 498a41f

Browse files
authored
Add fiboa improve command #79 #21 (#114)
* Add fiboa improve command #79 #21 * Write custom schemas to fiboa metadata for use in improve/merge/etc. #113 and minor fixes * Make geometries valid and explode to Polygons by default #119 * Explode polygons option for improve command * Add minimal test * Fix pick_schemas
1 parent a46e051 commit 498a41f

23 files changed

+316
-46
lines changed

CHANGELOG.md

+20-1
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,29 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
99

1010
### Added
1111

12+
- Command `fiboa improve` with helpers to
13+
- change the CRS
14+
- change the GeoParquet version and compression
15+
- fill missing perimeter/area values
16+
- fix invalid geometries
17+
- rename columns
1218
- Converter for Lithuania (EuroCrops)
13-
- Converter for Switzerland
1419
- Converter for Slovenia
1520
- Converter for Slovakia
21+
- Converter for Switzerland
22+
- `fiboa convert`: New parameter `--original-geometries` / `-og` to keep the original geometries
23+
24+
### Changed
25+
26+
- `fiboa convert`:
27+
- Writes custom schemas to collection metadata
28+
- Geometries are made valid using GeoPanda's `make_valid` method by default
29+
- MultiPolygons are converted to Polygons by default
30+
- `fiboa validate` uses custom schemas for validation
31+
- `fiboa merge` keeps custom schemas when needed
32+
33+
### Removed
34+
- `fiboa convert`: Removed the explicit parameter `explode_multipolygon` from the converter
1635

1736
### Fixed
1837

README.md

+26-4
Original file line numberDiff line numberDiff line change
@@ -8,19 +8,20 @@ A command-line interface (CLI) for working with fiboa.
88

99
## Getting Started
1010

11-
In order to make working with fiboa easier we have developed command-line interface (CLI) tools such as
11+
In order to make working with fiboa easier we have developed command-line interface (CLI) tools such as
1212
inspection, validation and file format conversions.
1313

1414
### Installation
1515

16-
You will need to have **Python 3.9** or any later version installed.
16+
You will need to have **Python 3.9** or any later version installed.
1717

1818
Run `pip install fiboa-cli` in the CLI to install the validator.
1919

2020
**Optional:** To install additional dependencies for specific [converters](#converter-for-existing-datasets),
2121
you can for example run: `pip install fiboa-cli[xyz]` with xyz being the converter name.
2222

2323
**Note on versions:**
24+
2425
- fiboa CLI >= 0.3.0 works with fiboa version > 0.2.0
2526
- fiboa CLI < 0.3.0 works with fiboa version = 0.1.0
2627

@@ -44,6 +45,7 @@ fiboa CLI supports various commands to work with the files:
4445
- [Merge fiboa GeoParquet files](#merge-fiboa-geoparquet-files)
4546
- [Create JSON Schema from fiboa Schema](#create-json-schema-from-fiboa-schema)
4647
- [Validate a fiboa Schema](#validate-a-fiboa-schema)
48+
- [Improve a fiboa Parquet file](#improve-a-fiboa-parquet-file)
4749
- [Update an extension template with new names](#update-an-extension-template-with-new-names)
4850
- [Converter for existing datasets](#converter-for-existing-datasets)
4951
- [Development](#development)
@@ -121,19 +123,38 @@ To validate a fiboa Schema YAML file, you can for example run:
121123

122124
Check `fiboa validate-schema --help` for more details.
123125

126+
### Improve a fiboa Parquet file
127+
128+
Various "improvements" can be applied to a fiboa GeoParquet file.
129+
The commands allows to
130+
131+
- change the CRS (`--crs`)
132+
- change the GeoParquet version (`-gp1`) and compression (`-pc`)
133+
- add/fill missing perimeter/area values (`-sz`)
134+
- fix invalid geometries (`-g`)
135+
- rename columns (`-r`)
136+
137+
Example:
138+
139+
- `fiboa improve file.parquet -o file2.parquet -g -sz -r old=new -pc zstd`
140+
141+
Check `fiboa improve --help` for more details.
142+
124143
### Update an extension template with new names
125144

126145
Once you've created and git cloned a new extension, you can use the CLI
127146
to update all template placeholders with proper names.
128147

129148
For example, if your extension is meant to have
130-
- the title "Timestamps Extension",
149+
150+
- the title "Timestamps Extension",
131151
- the prefix `ts` (e.g. field `ts:created` or `ts:updated`),
132152
- is hosted at `https://github.io/fiboa/timestamps-extension`
133153
(organization: `fiboa`, repository `timestamps-extension`),
134154
- and you run fiboa in the folder of the extension.
135155

136156
Then the following command could be used:
157+
137158
- `fiboa rename-extension . -t Timestamps -p ts -s timestamps-extension -o fiboa`
138159

139160
Check `fiboa rename-extension --help` for more details.
@@ -143,13 +164,14 @@ Check `fiboa rename-extension --help` for more details.
143164
The CLI ships various converters for existing datasets.
144165

145166
To get a list of available converters/datasets with title, license, etc. run:
167+
146168
- `fiboa converters`
147169

148170
Use any of the IDs from the list to convert an existing dataset to fiboa:
149171

150172
- `fiboa convert de_nrw`
151173

152-
See [Implement a converter](#implement-a-converter) for details about how to
174+
See [Implement a converter](#implement-a-converter) for details about how to
153175

154176
## Development
155177

fiboa_cli/__init__.py

+89-9
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,18 @@
66
import click
77
import pandas as pd
88

9+
from .const import COMPRESSION_METHODS, CORE_COLUMNS
910
from .convert import convert as convert_
1011
from .convert import list_all_converter_ids, list_all_converters
1112
from .create_geojson import create_geojson as create_geojson_
1213
from .create_geoparquet import create_geoparquet as create_geoparquet_
1314
from .describe import describe as describe_
14-
from .merge import merge as merge_, DEFAULT_COLUMNS, DEFAULT_CRS
15+
from .improve import improve as improve_
16+
from .merge import merge as merge_, DEFAULT_CRS
1517
from .jsonschema import jsonschema as jsonschema_
1618
from .rename_extension import rename_extension as rename_extension_
1719
from .util import (check_ext_schema_for_cli, log, parse_converter_input_files,
18-
valid_file_for_cli, valid_file_for_cli_with_ext,
20+
parse_map, valid_file_for_cli, valid_file_for_cli_with_ext,
1921
valid_files_folders_for_cli, valid_folder_for_cli)
2022
from .validate import validate as validate_
2123
from .validate_schema import validate_schema as validate_schema_
@@ -376,7 +378,7 @@ def jsonschema(schema, out, fiboa_version, id_):
376378
)
377379
@click.option(
378380
'--compression', '-pc',
379-
type=click.Choice(["brotli", "gzip", "lz4", "snappy", "zstd", "none"]),
381+
type=click.Choice(COMPRESSION_METHODS),
380382
help='Compression method for the Parquet file.',
381383
show_default=True,
382384
default="brotli"
@@ -385,7 +387,7 @@ def jsonschema(schema, out, fiboa_version, id_):
385387
'--geoparquet1', '-gp1',
386388
is_flag=True,
387389
type=click.BOOL,
388-
help='Enforces generating a GeoParquet 1.0 file bounding box. Defaults to GeoParquet 1.1 with bounding box.',
390+
help='Enforces generating a GeoParquet 1.0 file. Defaults to GeoParquet 1.1 with bounding box.',
389391
default=False
390392
)
391393
@click.option(
@@ -394,13 +396,20 @@ def jsonschema(schema, out, fiboa_version, id_):
394396
help='Url of mapping file. Some converters use additional sources with mapping data.',
395397
default=None
396398
)
397-
def convert(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file):
399+
@click.option(
400+
'--original-geometries', '-og',
401+
is_flag=True,
402+
type=click.BOOL,
403+
help='Keep the source geometries as provided, i.e. this option disables that geomtries are made valid and converted to Polygons.',
404+
default=False
405+
)
406+
def convert(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file, original_geometries):
398407
"""
399408
Converts existing field boundary datasets to fiboa.
400409
"""
401410
log(f"fiboa CLI {__version__} - Convert '{dataset}'\n", "success")
402411
try:
403-
convert_(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file)
412+
convert_(dataset, out, input, cache, source_coop, collection, compression, geoparquet1, mapping_file, original_geometries)
404413
except Exception as e:
405414
log(e, "error")
406415
sys.exit(1)
@@ -518,7 +527,7 @@ def rename_extension(folder, title, slug, org = "fiboa", prefix = None):
518527
multiple=True,
519528
help='Additional column names to include.',
520529
show_default=True,
521-
default=DEFAULT_COLUMNS,
530+
default=CORE_COLUMNS,
522531
)
523532
@click.option(
524533
'--exclude', '-e',
@@ -536,7 +545,7 @@ def rename_extension(folder, title, slug, org = "fiboa", prefix = None):
536545
)
537546
@click.option(
538547
'--compression', '-pc',
539-
type=click.Choice(["brotli", "gzip", "lz4", "snappy", "zstd", "none"]),
548+
type=click.Choice(COMPRESSION_METHODS),
540549
help='Compression method for the Parquet file.',
541550
show_default=True,
542551
default="brotli"
@@ -545,7 +554,7 @@ def rename_extension(folder, title, slug, org = "fiboa", prefix = None):
545554
'--geoparquet1', '-gp1',
546555
is_flag=True,
547556
type=click.BOOL,
548-
help='Enforces generating a GeoParquet 1.0 file bounding box. Defaults to GeoParquet 1.1 with bounding box.',
557+
help='Enforces generating a GeoParquet 1.0 file. Defaults to GeoParquet 1.1 with bounding box.',
549558
default=False
550559
)
551560
def merge(datasets, out, crs, include, exclude, extension, compression, geoparquet1):
@@ -564,6 +573,76 @@ def merge(datasets, out, crs, include, exclude, extension, compression, geoparqu
564573
sys.exit(1)
565574

566575

576+
## IMPROVE (add area, perimeter, and fix geometries)
577+
@click.command()
578+
@click.argument('input', nargs=1, type=click.Path(exists=True))
579+
@click.option(
580+
'--out', '-o',
581+
type=click.Path(exists=False),
582+
help='Path to write the GeoParquet file to. If not given, overwrites the input file.',
583+
default=None
584+
)
585+
@click.option(
586+
'--rename-column', '-r',
587+
type=click.STRING,
588+
callback=lambda ctx, param, value: parse_map(value),
589+
multiple=True,
590+
help='Renaming of columns. Provide the old name and the new name separated by an equal sign. Can be used multiple times.'
591+
)
592+
@click.option(
593+
'--add-sizes', '-sz',
594+
is_flag=True,
595+
type=click.BOOL,
596+
help='Computes missing sizes (area, perimeter)',
597+
default=False
598+
)
599+
@click.option(
600+
'--fix-geometries', '-g',
601+
is_flag=True,
602+
type=click.BOOL,
603+
help='Tries to fix invalid geometries that are repored by the validator (uses GeoPanda\'s make_valid method internally)',
604+
default=False
605+
)
606+
@click.option(
607+
'--explode-geometries', '-e',
608+
is_flag=True,
609+
type=click.BOOL,
610+
help='Converts MultiPolygons to Polygons',
611+
default=False
612+
)
613+
@click.option(
614+
'--crs',
615+
type=click.STRING,
616+
help='Coordinate Reference System (CRS) to use for the GeoParquet file.',
617+
show_default=True,
618+
default=None,
619+
)
620+
@click.option(
621+
'--compression', '-pc',
622+
type=click.Choice(COMPRESSION_METHODS),
623+
help='Compression method for the Parquet file.',
624+
show_default=True,
625+
default="brotli"
626+
)
627+
@click.option(
628+
'--geoparquet1', '-gp1',
629+
is_flag=True,
630+
type=click.BOOL,
631+
help='Enforces generating a GeoParquet 1.0 file. Defaults to GeoParquet 1.1 with bounding box.',
632+
default=False
633+
)
634+
def improve(input, out, rename_column, add_sizes, fix_geometries, explode_geometries, crs, compression, geoparquet1):
635+
"""
636+
"Improves" a fiboa GeoParquet file according to the given parameters.
637+
"""
638+
log(f"fiboa CLI {__version__} - Improve datasets\n", "success")
639+
try:
640+
improve_(input, out, rename_column, add_sizes, fix_geometries, explode_geometries, crs, compression, geoparquet1)
641+
except Exception as e:
642+
log(e, "error")
643+
sys.exit(1)
644+
645+
567646
cli.add_command(describe)
568647
cli.add_command(validate)
569648
cli.add_command(validate_schema)
@@ -574,6 +653,7 @@ def merge(datasets, out, crs, include, exclude, extension, compression, geoparqu
574653
cli.add_command(converters)
575654
cli.add_command(rename_extension)
576655
cli.add_command(merge)
656+
cli.add_command(improve)
577657

578658
if __name__ == '__main__':
579659
cli()

fiboa_cli/const.py

+11
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,14 @@
1010
STAC_COLLECTION_SCHEMA = "http://schemas.stacspec.org/v{version}/collection-spec/json-schema/collection.json"
1111
GEOPARQUET_SCHEMA = "https://geoparquet.org/releases/v{version}/schema.json"
1212
STAC_TABLE_EXTENSION = "https://stac-extensions.github.io/table/v1.2.0/schema.json"
13+
14+
COMPRESSION_METHODS = ["brotli", "gzip", "lz4", "snappy", "zstd", "none"]
15+
16+
CORE_COLUMNS = [
17+
"id",
18+
"geometry",
19+
"area",
20+
"perimeter",
21+
"determination_datetime",
22+
"determination_method",
23+
]

fiboa_cli/convert.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ def convert(
1313
collection = False,
1414
compression = None,
1515
geoparquet1 = False,
16-
mapping_file=None,
16+
mapping_file = None,
17+
original_geometries = False,
1718
):
1819
if dataset in IGNORED_DATASET_FILES:
1920
raise Exception(f"'{dataset}' is not a converter")
@@ -37,6 +38,7 @@ def convert(
3738
compression = compression,
3839
geoparquet1 = geoparquet1,
3940
mapping_file = mapping_file,
41+
original_geometries = original_geometries,
4042
)
4143

4244
def list_all_converter_ids():

fiboa_cli/convert_utils.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def convert(
4040
license = None,
4141
compression = None,
4242
geoparquet1 = False,
43-
explode_multipolygon = False,
43+
original_geometries = False,
4444
index_as_id = False,
4545
**kwargs):
4646
"""
@@ -160,11 +160,12 @@ def convert(
160160
else:
161161
log(f"Column '{key}' not found in dataset, skipping migration", "warning")
162162

163-
# 4b. For geometry column, convert multipolygon type to polygon
164-
if explode_multipolygon:
163+
# 4b. For geometry column, fix geometries
164+
if not original_geometries:
165+
gdf.geometry = gdf.geometry.make_valid()
165166
gdf = gdf.explode()
166167

167-
if has_migration or has_col_migrations or has_col_filters or has_col_additions or explode_multipolygon:
168+
if has_migration or has_col_migrations or has_col_filters or has_col_additions:
168169
log("GeoDataFrame after migrations and filters:")
169170
print(gdf.head())
170171

fiboa_cli/datasets/be_wa.py

-1
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ def file_migration(data, path, uri, layer):
9191
license=LICENSE,
9292
layer_filter=lambda layer, uri: layer == LAYER,
9393
file_migration=file_migration,
94-
explode_multipolygon=True,
9594
index_as_id=True,
9695
**kwargs
9796
)

fiboa_cli/datasets/ch.py

-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@ def convert(output_file, cache = None, **kwargs):
6969
column_migrations=COLUMN_MIGRATIONS,
7070
column_filters=COLUMN_FILTERS,
7171
providers=PROVIDERS,
72-
explode_multipolygon=True,
7372
index_as_id=True,
7473
fid_as_index=True,
7574
**kwargs

fiboa_cli/datasets/ec_fr.py

-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,5 @@ def convert(output_file, cache = None, **kwargs):
2525
column_filters=base.COLUMN_FILTERS,
2626
attribution=base.ATTRIBUTION,
2727
license=LICENSE,
28-
explode_multipolygon=True,
2928
**kwargs
3029
)

fiboa_cli/datasets/es_cat.py

-2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,6 @@
3535

3636
COLUMN_MIGRATIONS = {
3737
"campanya": lambda col: pd.to_datetime(col, format='%Y'),
38-
"geometry": lambda col: col.make_valid(),
3938
}
4039

4140
MISSING_SCHEMAS = {
@@ -62,6 +61,5 @@ def convert(output_file, cache = None, **kwargs):
6261
license=LICENSE,
6362
layer="CULTIUS_DUN2023",
6463
index_as_id=True,
65-
explode_multipolygon=True,
6664
**kwargs
6765
)

fiboa_cli/datasets/fi.py

-2
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,6 @@
3333
COLUMN_MIGRATIONS = {
3434
# Make year (1st january) from column "VUOSI"
3535
"VUOSI": lambda col: pd.to_datetime(col, format='%Y'),
36-
# Todo: generate a generic solution for making geometries valid
37-
"geometry": lambda col: col.make_valid()
3836
}
3937

4038
def migrate(gdf):

0 commit comments

Comments
 (0)