Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operations / transformations #79

Closed
7 of 8 tasks
cholmes opened this issue Jun 28, 2024 · 2 comments · Fixed by #114
Closed
7 of 8 tasks

operations / transformations #79

cholmes opened this issue Jun 28, 2024 · 2 comments · Fixed by #114
Assignees

Comments

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2024

ToDo list (added by @m-mohr):

List of ideas:

  • Add area and perimeter values: fiboa improve example.parquet -s
  • Clean up geometries (applies only shapely.make_valid): fiboa improve example.parquet ... -g
  • generate stats on boundary quality
  • filter out columns: Can be done with fiboa merge, a single input dataset and the --include/--exclude flags (note: use -x and --crs if needed)
  • harmonize data to eurocrops hcat => Harmonize data to Eurocrops HCAT #116
  • reproject: fiboa improve example.parquet --crs EPSG:1234
  • Convert from GeoParquet 1.0 to 1.1 (fiboa improve example.parquet) and vice versa (fiboa improve example.parquet -gp1)
  • Change compression: fiboa improve example.parquet --compression brotli

There's been a good bit of desire to do 'extra' things in the converters, and we had a good discussion with some ideas on how to approach that in the last call, so wanted to open an issue.

The original one was #21 - add area and perimeter. But then also things like adding statistics as new columns, or filtering columns out.

Was thinking it could be ideal to keep the 'converters' very 'clean' - like they just translate from the source data to fiboa. But then there's maybe fiboa transform or something like that with a bunch of sub-commands. And ideally you could also use those sub-commands as part of the conversion process. Some of the initial ideas:

  • Add area and perimeter values (converter: Add option to calculate area and perimeter if missing #21)
  • Clean up geometries - automatically shift any overlapping ones. Or detect areas that are too big and remove pixels that are clearly wrong.
  • generate stats on boundary quality - size, regularity, inscribed circles
  • filter out columns
  • subset to certain geographic areas / make test & train datasets.

I'm sure there's lots more, but basically have a set of utilities that help clean up and format data better, and harmonize it for various use cases. But every transformation is an 'opinion' to use the Varda way of thinking about it. So keep those as their own utilities, for people to choose to transform as they want.

@m-mohr m-mohr added this to fiboa Jun 30, 2024
@m-mohr m-mohr moved this from Backlog to Todo in fiboa Jun 30, 2024
@github-project-automation github-project-automation bot moved this to Backlog in fiboa Jun 30, 2024
@cholmes
Copy link
Contributor Author

cholmes commented Jul 8, 2024

Other ideas:

  • harmonize data to eurocrops hcat - add extra attributes and do the mapping from source data to the eurocrop names (seems like it would need another mini ecosystem of converters for each country, though they are likely simpler than the full-fledged fiboa converters).
  • reproject - like if the source data is in a country-specific projection.

I've also been thinking about a 'merge' command for awhile - had been planning to make an issue for that, where collection-level metadata would shift to the row level. I was thinking that would be a full featured command, where you could do things like reprojection, and clean up boundaries. But it might make sense to push more to 'operations' and keep the merge pretty simple - it would just reject things that don't merge well, but the source data could be transformed more to get ready for the merge.

@m-mohr m-mohr self-assigned this Aug 24, 2024
@m-mohr m-mohr mentioned this issue Aug 26, 2024
@m-mohr m-mohr moved this from Todo to In Progress in fiboa Nov 13, 2024
m-mohr added a commit that referenced this issue Nov 13, 2024
m-mohr added a commit that referenced this issue Nov 13, 2024
@m-mohr
Copy link
Contributor

m-mohr commented Nov 13, 2024

PR for most of the functionality: #114

@m-mohr m-mohr linked a pull request Nov 13, 2024 that will close this issue
@m-mohr m-mohr moved this from In Progress to Review in fiboa Nov 13, 2024
m-mohr added a commit that referenced this issue Nov 15, 2024
* Add fiboa improve command #79 #21

* Write custom schemas to fiboa metadata for use in improve/merge/etc. #113 and minor fixes

* Make geometries valid and explode to Polygons by default #119

* Explode polygons option for improve command

* Add minimal test

* Fix pick_schemas
@github-project-automation github-project-automation bot moved this from Review to Done in fiboa Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants