Skip to content

Conversation

@xylar
Copy link
Collaborator

@xylar xylar commented Sep 21, 2016

This merge removes code code for manually writing out
geojson files and replaces it with a call to json.dump.
This makes it easier to support new geometries (such as MultiPoint)
that weren't supported by the previous writer. It also means that
less file formatting must be handled by scripts before calling
write_all_features, since json.dump writes the full structure in one step.

Internally, some gymnastics were required to make sure that "properties" come
before "geometry" (which makes the files easier to manipulate). Also,
The formatting of the resulting geojson files, while similar, is not identical
to the that of the previous writer.

write_single_feature has been removed since it can be handled by
write_all_features in the only script that uses it.

All scripts have been updated to work with the new syntax.

@xylar
Copy link
Collaborator Author

xylar commented Sep 21, 2016

@pwolfram, In the process of some work I'm doing, I needed to be able to write out a MultiPoint object, which was not supported by the writer. I did some poking around and found that you can use json.dump to produce files that are nearly identical to the ones @douglasjacobsen's code produces, but with several advantages such as supporting more geometries.

Please take a look and test these scripts out. I have rebased #49 onto this so please merge this before that one (which I am still testing).

I haven't yet fully tested these changes, and they will affect init_step1 of MPAS-O global_ocean runs so we need to be careful that these changes do no harm.

I will let you know when I think the PR is ready to merge but I wanted to get your feedback and give you a chance to look over the code before earlier rather than later in that process.

Obviously, this is clean up so it's not a high priority. But I will be building other work (in addition to #49) off of this PR.

@xylar xylar force-pushed the simplify_feature_write_utils branch from 96e9df5 to aa5208d Compare September 21, 2016 17:51
@xylar xylar changed the title Update feture_write_utils to use json.dump Update feature_write_utils to use json.dump Sep 22, 2016
@xylar
Copy link
Collaborator Author

xylar commented Sep 22, 2016

@pwolfram, I was running into an issue where json.dump writes out coordinates with 16 digits of precision. This is a problem if neighboring features don't agree on where their common border is to that precision. This issue should be fixed with the commit I just pushed.

@xylar xylar force-pushed the simplify_feature_write_utils branch 3 times, most recently from 7659362 to 56b3ffc Compare September 22, 2016 18:31
@pwolfram
Copy link
Contributor

@xylar, do you need this within the next day or is early next week ok?

@xylar
Copy link
Collaborator Author

xylar commented Sep 22, 2016

No, please take your time. I'm using this as part of my efforts to solve #44, and I am still either debugging it or making small improvements. Also, I really need to test it on global_ocean so you really don't need to worry about it until I've done that.

Thanks for checking in.

@xylar
Copy link
Collaborator Author

xylar commented Sep 25, 2016

@pwolfram, I have tested all the scripts in the main directory and they work as I expected with these modifications. (I had trouble with the scripts for splitting at the prime meridian and antimeridian, but that is an issue for another time and unrelated to this PR. If I give them a feature that doesn't need splitting, they write it out correctly.)

I have tested nearly all of the global_ocean test cases in MPAS-O to make sure this merge does no harm to init_step1. I am confident that it does not.

Please give this a look when you have time. I would appreciate any feedback you have on coding style since this might be a good chance to address stylistic issues in the utils modules without needing a separate PR for it. But the primary focus should be on whether the changes I made 1) work correctly and 2) do not make the geojson files harder to work with (e.g. harder to read).

A good set of tests might be to merge in a few features and make sure you, too, can use each of the scripts to manipulate the features without running into trouble. If you have any existing workflow that makes use of this repo, you should almost certainly test that out with these changes.

Thanks for agreeing to review this.

@pwolfram
Copy link
Contributor

@xylar, I've started to tested this and am finding some potential bugs (or potentially just user-error that needs clarified in documentation). Note that this may not be a complete list because I'll test again after we resolve the hard-crash listed below.

  1. See Plotting error with moll projection for combined Atlantic and Southern Ocean #53 for a plotting but with moll.
  2. While we are doing this clean-up it seems like we want to also add the option -o to split_features.py to allow a particular output directory to be specified.
  3. Note, I'm not testing fix_features_at_prime_meridian.py or fix_features_at_antimeridian.py as indicated, although we need an issue for these scripts to make sure that this is eventually addressed.
  4. tag_features.py needs to allow the user to specify a new file for output for consistency with the other scripts and to support greater provenance capability (e.g., each step should produce a new file if desired without requiring the user to backup a feature file).
  5. Using these commands:
mkdir test_feature_collection
cd test_feature_collection/
ln -s ../ocean/region/North_Atlantic_Ocean/region.geojson NA_region.geojson
ln -s ../ocean/region/South_Atlantic_Ocean/region.geojson SA_region.geojson
ln -s ../ocean/region/Southern_Ocean/region.geojson SO_region.geojson
cd ..
./merge_features.py -d test_feature_collection/ -o AO_SO.geojson
./combine_features.py -f AO_SO.geojson -n AOSO -o AO_SO_combined.geojson

yields

┌─[pwolfram][shapiro][~/Documents/MPAS_pull_requests/simplify_feature_write_utils][16:25][±][ ✗]
└─▪ ./combine_features.py -f AO_SO.geojson -n AOSO -o AO_SO_combined.geojson
Traceback (most recent call last):
  File "./combine_features.py", line 107, in <module>
    write_all_features(features, out_file_name, indent=4)
  File "/Users/pwolfram/Documents/MPAS_pull_requests/simplify_feature_write_utils/utils/feature_write_utils.py", line 11, in write_all_features
    features['features'][index] = check_feature(features['features'][index])
  File "/Users/pwolfram/Documents/MPAS_pull_requests/simplify_feature_write_utils/utils/feature_write_utils.py", line 68, in check_feature
    outFeature = OrderedDict((('type',feature['type']),
KeyError: 'type'

Regarding the output style, e.g., where the standard approach with tabs represented by \▸ on the left and new approach on the right:
screenshot 2016-09-26 16 47 56

I think that it would be better to have lat/lon points on one line as opposed to four but there is definitely something to be said for using the standard json approach, for a variety of reasons. I can't imaging that most people will read/edit the geojson file by hand from the raw text, although they clearly can for either case. I would say both formats have their pros/cons but that using an established library brings the benefit of standardization with the broader community, which in my view is always a good thing.

Regarding coding style:

  1. Arguments to functions separated by commas should have a space, e.g., f(a, b) not f(a,b).
  2. A double-check this meets PEP8, e.g., via flake8 would be good. Although not all the output will be useful.
  3. Spurious whitespace should be avoided. This shows up via git diff. I also have a vim function / script I can share to help with this issue. This issue is prevalent throughout this repository.

@xylar, this should get us started but please let me know if there is something specific I should look at in the meantime before doing a more comprehensive review following mitigation of some of the issues discussed in this comment.

@xylar xylar force-pushed the simplify_feature_write_utils branch from 56b3ffc to 6414a47 Compare September 29, 2016 07:56
@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

While we are doing this clean-up it seems like we want to also add the option -o to split_features.py to allow a particular output directory to be specified.

We could do this but it's definitely beyond the scope of this PR, even if we're using this for some clean up. The intended use of split_features.py at the moment is to put features in exactly the directories they should be in for committing them to the repository. It wasn't intended for use in testing features. Nevertheless, I would agree that it would be useful for testing purposes to have that capability. maybe make an issue for this (as a feature request).

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

tag_features.py needs to allow the user to specify a new file for output for consistency with the other scripts and to support greater provenance capability (e.g., each step should produce a new file if desired without requiring the user to backup a feature file).

I agree. I found the behavior of this script unexpected even though I wrote it. This should be included in this PR, so I'll get to it as soon as I can.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

Note, I'm not testing fix_features_at_prime_meridian.py or fix_features_at_antimeridian.py as indicated, although we need an issue for these scripts to make sure that this is eventually addressed.

I'll make an issue for this as soon as I can get to it.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

I believe I have fixed the type error in combine_features.py with my latest commit. I had also seen the same problem.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

I think that it would be better to have lat/lon points on one line as opposed to four but there is definitely something to be said for using the standard json approach, for a variety of reasons. I can't imagine that most people will read/edit the geojson file by hand from the raw text, although they clearly can for either case. I would say both formats have their pros/cons but that using an established library brings the benefit of standardization with the broader community, which in my view is always a good thing.

I don't think we can control the json output to the degree that we can put lat/lon points on the same line.

I find that I do edit features by hand myself, for example, in order to create simple masks from other masks or to see why there seem to be redundant or self-intersecting points in a given feature. So readability is definitely important. But I find either the old or the new format approximately equally readable and the new is much easier to maintain.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

Arguments to functions separated by commas should have a space, e.g., f(a, b) not f(a,b).

Agreed, I'll fix this wherever I find it in the util modules.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

A double-check this meets PEP8, e.g., via flake8 would be good. Although not all the output will be useful.

I'm finding flake8 to be way too pedantic for my taste. We can discuss this but I find the requirements for lines shorter than 79 characters, and whitespace around comments (e.g. folds) to make the code less, rather than more, readable. Butt I will update utils to use these conventions. I'm not going to touch this in the other scripts. That's just not the scope of this PR.

This makes it easier to support new geometries (such as MultiPoint)
that weren't supported by the previous writer.  It also means that
no file formatting must be handled by calling scripts, since
json.dump writes the full structure in one step.  For convenience,
write_all_features now takes a file name instead of a file pointer.

Some gymnastics were required to make sure that properties come
before geometry (which makes the files easier to manipulate.  Also,
The formatting, while similar, is not identical to the previous.

write_single_feature has been removed since it can be handled by
write_all_features in the only script that uses it.

All scripts have been updated to work with the new syntax.
@xylar xylar force-pushed the simplify_feature_write_utils branch from 6414a47 to 150ba10 Compare September 29, 2016 08:30
Previously, tag_features.py changed tags in place.  Now, it writes
out a new file, consistent with all other manipulation scripts.
@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

I think I've fixed tag_features.py to write out to a separate file, as requested.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

@pwolfram, I will make two issues for PEP8 formatting of the remaining scripts and for fixing the prime meridian and antimeridian scripts. Since neither of those issues will be addressed by this PR, I think you can continue your review and merge when you're ready.

@pwolfram
Copy link
Contributor

@xylar, now may be a good time to update the README, but for simplicity I'm include this as a new issue #57 that we can worry about later (obviously to be debated).

@pwolfram
Copy link
Contributor

By the way @xylar, excellent PR: +185 −333 for code. Less code is almost always better.

@pwolfram
Copy link
Contributor

@xylar, this is just a nit but can you please update the date / authors in all the scripts you edited? I'm ok with this being a separate commit and that probably would be better because we haven't been keeping close tabs on this like we should have been. There may be more edits and I'll let you know when I'm done reviewing.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

By the way @xylar, excellent PR: +185 −333 for code. Less code is almost always better.

Yep, that was the basic idea.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

Okay, I updated the authors and last-modified dates. I agree that is something we want to do a better job of keeping track of.

Let me know what else you'd like me to address.

@pwolfram
Copy link
Contributor

@xylar, does

└─▪ ./difference_features.py -h
usage: difference_features.py [-h] -f FILE1 -m FILE2 [-o PATH]

This script takes a file containing one or more feature definitions, that is
pointed to by the -f flag and a second masking feature definition, pointed
to with the -m flag.  The masking features are masked out of (i.e. removed
from) the original feature definitions.  The resulting features are placed
in (or appended to) the output file pointed to with the -o flag
(features.geojson by default).

Authors: Xylar Asay-Davis
Last Modified: 02/12/2016

optional arguments:
  -h, --help            show this help message and exit
  -f FILE1, --feature_file FILE1
                        Single feature file to be clipped
  -m FILE2, --mask_file FILE2
                        Single feature whose overlap with the first feature should be removed
  -o PATH, --output PATH
                        Output file, e.g., features.geojson.

indicate that the file pointed to by "-m" MUST only have a single feature definition? It seems like in general we would want this file to be able to have multiple feature definitions. If so, this is a new feature request that we don't need to do anything with right now and I can submit an issue.

@pwolfram
Copy link
Contributor

Testing capability above:

./merge_features.py -d iceshelves/ -o iceshelves.geojson
./merge_features.py -f iceshelves/region/Amery_1/region.geojson -o Amery_all.geojson
./merge_features.py -f iceshelves/region/Amery_2/region.geojson -o Amery_all.geojson
./merge_features.py -f iceshelves/region/Amery_3/region.geojson -o Amery_all.geojson
./difference_features.py -f iceshelves.geojson -m Amery_all.geojson 
./plot_features.py -f features.geojson 

appears to work so we need to edit the doc string for difference_features.py to change "Single feature" to "Feature file" or "Single feature file" to be more accurate.

@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

@pwolfram, I will fix this but these comments are getting way outside the scope of this PR. Let's do a separate PR to address docstrings, coding style, etc. that needs to be fixed but that weren't broken by this PR.

Docstring and help now state correctly that the features that
the script supports multiple features to clip and multiple features
as masks.
@xylar
Copy link
Collaborator Author

xylar commented Sep 29, 2016

@pwolfram, I've updated the docstring and help for difference_features.py.

@pwolfram
Copy link
Contributor

Thanks @xylar. I just want this to be as good as possible but you are right this is getting outside scope. I'm almost done testing... driver scripts work and as far as I can tell everything is working properly. One additional note, outside this PR: driver scripts use the old shutil.move(defaultFileName,outName) approach, e.g., driver_scripts/setup_ocean_land_coverage.py and could be cleaned up following all your hard work to clean up all the scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants