-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support geospatial data for modeling electoral district boundaries #412
Comments
We'll need input from both data providers and publishers, but it's exciting to get the conversation going. Thanks for posting! Will we need to add geography attributes to What are the drawbacks and possible complexities with solely assigning spatial data to EDIT: I think I'm using "jurisdiction" to describe any map boundary within a VIP feed including precinct, locality, electoral district and state. |
If this is being proposed as a replacement, how could a consumer reliably tell if a street address falls within a polygon that's defined using lat/long? It seems like this would require the consumer to use an external service (e.g. external data set and geometry library) and could result in incorrect answers depending on the service, as opposed to being self-contained. |
As far as my initial proposal goes, I was thinking we simply add geospatial attributes to the
That is certainly another viable solution to only define geospatial attributes for Precincts, so that Localities and States are effectively just collections of Precincts. There is an advantage to this approach, as you mentioned, that it fits well into the current spec. One downside with this approach, however, is that a publisher always has to provide Precinct-level geospatial attributes, even if the feed only contains county- or state-wide voting locations.
Given a voter's location (lat/long), determining their precinct is just a matter of performing a "point in polygon" evaluation (see https://en.wikipedia.org/wiki/Point_in_polygon). There are a number of common algorithms for doing this evaluation, and I presume a number of open source implementations of these as well. |
I believe we see in VIP feeds that precinct (and precinct splits) are generally smaller pieces of the larger electoral district they construct. Taking an exaggerated example of an |
Just FYI - the gerrymandering is what you are up against.In places like Pittsburgh and San Francisco they have it to another whole level - where odd/even houses on sides of streets vote differently.This is why the specifications for OASIS EML support that - amongst other nuances.Enjoy.
…-------- Original Message --------
Subject: Re: [votinginfoproject/vip-specification] Support geospatial
data for modeling electoral district boundaries (#412)
From: Franklin Smith ***@***.***>
Date: Mon, March 29, 2021 11:57 am
To: votinginfoproject/vip-specification
***@***.***>
Cc: Subscribed ***@***.***>
To define the geographic boundary of a precinct, a publisher would need to provide geospatial attributes on an ElectoralDistrict element, and then reference that district from the corresponding Precinct.ElectoralDistrictIds field. I believe precinct (and precinct splits) are generally smaller pieces of the larger electoral district they construct. Taking an exaggerated example of an ElectoralDistrict for a State, that geography on its on would not be small enough to identify the geography of each Precinct in the State. Wouldn't we need the ElectoralDistrict to derive it's geography from Precinct, rather than the other way around? Of course we could provide geographies for both data types as well. —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.
|
I was thinking that if a feed only provides state-wide voting locations, for example, then the more granular precinct-level geography wouldn't be necessary. That way it would give publishers the flexibility to define geospatial boundaries at the precinct, locality or state level, whichever is most convenient. But on the other hand, if the more detailed precinct-level geographies are always readily available, then it may just be easier to collect geography at the precinct level only.
This is an interesting point. For the foreseeable future, the idea is that feed publishers would have the option of modeling voter locations either using street segments or geospatial data, whichever provides a better solution for each state's needs. In the fullness of time, however, the hope is that all states would migrate to the more precise geospatial approach. The point also gives rise to the need to be able to define a precinct with one or more shapes. It's possible that a precinct may require more than one polygon to fully encapsulate its jurisdiction. |
This was only half of my comment (I said "external data set and geometry library"). If it's being proposed as a replacement, consumers would now also have to depend on an external service or data set in order to get the lat/long for a street address. In addition, an error in that external source could result in sending a voter to the wrong precinct. With explicit street segments, one doesn't have any external data dependencies and isn't limited to software languages with a point-in-polygon implementation. Thus, it seems like this would raise the bar for consuming and using the data, so I would suggest that polygons be permitted only in addition to the current approach. Yet another issue is that multi-level apartments may have a vertical component to their coordinates, so I'm not sure that lat/long would even be sufficient to differentiate apartment numbers located on different floors. You would need three-dimensional polyhedra in space (possibly disconnected?). In contrast, the explicit |
@cjerdonek Thanks for flagging both the usability and the 3-D nature of some precincts. On the flip side, I will say that in many cases the Also, in states where they've already moved to a Geo-based solution on the backend, the way to create So I agree that this will raise the bar in the ways that you point out. The question is to what extent the current system is failing clients and voters -- in silent and hard-to-debug ways, in our experience -- and what we can do about it. [1] All of these addresses are the same place: |
With street segments, when given an address, at least one has a list of the possible targets to match against. In particular, if someone is attempting to find a match and they don't find an exact match in the list, it won't be a silent failure. If street segments are taken away, though, and someone is given an address, the process of getting a lat/long back can be a black box process if using an external service. So someone won't necessarily be alerted if the address should really be improved in some way. I think providing geospatial data in addition to street segments is a good improvement. It would address the new voter issue you mention by providing a fallback, and it will also provide more info to match against in cases where the address provided by a voter has more than one candidate match if using the street address alone. |
Btw, I think the questions of what to do about states that are providing a segment-per-voter as well as how things can be improved otherwise would probably be better discussed in separate tickets. |
The feedback we've gotten from those states is "please let us provide the information is geospatial formats", so I imagine that issue would just get marked as a duplicate of this one. :)
What you're describing is a false negative. I'm discussing both those and false positives. I can walk you through, in practice, what it's like to try and match user-provided addresses to VIP-provided segments without using a geocoder. This is my experience from several years ago so it may be slightly outdated but I believe the numbers are directionally correct. If anyone here has any experience trying to use VIP files to set up a polling place lookup tool without relying on a geocoder, I'd love to hear it and learn about your precision and recall (and how you're detecting false positives).
Again, if anyone is aware of more recent experiences trying to use these files without an external geocoding service, I'd love to hear them. But in practice the system would fail on about a third of lookups, and of the ones where it succeeded, about one in 40 would match to the wrong segment. |
All of that said, the issues you raised about 3-D modeling and odd exceptions are good ones. I could certainly see a case where the state provides geospatial data for the common case, but also a set of "override" street segments to handle these kinds of corner cases. So the serving end would check the user address against an override street segment first and return that if there's a match; barring that, it could fall back on the geospatial data. Also I know that Ohio is one of the larger "point cloud" providers, so if there's anyone from Ohio lurking on this thread or is a former election official from Ohio lurking on this thread I'd love to get their Ohio-based opinion about challenges in using a geospatial based feed. |
I understand address standardization isn’t infallible, but it does allow EOs to directly specify the address to precinct mapping. Adding spatial data does not alleviate the need to standardize addresses, it just adds additional tasks to the pipeline: (Street Range) Input Address -> Standardize Address -> Lookup Range Because it is unlikely for third party geocodes to be effectively validated by EOs, I would suggest requiring (requesting?) downstream consumers show the geocoded point to the voter for validation. This would make the geocoding operation less of a black box, and possibly provide a feedback loop for geocoding improvement. Another option would be for EOs to provide their own geocodes, which would be in addition to single point addresses. This of course would result in an even larger VIP feed, but would insure that voters get trusted info. I would also mention that some states may not have access to election district layers, so street ranges will need to be supported for at least the near future. |
Just to make it clear, we're not saying that VIP6 will remove the ability to specify street segments in the usual way. What we're proposing is an alternate way in addition to the existing method to specify a precinct or precinct split boundary. Election officials would have the following options to specify how voters are associated with polling places:
Any of these would be considered valid methods of saying "the person/people at this location are associated with this precinct". What geospatial data gives the jurisdictions which choose to use it are:
A strong +1 to showing the voter how the lookup tool interpreted their address, regardless of whether that's done via street segments or geospatial normalization. We've seen a number of issues where addresses in Queens, Hawaii, college campuses and dorms, tribal reservations, or other addresses outside the common "123 Main Street, Sometown, State, 12345" format get misinterpreted. Showing how the lookup tool parsed/modified the address is an excellent usability provision (although not something the spec can enforce). As for EO-provided geocodes, we've had mixed success with that. In many cases they're of decent quality, but in many other cases they disagree with our geocoding infrastructure. It requires a lot of effort to cross-check those. And I completely agree with your last point. We don't see street segments going away anytime in the near future. This is simply trying to find another tool for the toolbox, and address a workflow (EMSes using native geospatial layers) which some states have migrated to, and we expect more to migrate to before 2024. At this point VIP5 is six years old (!!!), so we should develop VIP6 with the mindset that it should meet publisher and client needs through at least the 2026 election, if not 2028. |
+1 to building VIP6 for the future. If states and localities are already building out their geo-spatial systems, we'll need a specification that can guide and support that work. Without that, we'll end up with a scenario where states and localities are building their spatial systems in unique ways. Better if we can lead on the format, rather than play catch up. |
I wanted to add a few more thoughts to the point of introducing a dependency on an external geocoding service. While it is theoretically possible to work with VIP 3.x/5.x feeds without using a geocoding service, based on my experience working with VIP feeds and the Civic Information API, I strongly doubt that this is actually achievable in a way that would provide a good user experience. As @JDziurlaj correctly pointed out above, both the street segment and geospatial based approaches require an "address standardization" step. Parsing, understanding and tokenizing a user-provided address is one of the primary utilities that a geocoding service provides, and a functionality that would be very difficult to replicate independent of a geocoder. It wouldn't be hard to get a decent level of coverage with a few heuristics like @jdmgoogle outlined above, but to do this in a reliable way that yields good coverage across the myriad of US addresses essentially amounts to building your own geocoding service. One possible path forward is that we formally acknowledge the silent, inescapable need to use a geocoding service in order to effectively consume and serve VIP feeds. The next version of the VIP specification could be a good opportunity to bring more transparency around this aspect of making voting information useful. I also want to consolidate the few key points of discussion so far in this thread to see where preferences lie, and to also welcome other ideas and feedback.
Thoughts? |
Providing some thoughts below. Thanks for synthesizing the conversation thus far.
This will be discussed with data providers in the coming months. I'll say that supporting spatial data on all jurisdictional records would provide some additional QA capability (since we can compare the 1 jurisdictional shape against the aggregate shape of the
Especially in the early days of v6, I could see a lot of value in collecting both types of data. It would provide some additional QA opportunity to cross check a geocoded Somewhat separate, there's been a couple of reasons offered as to why a geocoding service is a necessary tool for implementations of VIP data. With that in mind, is there any reason to also add a spatial attribute to |
Are there any localities which represent their |
In the county I worked in, we would take the street segments (lines) from the GIS system, and transform them into tabular structures for the street directory within the VRDB. This is because the GIS system and VRDB were not integrated. |
Thanks for that context. Did the lines from the GIS system contain within their boundaries only the road surface itself, or the road surface and the parcels? |
The street centerlines and parcels were separate layers. We spent months reconciling the TIGER lines to our parcels, adjusting the high and low ranges on each side of the street for each segment. |
Definitely happy to defer to whichever option is easier for data providers. The main motivation for specifying a single county level shape versus an aggregate of precinct level ones is that is that it could make life easier for the data provider.
Where would the StreetSegment's coordinates come from? Is each data provider able to geocode these to a lat/lng? |
I think we've reached a good point in the discussion to consider concrete ideas for modeling geospatial data in a VIP feed. I'd like to start by proposing the following. TL;DRSee the proposed change in jswiesner#1. GoalModel precinct boundaries with geospatial data. General approachThe most straightforward way to model a precinct boundary in a VIP feed is to extend the ElectoralDistrict element to include a spatial boundary. This would allow the geospatial shape to be a property of the district itself, which would allow for future flexibility to specify the boundary of other types of ElectoralDistricts as well (i.e. Locality, State, special districts for contests). A Precinct element in a VIP feed can reference multiple ElectoralDistricts, so the boundary of a precinct would be represented by the composite of the boundaries for all of its electoral districts. Specific approaches consideredIn thinking about how to model geospatial data in a VIP feed, the following two questions are top of mind.
The table below summarizes the most reasonable approaches considered.
Proposed ApproachI propose we opt for the approach of externalizing shapefiles from the XML feed and referencing these files by name and checksum. While this approach is not the simplest of all the approaches, it offers important benefits that clearly outweigh the downsides. Of all the approaches considered, this approach provides the best solution to ensure that publishing VIP feeds remains as easy as possible for data providers, XML feeds remain manageable in size, shapefiles can easily be inspected for QA, and that feed consumers can seamlessly build optimization into ingestion infrastructure to avoid unnecessary reprocessing. Schema changesThe schema changes for the proposed approach are staged in jswiesner#1. It would be great to get feedback on the proposed approach, as well as to hear any other proposals. |
This is implemented with the following: Amended to include a FeatureAttribute element with: And documented with:
Can we go ahead and close this issue? |
Fine with me. |
Closing this issue as a result of #412 (comment). |
Background
The VIP spec should allow for the use of geospatial data to define the boundaries of electoral districts.
StreetSegments currently provide the primary mechanism to associate the location of a registered voter with the polling location(s) they are eligible to vote at. The StreetSegment approach has proven difficult to use on both the producer and consumer side, but geospatial data can offer a scalable alternative. StreetSegments effectively amount to a point cloud of registered voters that index into polling locations. With geospatial data, on the other hand, a single polygon/shape can be used to encompass all registered voters within that district, replacing potentially thousands of StreetSegments.
Use case
A publisher of a VIP feed should be able to define the geographic boundary of an electoral district. For example, the publisher may have a polygon/shape represented by a series of lat/long points defining the boundary of the district. The publisher should then be able to define polling locations in the feed, and associate these locations with their corresponding electoral district(s). In most cases the electoral district would be a precinct, but could also be a precinct split, locality or state. If the producer provides a geographic boundary for the district, StreetSegments are no longer needed to map voters to their eligible polling locations - rather, if a voter's registered address is located within one of these geographic boundaries, then that voter is eligible to vote at any of polling locations associated with that district.
One possible solution
To kick off the discussion on this topic, one possible solution would be to add element(s) to the ElectoralDistrict element to capture the spatial extent of the district. In the simplest form, this may just be an unbounded number of lat/long points. With this extension, you could think of an ElectoralDistrict object as a pair of OCD ID plus the geographic extent of its boundary.
This solution would plug in well to the Precinct element as-is, which already contains a reference to one or more ElectoralDistricts. In the simplest example, a Precinct element would refer to a single ElectoralDistrict. That ElectoralDistrict would contain info about the OCD ID for the precinct, as well as a spatial extent definition to define the boundary of the precinct. That's all the information that would need to be provided in the feed for consumers to determine voter eligibility for this polling location.
The Locality and State elements, however, don't presently contain a reference to an ElectoralDistrict, so we would likely need to add this for easier modeling of county- and state-wide polling locations.
The text was updated successfully, but these errors were encountered: