Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the number of types for electoral_district #30

Closed
jungshadow opened this issue Mar 4, 2015 · 17 comments
Closed

Limit the number of types for electoral_district #30

jungshadow opened this issue Mar 4, 2015 · 17 comments
Assignees
Milestone

Comments

@jungshadow
Copy link
Collaborator

The element electoral_district.type is currently a string, which allows any value. This could be an enumerated list of types (e.g. state senate, congressional, et al...). The data in state systems isn't standardized and this could be difficult on VIP's end to structure, but it's easy to see the benefits of structuring the information. This issue is related to #29.

@jungshadow jungshadow added this to the Up for Debate milestone Mar 4, 2015
@pstenbjorn
Copy link
Contributor

If the use of ocdids is pursued, the enumerations for electoral_district.type might be inherited from the ocdid keys. E.g. in the ocdid
division/country:us/state:ma/county:norfolk/school_district:cohasset
There are four district types identified, country, state, county, and school_district.

As @jungshadow mentioned, each state may have its own localization of a type and we have discussed adding electoral_district.localization to accommodate this.

@jungshadow jungshadow self-assigned this Mar 12, 2015
@jungshadow jungshadow modified the milestones: Version 5.0, Up for Debate Mar 12, 2015
@jdmgoogle jdmgoogle assigned jdmgoogle and unassigned jungshadow Apr 15, 2015
@jdmgoogle
Copy link
Contributor

In my opinion there are two ways to do this. The first way is to use a subset of the enums in the ReportingUnit type in 1622.2; this list includes:

City
Congressional
Council
County
Judicial
Local
Locality
Municipality
National
Other
School
Special
State
StateHouse
StateSenate
Township
Utility
Ward
Water

Option #2 is to use the set of types used in OCD ID keys, per @pstenbjorn's suggestion. There's obviously a lot of overlap between the two, but I'd like to poll people here for their recommendation.

Cc @jungshadow @cjerdonek @nomadaisy @pkoms @demcg @jktomer @jen-tolentino

@cjerdonek
Copy link
Contributor

For convenience, here is the current definition:

<xs:element name="ElectoralDistrict">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="Name" type="xs:string" />
      <xs:element name="Type" type="xs:string" minOccurs="0" />
      <xs:element name="Number" type="xs:integer" minOccurs="0" />
      <xs:element name="ExternalIdentifierID" type="xs:IDREF" minOccurs="0" maxOccurs="unbounded" />
    </xs:sequence>
    <xs:attribute name="id" type="xs:ID" use="required" />
  </xs:complexType>
</xs:element>

Is it the intent that there be a way to group districts together of a common type (e.g. all supervisorial districts)? And is this what the "Type" name is for (via string matching), or is it just for display?

@pstenbjorn
Copy link
Contributor

For reference purposes these are the implicit types in the OcdId model. @cjerdonek I know that currently their are strict enumerations for election type used for presentation logic on Google's side.

court_of_appeals
territory
country
district
state
ward
transit
municipio
school_district
parish
circuit_court
sldu
district_court
chancery_court
superior_court
supreme_court
place
executive_district
county
census_area
borough
sldl
cd
anc
region
sewer
commissioner_district
school_board_district
constable_district
council_district
precinct

@cjerdonek
Copy link
Contributor

One reason I'm asking is if a reporting agency or feed has more than one group of districts of the same type, say two collections of "council districts" (one for each of two cities), it doesn't seem like the spec has a structured way to group the districts within each collection. If the type for both is "council_district" or "Council," then it doesn't seem like there is a way to associate all the district numbers together for each separate group. It looks like what is needed is some kind of district type object to reference and not just a string. The object could have the type enumeration (e.g. "council_district") as well as a display name (e.g. "Oakland City Council Districts" and "Berkeley City Council Districts").

@jdmgoogle
Copy link
Contributor

Two different electoral districts can be of the same type. We (Google) just use the type to help us figure out (a) where contests should go on the ballot relative to each other, and (b) de-duping contests when we get data from multiple sources. We also get ballot data from non-VIP sources, and when we're trying to merge the two we use the district type to help bin them together.

I guess I'm not sure where you see the problem. Could you provide a concrete XML snippet that you feel represents reality but violates the spec?

Also, now that we've seen the list of types for each, what is everyone's vote on the matter?
(1) Use the 1622.2 ReportingUnitType enums; or
(2) Use the OCD-ID types?

@cjerdonek
Copy link
Contributor

I guess I'm not sure where you see the problem. Could you provide a concrete XML snippet

Here is one:

<!-- Start Oakland City Council districts -->
<ElectoralDistrict id="1">
  <name>District 1</name>
  <type>Council</type>
  <number>1</number>
</ElectoralDistrict>
<ElectoralDistrict id="2">
  <name>District 2</name>
  <type>Council</type>
  <number>2</number>
</ElectoralDistrict>
<!-- Start Berkeley City Council districts -->
<ElectoralDistrict id="3">
  <name>District 1</name>
  <type>Council</type>
  <number>1</number>
</ElectoralDistrict>
<ElectoralDistrict id="4">
  <name>District 2</name>
  <type>Council</type>
  <number>2</number>
</ElectoralDistrict>

Given the XML above (which uses all of the available elements), there is no way to, say, get all of the ElectoralDistrict instances that correspond to Oakland City Council districts (or Berkeley Council districts). You can only get all city council districts generically.

It seems like each ElectoralDistrict object should point to an ElectoralDistrictType object, which would serve to group the numbered districts together. For example, for the above XML the types might look like--

<!-- Start Oakland City Council districts -->
<ElectoralDistrictType id="1">
  <name>Oakland City Council</name>
  <type>Council</type>
</ElectoralDistrictType>
<ElectoralDistrictType id="2">
  <name>Berkeley City Council</name>
  <type>Council</type>
</ElectoralDistrictType>

And it would be on these objects that the enumeration you're discussing would go on, rather than repeated on every one of the component districts.

@jdmgoogle
Copy link
Contributor

I guess I'm still not seeing the problem. Precincts (and their splits) provide references to all the electoral districts in which they're located, so in the first example the precincts which are in Berkeley would simply have references to electoral districts 1 and 2, whereas the precincts in Oakland would have references to districts 3 and 4.

In order to get the ballot for a given precinct (or split) we iterate over all the electoral districts referenced by the precinct (split), bin them by type, and then sort them.

@cjerdonek
Copy link
Contributor

I see my point still isn't coming across. Districts within a given numbering sequence have an obvious sibling relationship with one another that isn't currently reflected in the spec. It's answered by the question, "What am I a district of?" It lets you do things like list out the other districts in the family.

In the example I gave above, you only knew that the "parent" areas were Berkeley and Oakland because I told that to you. That information isn't currently encoded in the XML (unless you have "Council" hard-coded to mean "look at the City containing it").

If I gave you something like the below and asked which one is the "District 4" that corresponds to the "District 3," how would you be able to answer that? I don't think you'd be able to in general because you don't know the parent area.

<ElectoralDistrict id="1">
  <name>District 3</name>
  <type>district</type>
  <number>3</number>
</ElectoralDistrict>
<ElectoralDistrict id="100">
  <name>District 4</name>
  <type>district</type>
  <number>4</number>
</ElectoralDistrict>
<ElectoralDistrict id="200">
  <name>District 4</name>
  <type>district</type>
  <number>4</number>
</ElectoralDistrict>

@jdmgoogle
Copy link
Contributor

Districts within a given numbering sequence have an obvious sibling relationship
with one another that isn't currently reflected in the spec

Ah, okay. I see what you're saying. However this is Working As Intended. There is not and has never been any attempt to determine if two districts are siblings to each other (or have any other relation to each other). If the name of a district is simply "District 3" (instead of "Oakland District 3" or "Berkeley District 3") then we see that as a poorly-constructed feed instead of a fundamental flaw in the XSD.

@cjerdonek
Copy link
Contributor

Okay, so it's out of scope. That's fine (though I do think the information would be useful and more DRY). One more question. With the proposed enum, would it be a problem if a precinct is contained in two electoral districts with the same type, or do you insist that there be at most one of each type?

@jdmgoogle
Copy link
Contributor

It probably would be useful in a way, but yes, it's out of scope for VIP (or at the very least VIP 5.0).

No, there's no problem if a precinct links to more than one electoral district of the same type.

@jdmgoogle
Copy link
Contributor

Any comments, @pstenbjorn @jktomer or @jungshadow?

@pstenbjorn
Copy link
Contributor

@jdmgoogle I think that for our current use cases rely on other relationships of electoral districts to set their context. If we have two districts named exactly the same thing, the context of the relationships should define the scope of the district, so I don't see a logical problem. As far as display, we had discussed allowing the use of a local district type name alias that could address visualization challenges.

@jdmgoogle
Copy link
Contributor

I'll let @jktomer weigh on in this, too, but here's my current thinking:

We should adopt the 1622.2 types, plus an "OtherType" string.

  • This will more easily enable compatibility with 1622.2;
  • We (Google) feel relatively confident in our ability to map between 1622.2, OCDID, and other types; and
  • If the data provider includes an OCDID as one of the ExternalIdentifiers, that makes things even easier.

Plus we really only use this (at the moment) to reconcile ballot data between a VIP feed and other data sources (e.g., the Ballot Information Project).

Hence, my vote is just to use the 1622.2 set of enumerations.

@jungshadow
Copy link
Collaborator Author

I like the simplicity of the 1622.2 set of enumerations. The opportunity to use an OCDID on a particular piece of geography is available through ExternalIdentifier. I think starting from a basic list helps us move toward a generalized glossary of terms.

@jktomer
Copy link

jktomer commented Apr 21, 2015

+1 for the 1622 set. The important thing, from our perspective, is that we can determine whether a given VIP district is equivalent to a given district we get from another source (say, BIP). That depends on two factors:

  • the reliable presence of this field in VIP feeds, and
  • an easy mapping between its values and the equivalent information we have about districts from our other sources.

The first condition is vastly improved by having a minimal set of choices, which speaks strongly in favor of using the 1622 set. The second requires that we have enough detail to adequately cover all the distinct districts we know about in any given election, and that the choices we have map semantically well enough to any other values we have that translation is easy. The 1622 set seems adequate on both counts to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants