Create 2019 Indian Vidhan Sabha OCDIDs. #174

rahul-nath · 2019-10-03T22:19:54Z

In this PR I've included a script that I created to generate OCD IDs specifically for the Indian Vidhan Sabha elections of Maharashtra and Haryana. The OCD IDs generated are for constituencies of these states, which include their districts. ~~There were some decisions made regarding district~~ ~~names due to discrepancies between the districts in wikipedia pages for [Maharashtra districts]~~(https://en.wikipedia.org/wiki/List_of_districts_of_Maharashtra#Districts) and [constituencies]~~(https://en.wikipedia.org/wiki/List_of_constituencies_of_the_Maharashtra_Legislative_Assembly)~~ ~~and Haryana districts and~~
~~[constituencies]~~
~~(https://en.wikipedia.org/wiki/List_of_constituencies_of_the_Haryana_Legislative_Assembly). The~~ ~~district pages were deferred to over the analogous columns in the constituency pages after some~~ ~~research. They are as follows:~~

~~- Yamunanagar is used over Yamuna Nagar~~
~~- Gondia is used over Gondiya~~
~~- Gurugram is used over Gurgaon~~
~~- Nuh is used over ``Mewat~~

~~where applicable.~~

UPDATE: For source of truth, it was determined that https://affidavit.eci.gov.in is the source of truth regarding Consituency and District names.

Additionally, no changes were made to the aliases file located in identifiers/countries-in as it's unclear if that was necessary.

rahul-nath · 2019-10-03T22:22:55Z

@jamesturk @jpmckinney @jdmgoogle this is necessary for the imminent Vidhan Sabha elections. I am unable to assign reviewers so please assign yourself.

…tions.

rahul-nath · 2019-10-09T19:47:38Z

Just want to bump this to make sure it's been seen.

jpmckinney · 2019-10-09T23:37:11Z

Format looks good to me – I haven't checked against source of truth.

jdmgoogle

Thanks for putting this together. I just have some questions around the script, the naming structure, and the original source of truth.

jdmgoogle · 2019-10-14T14:24:49Z

scripts/create_ocd_ids.py

+  for parent in sorted(parent_set, key=lambda x: x.split(",")[-1]):
+    print(parent)
+
+for state_abbr, state  in contests.items():


Please sort the output so it's easier to read.

Do we wish to sort by state name, then district name or state name by constituency?

jdmgoogle · 2019-10-14T14:28:12Z

scripts/create_ocd_ids.py

+  # format hardcoded OCD ID
+  global new_file
+  ocd_id = "ocd-division/country:{}/state:{}/district:{}/cd:{}"
+  rest = "state {} district {} {} constituency {}"


The naming here is a bit awkward. Maybe

${constituency} constituency, ${district} district, ${state}

E.g.,

Khanapur constituency, Sangli district, Maharashtra

Will do. I think I'll use semicolons in lieu of the commas here, unless we should add new column names for constituency, district, and state (can we do that? It could potentially serve a future purpose)

jdmgoogle · 2019-10-14T14:29:46Z

scripts/create_ocd_ids.py

+    "Kasba Peth": "Kasbapeth"
+}
+
+const_replacements = {


Why are these being replaced?

With the electoral districts in India have changed frequently in the last decade, so there's a lot of erroneous information out there. Some of it has made its way to wikipedia, which is unfortunately the only place I'd found abbreviations. To reconcile the differences between the the ultimate source of truth (https://affidavit.eci.gov.in) and the spreadsheet of abbreviations to districts, I use this dictionary. (Actually, this particular set of replacements is going to be taken out of this PR as a more concrete source of constituencies has been found with all corresponding states and districts: https://electoralsearch.in

jdmgoogle · 2019-10-14T14:30:06Z

scripts/create_ocd_ids.py

+contests = {"hr": "Haryana", "mh": "Maharashtra"}
+columns = ["id", "name"]
+country = "in"
+election = "Vidhan Sabha"


OCD-IDs should be independent of any one election.

Looking at the OCD-IDs for the Lok Sabha elections, it looks like the name of the election was included in the file containing the election (I was pattern matching). I'll take this out and generalize this script better.

jdmgoogle · 2019-10-14T14:30:48Z

scripts/create_ocd_ids.py

+
+  for c_row in consts:
+    # source of truth on district names:
+    # https://affidavit.eci.gov.in/


What, exactly is being pulled from there? What's the input CSV that this script is munging?

I will include another script I made that fetches data and creates constituency CSVs for each state from the new source. District and constituency information is pulled from that website. I'll detail the expected format of the district abbreviation in a comment, but that information must be retrieved from elsewhere; in this case, they were taken from wikipedia manually put into a spreadsheet without the use of any provided script.

While that's useful to have, I'd prefer to split that out into a separate PR and have this one focus on only the OCD-IDs.

Great. I split the PRs with this one containing the OCD IDs and another that adds the scripts that generate them.

jdmgoogle

Once the names of the constituencies are updated we should be good to go. Thanks.

jdmgoogle

@jpmckinney

rahul-nath · 2019-10-22T14:03:43Z

Format looks good to me – I haven't checked against source of truth.

Awesome, sounds good @jpmckinney . Let me know if there's any changes that need to be done on the additional OCD-IDs

Create 2019 Indian Vidhan Sabha OCDIDs.

4db6f36

rahul-nath added 3 commits October 7, 2019 11:28

Update OCDIDs according to EC of India.

af1be76

Update create ocd id script.

35f2298

Rollback commits and add correct OCD IDs for Indian Vidhan Sabha elec…

2ad61c6

…tions.

jpmckinney requested a review from jdmgoogle October 9, 2019 23:36

jpmckinney closed this Oct 9, 2019

jpmckinney reopened this Oct 9, 2019

jdmgoogle reviewed Oct 14, 2019

View reviewed changes

jdmgoogle reviewed Oct 18, 2019

View reviewed changes

Remove script and add Delhi, Bihar, and Jharkhand OCD-IDs

f07af3d

rahul-nath requested a review from jdmgoogle October 18, 2019 18:44

jdmgoogle approved these changes Oct 21, 2019

View reviewed changes

jpmckinney merged commit 6587ba6 into opencivicdata:master Oct 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create 2019 Indian Vidhan Sabha OCDIDs. #174

Create 2019 Indian Vidhan Sabha OCDIDs. #174

rahul-nath commented Oct 3, 2019 •

edited

Loading

rahul-nath commented Oct 3, 2019

rahul-nath commented Oct 9, 2019

jpmckinney commented Oct 9, 2019

jdmgoogle left a comment

jdmgoogle Oct 14, 2019

rahul-nath Oct 17, 2019 •

edited

Loading

jdmgoogle Oct 14, 2019

rahul-nath Oct 17, 2019

jdmgoogle Oct 14, 2019

rahul-nath Oct 17, 2019

jdmgoogle Oct 14, 2019

rahul-nath Oct 17, 2019

jdmgoogle Oct 14, 2019

rahul-nath Oct 17, 2019

jdmgoogle Oct 18, 2019

rahul-nath Oct 18, 2019

jdmgoogle left a comment

jdmgoogle left a comment

rahul-nath commented Oct 22, 2019

Create 2019 Indian Vidhan Sabha OCDIDs. #174

Create 2019 Indian Vidhan Sabha OCDIDs. #174

Conversation

rahul-nath commented Oct 3, 2019 • edited Loading

rahul-nath commented Oct 3, 2019

rahul-nath commented Oct 9, 2019

jpmckinney commented Oct 9, 2019

jdmgoogle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rahul-nath Oct 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdmgoogle left a comment

Choose a reason for hiding this comment

jdmgoogle left a comment

Choose a reason for hiding this comment

rahul-nath commented Oct 22, 2019

rahul-nath commented Oct 3, 2019 •

edited

Loading

rahul-nath Oct 17, 2019 •

edited

Loading