-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Mississauga scraper since new mayor has been elected #434
Fix Mississauga scraper since new mayor has been elected #434
Conversation
9cadfea
to
5cd9c90
Compare
ca_on_mississauga/people.py
Outdated
|
||
def councillor_data(self, url): | ||
page = self.lxmlize(url) | ||
|
||
name_district = page.xpath('//*[@id="com-main"]/div/div/div/h1/text()')[0] | ||
hyphen = name_district.find("Councillor") | ||
if hyphen == -1: | ||
hyphen = 9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the significance of the magic numbers 9, 27 and 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpmckinney so for 9 it was the length of "Ward 5 –" because unhelpfully the council didn't include the word "Councillor" on Natalie's page title https://www.mississauga.ca/council/city-council-members/ward-5-councillor-natalie-hart/ . For 27 it was the length of "Councillor and Deputy Mayor" in regards to Matt's title as i was hitting errors if the string and Deputy Mayor were not removed https://www.mississauga.ca/council/city-council-members/ward-8-councillor-matt-mahoney/ . For 8 it is the length of "Mayor - " to remove it from Carolyn's page title https://www.mississauga.ca/council/city-council-members/mayor-carolyn-parrish/
I wasn't sure if there were better ways to resolve these random matters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest doing things like variable.replace("Councillor and Deputy Mayor", "").strip()
, that way if there is no match, nothing special happens. Right now, if there's no match, the string will be cut at a strange index.
d9ebfb9
to
0922876
Compare
@jpmckinney I think this should be better now and hopefully less strange |
ping @jpmckinney
This deals with the fact there is a new mayor https://www.mississauga.ca/council/city-council-members/ and that for Ward 8 they have put Deputy Mayor in the title of the page as well as Councillor