Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Mississauga scraper since new mayor has been elected #434

Merged
merged 2 commits into from
Nov 4, 2024

Conversation

seamuslee001
Copy link
Contributor

ping @jpmckinney

This deals with the fact there is a new mayor https://www.mississauga.ca/council/city-council-members/ and that for Ward 8 they have put Deputy Mayor in the title of the page as well as Councillor


def councillor_data(self, url):
page = self.lxmlize(url)

name_district = page.xpath('//*[@id="com-main"]/div/div/div/h1/text()')[0]
hyphen = name_district.find("Councillor")
if hyphen == -1:
hyphen = 9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of the magic numbers 9, 27 and 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpmckinney so for 9 it was the length of "Ward 5 –" because unhelpfully the council didn't include the word "Councillor" on Natalie's page title https://www.mississauga.ca/council/city-council-members/ward-5-councillor-natalie-hart/ . For 27 it was the length of "Councillor and Deputy Mayor" in regards to Matt's title as i was hitting errors if the string and Deputy Mayor were not removed https://www.mississauga.ca/council/city-council-members/ward-8-councillor-matt-mahoney/ . For 8 it is the length of "Mayor - " to remove it from Carolyn's page title https://www.mississauga.ca/council/city-council-members/mayor-carolyn-parrish/

I wasn't sure if there were better ways to resolve these random matters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest doing things like variable.replace("Councillor and Deputy Mayor", "").strip(), that way if there is no match, nothing special happens. Right now, if there's no match, the string will be cut at a strange index.

@seamuslee001
Copy link
Contributor Author

@jpmckinney I think this should be better now and hopefully less strange

@jpmckinney jpmckinney merged commit 3e2503d into opencivicdata:master Nov 4, 2024
3 checks passed
@seamuslee001 seamuslee001 deleted the mississauga_fix_2 branch November 4, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants