-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WhoScored] Unable to select English locale #440
Comments
Can you try to run the code in non-headless mode and check what happens in your browser window? Does it say that your IP is blocked or show a captcha? import soccerdata as sd
ws = sd.WhoScored("ENG-Premier League", "2223", headless=False, no_cache=True)
leagues = ws.read_leagues() If that's the case it's a problem with the undetected-chromedriver library, not with soccerdata. You can test with: import undetected_chromedriver as uc
driver = uc.Chrome(headless=False, use_subprocess=False)
driver.get('https://www.whoscored.com/') You might find a solution if the issue tracker of the undetected-chromedriver library. |
Oh, but now you have a different error. You got past the error in your first comment. Can you share the "tiers.json" file in "/Users/davidegualona/soccerdata/data/WhoScored"? |
Yes, of course. Here it is: |
You were right, the country names are in Italian in your "tiers.json" file. One option is to add the Italian names in the {
"ENG-Premier League": {
"WhoScored": "Inghilterra - Premier League"
}
} You might experience more problems in other parts of the code though. Alternatively, you could try to set the default language of your browser to English or configure selenium accordingly (see https://stackoverflow.com/questions/55150118/trouble-modifying-the-language-option-in-selenium-python-bindings). Let me know what works. |
I've tried the first one but the code immediately presents another problem, so I think It's not viable. For the other two: I've tried to change the language of Chrome and Safari, but It doesn't resolve it because in the search page the result already is in Italian, for the adjustment via your link I don't think I have the necessary ability to pull a functioning adjustment. I've tried with some help from ChatGPT but in 1 hour we couldn't find a solution, because apparently this: driver = webdriver.Chrome(chrome_options=options) needs to be this: driver = webdriver.Chrome(options=options) in order to apply the options, but still I don't know to make the driver work into the scraping part. Nonetheless ChatGPT made me try this: import requests Set up the WebDriver with language preferenceoptions = webdriver.ChromeOptions() Navigate to the WhoScored page using Seleniumdriver.get("https://www.whoscored.com/") # Replace with the actual URL Extract the HTML content after the page has loadedhtml_content = driver.page_source Continue with requests and BeautifulSoupsoup = BeautifulSoup(html_content, 'html.parser') In order to see if things could work to later melt the soccerdata part with this adjustment, and I found that even though I can make him load in English after a second Whoscored refresh itself and load in Italian. |
You can also try to redirect to the English version by simulating a click on the language menu at the top left. import soccerdata as sd
ws = sd.WhoScored("ENG-Premier League", "2223", headless=False, no_cache=True)
ws._driver.get("https://www.whoscored.com/")
ws._driver.execute_script("location = 'https://whoscored.com/'")
leagues = ws.read_leagues() |
It does what it is supposed to do, but nonetheless Whoscored refresh itself and load in Italian |
Is there any way in which you can switch to English when browsing the website manually? |
There's a toggle in which you can choose the language, but if I set EN it switches automatically back to IT |
Anyway, I've resolved my subscribing to NordVPN, right now it seems worth the amount of money for the effort. |
Ok, great! If the locale is hard-coded based on IP location I think the only possible fixes are indeed translating some parts of the implementation or using a VPN. #420 is not yet released. If you can't wait for the next release, you can install the latest build from test.pypi. |
Sorry to bring this back, but I am not able to scrape whoscored after months.
I explain it all, since April I had it all functioning properly. Then I changed my pc from an Intel MacBook Pro to a Mac mini m2, I've downloaded again Tor via Homebrew and set anaconda properly with a specific environment to use only soccerdata and dependencies. Still I've not been able to scrape a single file from Whoscored, while FBREF scraping - at least - works flawlessly. I've tried all the things that were recommended in precedently opened iterations of this problem, the only thing I've not tried till now is to use a VPN because I'd really like to not spend money right now to make it work. If anyone is able to help me in making it work feel free to contact me personally on twitter: @gualanodavide.
Thanks a lot to anyone
The text was updated successfully, but these errors were encountered: