[WhoScored] Unable to select English locale #440

Gibranium · 2023-12-14T15:38:39Z

Sorry to bring this back, but I am not able to scrape whoscored after months.
I explain it all, since April I had it all functioning properly. Then I changed my pc from an Intel MacBook Pro to a Mac mini m2, I've downloaded again Tor via Homebrew and set anaconda properly with a specific environment to use only soccerdata and dependencies. Still I've not been able to scrape a single file from Whoscored, while FBREF scraping - at least - works flawlessly. I've tried all the things that were recommended in precedently opened iterations of this problem, the only thing I've not tried till now is to use a VPN because I'd really like to not spend money right now to make it work. If anyone is able to help me in making it work feel free to contact me personally on twitter: @gualanodavide.
Thanks a lot to anyone

probberechts · 2023-12-14T16:19:07Z

Can you try to run the code in non-headless mode and check what happens in your browser window? Does it say that your IP is blocked or show a captcha?

import soccerdata as sd
ws = sd.WhoScored("ENG-Premier League", "2223", headless=False, no_cache=True)
leagues = ws.read_leagues()

If that's the case it's a problem with the undetected-chromedriver library, not with soccerdata. You can test with:

import undetected_chromedriver as uc
driver = uc.Chrome(headless=False, use_subprocess=False)
driver.get('https://www.whoscored.com/')

You might find a solution if the issue tracker of the undetected-chromedriver library.

Gibranium · 2023-12-14T19:30:41Z

I did not have any problem in my browser window, it opened whoscored and didn't ask for a captcha, but I think since I'm in Italy that it doesn't find the same names in the link as he request in the code, so it fails. Am I right, and how can I solve it?

probberechts · 2023-12-14T21:34:38Z

Oh, but now you have a different error. You got past the error in your first comment. Can you share the "tiers.json" file in "/Users/davidegualona/soccerdata/data/WhoScored"?

Gibranium · 2023-12-14T21:52:19Z

Yes, of course.

Here it is:

tiers.json

probberechts · 2023-12-14T22:32:32Z

You were right, the country names are in Italian in your "tiers.json" file. One option is to add the Italian names in the config/league_dict.json file (see https://soccerdata.readthedocs.io/en/latest/howto/custom-leagues.html). For example,

{
  "ENG-Premier League": {
    "WhoScored": "Inghilterra - Premier League"
  }
}

You might experience more problems in other parts of the code though.

Alternatively, you could try to set the default language of your browser to English or configure selenium accordingly (see https://stackoverflow.com/questions/55150118/trouble-modifying-the-language-option-in-selenium-python-bindings).

Let me know what works.

Gibranium · 2023-12-15T00:09:52Z

I've tried the first one but the code immediately presents another problem, so I think It's not viable. For the other two: I've tried to change the language of Chrome and Safari, but It doesn't resolve it because in the search page the result already is in Italian, for the adjustment via your link I don't think I have the necessary ability to pull a functioning adjustment. I've tried with some help from ChatGPT but in 1 hour we couldn't find a solution, because apparently this:

driver = webdriver.Chrome(chrome_options=options)

needs to be this:

driver = webdriver.Chrome(options=options)

in order to apply the options, but still I don't know to make the driver work into the scraping part. Nonetheless ChatGPT made me try this:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

Set up the WebDriver with language preference

options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {'intl.accept_languages': 'en,en_US'})
driver = webdriver.Chrome(options=options)

Navigate to the WhoScored page using Selenium

driver.get("https://www.whoscored.com/") # Replace with the actual URL

Extract the HTML content after the page has loaded

html_content = driver.page_source

Continue with requests and BeautifulSoup

soup = BeautifulSoup(html_content, 'html.parser')

In order to see if things could work to later melt the soccerdata part with this adjustment, and I found that even though I can make him load in English after a second Whoscored refresh itself and load in Italian.
So, either I am not good enough to pull this or I need to go and do a NordVPN subscription, am I right?

probberechts · 2023-12-15T04:05:45Z

You can also try to redirect to the English version by simulating a click on the language menu at the top left.

import soccerdata as sd
ws = sd.WhoScored("ENG-Premier League", "2223", headless=False, no_cache=True)
ws._driver.get("https://www.whoscored.com/")
ws._driver.execute_script("location = 'https://whoscored.com/'")
leagues = ws.read_leagues()

Gibranium · 2023-12-15T12:16:48Z

It does what it is supposed to do, but nonetheless Whoscored refresh itself and load in Italian

probberechts · 2023-12-15T12:21:19Z

Is there any way in which you can switch to English when browsing the website manually?

Gibranium · 2023-12-15T12:23:02Z

There's a toggle in which you can choose the language, but if I set EN it switches automatically back to IT

Gibranium · 2023-12-15T12:56:34Z

Anyway, I've resolved my subscribing to NordVPN, right now it seems worth the amount of money for the effort.
I'd ask you only another thing - then you can close the issue if you need to - for [WhoScored] Ignore cached events file if empty #420, the improvement has been already added to soccerdata or we should write the enhancement by ourselves? In that case I should do it where? Thank you very much for all the help.

probberechts · 2023-12-16T21:12:59Z

Ok, great! If the locale is hard-coded based on IP location I think the only possible fixes are indeed translating some parts of the implementation or using a VPN.

#420 is not yet released. If you can't wait for the next release, you can install the latest build from test.pypi.

probberechts added the WhoScored Issue or pull request related to the WhoScored scraper label Jan 1, 2024

probberechts changed the title ~~Whoscored problem~~ [WhoScored] Unable to select English locale Jan 1, 2024

probberechts mentioned this issue Sep 5, 2024

Problem with Whoscored scraper #698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WhoScored] Unable to select English locale #440

[WhoScored] Unable to select English locale #440

Gibranium commented Dec 14, 2023

probberechts commented Dec 14, 2023

Gibranium commented Dec 14, 2023

probberechts commented Dec 14, 2023

Gibranium commented Dec 14, 2023

probberechts commented Dec 14, 2023

Gibranium commented Dec 15, 2023

probberechts commented Dec 15, 2023

Gibranium commented Dec 15, 2023

probberechts commented Dec 15, 2023

Gibranium commented Dec 15, 2023

Gibranium commented Dec 15, 2023

probberechts commented Dec 16, 2023

[WhoScored] Unable to select English locale #440

[WhoScored] Unable to select English locale #440

Comments

Gibranium commented Dec 14, 2023

probberechts commented Dec 14, 2023

Gibranium commented Dec 14, 2023

probberechts commented Dec 14, 2023

Gibranium commented Dec 14, 2023

probberechts commented Dec 14, 2023

Gibranium commented Dec 15, 2023

Set up the WebDriver with language preference

Navigate to the WhoScored page using Selenium

Extract the HTML content after the page has loaded

Continue with requests and BeautifulSoup

probberechts commented Dec 15, 2023

Gibranium commented Dec 15, 2023

probberechts commented Dec 15, 2023

Gibranium commented Dec 15, 2023

Gibranium commented Dec 15, 2023

probberechts commented Dec 16, 2023