GameScraper is a web application that scrapes the web for video game news and deals.
When I first started learning Python and was looking for something to apply it to I had a hard time finding something that interested me. Then as I was installing some addons on Kodi I noticed there was quite a lot of Python involved! And I also saw the Beautiful Soup logo. So then it dawned on me that maybe Beautiful Soup was used to scrape information about the various movies and television shows and that inspired me to learn about web scraping. So for my capstone project I wanted to combine my interest in data collection and my love for videogames into a web application.
At this time GameScraper collects data from PC Gamer, Steam, The Verge, and Techspot to aggregate the latest news and deals. The search function collects data from Gamespot and the RAWG API. It used to also utilized the Bing Web Search API but only during my 2-week trial.
- Python
- Django
- BeautifulSoup
The images that are used on the steam home page are small. A lot smaller than what I wanted to display in my app. Resizing the images to make them bigger just made them blurry. So instead of extracting the image URL with Beautiful Soup, I extracted the games' Steam app id from the data-ds-appid attribute in the <a>
tag and inserted them into a header url:
In this example, the app id is "275850", which is the game No Man's Sky.
Once I found a way to get app id's from the <a>
tags it was only a matter of figuring out the proper way to construct the image URL. The header images for games on Steam all follow the same pattern:
https://steamcdn-a.akamaihd.net/steam/apps/THE GAME'S APP ID GOES HERE/header.jpg?
So all I needed to do was format the URL string to include the app id:
app_id = result.get("data-ds-appid")
image = f"https://steamcdn-a.akamaihd.net/steam/apps/{app_id}/header.jpg?"
That worked fine for most games, but I ran into a problem. Some games had more than one app id in the data-ds-appid attribute. For instance, The Witcher 3: Wild Hunt Game of the Year Edition had three app ids because it's comprised of three separate apps, the base game, and two DLC expansions, each with its own id:
After extraction, the ids all came back as one string: "292030,378649,378648"
So I just refactored the code to split the string into a list and assigned the first element of the list(which is the app id of the base game) to the app_id variable:
if "," in result.get("data-ds-appid"):
app_split = result.get("data-ds-appid").split(",")
app_id = app_split[0]
else:
app_id = result.get("data-ds-appid")
It's not a perfect solution since I'd still like to get the image the Game of the Year edition, but this is a good workaround for now since the extracted title identifies the game as the Game of the Year edition.
And here's how the complete code for the steam deals scrape as it appears in the views.py file of the app:
#Steam News Scrape
try:
steam_news_page = requests.get("https://store.steampowered.com/",headers={"User-Agent":"Defined"})
soup = BeautifulSoup(steam_news_page.text, "html.parser")
steam_results = soup.find("div", {"id": "tab_specials_content"}).select("a")
steam_news = []
for result in steam_results[:10]:
if "," in result.get("data-ds-appid"):
app_split = result.get("data-ds-appid").split(",")
app_id = app_split[0]
else:
app_id = result.get("data-ds-appid")
steam_news.append({
'url': result.get("href"),
'image': f"https://steamcdn-a.akamaihd.net/steam/apps/{app_id}/header.jpg",
'title': result.find("div", {"class": "tab_item_name"}).text,
'original_prince': result.find("div", {"class": "discount_original_price"}).text,
'price': result.find("div", {"class": "discount_final_price"}).text,
'app_id': app_id
})
except:
steam_news = None
- Scrapers can and will break
- The search feature doesn't always return the expected results
- Move scraper code from views.py to seperate file and import it's function into views.py
- Save scrape results to the database
- User login and ability to comment on other users favorites.
- Only the user who created the favorite can edit it.
- Activate virtual env
.\myvenv\Scripts\activate
- Start dev server
python .\manage.py runserver