This script is used to scrape the text of posts on a public facebook page and comments(all of them and all of their replies) on those posts and save them in mongodb database. It also stores the link to the profile of the person who made the comment. It uses selenium. This script does not use facebook's graph api.
- python 3.x
- Latest chrome wedriver, should be placed in the same folder as the script
- mongodb and selenium to be installed
Note: This script run on linux machine, with slight modification it could be run on windows machine also.
-
Open the
facebook_scraper.py
file in any text editor- Assign the name of the page to
page_name
variable (line 31). - Assign the URL to
url
variable (line 32). - Assign how many time do you want to scroll to
total_scrolls
variable (line 35). The value should be a non-negative integer. More the value more will be scroll. - Assign your facebook's email id to
email
variable (line 150). - Assign your password to
password
variable (line 151). - Now save the changes.
- Assign the name of the page to
-
Next put the latest chrome webdriver in the same folder as the script. I am attaching a webdriver which was latest durint the writing of this script.
-
Open terminal and
cd
into the directory containing the script. Than run the script by writingpython3 facebook_scraper.py
and wait for the script to complete it's work. -
suggestion - It is suggested to use VPN for scraping the data as facebook might block your ip if you scrawl the data. Though this script works fine even without VPN.
The output of this script is all the text of posts on a public facebook page and comments(all of them and all of their replies) on those posts stored in facebook mongodb database. With slight changes you could store them in a text or csv file.
Scraping data from facebook without permission from facebook or without using it's graph api is illegal. Kindly refer to https://www.facebook.com/robots.txt for more details.