We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm scraping a website to get the title and description and other meta data, but it's not working on all sites.
for example: https://www.youtube.com/watch?v=3AIZAGwMRg8
final List elements = document.head.getElementsByTagName('title');
elements returns []
But other sites work just fine, like https://apple.com
I'm also using:
final List<Element> metas = document.head.getElementsByTagName('meta');
And on that site, I'm not seeing all the meta tags
The text was updated successfully, but these errors were encountered:
It won't work because all of that is rendered through javascript, which this library does not run.
Disable javascript before loading a page and then you can see what can be scraped and what cannot.
I installed a chrome extension to do this (https://chrome.google.com/webstore/detail/toggle-javascript/cidlcjdalomndpeagkjpnefhljffbnlo) but you can also do it by pressing F12 to open the console and then pressing Cntr + Shift + P to open the command line, then just type javascript and the option is going to show up for you.
F12
Cntr + Shift + P
If you NEED javascript, i recommend running a library like puppeteer first and then parsing that post-rendered HTML.
Youtube also has an API you can tap into, instead of scraping their site. See if that can fit your need somehow.
Sorry, something went wrong.
If you set the User-Agent to a bot when retrieving the document, then it will return all of the tags.
No branches or pull requests
I'm scraping a website to get the title and description and other meta data, but it's not working on all sites.
for example:
https://www.youtube.com/watch?v=3AIZAGwMRg8
final List elements = document.head.getElementsByTagName('title');
elements returns []
But other sites work just fine, like https://apple.com
I'm also using:
And on that site, I'm not seeing all the meta tags
The text was updated successfully, but these errors were encountered: