You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given the examples, struggle to understand how this may be utilized to extract the main article from a page. In this case, the sample would be the article content itself. Would be great if it could use several samples from other websites and then develop a generalized pattern for additional pages. Guessing this my be out of scope for this project.
The text was updated successfully, but these errors were encountered:
wanted_dict= {
"title": ["Possible to to try to extract main article from a page?"],
"meta": ["vzeazy"],
"content": ['Given the examples, struggle to understand how this may be utilized to extract the main article from a page. In this case, the sample would be the article content itself. Would be great if it could use several samples from other websites and then develop a generalized pattern for additional pages. Guessing this my be out of scope for this project.']
}
html_file=open('sample/train.html', 'r', encoding='utf-8')
source_code=html_file.read()
result=scraper.build(html=source_code, wanted_dict=wanted_dict)
scraper.save('github')
html_file=open('sample/test.html', 'r', encoding='utf-8')
source_code=html_file.read()
result=scraper.get_result_exact(html=source_code)
Given the examples, struggle to understand how this may be utilized to extract the main article from a page. In this case, the sample would be the article content itself. Would be great if it could use several samples from other websites and then develop a generalized pattern for additional pages. Guessing this my be out of scope for this project.
The text was updated successfully, but these errors were encountered: