Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Issue: #1020
Replace "ISSUE_NUMBER" with the number of your issue so that GitHub will link this pull request with the issue and make review easier.
Checklist
All checks are run in GitHub Actions. You'll be able to see the results of the checks at the bottom of the pull request page after it's been opened, and you can click on any of the specific checks listed to see the output of each step and debug failures.
Questions
I used a third party library (w3lib) to conveniently remove html tags. I don't know if that's okay which is why i didn't include it in the requirement files yet. I could implement the functionality without the library but the remove_tags method is much more convenient, and could be useful for other spiders too.
The test for the links is set to xfail because there was a unicode character in the result, and in actual result it represented with ascii characters, so should I change the ascii to unicode, or keep the ascii ?
In the links field, there might be some links with the same href but with different titles, like in this example. some titles are more descriptive than others, or contain more info like the zoom password. should I keep the duplicate hrefs or remove them ?
Include any questions you have about what you're working on.