Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to traverse links using Breadth First Search #111

Merged
merged 8 commits into from
Aug 10, 2018
Merged

Add function to traverse links using Breadth First Search #111

merged 8 commits into from
Aug 10, 2018

Conversation

KingAkeem
Copy link
Member

Issue #102

Changes Proposed

  • Add functions for traversing links on webpage using Breadth First Search

Explanation of Changes

Two functions have been added, one which accepts the html of a webpage and an integer which represents the depth at which to stop. This function invokes the traversal function which searches the links using Breadth First Search algorithm.

@PSNAppz
Copy link
Member

PSNAppz commented Aug 8, 2018

@KingAkeem Awesome work 👏🏻.

@PSNAppz PSNAppz added this to the TorBot v1.3 milestone Aug 8, 2018
Copy link
Contributor

@agrepravin agrepravin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise


toVisit = list()
for link in links:
if targetLink == link and targetLink:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see value of and condition here. If targetLink == link it will always be targetLink, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure targetLink since Python is a dynamic language, it's impossible to tell ahead of time what items a list may contain. If a None were to somehow get inserted, I don't want it to return a false positive.

for link in links:
if targetLink == link and targetLink:
return depth
resp = requests.get(link)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if errors out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, didn't think about that. I'm going to just put a try-except block and just pass errors. If there are errors, then we can just assume the link isn't valid.

@PSNAppz PSNAppz merged commit 5ca2d21 into DedSecInside:dev Aug 10, 2018
@KingAkeem KingAkeem deleted the bfs_crawl branch August 10, 2018 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants