-
-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TorBot Stable Version1.2 #56
Conversation
Version 1.1.0_dev
…/torbot into info_feature_v0.0.1
Info_feature_v0.0.1
Fixed requirements.txt by adding missing "="
Fixed the error occurring when given a broken link with -u flag and other minor improvements.
Adding Contributors
PEP8 code and fixed some minor bugs
Updating my branch
To socket code and put it in side of a function. Also took out the Controller module since it wasn't actually being used.
Removed code that wasn't being called and now using regular expressions to validate urls
Instead of just checking if string contains http or https Uses two functions which use regular expressions to match valid urls. One funciton is specifically geared towards onion address and the other is for general url validation.
Using the -s flag will now save the results in a json file within the current working directory.
Takes list of URLs and asychrnously calls HEAD request on each link in the list and test status code for a 200 response. If the response is not 200 or takes longer than 8 seconds, the link is declared dead. Also switched from urllib.request to requests for not only simplicity but thread-safety also.
Was previously showing dead for any status code which isn't 200, so I use raise_for_status function from requests which only raises an error if the status code is an HTTP error status code such as 4xx and 5xx
To use dash -e flag, pass url name like www.url.com and it will try to establish an https connection than http if the secure conneciton fails. If both fail then an error message is printed. If the connection is successful than we search for the onion domain name and the others that were passed with the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Should remove Python Stem Module from Dependencies (README.md, Line 88)
-Fix Grammar error. Remove the word "should" before "default setting". (README.md, Line 105)
-Still using urlliib.request, should be requests.
Fixed requirements.txt, I took out the stem module since it's not used anymore. Added requests module also
@KingAkeem Thanks for pointing out. I will fix those asap. |
modules/pagereader.py
Outdated
@@ -1,17 +1,36 @@ | |||
import urllib.request | |||
import urllib.request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be switched to requests
modules/pagereader.py
Outdated
def readPage(site): | ||
headers = {'User-Agent': | ||
'TorBot - Onion crawler | www.github.com/DedSecInside/TorBot'} | ||
req = urllib.request.Request(site, None, headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs to be switched to requests
modules/pagereader.py
Outdated
while (attempts_left): | ||
try: | ||
response = urllib.request.urlopen(req) | ||
page = BeautifulSoup(response.read(), 'html.parser') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requests
requirements.txt
Outdated
@@ -1,3 +1,4 @@ | |||
beautifulsoup4==4.6.0 | |||
PySocks==1.6.7 | |||
stem==1.5.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stem is no longer being used and request requirement should be added for version 2.18.4
This PR will be updated once #66 is merged. @KingAkeem |
Fixed `-l` flag for finding live links.
@KingAkeem Please review this |
- sudo apt-get -y install python3-pip | ||
- pip3 install bs4 | ||
- pip3 install -r requirements.txt | ||
- cd tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are using pytest, whenever you go into the test suite. You only need to run pytest
to run all of the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
README.md
Outdated
5. Save crawl info to JSON file.(Completed) | ||
6. Crawl custom domains.(Completed) | ||
7. Check if the link is live.(Not Started) | ||
4. Built-in Updater.(Completed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Built-in Updater should be number 8 instead of 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good.
@KingAkeem @agostinelli @agrepravin @leaen Thanks for the awesome work |
#49