Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GoTor HTTP service #219

Merged
merged 22 commits into from
Sep 21, 2021
Merged

Use GoTor HTTP service #219

merged 22 commits into from
Sep 21, 2021

Conversation

KingAkeem
Copy link
Member

@KingAkeem KingAkeem commented Sep 8, 2021

Requires DedSecInside/gotor#19

Changes Proposed

  • Replace Python internals with Golang HTTP service to perform webcrawling. This is done to take advantage of Golang's speed and built-in concurrency. The speed increase is significant.
  • Fix -i/--info information flag, the response returned from performing an HTTP requests with the requests library is an object so the text property needs to be used to get the HTML from the request.
  • Links have a new color used for status. yellow is now an indication that the link was redirected. (Maybe there should be a small indicator somewhere?)
  • Fix depth argument, previously it just didn't work correctly, especially when creating trees.
  • Changing version to 2.0.0 since this will be a major update and not backwards compatible (requiring a new server)
  • The golang service currently has to be started by the user in order to use the program. This can be resolved by either placing it on a public domain or creating an rpm/executable that can easily be used.

Metrics

  • Searching https://www.google.com at depth of 2. (results are from time command)
    use_gotor:
    Screenshot from 2021-09-09 08-58-59
    dev:
    Screenshot from 2021-09-09 09-04-41

How to run

  • Go to required branch Add HTTP server gotor#19 and run go run main.go -server to start the goTor service
  • Use TorBot in the same manner as you did previously, all arguments should work as they did before. (with a noticeable speed increase)

Explanation of Changes

The main goal of this PR is to increase the performance of running webcrawling operations such as building trees

Tasks left

  • Update README to reflect changes
  • Add documentation for new code
  • Remove Python unit tests
  • Add Go unit tests
  • Add service as git submodule

src/modules/info.py Show resolved Hide resolved
src/modules/api.py Show resolved Hide resolved
src/modules/info.py Outdated Show resolved Hide resolved
src/modules/info.py Outdated Show resolved Hide resolved
@PSNAppz PSNAppz self-requested a review September 19, 2021 15:58
Copy link
Member

@PSNAppz PSNAppz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and working good!

@KingAkeem KingAkeem merged commit 8dbc7e5 into dev Sep 21, 2021
@KingAkeem KingAkeem deleted the use_gotor branch September 21, 2021 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants