-
-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding gitdumper module #578
Conversation
…/dev 1.05.1 stable
…/dev Dev --> Stable
…/dev Dev --> Stable
…/dev Dev --> Stable
…/dev Dev --> Stable
@holsick thanks for the PR! Before we get into writing tests, etc., I want to deliberate a little about git-dumper. We are trying to avoid third-party libraries that make their own web requests because it's difficult to control how they behave, e.g. it's necessary to pass through the global BBOT user-agent and proxy, and they don't obey the global rate limits we have set for HTTP. Also, git-dumper is synchronous, meaning that this function: def dump(self, url):
return git_dumper.fetch_git(url, self.source_code_path, self.workers, self.retries, self.timeout, self.headers) Will block the event loop (i.e. the whole scan) until it completes. There is a way around this; we can submit it to the thread pool with This would take some work but armed with chatgpt it shouldn't be too bad. Let me know what you think. |
Hey @TheTechromancer! Everything you described makes total sense. I ran into those issues when trying to use the default git-dumper (blocking scan, unable to control HTTP requests, headers, etc.) so that is all good to know. In that case, I think I may go down the path of "asyncifying" git-dumper into something even smaller and more suitable for BBOT. Should be a fun project. |
@holsick I am closing this for now. Please feel free to re-open -- maybe if you want to integrate your golang tool. |
This module will run git-dumper against urls with exposed .git directories, extracting the source code to a local folder.