UDdup - URLs Deduplication Tool

The tool gets a list of URLs, and removes "duplicate" pages in the sense of URL patterns that are probably repetitive and points to the same web template.

For example:

https://www.example.com/product/123
https://www.example.com/product/456
https://www.example.com/product/123?is_prod=false
https://www.example.com/product/222?is_debug=true

All the above are probably points to the same product "template". Therefore it should be enough to scan only some of these URLs by our various scanners.

The result of the above after UDdup should be:

https://www.example.com/product/123?is_prod=false
https://www.example.com/product/222?is_debug=true

Why do I need it?

Mostly for better (automated) reconnaissance process, with less noise (for both the tester and the target).

Examples

Take a look at demo.txt which is the raw URLs file which results in demo-results.txt.

Installation

With pip (Recommended)

pip install uddup

Manual (from code)

# Clone the repository.
git clone https://github.com/rotemreiss/uddup.git

# Install the Python requirements.
cd uddup
pip install -r requirements.txt

Usage

uddup -u demo.txt -o ./demo-result.txt

More Usage Options

uddup -h

Short Form	Long Form	Description
-h	--help	Show this help message and exit
-u	--urls	File with a list of urls
-o	--output	Save results to a file
-s	--silent	Print only the result URLs
-fp	--filter-path	Filter paths by a given Regex

Filter Paths by Regex

Allows filtering custom paths pattern. For example, if we would like to filter all paths that starts with /product we will need to run:

# Single Regex
uddup -u demo.txt -fp "^product"

Input:

https://www.example.com/
https://www.example.com/privacy-policy
https://www.example.com/product/1
https://www.example2.com/product/2
https://www.example3.com/product/4

Output:

https://www.example.com/
https://www.example.com/privacy-policy

Advanced Regex with multiple path filters

uddup -u demo.txt -fp "(^product)|(^category)"

Contributing

Feel free to fork the repository and submit pull-requests.

Support

Create new GitHub issue

Want to say thanks? :) Message me on Linkedin

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
tests		tests
uddup		uddup
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
demo-result.txt		demo-result.txt
demo.txt		demo.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

UDdup - URLs Deduplication Tool

Why do I need it?

Examples

Installation

With pip (Recommended)

Manual (from code)

Usage

More Usage Options

Filter Paths by Regex

Advanced Regex with multiple path filters

Contributing

Support

License

About

Uh oh!

Releases 4

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

rotemreiss/uddup

Folders and files

Latest commit

History

Repository files navigation

UDdup - URLs Deduplication Tool

Why do I need it?

Examples

Installation

With pip (Recommended)

Manual (from code)

Usage

More Usage Options

Filter Paths by Regex

Advanced Regex with multiple path filters

Contributing

Support

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages