GitHub - vapescherov/unique-ip-counter

Task

You have a simple text file with IPv4 addresses. One line is one address, line by line:

145.67.23.4
8.34.5.23
89.54.3.124
89.54.3.124
3.45.71.5
...

The file is unlimited in size and can occupy tens and hundreds of gigabytes.

You should calculate the number of unique addresses in this file using as little memory and time as possible. There is a "naive" algorithm for solving this problem (read line by line, put lines into HashSet). It's better if your implementation is more complicated and faster than this naive algorithm.

Before submitting an assignment, it will be nice to check how it handles this file. Attention - the file weighs about 20Gb, and unzips to about 120Gb.

Solution

Avoid creating String objects when parsing file to reduce garbage collection overhead.
Use efficient representation of IPv4. (primitive int instead of String)
Utilize a BitSet to store IPs efficiently. (1 bit * 2³² (all possible values) = 512 MB)

I attempted to abstract the reading logic to return an IntStream (IntStreamingCounter). However, this approach was a bit slower compared to reading the file within a while loop (ByteBufferUniqueIPCounter).

Usage

mvn clean package
java -jar ./application/target/application-1.0.jar ./ip_addresses

Potential improvements

If the file content is guaranteed to be valid, eliminate exception checks to improve performance.
The CPU is a bottleneck in the current implementation, so we can improve performance by parallelizing file processing using multiple threads.
To fully benefit from multithreading, efficient thread-safe version of IntSet is needed.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
application		application
benchmark		benchmark
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Task

Solution

Usage

Potential improvements

About

Releases

Packages

Languages

vapescherov/unique-ip-counter

Folders and files

Latest commit

History

Repository files navigation

Task

Solution

Usage

Potential improvements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages