Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise and test IP2Country #1684

Open
wants to merge 16 commits into
base: dev
Choose a base branch
from
Open

Conversation

bemusementpark
Copy link

This PR moves geolite2_country_blocks_ipv4.csv out of assets and into source, and adds a gradle task ipToCode that generates some geolite2_country_blocks_ipv4.bin that is placed in generated assets. This file deduplicates IP address ranges, and stores IP addresses and country codes each as ints in binary, which should be close to optimal storage usage.

File size reduction appears to be about 12MB.

This PR removes the TreeMap and prefers primitive arrays, as the data is already sorted. This allows us to binary search over the array without additional overhead. Inserting ~100k entries into the TreeMap is an O ( n log n ) operation, as each insertion is O ( log n ) so we may as well avoid it. There's also good memory gains, not having an Entry a left, right and parent ref for every entry.

All up this improves the performance of this processing from about 3 seconds to about 1 second. The majority of that is from a database lookup in OnionApi#paths.

This PR adds a test IP2CountryTest.kt that checks some of the corner cases in the ip-country translation.

ipToCode task calls ipToCode.kts a kotlin script that reads the .csv and writes the .bin

What's next?

  • It'd make sense to inject IP2Country, and defer work until you open the PathActivity.
  • Optimise OnionApi#paths

@bemusementpark bemusementpark changed the base branch from master to dev October 8, 2024 05:36
@bemusementpark bemusementpark changed the title Optimise IP2Country Optimise and test IP2Country Oct 8, 2024
Copy link
Collaborator

@SessionHero01 SessionHero01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

Comment on lines 119 to 139
@OptIn(ExperimentalUnsignedTypes::class)
private fun UIntArray.fuzzyBinarySearch(target: UInt): Int? {
if (isEmpty()) return null

var low = 0
var high = size - 1

while (low <= high) {
val mid = (low + high) / 2
val midValue = this[mid]

when {
midValue == target -> return mid // Exact match found
midValue < target -> low = mid + 1 // Search in the right half
else -> high = mid - 1 // Search in the left half
}
}

// If no exact match, return the largest index with value <= target
return if (high >= 0) high else null
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the Kotlin's UIntArray.binarySearch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binarySearch is looking for an exact match or will return -1. This is searching for the "Price is Right" match. Best without going over... or the opposite. So we should get an answer for any given IP even if it wasn't explicitly provided in the list.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binarySearch doesn't simply return -1 though, it returns "where the item should have been put in order to maintain the order", in other words, it will give you info about the closest matched IP. Refer to the doc of binarySearch here:

the index of the element, if it is contained in the array within the specified range; otherwise, the inverted insertion point (-insertion point - 1). The insertion point is defined as the index at which the element should be inserted, so that the array (or the specified subrange of array) still remains sorted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, amazing, thanks @SessionHero01, done.

Copy link
Collaborator

@ThomasSession ThomasSession left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Looks like a great optimisation.
Might be worth checking the tests as something seem to have failed there.

@bemusementpark
Copy link
Author

bemusementpark commented Oct 9, 2024

Thanks @ThomasSession, There was some issue calling ipToCode.kts by command line. I moved the .kts to a .gradle.kts which looks safer, neater, more idiomatic calling it with apply. PTAL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants