-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use non cryptographic hash for unique ID instead of MD5 #169
Comments
Sounds good. Could this be configurable and controlled using the config? An idea would be: in the var QueuedIDGenerator QuidGenerator
type QuidGenerator func(clientID uint64) string
func defaultQueuedID(clientID uint64) string {
return fmt.Sprintf("%x", md5.Sum([]byte(string(time.Now().Unix())+string(clientID))))
}
// (add whatever other generator function here, eg murmur3)
// change queuedID to delegate to our desired function
func queuedID(clientID uint64) string {
return QueuedIDGenerator(clientID)
} and to the init() at the top of envelope.go, we add Then we can also modify api.go to add a convenience func that allows us set to whatever func we pass (but don't let it change it once the server is running) Then use the API call itself in server.go to set it to whatever value is in the config. How does that sound? |
oh, on second look - the value from queuedID() gets ignored in the Btw, would prefer to use a hasing function that doesn't introduce new external dependency. Which one to pick? https://golang.org/pkg/hash/#pkg-subdirectories |
I think, its wrong way. Need to hardcode faster hashing function and all. |
I agree, this not something anybody really wants to configure. We should just replace MD5 with xxHash and leave the current logic in place. Not introducing an external dependency is a fair reason not to do it though as there is not a non crypto hash function in the standard lib. |
And what is the problem of adding a new dependency? Anyway need to compile the executable file. |
@lord-alfred new dependencies can add technical debt. Also, since this package is made to be used as a library, it's better if the user of the library can decide on their own dependencies. (And maintain them) while keeping our dependencies as small as possible. @tegk please ignore my first comment. It appears that the queuedID function is actually somewhat redundant. The hashing has been delegated to some the backends, and you can see that the SQL processor sets its own queuedID value. So let's keep this issue open to fix this inconsistency. As for any performance issues, it would be great if we could start using pprof tool to generate some graphs/reports where we can identify any bottlenecks. Right now, indentifying without profiling is kind of like shooting in the dark. |
Here is a benchmark of hashing strings 3 to 254 characters long with md5 and xxhash. The change to xxHash would make sense in terms of latency (-84.8% less!) for sure.
The results also are confirmed when bench-marking the hash backed:
|
Not disagreeing with you there. Definitelly md5 is much slower.
Although, please note that in practice, it's only performed, in the extreme
case, say 1000 times a second. And that's only on a reasonably small
string. So it's probably not that much worry about.
…On Fri., 16 Aug. 2019, 01:32 Till Knuesting, ***@***.***> wrote:
``
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#169?email_source=notifications&email_token=AAE6MP4AGKJHCDZSB5CPUS3QEWAI5A5CNFSM4IKMCWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MJ5WA#issuecomment-521707224>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAE6MP5ZGEZ5NR2V2GBNCG3QEWAI5ANCNFSM4IKMCWGA>
.
|
You can use xor too and go REALLY fast.. doesn't mean you should though. At least maybe warn users? |
This value is actually just a filler because ideally a Message ID should be
created by the backed.. Indeed, it could be changed to something simple
and use things like the system date, client id and so on, no need for md5
there.
As for performance wise, It would be really good to use pprof to analyze
this (and other parts) in the future.
…On Fri, 21 Feb 2020, 01:01 h1z1, ***@***.***> wrote:
You can use xor too and go REALLY fast.. doesn't mean you should though.
At least maybe warn users?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#169?email_source=notifications&email_token=AAE6MP3QBXLTEH2IF2CBXP3RD2SL5A5CNFSM4IKMCWGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMO4OTY#issuecomment-589154127>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAE6MP2M3FXKKPJGSZNSHTDRD2SL5ANCNFSM4IKMCWGA>
.
|
I would like to discuss to replace md5 with a non cryptographic hash function like xxhash or murmur3 as they are much faster.
For example we use md5 to generate an ID for the queued envelop.
If we agree on the usage of xxhash I can send a pull request.
The text was updated successfully, but these errors were encountered: