Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would it be better to support concurrent insert? #18

Open
seaguest opened this issue Nov 2, 2018 · 5 comments
Open

Would it be better to support concurrent insert? #18

seaguest opened this issue Nov 2, 2018 · 5 comments

Comments

@seaguest
Copy link

seaguest commented Nov 2, 2018

Hello,

If we have huge data to process, we may need a lot insert, but I didn't see any mutex in code, would it be better to support concurrent insert, or is there any other reason for not?

@seiflotfy
Copy link
Owner

I'd have to set a mutex into the insertion call, it might as well be done by the caller... WDYT?

@seaguest
Copy link
Author

I think it would be better to integrate the mutex inside the filter to make it a complete module, with which people can use without dealing with concurrent write problem.
Anyway, it's just a matter of choice.

@mholt
Copy link
Contributor

mholt commented Jan 11, 2019

It'd be interesting to see what the performance tradeoff is; some applications may not need thread safety, and a mutex might slow them down? I dunno.

Anyway, here's my solution; it's very simple and elegant:

type concurrentCuckoo struct {
	*sync.Mutex
	*cuckoo.Filter
}

cf := concurrentCuckoo{
	Filter: cuckoo.NewFilter(1000000)
	Mutex:  new(sync.Mutex)
}

cf.Lock()
cf.InsertUnique([]byte("hello!"))
cf.Unlock()

Side-note: It would be nice to reduce typing repetition in this package a bit. I'll submit another issue.

@mholt mholt mentioned this issue Jan 11, 2019
@mark-kubacki
Copy link

FYI, a mutex will be a congestion point on HCC machines (like, 12 cores and up).

I'd leave the choice of guard to the data structure user, and not integrate it.

@cben
Copy link

cben commented Dec 1, 2020

Is it sufficient to do mutex only on writes? Are reads safe while insert is potentially shuffling things?
Or does this need a read-write lock to allow concurrent reads when not doing writes?

(if leaving locking to callers, this is a point that's good to document)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants