-
Notifications
You must be signed in to change notification settings - Fork 13.3k
add pop() to HashSet etc.? #27804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @gankro |
Pop doesn't make sense for our HashMaps -- it would be an O(m/n) expected linear search where If you want a good BTree* could cleanly provide such functionality (pop_min, pop_max) |
I understand the desire to implement only efficient operations, but in that case my question is really how to get elements out of the set at all (without turning it into an iterator). O(m/n) doesn't sound too bad for me, btw. |
Just saw a reimplementation of this: http://jneem.github.io/regex-dfa/src/regex_dfa/dfa.rs.html ( |
For the BTreeSet it could be implemented in less than linear time, it's a much better alternative. |
Having just written code that needs some equivalent of [Although it's a little off-topic, I have doubts about the performance and memory trade-offs of a linked hashmap. A more cache-friendly technique for ordered maps was first described -- as far as I know -- by Raymond Hettinger. It was then independently implemented (and possibly reinvented) in PHP (who have a nice description of it) and PyPy (who have a slightly shorter but still interesting description). Still, none of this helps if one has been passed a |
I need one for whatever set this is possible to implement on. |
@gankro that depends on frequency of operation. Even though this particular operation may be slow if the operation is performed unfrequently HashSet may be still a better option overall. (And if you have a datatype which does not/cannot be |
second the pop on the HashSet. I see how drain( range ) is not a good idea but it's very painful right now to iterate over elements and then remove them in another pass performance-wise. |
I've wanted this a few times. If I wrote a PR for it is it likely to get merged? |
It needs to be a constant time (average) method, doesn't it? In that case it requires implementation changes in the hashmap and deciding that is full of tradeoffs. I think that if it could have been added without such major changes, you could just post a PR and we'd have it stable soon. |
I was assuming it would be O(m/n), but maybe it would be better to have some kind of iterator which allows removing elements from the set. eg. an iterator of |
The O(m/n) behaviour is a DOS footgun, I don't think it would be acceptable to ship. In the same vein I don't think this is important enough to possibly redesign hashmap to make efficient. |
How about shrinking the table when it is more empty then a given threshold? |
I would be interested in seeing an implementation of this in a PR. I don't necessarily agree with #27804 (comment) and #27804 (comment) that the implementation needs to be constant amortized time. Let's look at some benchmarks once there is an implementation to see how bad it gets. |
Agree with the need of a Imagine the following situation: you have a pool (set) of elements to handle, and while handling each, you may find yourself handling other elements, and would thus wish to remove those other encountered elements from the pool. # to_visit = set(...)
while to_visit: # while non-empty:
current_one = to_visit.pop()
do_slow_stuff_with(current_one)
if some_cond(current_one):
another_one = some_fast_function(current_one)
to_visit.discard(another_one) I have implemented such a method the most efficiently I could (no use std::hash::{
Hash,
BuildHasher,
};
use std::collections::HashSet;
/// Takes an arbitrary element from a `HashSet`, or None if empty
pub fn hashset_take_arbitrary<K, S> (
set: &mut HashSet<K, S>,
) -> Option<K>
where
K: Hash + Eq,
S: BuildHasher,
{
let key_ref = {
if let Some(key_ref) = set.iter().next() {
/* must hide the origin of this borrow ... */
unsafe { &*(key_ref as *const _) }
} else {
return None
}
};
/* ... so that we may be able to mutably borrow the set here
despite key_ref existence */
set.take(key_ref)
} That way we can iterate as expected with while let Some(current_one) = hashset_take_arbitrary(&mut to_visit) {
/* ... let another_one = ... */
(&mut to_visit).remove(&another_one);
} |
See also rust-lang/rfcs#1800. |
Yes please! Lots of worklist algorithms use a set, and repeatedly pop an arbitrary element. |
I kind of needed this for a BTreeSet, so I shamelessly re-appropriated your code, though with a BTree specific twist. Imagine you have a large BTreeSet, and you want to pop the value immediately before or after (or just whatever is there that is Eq or PartialEq to the value you specify) but you don't want to have to iterate through the entire set to get it. At least, I think this becomes O(log n) rather than O(n), but I'm not a computer scientist at all, so somebody correct me if I'm wrong. /// Pops the element immediately before the specified value
pub fn pop_before<K: Ord>(set: &mut BTreeSet<K>, value: &K) -> Option<K>
{
let key_ref = {
if let Some(key_ref) = set.range(..value).next_back() {
/* must hide the origin of this borrow ... */
unsafe { &*(key_ref as *const _) }
} else {
return None;
}
};
/* ... so that we may be able to mutably borrow the set here
despite key_ref existence */
set.take(key_ref)
}
/// Pops the element immediately after the specified value
pub fn pop_after<K: Ord>(set: &mut BTreeSet<K>, value: &K) -> Option<K>
{
let key_ref = {
if let Some(key_ref) = set.range(value..).next() {
/* must hide the origin of this borrow ... */
unsafe { &*(key_ref as *const _) }
} else {
return None;
}
};
/* ... so that we may be able to mutably borrow the set here
despite key_ref existence */
set.take(key_ref)
}
/// Pops the element equal to the specified value
pub fn pop<K>(set: &mut BTreeSet<K>, value: &K) -> Option<K>
where
K: Ord,
{
let key_ref = {
if let Some(key_ref) = set.range(..=value).next() {
/* must hide the origin of this borrow ... */
unsafe { &*(key_ref as *const _) }
} else {
return None;
}
};
/* ... so that we may be able to mutably borrow the set here
despite key_ref existence */
set.take(key_ref)
} |
This would still be useful in 2023! And as pointed out, shrinking the set in |
The problem is that you can't find an item in amortized constant time. |
If it's O(m/n) in capacity and length respectively, that looks like
constant time to me if m/n is bounded? Do you mean in case other operations
have removed many elements, rather than just `pop()`ing repeatedly?
…On Wed, 8 May 2024, 12:55 Tobias Bucher, ***@***.***> wrote:
This would still be useful in 2023! And as pointed out, shrinking the set
in pop() would ensure constant amortised time, right?
The problem is that you can't *find* an item in amortized constant time.
—
Reply to this email directly, view it on GitHub
<#27804 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7SILUVP4PQR7FESK253TLZBIHBRAVCNFSM4BNMBYD2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMJQGA2DAOBXGUYQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
In Python I've often found the
set.pop()
method useful which would remove and return a single arbitrary element of the set (or raise if empty). In Rust this should return anOption<T>
, of course.I haven't found such a method on
HashSet
, have I just not looked hard enough?The text was updated successfully, but these errors were encountered: