Skip to content

add pop() to HashSet etc.? #27804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
birkenfeld opened this issue Aug 13, 2015 · 23 comments
Open

add pop() to HashSet etc.? #27804

birkenfeld opened this issue Aug 13, 2015 · 23 comments
Labels
A-collections Area: `std::collections` C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@birkenfeld
Copy link
Contributor

In Python I've often found the set.pop() method useful which would remove and return a single arbitrary element of the set (or raise if empty). In Rust this should return an Option<T>, of course.

I haven't found such a method on HashSet, have I just not looked hard enough?

@alexcrichton
Copy link
Member

cc @gankro

@Gankra
Copy link
Contributor

Gankra commented Aug 13, 2015

Pop doesn't make sense for our HashMaps -- it would be an O(m/n) expected linear search where m is the capacity and n the len.

If you want a good pop you should use https://libraries.io/cargo/linked-hash-map

BTree* could cleanly provide such functionality (pop_min, pop_max)

@birkenfeld
Copy link
Contributor Author

I understand the desire to implement only efficient operations, but in that case my question is really how to get elements out of the set at all (without turning it into an iterator).

O(m/n) doesn't sound too bad for me, btw.

@birkenfeld
Copy link
Contributor Author

Just saw a reimplementation of this: http://jneem.github.io/regex-dfa/src/regex_dfa/dfa.rs.html (PopArbitrary at the top).

@bluss
Copy link
Member

bluss commented Aug 29, 2015

For the BTreeSet it could be implemented in less than linear time, it's a much better alternative.

@ltratt
Copy link
Contributor

ltratt commented Sep 12, 2015

Having just written code that needs some equivalent of pop on an arbitrary HashMap, I'd like to second this proposal. Since HashMap is used pervasively, sometimes that is simply what is one has to deal with, and using an alternative data-structure (e.g. BTreeMap) isn't possible. At the moment, the only way of doing this is via something like M.iter().next().unwrap().clone() followed by M.remove(...) which is ugly, inefficient, and obfuscatory. I imagine the same functionality is needed by HashSet and the like, though I see HashMap used much more often.

[Although it's a little off-topic, I have doubts about the performance and memory trade-offs of a linked hashmap. A more cache-friendly technique for ordered maps was first described -- as far as I know -- by Raymond Hettinger. It was then independently implemented (and possibly reinvented) in PHP (who have a nice description of it) and PyPy (who have a slightly shorter but still interesting description). Still, none of this helps if one has been passed a HashMap.]

@nagisa
Copy link
Member

nagisa commented May 10, 2016

I need one for whatever set this is possible to implement on.

@uzytkownik
Copy link

@gankro that depends on frequency of operation. Even though this particular operation may be slow if the operation is performed unfrequently HashSet may be still a better option overall.

(And if you have a datatype which does not/cannot be Clone you cannot roll this primitive on your own.)

@przygienda
Copy link

second the pop on the HashSet. I see how drain( range ) is not a good idea but it's very painful right now to iterate over elements and then remove them in another pass performance-wise.

@bluss
Copy link
Member

bluss commented Oct 31, 2016

@ltratt ordermap now exists as a Rust implementation of that idea. It's got an efficient .pop() for this reason.

@steveklabnik steveklabnik added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-libs labels Mar 24, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-feature-request Category: A feature request, i.e: not implemented / a PR. label Jul 22, 2017
@canndrew
Copy link
Contributor

I've wanted this a few times. If I wrote a PR for it is it likely to get merged?

@bluss
Copy link
Member

bluss commented Oct 24, 2017

It needs to be a constant time (average) method, doesn't it? In that case it requires implementation changes in the hashmap and deciding that is full of tradeoffs.

I think that if it could have been added without such major changes, you could just post a PR and we'd have it stable soon.

@canndrew
Copy link
Contributor

I was assuming it would be O(m/n), but maybe it would be better to have some kind of iterator which allows removing elements from the set. eg. an iterator of OccupiedEntry. That would be more versatile and would sometimes be more efficient to use.

@Gankra
Copy link
Contributor

Gankra commented Oct 26, 2017

The O(m/n) behaviour is a DOS footgun, I don't think it would be acceptable to ship. In the same vein I don't think this is important enough to possibly redesign hashmap to make efficient.

@ranma42
Copy link
Contributor

ranma42 commented Oct 26, 2017

How about shrinking the table when it is more empty then a given threshold?
This would ensure that O(m/n) is not that bad.
(AFAICT even shrinking only in pop should work from an amortised cost perspective)

@dtolnay dtolnay added C-feature-accepted Category: A feature request that has been accepted pending implementation. and removed C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Nov 18, 2017
@dtolnay
Copy link
Member

dtolnay commented Nov 18, 2017

I would be interested in seeing an implementation of this in a PR. I don't necessarily agree with #27804 (comment) and #27804 (comment) that the implementation needs to be constant amortized time. Let's look at some benchmarks once there is an implementation to see how bad it gets.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Jul 19, 2018

Agree with the need of a take_arbitrary method.

Imagine the following situation: you have a pool (set) of elements to handle, and while handling each, you may find yourself handling other elements, and would thus wish to remove those other encountered elements from the pool.
You cannot simply iterate over the pool, since the body will attempt to mutate the pool, you must then iterate manually.
Here is an example in Python pseudo-code:

# to_visit = set(...)
while to_visit:  # while non-empty:
    current_one = to_visit.pop()
    do_slow_stuff_with(current_one)
    if some_cond(current_one):
        another_one = some_fast_function(current_one)
        to_visit.discard(another_one)

I have implemented such a method the most efficiently I could (no Copy or Clone whatsoever), but sadly it has required the use of an unsafe block to hide the origin of a reference to the compiler, since it implies to mutably borrow the set while having a shared/borrowed value from that very set:

use std::hash::{
    Hash,
    BuildHasher,
};
use std::collections::HashSet;

/// Takes an arbitrary element from a `HashSet`, or None if empty
pub fn hashset_take_arbitrary<K, S> (
    set: &mut HashSet<K, S>,
) -> Option<K>
where
    K: Hash + Eq,
    S: BuildHasher,
{
    let key_ref = {
        if let Some(key_ref) = set.iter().next() {
            /* must hide the origin of this borrow ... */
            unsafe { &*(key_ref as *const _) }
        } else {
            return None
        }
    };
    /* ... so that we may be able to mutably borrow the set here
       despite key_ref existence */
    set.take(key_ref)
}

That way we can iterate as expected with

while let Some(current_one) = hashset_take_arbitrary(&mut to_visit) {
  /* ... let another_one = ... */
  (&mut to_visit).remove(&another_one);
}

@schulzch
Copy link
Contributor

See also rust-lang/rfcs#1800.

@vext01
Copy link
Contributor

vext01 commented Mar 20, 2019

Yes please! Lots of worklist algorithms use a set, and repeatedly pop an arbitrary element.

@1Dragoon
Copy link

Agree with the need of a take_arbitrary method.

I kind of needed this for a BTreeSet, so I shamelessly re-appropriated your code, though with a BTree specific twist. Imagine you have a large BTreeSet, and you want to pop the value immediately before or after (or just whatever is there that is Eq or PartialEq to the value you specify) but you don't want to have to iterate through the entire set to get it. At least, I think this becomes O(log n) rather than O(n), but I'm not a computer scientist at all, so somebody correct me if I'm wrong.

/// Pops the element immediately before the specified value
pub fn pop_before<K: Ord>(set: &mut BTreeSet<K>, value: &K) -> Option<K>
{
    let key_ref = {
        if let Some(key_ref) = set.range(..value).next_back() {
            /* must hide the origin of this borrow ... */
            unsafe { &*(key_ref as *const _) }
        } else {
            return None;
        }
    };
    /* ... so that we may be able to mutably borrow the set here
    despite key_ref existence */
    set.take(key_ref)
}

/// Pops the element immediately after the specified value
pub fn pop_after<K: Ord>(set: &mut BTreeSet<K>, value: &K) -> Option<K>
{
    let key_ref = {
        if let Some(key_ref) = set.range(value..).next() {
            /* must hide the origin of this borrow ... */
            unsafe { &*(key_ref as *const _) }
        } else {
            return None;
        }
    };
    /* ... so that we may be able to mutably borrow the set here
    despite key_ref existence */
    set.take(key_ref)
}

/// Pops the element equal to the specified value
pub fn pop<K>(set: &mut BTreeSet<K>, value: &K) -> Option<K>
where
    K: Ord,
{
    let key_ref = {
        if let Some(key_ref) = set.range(..=value).next() {
            /* must hide the origin of this borrow ... */
            unsafe { &*(key_ref as *const _) }
        } else {
            return None;
        }
    };
    /* ... so that we may be able to mutably borrow the set here
    despite key_ref existence */
    set.take(key_ref)
}

@fish-face
Copy link

This would still be useful in 2023! And as pointed out, shrinking the set in pop() would ensure constant amortised time, right?

@tbu-
Copy link
Contributor

tbu- commented May 8, 2024

This would still be useful in 2023! And as pointed out, shrinking the set in pop() would ensure constant amortised time, right?

The problem is that you can't find an item in amortized constant time.

@fish-face
Copy link

fish-face commented May 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collections` C-feature-accepted Category: A feature request that has been accepted pending implementation. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests