RFC: Add item recovery collection APIs #1194

apasel422 · 2015-07-08T16:20:17Z

Gankra · 2015-07-08T16:35:15Z

CC @cmr @eddyb @bluss @seppo0010 (can't remember all the people interested...)

pnkfelix · 2015-07-09T19:04:47Z

text/0000-collection-recovery.md

+
+# Alternatives
+
+Do nothing.


hold on, to be clear: "Do nothing" here means "Do nothing and let users write such caches via, e.g., HashMap<T, ()> " ... right?

I don't particular mind adding the functionality described here to HashSet, but I'm also not sure its strictly necessary, unless I have missed something with how HashMap<T, ()> would work.

Update: Ah, re-reading the RFC, I now see that our current HashMap API would not support that

It's not possible to use HashMap that way, because it doesn't provide any methods that return &K (or K) other than via its iterators.

blaenk · 2015-07-16T20:21:44Z

It'd be cool if you could add another copy of the code that demonstrates the problem but showing how to use your proposed APIs to resolve it.

apasel422 · 2015-07-16T21:13:34Z

@blaenk I'd actually like to use a more concrete motivating example, but I'll add the revised code as well.

apasel422 · 2015-07-17T11:34:45Z

@gankro Do you have any ideas for a better motivating example, like an algorithm that uses a set as a cache?

eddyb · 2015-07-17T11:44:55Z

@apasel422 that's the usecase in the compiler: sets of hundreds of thousands of elements, used for interning/caching, that have to be identity maps right now, wasting some memory space.

benaryorg · 2015-07-17T21:02:32Z

@eddyb Does that mean, if this RFC is implemented, the compiler can be tweaked to use less memory?

apasel422 · 2015-07-19T23:17:10Z

I've added a WIP implementation of this RFC for BTreeMap at https://github.com/apasel422/rust/tree/rfc-1194.

Gankra · 2015-07-20T02:15:23Z

@apasel422 It could be argued by metaphor to the current naming in #1195 that these methods could just called get_eq, remove_eq, etc...

Gankra · 2015-07-20T02:17:12Z

This is a bit more dubious for HashMap; but not crazy.

apasel422 · 2015-07-20T02:23:37Z

@gankro I actually had that same thought a little while ago, but I don't think it fully translates, and would be even weirder for entries:

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    fn get(&self) -> &V;
    fn get_eq(&self) -> (&K, &V); // what does eq have to do with this?
    ...
}

{key_value, key_value_mut, remove_key_value} are quite verbose, but have parallel set names {element, remove_element}. {kv, kv_mut, remove_kv} are more succinct, but don't have an obvious set version.

Gankra · 2015-07-20T02:34:57Z

Hmm... Entry does seem to mess things up. That said... is it a tragedy if it's a bit misaligned from everything else?

apasel422 · 2015-07-20T11:34:55Z

I think they should be consistent.

Here are some options that work for both maps and occupied entries (in addition to Map::replace, which I think is non-controversial):

{kv, kv_mut, remove_kv} with VacantEntry::insert_kv
{get_kv, get_kv_mut, remove_kv} with VacantEntry::insert_kv
{key_val, key_val_mut, remove_key_val} with VacantEntry::insert_key_val
{key_value, key_value_mut, remove_key_value} with VacantEntry::insert_key_value

And here are some options for sets (assuming that the changes in rust-lang/rust#27135 canonicalize "element" over "value" when referring to sets):

{elem, remove_elem}
{element, remove_element}

i30817 · 2015-07-23T15:02:51Z

One common optimization that can't be done in java because of set item recovery is to just store hashes instead of elements, for the case where identity-mapping is not desirable. Are you going to give up this special case?

apasel422 · 2015-07-23T15:07:42Z

@i30817 Rust's HashMap and HashSet already prevent that optimization: The keys (elements) must implement Eq in addition to Hash, and the types provide iterators over their contents. A hypothetical map or set providing that optimization could omit these operations, but would also have to omit iteration. In any case, just storing hashes seems more like a bloom filter than a hash table, due to the absence of a check on Eq.

i30817 · 2015-07-23T15:10:28Z

Mmm makes sense. Still, it's a somewhat common optimization, maybe a bloom filter type could be added to the language.

apasel422 · 2015-07-23T15:14:23Z

@i30817 Off-topic, but [https://crates.io/search?q=bloom filter](https://crates.io/search?q=bloom filter)

shepmaster · 2015-07-26T14:44:46Z

I'm definitely in favor of the ideas laid out here. I have the same motivating problem - a cache of strings. I've used some unsafe code to avoid double-allocating the strings, but I still have essentially a HashMap<&str, &str>.

bkoropoff · 2015-07-28T03:08:20Z

How would you feel about modifying OccupiedEntry to hold on to the key passed to entry so you can recover it?

apasel422 · 2015-07-28T11:03:34Z

@bkoropoff That could be done, but it has the problem that a new key is only present for OccupiedEntrys that are obtained through Map::entry. The theoretical {max_entry, min_entry, ...} provided by RFC #1195 will not have a new key to return.

I'm therefore more inclined to add that kind of key-recovery functionality as

pub enum Entry<'a, K: 'a, V: 'a> {
    Occupied(OccupiedEntry<'a, K, V>, K),
    Vacant(VacantEntry<'a, K, V>),
}

instead of

pub struct OccupiedEntry<'a, K: 'a, V: 'a> {
    new_key: K,
    // ...
}

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    /// Returns the key that was used to acquire this entry.
    // This could return `Option<K>` in order to better model the `max_entry` situation
    pub fn into_new_key(self) -> K { self.key }
}

but that would not be a backwards-compatible change. Additionally, storing the new key in the struct itself has the benefit of allowing us to provide an additional OccupiedEntry::replace_key method that updates the new key and returns the old one, much as Map::replace would:

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    /// Replaces the entry's key with the one that was used to acquire this entry, if any, and
    /// returns the old key.
    ///
    /// This method always return `None` after the first call to it and for all entries
    /// acquired through `max_entry` etc.
    pub fn replace_key(&mut self) -> Option<K>;
}

This adds some complexity to the API surface and makes it harder to reason about what the behavior is. It's possible that we could add what you're proposing in a subsequent RFC instead.

Gankra · 2015-07-29T21:31:00Z

🔔 HERE YE HERE YE THIS RFC IS ENTERING ITS FINAL COMMENT PERIOD 🔔

aturon · 2015-08-07T16:19:10Z

Sorry to be late to this party (I also had to miss the libs meeting this week). I'm on board with the basic motivation here, and regret the stabilization of the bool-centric methods on sets. As @gankro said, the team decided to go forward with this RFC, modulo some bikeshedding.

That said, I feel like the RFC is proposing significantly more API expansion than is actually needed to solve the original problem -- in particular, I don't see why any changes to the entry API are needed. Could we instead take the following as a starting point (bikesheds painted in my favorite colors):

impl<T> Set<T> {
    // Like `contains`, but returns a reference to the element if the set contains it.
    fn get<Q: ?Sized>(&self, element: &Q) -> Option<&T>;

    // Like `remove`, but returns the element if the set contained it.
    fn take<Q: ?Sized>(&mut self, element: &Q) -> Option<T>;

    // Like `insert`, but replaces the element with the given one and returns the previous element
    // if the set contained it.
    fn replace(&mut self, element: T) -> Option<T>;
}

impl<K, V> Map<K, V> {
    // Like `get`, but additionally returns a reference to the entry's key.
    fn key_value<Q: ?Sized>(&self, key: &Q) -> Option<(&K, &V)>;

    // Like `get_mut`, but additionally returns a reference to the entry's key.
    fn key_value_mut<Q: ?Sized>(&mut self, key: &Q) -> Option<(&K, &mut V)>;

    // Like `remove`, but additionally returns the entry's key.
    fn remove_key_value<Q: ?Sized>(&mut self, key: &Q) -> Option<(K, V)>;

    // Like `insert`, but additionally replaces the key with the given one and returns the previous
    // key and value if the map contained it.
    fn replace(&mut self, key: K, value: V) -> Option<(K, V)>;
}

In particular, the fact that the entry APIs need an owned key to use (today, at least) seems to make the key-accessing functionality questionable. But maybe I'm missing something?

apasel422 · 2015-08-07T21:52:59Z

@aturon We will presumably want the entry methods once #1195 is accepted, but they could be omitted for now. I think that both RFCs need to be considered together though, and it probably makes sense to avoid a proliferation of {get, get_mut, remove, and entry} methods when just {get, entry} suffice. We don't need to add take or key_value_mut on Map directly if we just add them to OccupiedEntry now, and this will benefit the queries added in #1195 as well. For example:

impl<K, V> Map<K, V> {
    fn get_pair(&self, key: &Q) -> Option<(&K, &V)>;
    fn get_entry(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>;
    fn replace(&mut self, key: K, val: V) -> Option<(K, V)>;

    fn get_max(&self) -> Option<(&K, &V)>;
    fn max_entry(&mut self) -> Option<OccupiedEntry<K, V>>>;

    fn get_lt(&self, key: &Q) -> Option<(&K, &V)>;
    fn lt_entry(&mut self, key: &Q) -> Option<OccupiedEntry<K, V>>>;

    // get_* and *_entry for le, ge, gt, min
}

impl<'a, K, V> OccupiedEntry<'a, K, V> {
    fn pair(&self) -> (&K, &V);
    fn pair_mut(&mut self) -> (&K, &mut V);
    fn into_pair_mut(self) -> (&'a K, &'a mut V);
    fn take(self) -> (K, V);
}

alexcrichton · 2015-08-12T23:42:08Z

The libs team discussed this RFC today, and our conclusion was that it may be best to hone this down to what's precisely necessary to satisfy the motivation in the outset. To that end would it be possible to only include the set methods? Specifically:

impl<T> Set<T> {
    fn get<Q: ?Sized>(&self, element: &Q) -> Option<&T>;
    fn take<Q: ?Sized>(&mut self, element: &Q) -> Option<T>;
    fn replace(&mut self, element: T) -> Option<T>;
}

Gankra · 2015-08-12T23:45:02Z

Specifically, I believe the supporting map methods were also decided to just be doc(hidden) and unstable.

alexcrichton · 2015-08-12T23:55:25Z

I'd personally prefer the methods to be freestanding in the module so they're private to the outside world but public to the crate rather than having them in the inherent API at all.

Gankra · 2015-08-13T00:01:54Z

@alexcrichton How does that work? Privacy can only reach up, and not down or sideways. Maps and Sets are defined in sibling modules.

sfackler · 2015-08-13T00:23:31Z

They could be defined in a crate private trait. That's how I've gotten
around visibility issues before.

On Wed, Aug 12, 2015, 8:01 PM Alexis Beingessner [email protected]
wrote:

@alexcrichton https://github.com/alexcrichton How does that work?
Privacy can only reach up, and not down or sideways. Maps and Sets are
defined in sibling modules.

—
Reply to this email directly or view it on GitHub
#1194 (comment).

Gankra · 2015-08-20T18:30:49Z

@apasel422 Can you amend the RFC to be minimal per aturon's request? I think we're good to go when that's done.

seppo0010 · 2015-08-20T18:43:42Z

I don't understand the motivation to allow item recover from a Set but not key recover from a Map.

I was actually expecting that feature to move items from one Map to another without cloning its keys.

SimonSapin · 2015-08-20T20:43:36Z

Same here, the use case that got me here was with a HashMap, not a set.

eddyb · 2015-08-20T21:35:10Z

@seppo0010 The usecase I hit that needed the feature for Set was sort of a cache.
One quick example would be HashSet<Rc<str>>, you want to be able to find an existing key with &str and clone the Rc handle.
To get closer to the compiler, HashSet<&'arena str> could be used as a cache, queried with a temporary &str, that would get copied on the arena if not found in the set.

alexcrichton · 2015-08-27T00:01:44Z

@apasel422 ping about the RFC update, would love to merge!

apasel422 · 2015-08-27T13:26:42Z

@alexcrichton I haven't updated yet because it seems like there's still some dissent, based on the last few comments.

shepmaster · 2015-08-27T13:55:10Z

As another voice, only having it on sets would be acceptable for me. I am in the same boat as @eddyb — a cache.

apasel422 · 2015-08-27T14:01:58Z

@alexcrichton Updated.

alexcrichton · 2015-08-27T20:49:54Z

Ok, thanks @apasel422! The consensus of the libs team is that this is a great step forward for sets and we can continue to explore the problem space for maps as the needs arise, but it seems like the most pressing parts to work with are sets today.

And of course, thanks again for the RFC @apasel422!

rpjohnst · 2017-10-03T07:25:25Z

I find myself needing this for maps.

In my case, I am building a string interner using a HashMap<Box<str>, u32> where the u32s track the order of insertion. I use this to determine whether a string is in one of a few fixed sets of symbols by doing range checks on its corresponding u32.

The OrderMap crate provides this part of the API under the name get_pair, so that name now has some precedent.

Alternatively, it would be useful for the Entry API to provide a reference to the key as well as the value after an or_insert/or_insert_with-like operation. This might be the better option since it allows the lookup to be reused when inserting new entries.

What is the best route forward here? Should I write up a new RFC?

…sert-with-iterator-to-last-inserted function ( see rust-lang/rfcs#1194 )

fschutt · 2017-12-16T03:01:25Z

Well, I needed this for a function where I insert into a set, but then I immediately want an iterator to that last, inserted element in the set. Since a BTreeSet is just a BTreeMap internally, this should be possible.

The application is a scanline algorithm, where the set consists out of ordered points. I need to insert a point into a scanline and then know where it has been inserted (the position), so that I can construct an iterator to the next and previous point in the (ordered) scanline.

So for now I've forked the std::collections::BTreeSet. Watching this for when it eventually gets merged.

create collection recovery RFC

320ad8e

Gankra self-assigned this Jul 8, 2015

Gankra added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jul 8, 2015

pnkfelix reviewed Jul 9, 2015
View reviewed changes

Gankra mentioned this pull request Jul 11, 2015

Add hash_map::Entry.or_insert_with_key() method #1202

Closed

Gankra mentioned this pull request Jul 17, 2015

B+ Tree in BTreeMap rust-lang/rust#27090

Closed

apasel422 mentioned this pull request Jul 20, 2015

VacantEntry should provide an accessor for the key rust-lang/rust#18323

Closed

apasel422 added 2 commits July 20, 2015 16:24

s/item/element/ and move VacantEntry stuff into details section

e1a90c3

s/functionailty/functionality/

ac347d1

apasel422 mentioned this pull request Jul 23, 2015

Add HashMap::get_pair? rust-lang/rust#27237

Closed

update for libs team changes

5442de0

rename RFC

b8648e4

apasel422 mentioned this pull request Aug 27, 2015

implement RFC 1194 rust-lang/rust#28043

Merged

alexcrichton mentioned this pull request Aug 27, 2015

Set recovery methods rust-lang/rust#28050

Closed

alexcrichton merged commit b8648e4 into rust-lang:master Aug 27, 2015

apasel422 deleted the collection-recovery branch August 28, 2015 02:20

apasel422 mentioned this pull request Oct 16, 2015

RFC: get collection keys #1175

Closed

apasel422 mentioned this pull request Jan 18, 2016

HashSet should have a get function #691

Closed

apasel422 mentioned this pull request May 13, 2016

HashMap::extend_with() to handle collisions rust-lang/rust#33618

Closed

mbrubeck unassigned Gankra Apr 17, 2017

fschutt added a commit to fschutt/polyclip that referenced this pull request Dec 16, 2017

Added custom fork of std::collections::BTreeSet because of missing in…

e53890b

…sert-with-iterator-to-last-inserted function ( see rust-lang/rfcs#1194 )

Centril added the A-collections Proposals about collection APIs label Nov 23, 2018

RFC: Add item recovery collection APIs #1194

RFC: Add item recovery collection APIs #1194

Conversation

apasel422 commented Jul 8, 2015 • edited by mbrubeck Loading

Gankra commented Jul 8, 2015

pnkfelix Jul 9, 2015

Choose a reason for hiding this comment

apasel422 Jul 9, 2015

Choose a reason for hiding this comment

blaenk commented Jul 16, 2015

apasel422 commented Jul 16, 2015

apasel422 commented Jul 17, 2015

eddyb commented Jul 17, 2015

benaryorg commented Jul 17, 2015

apasel422 commented Jul 19, 2015

Gankra commented Jul 20, 2015

Gankra commented Jul 20, 2015

apasel422 commented Jul 20, 2015

Gankra commented Jul 20, 2015

apasel422 commented Jul 20, 2015

i30817 commented Jul 23, 2015

apasel422 commented Jul 23, 2015

i30817 commented Jul 23, 2015

apasel422 commented Jul 23, 2015

shepmaster commented Jul 26, 2015

bkoropoff commented Jul 28, 2015

apasel422 commented Jul 28, 2015

Gankra commented Jul 29, 2015

aturon commented Aug 7, 2015

apasel422 commented Aug 7, 2015

alexcrichton commented Aug 12, 2015

Gankra commented Aug 12, 2015

alexcrichton commented Aug 12, 2015

Gankra commented Aug 13, 2015

sfackler commented Aug 13, 2015

Gankra commented Aug 20, 2015

seppo0010 commented Aug 20, 2015

SimonSapin commented Aug 20, 2015

eddyb commented Aug 20, 2015

alexcrichton commented Aug 27, 2015

apasel422 commented Aug 27, 2015

shepmaster commented Aug 27, 2015

apasel422 commented Aug 27, 2015

alexcrichton commented Aug 27, 2015

rpjohnst commented Oct 3, 2017

fschutt commented Dec 16, 2017

apasel422 commented Jul 8, 2015 •

edited by mbrubeck

Loading