Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lookup functions on BTreeMap/Set #49638

Open
parsonsmatt opened this issue Apr 4, 2018 · 18 comments
Open

New lookup functions on BTreeMap/Set #49638

parsonsmatt opened this issue Apr 4, 2018 · 18 comments
Labels
A-collections Area: `std::collection` C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@parsonsmatt
Copy link

parsonsmatt commented Apr 4, 2018

The current function get attempts to find an exact key match; if it fails, it returns None. I propose the addition of four variants:

  • get_lt finds the greatest (key, element) in the map/set that is less than the given key.
  • get_lte returns the lookup key and element in the map if present, otherwise returning the next smallest key and element.
  • get_gt finds the smallest (key, element) in the map/set that is greater than the given key.
  • get_gte looks up the key; if present, returns it and the element, if not, returns the next largest key and element.

The specific use case that prompted this:

I'm working on a toy Smalltalk implementation. One of the implementation methods is "given an object pointer, find the next object pointer that is an instance of of the class." Given a value instances: BTreeSet<Pointer>, the implementation is simply get_gt(obj_ptr).

@ExpHP
Copy link
Contributor

ExpHP commented Apr 4, 2018

Note that, if desperate, the API of BTreeSet is currently powerful enough for a user to implement these on their own:

use std::collections::{Bound, BTreeSet};
fn get_lt<'a, T: Ord>(set: &'a BTreeSet<T>, value: &T) -> Option<&'a T> {
    set.range((Bound::Unbounded, Bound::Excluded(value))).next_back()
}

fn get_le<'a, T: Ord>(set: &'a BTreeSet<T>, value: &T) -> Option<&'a T> {
    set.range((Bound::Unbounded, Bound::Included(value))).next_back()
}

fn get_gt<'a, T: Ord>(set: &'a BTreeSet<T>, value: &T) -> Option<&'a T> {
    set.range((Bound::Excluded(value), Bound::Unbounded)).next()
}

fn get_ge<'a, T: Ord>(set: &'a BTreeSet<T>, value: &T) -> Option<&'a T> {
    set.range((Bound::Included(value), Bound::Unbounded)).next()
}

fn main() {
    let set = vec![1, 4, 6].into_iter().collect::<BTreeSet<_>>();
    assert_eq!(get_lt(&set, &0), None);
    assert_eq!(get_lt(&set, &1), None);
    assert_eq!(get_lt(&set, &2), Some(&1));
    assert_eq!(get_lt(&set, &7), Some(&6));
    
    assert_eq!(get_le(&set, &0), None);
    assert_eq!(get_le(&set, &1), Some(&1));
    assert_eq!(get_le(&set, &2), Some(&1));
    assert_eq!(get_le(&set, &7), Some(&6));
    
    assert_eq!(get_gt(&set, &7), None);
    assert_eq!(get_gt(&set, &6), None);
    assert_eq!(get_gt(&set, &5), Some(&6));
    assert_eq!(get_gt(&set, &0), Some(&1));
    
    assert_eq!(get_ge(&set, &7), None);
    assert_eq!(get_ge(&set, &6), Some(&6));
    assert_eq!(get_ge(&set, &5), Some(&6));
    assert_eq!(get_ge(&set, &0), Some(&1));
}

(I only say this as a matter of fact; not meaning to imply that there is no point to adding the feature to std. My opinion on that is neutral)

@parsonsmatt
Copy link
Author

That's really cool! Thanks for sharing 😄 Does it have the same performance characteristics of a get?

@ExpHP
Copy link
Contributor

ExpHP commented Apr 5, 2018

In theory it should have the same performance characteristics, if the implementation of range is anything like I'd expect, but I confess I haven't tried it (and I am not invested enough in it to try and properly benchmark it; I'll leave that to somebody who has a real use case to test it on).

Even if it turns out that range isn't currently up to snuff optimization-wise, I would expect that it can still probably be fixed for this use case.

@pietroalbini pietroalbini added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. A-collections Area: `std::collection` labels Apr 5, 2018
@Amanieu
Copy link
Member

Amanieu commented Apr 6, 2018

Here's a better API:

/// Returns the highest element whose key is below the given bound.
fn upper_bound<Q>(&self, bound: Bound<&Q>) -> Option<&V>
    where
        K: Borrow<Q>,
        Q: Ord + ?Sized;

/// Returns the lowest element whose key is above the given bound.
fn lower_bound<Q>(&self, bound: Bound<&Q>) -> Option<&V>
    where
        K: Borrow<Q>,
        Q: Ord + ?Sized;

@parsonsmatt
Copy link
Author

That is a much improved API 😄

@mishra-prabhakar
Copy link

@rustbot claim

@rustbot rustbot self-assigned this Apr 25, 2020
@Alexendoo
Copy link
Member

Hi, are you still working on this issue @Prabhakar-17?

@mishra-prabhakar
Copy link

@Alexendoo sorry, didn't find time to work on this one. if someone else is willing to pick it up please go ahead otherwise I will have time next weekend.

@GodTamIt
Copy link

Looks like this still hasn't been implemented but was looking into similar functionality recently.

IIUC, given a key query K, its upper bound U and lower bound L, if K does not exist in the tree, you will always encounter the other bound (i.e. searching for U, you'll encounter L and vice versa) either in a non-leaf node in your search path or at the leaf node where your search terminates. If this is true, how do y'all feel about an API that returns something strictly less, equal, and strictly greater in one call?

// Could use a better name
pub struct Neighbors<'a, K: 'a, V: 'a> {
    pub lesser: Option<(&K, &V)>,
    pub equal: Option<(&K, &V)>,
    pub greater: Option<(&K, &V)>,
};

pub fn neighbors<Q>(&self, key: &Q) -> Neighbors<K, V>
    where
        K: Borrow<Q> + Ord,
        Q: Ord + ?Sized;

@ghost
Copy link

ghost commented Aug 22, 2021

@GodTamIt I'd love to have what you suggested in the problem I'm trying to solve. Otherwise, I'm forced to do 2 separate calls to get the lesser and the greater element respectively.

@GodTamIt
Copy link

Happy to work on an implementation if there isn't too much opposition to it here

@scottmcm
Copy link
Member

Hmm, a struct with three options makes me feel like its semantics are cluttered, though I don't see any obvious subsetting that would help.

I wonder if it could make sense to have some sort of cursor/finger API that would make it easy to get the next/previous elements from a position...

@GodTamIt
Copy link

GodTamIt commented Aug 24, 2021

I wonder if it could make sense to have some sort of cursor/finger API that would make it easy to get the next/previous elements from a position...

I thought about this kind of API but there were two main issues that would need to be sorted out:

  • If the cursor is intended to be able to seek backwards and forwards with no bounds, then it'd actually need to keep and manage state that indicates the path to the current leaf.
  • The semantics for the edge case where there is no element equal to the search key would need to be sorted out. Does it default to the element ordered strictly before or after? What happens when that fallback element does not exist (i.e. at the min or max elements)? Reference a sentinel node?

@Amanieu
Copy link
Member

Amanieu commented Aug 24, 2021

@ghost
Copy link

ghost commented Aug 31, 2021

I thought about this kind of API but there were two main issues that would need to be sorted out:

* If the cursor is intended to be able to seek backwards and forwards with no bounds, then it'd actually need to keep and manage state that indicates the path to the current leaf.

* The semantics for the edge case where there is no element equal to the search key would need to be sorted out. Does it default to the element ordered strictly before or after? What happens when that fallback element does not exist (i.e. at the min or max elements)? Reference a sentinel node?

Just tossing an idea: I wonder if the cursor API can be combined with the existing Entry API so that the Cursor will return something like the Entry enum. If the element with the search key doesn't exist, then a Vacant variant will be returned.

@Amanieu
Copy link
Member

Amanieu commented Aug 31, 2021

  • If the cursor is intended to be able to seek backwards and forwards with no bounds, then it'd actually need to keep and manage state that indicates the path to the current leaf.

That's not true: each node in the tree has a pointer to its parent, so you don't need to track this in the cursor itself.

  • The semantics for the edge case where there is no element equal to the search key would need to be sorted out. Does it default to the element ordered strictly before or after? What happens when that fallback element does not exist (i.e. at the min or max elements)? Reference a sentinel node?

I would recommend mirroring the API of intrusive_collections::RBTree, specifically these methods which all return cursors:

@dtolnay
Copy link
Member

dtolnay commented Jan 27, 2022

@rustbot release-assignment

@rustbot rustbot removed their assignment Jan 27, 2022
@terrorfisch
Copy link

For completeness: There was an ACP rust-lang/libs-team#141 and there is a nightly feature btree_cursors #107540.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collection` C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests