Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added operator[] support. #4

Closed
wants to merge 1 commit into from
Closed

Added operator[] support. #4

wants to merge 1 commit into from

Conversation

ylow
Copy link
Contributor

@ylow ylow commented Sep 8, 2014

  • It is documented that find() and insert() are faster, but this exists
    mainly for convenience.

  • The performance of operator[] is basically just as fast as calling
    find/insert for queries and insertions. Update may be slower since updates
    are performed by

      // a unified update/insert will be nice. upsert is not doing
      // anything particularly different from this.
      // and this is currently biased towards inserting a new value
      // than updating an existing value
      while (!owner->insert(key, value) &&
             !owner->update(key, value));
    

    Note that this implementation here is not exactly STL compliant.
    To maintain performance and avoid hitting the hash table too many
    times, The reference object is lazy. In other words,

    • operator[] does not actually perform an insert. It returns a
      reference object pointing to the requested key.
    • On table[i] = val // reference::operator=(mapped_type)
      an insert / update is called, and the reference becomes eager.
    • On val = table[i] // operator mapped_type()
      an find / insert is called and the reference becomes eager.
    • On table[i](i.e. no operation performed), the destructor is called
      immediately (reference::~reference()) in which case
      insert(key, mapped_type()) is called.

    Thus, with normal usage, this should behave pretty much exactly like
    a regular reference. However, where issues might occur is when the
    lifetime of the reference exceeds its usual lifespan.

    auto i = table[i]

    in which case the lifetime of reference object is extended beyond
    what we would like it to be causing issues. To avoid this issue,
    the above is banned. By making the default constructor,
    copy constructor, and assignment operator of the reference object
    private, the above will cause a compilation error.

    Though that means that annoyingly,

    table[i] = table[j]

    will cause a compilation error.

  • Having a fused find/insert and a fused insert/update will help make
    the operator[] implementation be much nicer avoiding the silly lazy
    reference issue.

    i.e. what will be nice to have is:

    // returns mapped value if key exists, or if key does not exist,
    // inserts (key, value) and return value.
    // equivalent to:
    // \code
    // while(1) {
    // mapped_type qval;
    // if (find(key, qval)) return qval;
    // else if (insert(key, value)) return value;
    // }
    // \endcode
    mapped_type find_or_insert(key_type key, mapped_type value)

    and

    // Equivalent to:
    // \code
    // while (!owner->insert(key, value) &&
    // !owner->update(key, value));
    // \endcode
    //
    // or,
    //
    // \code
    // upsert(key, [](auto i){return i;}, value);
    // \endcode

    (though the upsert is not exactly faster than the while loop)
    mapped_type insert_or_update(key_type key, mapped_type value)

  • Changed the update functions to templatize around the Updater. This
    is generally faster since that permits lambda to be inlined rather type
    erased through the std::function.

 - It is documented that find() and insert() are faster, but this exists
 mainly for convenience.

 - The performance of operator[] is basically just as fast as calling
 find/insert for queries and insertions. Update may be slower since updates
 are performed by

        // a unified update/insert will be nice. upsert is not doing
        // anything particularly different from this.
        // and this is currently biased towards inserting a new value
        // than updating an existing value
        while (!owner->insert(key, value) &&
               !owner->update(key, value));

 Note that this implementation here is not exactly STL compliant.
 To maintain performance and avoid hitting the hash table too many
 times, The reference object is *lazy*. In other words,

  - operator[] does not actually perform an insert. It returns a
    reference object pointing to the requested key.
  - On table[i] = val // reference::operator=(mapped_type)
    an insert / update is called, and the reference becomes eager.
  - On val = table[i] // operator mapped_type()
    an find / insert is called and the reference becomes eager.
  - On table[i] (i.e. no operation performed), the destructor is called
    immediately (reference::~reference()) in which case
    insert(key, mapped_type()) is called.

 Thus, with normal usage, this should behave pretty much exactly like
 a regular reference. However, where issues might occur is when the
 lifetime of the reference exceeds its usual lifespan.

 auto i = table[i]

 in which case the lifetime of reference object is extended beyond
 what we would like it to be causing issues. To avoid this issue,
 the above is banned. By making the default constructor,
 copy constructor, and assignment operator of the reference object
 private, the above will cause a compilation error.

 Though that means that annoyingly,

 table[i] = table[j]

 will cause a compilation error.

 - Having a fused find/insert and a fused insert/update will help make
 the operator[] implementation be *much* nicer avoiding the silly lazy
 reference issue.

 i.e. what will be nice to have is:

   // returns mapped value if key exists, or if key does not exist,
   // inserts (key, value) and return value.
   // equivalent to:
   // \code
   //   while(1) {
   //     mapped_type qval;
   //     if (find(key, qval)) return qval;
   //     else if (insert(key, value)) return value;
   //   }
   // \endcode
   mapped_type find_or_insert(key_type key, mapped_type value)

  and

  // Equivalent to:
  // \code
  //  while (!owner->insert(key, value) &&
  //         !owner->update(key, value));
  // \endcode
  //
  // or,
  //
  // \code
  // upsert(key, [](auto i){return i;}, value);
  // \endcode

  (though the upsert is not exactly faster than the while loop)
  mapped_type insert_or_update(key_type key, mapped_type value)

 - Changed the update functions to templatize around the Updater. This
 is generally faster since that permits lambda to be inlined rather type
 erased through the std::function.
@apc999
Copy link
Member

apc999 commented Sep 9, 2014

I like this idea of having a wrapper class around the table entry to return
on reference operation (e.g., table["foo"]).
Originally, we didn't support this STL-like reference because we found it
hard to determine the lifetime (and how long to lock
the corresponding buckets) on rvalue reference to an entry. Now the wrapper
returned can do smart thing helping us ensure
thread safety.

Performance-wise, I think it is likely to be the same (hoping the compiler
will do the heavy-lifting since everything is in .hh).

In general, I think this is good, and it will improve the readability and
usability of libcuckoo.

Thoughts? Manu, Dave, Michael?

  • Bin

On Mon, Sep 8, 2014 at 4:53 PM, Yucheng Low [email protected]
wrote:

It is documented that find() and insert() are faster, but this exists
mainly for convenience.

The performance of operator[] is basically just as fast as calling
find/insert for queries and insertions. Update may be slower since
updates
are performed by

 // a unified update/insert will be nice. upsert is not doing
 // anything particularly different from this.
 // and this is currently biased towards inserting a new value
 // than updating an existing value
 while (!owner->insert(key, value) &&
        !owner->update(key, value));

Note that this implementation here is not exactly STL compliant.
To maintain performance and avoid hitting the hash table too many
times, The reference object is lazy. In other words,
- operator[] does not actually perform an insert. It returns a
reference object pointing to the requested key.
- On table[i] = val // reference::operator=(mapped_type) an insert
/ update is called, and the reference becomes eager.
- On val = table[i] // operator mapped_type() an find / insert is
called and the reference becomes eager.
- On tablei http://i.e.%20no%20operation%20performed, the
destructor is called immediately (reference::~reference()) in which case
insert(key, mapped_type()) is called.

Thus, with normal usage, this should behave pretty much exactly like
a regular reference. However, where issues might occur is when the
lifetime of the reference exceeds its usual lifespan.

auto i = table[i]

in which case the lifetime of reference object is extended beyond
what we would like it to be causing issues. To avoid this issue,
the above is banned. By making the default constructor,
copy constructor, and assignment operator of the reference object
private, the above will cause a compilation error.

Though that means that annoyingly,

table[i] = table[j]

will cause a compilation error.
-

Having a fused find/insert and a fused insert/update will help make
the operator[] implementation be much nicer avoiding the silly lazy
reference issue.

i.e. what will be nice to have is:

// returns mapped value if key exists, or if key does not exist,
// inserts (key, value) and return value.
// equivalent to:
// \code
// while(1) {
// mapped_type qval;
// if (find(key, qval)) return qval;
// else if (insert(key, value)) return value;
// }
// \endcode
mapped_type find_or_insert(key_type key, mapped_type value)

and

// Equivalent to:
// \code
// while (!owner->insert(key, value) &&
// !owner->update(key, value));
// \endcode
//
// or,
//
// \code
// upsert(key, http://auto%20i{return i;}, value);
// \endcode

(though the upsert is not exactly faster than the while loop)
mapped_type insert_or_update(key_type key, mapped_type value)
-

Changed the update functions to templatize around the Updater. This
is generally faster since that permits lambda to be inlined rather type
erased through the std::function.


You can merge this Pull Request by running

git pull https://github.com/ylow/libcuckoo master

Or view, comment on, or merge it at:

#4
Commit Summary

  • Added operator[] support.

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#4.

Computer Science Department
Carnegie Mellon University

@manugoyal
Copy link
Contributor

Yeah I think it would make a nice addition. Looking at it briefly, there
are a few things in the current implementation that could be improved upon,
but overall it would work well I think.

-Manu

On Mon, Sep 8, 2014 at 6:25 PM, Bin Fan [email protected] wrote:

I like this idea of having a wrapper class around the table entry to
return
on reference operation (e.g., table["foo"]).
Originally, we didn't support this STL-like reference because we found it
hard to determine the lifetime (and how long to lock
the corresponding buckets) on rvalue reference to an entry. Now the
wrapper
returned can do smart thing helping us ensure
thread safety.

Performance-wise, I think it is likely to be the same (hoping the compiler
will do the heavy-lifting since everything is in .hh).

In general, I think this is good, and it will improve the readability and
usability of libcuckoo.

Thoughts? Manu, Dave, Michael?

  • Bin

On Mon, Sep 8, 2014 at 4:53 PM, Yucheng Low [email protected]
wrote:

It is documented that find() and insert() are faster, but this exists

mainly for convenience.

The performance of operator[] is basically just as fast as calling
find/insert for queries and insertions. Update may be slower since
updates
are performed by

// a unified update/insert will be nice. upsert is not doing
// anything particularly different from this.
// and this is currently biased towards inserting a new value
// than updating an existing value
while (!owner->insert(key, value) &&
!owner->update(key, value));

Note that this implementation here is not exactly STL compliant.
To maintain performance and avoid hitting the hash table too many
times, The reference object is lazy. In other words,

  • operator[] does not actually perform an insert. It returns a
    reference object pointing to the requested key.
  • On table[i] = val // reference::operator=(mapped_type) an insert
    / update is called, and the reference becomes eager.
  • On val = table[i] // operator mapped_type() an find / insert is
    called and the reference becomes eager.
  • On tablei http://i.e.%20no%20operation%20performed, the
    destructor is called immediately (reference::~reference()) in which case
    insert(key, mapped_type()) is called.

Thus, with normal usage, this should behave pretty much exactly like
a regular reference. However, where issues might occur is when the
lifetime of the reference exceeds its usual lifespan.

auto i = table[i]

in which case the lifetime of reference object is extended beyond
what we would like it to be causing issues. To avoid this issue,
the above is banned. By making the default constructor,
copy constructor, and assignment operator of the reference object
private, the above will cause a compilation error.

Though that means that annoyingly,

table[i] = table[j]

will cause a compilation error.

Having a fused find/insert and a fused insert/update will help make
the operator[] implementation be much nicer avoiding the silly lazy
reference issue.

i.e. what will be nice to have is:

// returns mapped value if key exists, or if key does not exist,
// inserts (key, value) and return value.
// equivalent to:
// \code
// while(1) {
// mapped_type qval;
// if (find(key, qval)) return qval;
// else if (insert(key, value)) return value;
// }
// \endcode
mapped_type find_or_insert(key_type key, mapped_type value)

and

// Equivalent to:
// \code
// while (!owner->insert(key, value) &&
// !owner->update(key, value));
// \endcode
//
// or,
//
// \code
// upsert(key, http://auto%20i{return i;}, value);
// \endcode

(though the upsert is not exactly faster than the while loop)

mapped_type insert_or_update(key_type key, mapped_type value)

Changed the update functions to templatize around the Updater. This
is generally faster since that permits lambda to be inlined rather type
erased through the std::function.


You can merge this Pull Request by running

git pull https://github.com/ylow/libcuckoo master

Or view, comment on, or merge it at:

#4
Commit Summary

  • Added operator[] support.

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#4.

Computer Science Department
Carnegie Mellon University


Reply to this email directly or view it on GitHub
#4 (comment).

@manugoyal
Copy link
Contributor

Hey @ylow,

Sorry I took so long to look over the pull request. I decided to rework your implementation quite a bit, so rather than asking you to re-implement the changes I made on this pull request, I'm going to send a pull request of my changes to your forked repo and you can take a look at that. If we can agree on those changes, then we can merge into the efficient repository.

Thanks,
Manu

@ylow
Copy link
Contributor Author

ylow commented Sep 26, 2014

No worries. Please do go ahead. I hacked this together as a proof of concept. As mentioned in the comments, given a couple of additional primitives, it can be built quite a bit cleaner.

@manugoyal
Copy link
Contributor

I merged your changes and mine into the master branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants