Skip to content

ARROW-2558: [Plasma] avoid walk through all the objects when a client disconnects#2015

Closed
zhijunfu wants to merge 3 commits intoapache:masterfrom
zhijunfu:client_object_ids
Closed

ARROW-2558: [Plasma] avoid walk through all the objects when a client disconnects#2015
zhijunfu wants to merge 3 commits intoapache:masterfrom
zhijunfu:client_object_ids

Conversation

@zhijunfu
Copy link
Contributor

@zhijunfu zhijunfu commented May 9, 2018

Currently plasma stores list-of-clients in ObjectTableEntry, which is used to track which clients are using a given object, this serves for two purposes:

  • If an object is in use.
  • If the client trying to abort an object is the one who created it.

A problem with list-of-clients approach is that when a client disconnects, we need to walk through all the objects and remove the client pointer from the list for each object.

Instead, we could add a reference count in ObjectTableEntry, and store list-of-object-ids in client structure. This could both goals that the original approach is targeting, while when a client disconnects, it just walk through its object-ids and dereference each ObjectTableEntry, there's no need to walk through all objects.

#endif
/// Set of clients currently using this object.
std::unordered_set<Client*> clients;
/// Reference count.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better documentation is: Number of clients currently using this object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is indeed better:)

/// Set of clients currently using this object.
std::unordered_set<Client*> clients;
/// Reference count.
int refcnt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this to ref_count, it's good to be not too cryptic ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

}
// Add the client pointer to the list of clients using this object.
entry->clients.insert(client);
// Increase refcnt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Increase reference count.

auto it = client->object_ids.find(entry->object_id);
if (it != client->object_ids.end()) {
client->object_ids.erase(it);
// Decrease refcnt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Decrease reference count.

@@ -533,19 +538,25 @@ void PlasmaStore::disconnect_client(int client_fd) {
// lists.
// TODO(swang): Avoid iteration through the object table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove the TODO(swang) now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure:)

@zhijunfu
Copy link
Contributor Author

Comments addressed. Thanks

@pcmoritz
Copy link
Contributor

Performance impact:

Without this PR:

[ 78.57%] ··· Running plasma.SimplePlasmaLatency.time_plasma_put                                                                                                             498.27ms
[ 85.71%] ··· Running plasma.SimplePlasmaLatency.time_plasma_putget                                                                                                          694.21ms
[ 92.86%] ··· Running plasma.SimplePlasmaThroughput.time_plasma_put_data                                                                                                           ok
[ 92.86%] ···· 
               ========== ==========
                 param1             
               ---------- ----------
                  1000     552.60μs 
                 100000    768.20μs 
                10000000    5.61ms  
               ========== ==========

With this PR:

[ 78.57%] ··· Running plasma.SimplePlasmaLatency.time_plasma_put                                                                                                             500.77ms
[ 85.71%] ··· Running plasma.SimplePlasmaLatency.time_plasma_putget                                                                                                          697.12ms
[ 92.86%] ··· Running plasma.SimplePlasmaThroughput.time_plasma_put_data                                                                                                           ok
[ 92.86%] ···· 
               ========== ==========
                 param1             
               ---------- ----------
                  1000     557.60μs 
                 100000    741.20μs 
                10000000    4.89ms  
               ========== ==========

So looks like the performance is approximately the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants