Skip to content

Commit

Permalink
[DOCS] Move and expand documentation about connection pools, selectors
Browse files Browse the repository at this point in the history
  • Loading branch information
polyfractal committed Dec 16, 2014
1 parent d707bf8 commit 759f8c4
Show file tree
Hide file tree
Showing 2 changed files with 116 additions and 70 deletions.
67 changes: 6 additions & 61 deletions docs/configuration.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Almost every aspect of the client is configurable. The client is built around h

However, since the container also controls all object instantiation, it is possible for users to completely change the internal components used by the client. A user could, for example, write a custom ConnectionPool class and use that instead of the default connection pools that ship with the client

=== Example: Configuring the Hosts
=== Host Configuration

A common operation will be telling the client what nodes are in your cluster. By default, the client will connect to 'localhost:9200', which obviously doesn't work in many production environments.

Expand All @@ -30,7 +30,7 @@ $client = new Elasticsearch\Client($params);

This associative array holds all custom configurations that you may want to set. Often, you'll only need to configure the hosts, but if you need more advanced behavior, read on.

=== Example: Configuring HTTP Basic Auth
=== HTTP Basic Auth
HTTP basic authentication is a very common requirement. To enable Basic Auth in the client, simply set a parameter with the required auth parameters:

[source,php]
Expand All @@ -48,7 +48,7 @@ $client = new Elasticsearch\Client($params);
After being initialized with authentication credentials, all outgoing requests will automatically include the HTTP auth headers.


=== Example: Ignoring exceptions
=== Ignoring exceptions
The library attempts to throw exceptions for common problems. These exceptions
match the HTTP response code provided by Elasticsearch. For example, attempting to
GET a nonexistent document will throw a `MissingDocument404Exception`.
Expand Down Expand Up @@ -119,7 +119,7 @@ $getParams = array(
$exists = $client->exists($getParams);
----

=== Example: Configuring the Logger
=== Configuring the Logger
By default, logging in the client is disabled for performance reasons. If you wish to enable logging, simply set the `logging` parameter to true:

[source,php]
Expand Down Expand Up @@ -157,9 +157,7 @@ $client = new Elasticsearch\Client($params);
----
{zwsp} +

Easy, right? Let's get a little more complicated and specify a custom logger. By default, the client uses a file-based logger provided by the https://github.com/Seldaek/monolog[Monolog] framework. Monolog provides a variety of loggers.

Let's log to the SysLog instead of a file:
By default, the client uses a file-based logger provided by the https://github.com/Seldaek/monolog[Monolog] framework. Monolog provides a variety of loggers. For example, we can instruct the client to log to SysLog instead of a file:

[source,php]
----
Expand All @@ -181,6 +179,7 @@ $client = new Elasticsearch\Client($params);
{zwsp} +

The client uses the generic https://github.com/php-fig/log[PSR\Log interface], which means that any PSR\Log compatible loggers will work just fine in the client.

Replacing the logger with another PSR\Log compatible logger is similar to the previous example of configuring a Monolog logger:

[source,php]
Expand All @@ -196,60 +195,6 @@ $client = new Elasticsearch\Client($params);
----
{zwsp} +

=== Example: Configuring the Selector Class
When we changed the logger object, we provided a complete object that we wanted to over-ride the default with. There are many configurations where this won't work. For example, the Connection class must be instantiated repeatedly when new connections are made.

Rather than provide an anonymous function or callback which builds new objects, the client simply accepts a class path which is used to build new objects.

Let's configure the Selector class. By default, the client uses a Round-Robin selector (called RoundRobinSelector, unsurprisingly). This will select connections in a loop, evenly distributing requests against your whole cluster.

Let's change it to a different Selector - the RandomSelector:

[source,php]
----
$params['selectorClass'] = '\Elasticsearch\ConnectionPool\Selectors\RandomSelector';
$client = new Elasticsearch\Client($params);
----
{zwsp} +

The client will now query random nodes. Let's go one step further and define our own selector, using custom business logic that is specific to your domain. Most configurable components in the client adhere to an interface, which makes it easy to swap them out for your own class.

Let's make a selector that only chooses the first connection. This is obviously not a good selector (!!!), but it demonstrates the concept well:

[source,php]
----
namespace MyProject\Selectors;
use Elasticsearch\Connections\ConnectionInterface;
use Elasticsearch\ConnectionPool\Selectors\SelectorInterface
class FirstSelector implements SelectorInterface
{
/**
* Selects the first connection
*
* @param array $connections Array of Connection objects
*
* @return ConnectionInterface
*/
public function select($connections)
{
return $connections[0];
}
}
----
{zwsp} +

And now we can specify that when creating the client:

[source,php]
----
$params['selectorClass'] = '\MyProject\Selectors\FirstSelector';
$client = new Elasticsearch\Client($params);
----
{zwsp} +

=== Full list of configurations

Expand Down
119 changes: 110 additions & 9 deletions docs/connection-pool.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,29 @@ The connection pool is an object inside the client that is responsible for maint

However, in the real world, things are never so clear. Nodes are sometimes in a gray-zone of _"probably dead but not confirmed"_, _"timed-out but unclear why"_ or _"recently dead but now alive"_.

The connection pool's job is to manage this set of unruly connections and try to provide the best behavior to the client. There are currently two connection pool implementations:
The connection pool's job is to manage this set of unruly connections and try to provide the best behavior to the client. There are several
connection pool implementations that you can choose from:

- `staticConnectionPool`: the user provides a static list of hosts and the connection pool simply tries to use these connections as required
- `sniffingConnectionPool`: the user provides a seed list of hosts, which the client uses to sniff the rest of the cluster using the Cluster State API
=== staticNoPingConnectionPool (default)

=== PHP and connection pooling
This connection pool maintains a static list of hosts, which are assumed to be alive when the client initializes. If
a node fails a request, it is marked as `dead` for 60 seconds and the next node is tried. After 60 seconds, the node
is revived and put back into rotation. Each additional failed request will cause the dead timeout to increase exponentially.

A successful request will reset the "failed ping timeout" counter.

=== staticConnectionPool

Identical to the `staticNoPingConnectionPool`, except it pings nodes before they are used to determine if they are alive.
This may be useful for long-running scripts, but tends to be additional overhead that is unescessary for average PHP scripts.

=== sniffingConnectionPool

Unlike the two previous static connection pools, this one is dynamic. The user provides a seed list of hosts, which the
client uses to "sniff" and discover the rest of the cluster. It achieves this through the Cluster State API. As new
nodes are added or removed from the cluster, the client will update it's pool of active connections.

=== Which connection pool to choose? PHP and connection pooling

At first blush, the `sniffingConnectionPool` implementation seems superior. For many languages, it is. In PHP, the conversation is a bit more nuanced.

Expand All @@ -24,12 +41,96 @@ In reality, if your script only executes a few queries, the sniffing concept is

For this reason the default connection pool is currently the `staticConnectionPool`. You can, of course, change this default - but we strongly recommend you load test and verify that it does not negatively impact your performance.

=== RoundRobinSelector vs StickyRoundRobinSelector
=== Changing the Connection Pool

Changing the connection pool is very simple: instantiate the client with your chosen connection pool implementation:

[source,php]
----
$params['connectionPoolClass'] = '\Elasticsearch\ConnectionPool\SniffingConnectionPool';
$client = new Elasticsearch\Client($params);
----
{zwsp} +

== Selectors

The connection pool maintains the list of connections, and decides when nodes should transition from alive to dead (and
vice versa). It has no logic to choose connections, however. That job belongs to the Selector class.

The selector's job is to return a single connection from a provided array of connections. Like the Connection Pool, there
are several implementations to choose from:

=== RoundRobinSelector (Default)

This selector returns connections in a round-robin fashion. Node #1 is selected on the first request, Node #2 on
the second request, etc. This ensures an even load of traffic across your cluster. Round-robin'ing happens on a
per-request basis (e.g. sequential requests go to different nodes).

=== StickyRoundRobinSelector

This selector is "sticky", in that it prefers to reuse the same connection repeatedly. For example, Node #1 is chosen
on the first request. Node #1 will continue to be re-used for each subsequent request until that node fails. Upon failure,
the selector will round-robin to the next available node, then "stick" to that node.

This is an ideal strategy for many PHP scripts. Since PHP scripts are shared-nothing and tend to exit quickly, creating
new connections for each request is often a sub-optimal strategy and introduces a lot of overhead. Instead, it is typically
better to "stick" to a single connection for the duration of the script.

By default, this selector will randomize the hosts upon initialization, which will still guarantee an even distribution
of load across the cluster. It changes the round-robin dynamics from per-request to per-script.

=== RandomSelector

This selector simply returns a random node, regardless of state. It is generally just for testing

=== Changing or replacing Selector Class

Changing the selector is also very simple: instantiate the client with your chosen implementation:

[source,php]
----
$params['selectorClass'] = '\Elasticsearch\ConnectionPool\Selectors\RandomSelector';
$client = new Elasticsearch\Client($params);
----
{zwsp} +

The client will now query random nodes. It is sometimes useful to build a custom selector which services your particular
cluster with custom business logic.

For example, we can build a new selector that only selects the first connection each time. This is obviously not a good
selector (!!!), but it demonstrates the concept well:

[source,php]
----
namespace MyProject\Selectors;
use Elasticsearch\Connections\ConnectionInterface;
use Elasticsearch\ConnectionPool\Selectors\SelectorInterface
class FirstSelector implements SelectorInterface
{
In a similar vein as the connection pool, there is a PHP-specific selector that may be useful to some environments.
/**
* Selects the first connection
*
* @param array $connections Array of Connection objects
*
* @return ConnectionInterface
*/
public function select($connections)
{
return $connections[0];
}
The selector's job is to return a single connection from an array of connections. The default is a round-robin selector that operates on a randomized list of hosts. This ensures an even load of traffic across your cluster. Round-robin'ing happens on a per-request basis (e.g. sequential requests go to different nodes).
}
----
{zwsp} +

Considering the nature of many PHP scripts, it is possible this is a sub-optimal strategy due to overhead of creating new connections. For that reason, there is a slightly modified `StickyRoundRobinSelector` class.
And now we can specify that when creating the client:

This class prefers to use the same connection for every request, but if the node is dead (due to a previous connection failure) it will round-robin to the next node. Using the default `randomizeHosts = true` setting, this still ensures even distribution of load across the cluster. The load will simply be round-robin'ed on a per-script basis rather than a per-query basis.
[source,php]
----
$params['selectorClass'] = '\MyProject\Selectors\FirstSelector';
$client = new Elasticsearch\Client($params);
----
{zwsp} +

0 comments on commit 759f8c4

Please sign in to comment.