Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchResponseIterator is scrolling the first page twice #595

Closed
Kleinrich opened this issue Jun 30, 2017 · 3 comments · Fixed by #871
Closed

SearchResponseIterator is scrolling the first page twice #595

Kleinrich opened this issue Jun 30, 2017 · 3 comments · Fixed by #871

Comments

@Kleinrich
Copy link

Kleinrich commented Jun 30, 2017

SearchResponseIterator is scrolling the first page twice

Hello,

When you use the SearchResponseIterator in a foreach, the first page is returned twice.
The reason seems to be in SearchResponseIterator::next().
Why does not the first iteration make call to the scroll api?

if ($this->current_key !== 0) {
            $this->current_scrolled_response = $this->client->scroll(
                array(
                    'scroll_id' => $this->scroll_id,
                    'scroll'    => $this->scroll_ttl
                )
            );
            $this->scroll_id = $this->current_scrolled_response['_scroll_id'];
}

The iterator sequence is :
rewind() -> valid() -> current() -> key() -> next() -> valid() -> current() -> key() -> next() -> etc until valid returns false.
So at the first iteration, we have to move the cursor on the next page.
If I remove the if, it works well.

Code snippet of problem

Here is an how-to-reproduce example :

require_once __DIR__.'/vendor/autoload.php';

$aConfig = [
    'hosts' => ['localhost:9200'],
    'count' => [
        'index' => 'foo',
        'type' => 'bar',
        'body' => ['query' => ['match_all' => new \StdClass()]]
    ],
    'scroll' => [
        'scroll' => '1m',
        'size' => 100,
        'index' => 'foo',
        'type' => 'bar',
        'body' => ['query' => ['match_all' => new \StdClass()]],
    ],
];

$oClient = \Elasticsearch\ClientBuilder::create()
    ->setHosts($aConfig['hosts'])
    ->build();

$oIterator = new \Elasticsearch\Helper\Iterators\SearchResponseIterator($oClient, $aConfig['scroll']);
$iCount = 0;
foreach ($oIterator as $aPage) {
    $iCount += count($aPage['hits']['hits']);
}

printf('Scrolled %1$s documents%2$s', $iCount, PHP_EOL);
printf('There are %1$s documents%2$s', $oClient->count($aConfig['count'])['count'], PHP_EOL);

System details

Ubuntu 16.04.2 LTS
PHP 7.1.4 & PHP 5.6.30
ES 5.2.2
PHP-ES v5.2.0

@biopowder
Copy link

Same for me:
Apache/2.4.29 (Win32) OpenSSL/1.0.2n PHP/7.1.13

@polyfractal
Copy link
Contributor

Thanks for investigating! Want to open a PR to fix the bug? If not, I'll try to get to it later this week or early next.

I suspect this is due to how scrolling has changed over time (used to operate differently with scan/scroll). The helpers were contributed by community members and don't get as much testing (not a good excuse, just the truth).

@biopowder
Copy link

biopowder commented Sep 28, 2018

I don't remember how did I make WA in June, but it was working for me. And now it's still scrolling twice. (maybe I was changed some library versions or PHP... @polyfractal , did you manage to check this? P.S. what's mean PR?

safecat added a commit to safecat/elasticsearch-php that referenced this issue Mar 29, 2019
safecat added a commit to safecat/elasticsearch-php that referenced this issue Mar 29, 2019
@ezimuel ezimuel mentioned this issue Apr 18, 2019
ezimuel added a commit that referenced this issue Apr 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants