Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Scroll Iterator #842

Merged
merged 2 commits into from
May 20, 2015
Merged

added Scroll Iterator #842

merged 2 commits into from
May 20, 2015

Conversation

webdevsHub
Copy link
Contributor

Implementation of Scroll Iterator requested by @hasumedic in #827

The Scroll Iterator executes every request fully isolated. That means search options manipulated during iterations are changed and reverted automatically.

$search->setOption(Search::OPTION_SEARCH_TYPE, Search::OPTION_SEARCH_TYPE_COUNT);

$path = $search->getPath();
foreach ($search->scroll() as $scrollId => $resultSet) {
    $search->getPath() === $path; // returns true
}
$search->getPath() === $path; // returns true

In addition I changed the ScanAndScroll Iterator to a child class of Scroll in order to profit from that isolation.

Finally I added a Search::scroll() function just like Search::scanAndScroll()

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling cfcd3a5 on webdevsHub:scroll into * on ruflin:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 6bf4a0b on webdevsHub:scroll into * on ruflin:master*.

1 similar comment
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 6bf4a0b on webdevsHub:scroll into * on ruflin:master*.

$this->_search->getQuery()->setSize($this->sizePerShard);
// reset state
$this->_nextScrollId = null;
$this->_options = array(null, null, null, null);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be only 3 null?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I just saw that there is a fourth param size for ScanAndScroll

@ruflin
Copy link
Owner

ruflin commented May 14, 2015

Does this break an BC or not?

@webdevsHub
Copy link
Contributor Author

Does this break an BC or not?

no

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 6bf4a0b on webdevsHub:scroll into * on ruflin:master*.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 6bf4a0b on webdevsHub:scroll into * on ruflin:master*.

1 similar comment
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 6bf4a0b on webdevsHub:scroll into * on ruflin:master*.

@webdevsHub
Copy link
Contributor Author

I would like hearing @hasumedic opinion on this before merge

@ruflin
Copy link
Owner

ruflin commented May 14, 2015

Ok, I will wait with merging until @hasumedic put his feedback in.

@hasumedic
Copy link

Looking good to me @webdevsHub!

*
* @var array
*/
protected $_options = array(null, null, null);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a VO instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a VO instead?

VO?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we should be using Value Objects for these pieces of config. But I haven't seen them implemented anywhere else in the library, so I would assume that arrays are the conventions in this case. Isn't that the case @ruflin?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hasumedic Currently the convention is arrays. This is mainly related to the history of Elastica as it started very simple and grew together with elasticsearch. If we change to ArrayObjects, we should consider a change across the full library. @ewgRa brought up this issue and I just realised I never responded to it :-( #783 (comment) I suggest to continue this discussion in #783 an until then we use arrays.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being the case, this PR looks good and should go forward! 👍

@webdevsHub
Copy link
Contributor Author

I deleted protected $_lastScrollId because it is useless. This would break BC if someone extended ScanAndScroll and used that member. Should I mentioned it somewhere?

@ruflin
Copy link
Owner

ruflin commented May 18, 2015

Mention it in the CHANGELOG.md file

@ruflin ruflin merged commit 7a21b60 into ruflin:master May 20, 2015
@ruflin
Copy link
Owner

ruflin commented May 20, 2015

Merged. Thx.

@hasumedic
Copy link

@webdevsHub just a quickie.

How would this API work in a cross-request scenario? ie. First request loads the data and returns the scrollId by means of Scroll::key(). Second request sends the scrollId to fetch the next bulk of data. How would this scrollId be set into the Scroll object to query ES for the next set of results?

@ruflin
Copy link
Owner

ruflin commented May 20, 2015

@hasumedic I think @webdevsHub is on vacation and will only respond next week ...

@hasumedic
Copy link

@ruflin that's OK. We'll figure it out then :) Thanks for letting me know!

@hasumedic
Copy link

Thanks for replying @webdevsHub ! I guess that that's precisely what my question was.
This works fine in an iterable scenario, but it prevents others such as multiple requests to work.

What I was asking in my previous comment was in terms of web requests. So you make a first web request to process a first batch of (lets say) 1000 elements. This generates the scrollId and the first 1000 results. With this you generate a response.
Now you want to manually trigger the process for the next 1000 elements, so you make a new web request. In order to continue processing from the element 1001 to the 2000, you need the scrollId to be passed into the Scroll object and run Scroll:next(), but I don't see how this can work at the moment.

Does this make any sense?

@webdevsHub
Copy link
Contributor Author

As you can see here it is perfectly fine to use the Iterator without foreach and without setting something manually in the class.

@hasumedic
Copy link

I'll give it a go. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants