Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan and scroll object for distributed search execution #617

Merged
merged 3 commits into from
May 25, 2014

Conversation

webdevsHub
Copy link
Contributor

Hi,
i wrote a new class for for the scan and scroll feature. You can loop over huge amounts of data with a simple foreach.

I never contributed to Elastica and i'm not sure, if you want complex operations like this in your library. But i decided to make a pull request to discuss it :D

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 4194195 on webdevsHub:ScanAndScroll into 58b0928 on ruflin:master.

@ruflin
Copy link
Owner

ruflin commented May 17, 2014

@webdevsHub Thank you for your contribution. I will take a closer look tomorrow.

@ruflin
Copy link
Owner

ruflin commented May 18, 2014

I'm not sure if we should make it part of the Search object or if it makes sense to add a new object for it. I'm tempted to say it should be in Search as the scroll and scan option is already supported by it but I'm open to let me convince from the opposite.

@webdevsHub What do you think?

@webdevsHub
Copy link
Contributor Author

My goal was: get rid of the scan-request. The scan request only make sense (as far as i know) in combination with 1 or more scroll requests. If you want the results directly, you send a normal _search request, else you have to perform the scan and scroll operation. The scan search type is used as a "initial scroll request" exclusively.

If you agree with that, the next logical step is to provide a feature which will hide all the meta-work from users (0: init the scroll aka scan, 1: set the new scroll-id, 2: perform the scroll, 3: repeat from 1 until no results returned) and let the users deal with the results only.

I think the best way to do this is the Iterator-Interface, because you can only use an Iterator as the loop-context in one direction. So it perfectly matches the scan and scroll behavior.

With this in mind it feels wrong to integrate the scan and scroll operation in the Search class: We can make the Search class itself to an Iterator. But this will change the behavior of the Search class a lot (from a nice way to define and configure your searches to a heavy, mutli-functioned allrounder).

A better way would be to provide a new function (something like Search::scanAndScroll). This function returns the ScanAndScroll-Iterator constructed with $this. Now the scan and scroll operation is integrated into the Search class but decoupled at the same time.

foreach($search->scanAndScroll() as $resultSet) {
  // ...
}

I think another problem is the override of some properties in the Search and Query objects. This might lead to an issue if the users want to reuse the Search and Query objects. The only solution i see is to clone the Search in the constructor of ScanAndScroll but this looks wrong to me too... Any ideas?

*
* @param Search $search
*/
function __construct(Search $search) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add public also the constructor

@ruflin
Copy link
Owner

ruflin commented May 20, 2014

I would say good argumentation 👍 Lets go this way. Can you also update the changes.txt file?

@ruflin
Copy link
Owner

ruflin commented May 24, 2014

@webdevsHub Any updates?

@webdevsHub
Copy link
Contributor Author

changed some comments and added the new method to Search

can you update the documentation on http://elastica.io ?

@ruflin ruflin merged commit 73cc3de into ruflin:master May 25, 2014
@ruflin
Copy link
Owner

ruflin commented May 25, 2014

Ok, merged in. I will update the documentation on elastica.io

@webdevsHub webdevsHub deleted the ScanAndScroll branch May 25, 2014 18:03
@ruflin
Copy link
Owner

ruflin commented May 25, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants