Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for specifying a pipeline when indexing documents #1455

Merged
merged 1 commit into from
Feb 23, 2018

Conversation

stof
Copy link
Contributor

@stof stof commented Feb 9, 2018

Closes #1453

@stof
Copy link
Contributor Author

stof commented Feb 9, 2018

@ruflin I added the setPipeline in the Document class. Should it be there or in the AbstractUpdateAction ? I don't know whether pipelines make sense for Script.

And I would like some guidance about the best way to cover this with tests.

@stof
Copy link
Contributor Author

stof commented Feb 13, 2018

@ruflin could you give me the necessary indications so that I can finish this PR ?

@ruflin
Copy link
Owner

ruflin commented Feb 13, 2018

@stof Sorry for the late reply, only was partially online the last days.

I think having it in Document as you made it make sense as the pipeline is set for each Document.

The testing here is a bit trickier as you have to create a pipeline first in Elasticsearch. I would probably do that through Elastica by using php-elasticsearch as ingest does not really exist yet in Elastica. Best go for a simple processor like rename inside the pipeline which then should also make it easy to test if the document was actually processed as expected.

Cleaning up the pipeline after the test would be great, but I'm also ok if it sticks around if this gets too complicated.

@stof
Copy link
Contributor Author

stof commented Feb 14, 2018

@ruflin actually, for bulk requests, this goes in the bulk params, not in each document. Writing the test made me realize this mistake in my PR (I'm already using the pipeline when indexing documents with the lower-level API, but I was not using the bulk API).

and the update API does not support pipelines yet (this is discussed in elastic/elasticsearch#17895)

So does it still make sense to have it in the Document object ?

@stof
Copy link
Contributor Author

stof commented Feb 14, 2018

Tests are written though.

They are currently leaking the rename pipeline, as ensuring that the cleanup is done even in case of test failure would involve wrapping the whole test in a try/finally or doing the cleanup in tearDown while most tests don't create it. But I use always the same pipeline name (overriding the pipeline defined by previous tests), so I won't leak multiple ones.

@stof
Copy link
Contributor Author

stof commented Feb 14, 2018

anyway, Type::addDocument does not take options outside the Document, so it still make sense to have pipeline in the Document IMO. But the question is how this integrates with addDocuments cleanly.

@ruflin
Copy link
Owner

ruflin commented Feb 15, 2018

@stof Thanks for the details and changes. I will probably only have time to have a detailed look and think about it beginning next week. It's definitively on my todo list ;-)

@ruflin
Copy link
Owner

ruflin commented Feb 18, 2018

@stof I'm positively surprised that using a pipeline with the bulk API already works as you showed in your tests. I wonder if that is enough for now or if we need a deeper integration?

For your PR I'm good to merge as is if your are ok with it and would add a Changelog. We can perhaps have a follow up discussion in an issue about the bulk part.

@stof
Copy link
Contributor Author

stof commented Feb 19, 2018

@ruflin it already works, because the client exposes the request params in its API, and #1427 makes it much more usable by allowing to pass them along.

@stof
Copy link
Contributor Author

stof commented Feb 19, 2018

@ruflin I'm OK with merging this PR as is (and I added the changelog)

@@ -2,14 +2,16 @@
All notable changes to this project will be documented in this file based on the [Keep a Changelog](http://keepachangelog.com/) Standard. This project adheres to [Semantic Versioning](http://semver.org/).


## [Unreleased](https://github.com/ruflin/Elastica/compare/6.0.0...master)
## [Unreleased](https://github.com/ruflin/Elastica/compare/6.0.1...master)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@ruflin ruflin merged commit e505ade into ruflin:master Feb 23, 2018
@stof stof deleted the pipeline_support branch February 23, 2018 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants