-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial import does not work #38
Comments
Hi, That should not be an issue.
Thanks, On Tue, Nov 6, 2012 at 11:27 AM, egueidan [email protected] wrote:
|
Sure, the river setting is trivial:
The mongo log only shows connections opening (no errors) e.g.:
The ES log shows for each river (I create multiple ones) the following lines:
The 'mydb' index remains empty. If I update an element in mongo it will be correctly picked up. The interesting part is that if wipe out ES and restart the river the updated element will show up at restart (but only that updated element). I guess this is because the update operation is still in the oplog. It is important to note that the replSet has been activated on mongo after the data was inserted (I just activated it to be able to use the river from now on). ...
if (time == null) {
logger.info("No known previous slurping time for this collection");
return null;
}
... With this modification (which I understand is not satisfying for the purpose of filtered rivers), all my data is picked up. Thanks, |
Hi, It is a requirement to have the replica set setup before to start importing None of the data imported the replica set was created will be indexed in ES. Thanks, On Tue, Nov 6, 2012 at 12:26 PM, egueidan [email protected] wrote:
|
Ok but that used to work (tested with 1.4.0)... Also, I might be wrong but the oplog being a capped collection, it won't contain all the operations that ever happened in mongo. This means that the first time you use the river you have to query directly the slurped collection for all elements (which is what processFullCollection does if I understand correctly). And that does not rely on the oplog. |
Hi, Just tested:
So even if the new filter was there the river would not find the document that has been created before the replica set was initiated. The new filter implementation also make sure that only data related to the collection monitored are returned (as opposed as before where everything was returner to the river). Thanks, |
I have exactly the same problem where processFullCollection is not called the first time it's loaded from a new _river/mongodb/_meta. I built elasticsearch and the plugins from master cloned locally. processFullCollection is only triggered when oplogCursor method returns null. But that's never the case for a collection that is existing. Is there an external way to trigger a fullcollection process for the first time elasticsearch is ran? I also agree with egueidan: the oplog won't give you all the records that need to be indexed. It seems that a first time run should upload ALL mongodb docs to the es indexer. Did I miss anything? How can the search indexer be loaded from the mongodb river on initial setup? I need to implement the search on top of an existing mongodb configuration already containing 100,000 documents. How can I index these documents? Please advise. |
Hi, If a collection has been created before the replica set then the river will no be able to index the documents. The recommendation is:
[1] - http://docs.mongodb.org/manual/reference/mongodump/ Thanks, |
Hi,
It looks like the river initial import does not work. My setup is the following:
ES 0.19.11
MongoDB 2.2.1
River 1.5.0
After a quick look at the code, a wild guess would be that this appeared when #31 was fixed.
In fact, the method Slurper#getIndexFilter does not return null anymore when there is no input timestamp. This means that the first slurper loop won't execute processFullCollection().
Let me know, if you need more information.
Cheers,
Emmanuel
The text was updated successfully, but these errors were encountered: