-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upon restart, Fscrawler deletes and reindexes even though no new files are added. #1941
Comments
Could you run this again with one single document? With the trace mode on. |
I am pretty sure I originally started it with debug: would I start again with --env LOG_LEVEL=trace? And I just remove the files in the watch folder and put in one? Or I guess I could just change the folder in the yml file? |
Yeah. Look at https://fscrawler.readthedocs.io/en/latest/admin/logger.html for details. Alternatively, And yes, using a new dir would help. But you need to run with |
OK I am using docker so I don't know --trace/--restart apply here? How do I restart the docker container with different logging options? I think I have to recreate the container, no? |
I guess something like this:
|
serveracct@planck:/mnt/cloud/cases$ docker: Error response from daemon: Conflict. The container name "/fscrawler" is already in use by container "e3ed43fb9317fa65374564b70e5a1c79bfd5cbbae63de59b19d02ddcd6b0fe8b". You have to remove (or rename) that container to be able to reuse that name. |
I guess I could just restart without actually naming it and we'd just have another container? |
May be. I'm not that good with Docker 😅 |
OK if I change the folder and restart fscrawler it will then delete my old documents from Elasticsearch? |
No. |
@ScottCov have you been able to reproduce? |
Describe the bug
I have fscrawler on continuously. What I find is that if I turn it off and then restart, it proceeds to delete and reindex the documents which are already indexed. Specifically, the number of indexed documents doesn't change but it appears to be deleting and then adding them again even though there are no new ones. To be clear, i just stopped the docker container/elasticsearch and restarted it.
Job Settings
Logs
Expected behavior
I wouldn't expect any need to reindex as no new documents were added to the folder
Versions:
Fscrawler 2-10 snapshot docker
Attachment
If the bug is related to a given file, please share this file, so we can reuse it in tests
to reproduce the problem and may be use it in our integration tests.
The text was updated successfully, but these errors were encountered: