-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-6647 KafkaStreams.cleanUp creates .lock file in directory it tries to clean #4702
Conversation
Whilst this may be a nice to have I remain unconvinced that this will fix the issue. public synchronized void clean() {
try {
cleanRemovedTasks(0, true);
} catch (final Exception e) {
// this is already logged within cleanRemovedTasks
throw new StreamsException(e);
}
try {
Utils.delete(globalStateDir().getAbsoluteFile());
} catch (final IOException e) {
log.error("{} Failed to delete global state directory due to an unexpected exception", logPrefix(), e);
throw new StreamsException(e);
}
} |
@@ -138,7 +138,7 @@ synchronized boolean lock(final TaskId taskId) throws IOException { | |||
} | |||
|
|||
try { | |||
lockFile = new File(directoryForTask(taskId), LOCK_FILE_NAME); | |||
lockFile = new File(stateDir, taskId + LOCK_FILE_NAME); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove the directoryForTask
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The location of the lock file is lifted one level up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion on #4713 I think this idea should actually work. Nit: Can we rename the lock to LOCK_FILE_NAME + " -" + taskId
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename the lock if people think we can continue with this solution.
Left one minor comment. Plus do we need to add a new test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a comment on https://issues.apache.org/jira/browse/KAFKA-6647 ticket itself.
@@ -88,27 +89,6 @@ public void shouldCreateTaskStateDirectory() { | |||
assertTrue(taskDirectory.isDirectory()); | |||
} | |||
|
|||
@Test | |||
public void shouldLockTaskStateDirectory() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
directory.lock(taskId) is at different level than channel.tryLock() , rendering the test useless.
Guozhang has different opinion on the approach. |
Any update on this issue? |
@gartho We are still pondering what is the best fix for this issue across different file systems. What's your use case and is this a blocker issue for your app? |
I’m coming across this issue during integration level testing. I’ve decided to resolve it for now by clearing out the state store folder before I run my tests so it’s not blocking me at present. |
Any updates? |
@gartho @philippkamps Sorry for being late on this PR. @tedyu I'd suggest follow the option 2) I've proposed in the JIRA ticket, i.e.
WDYT? |
+1 |
@tedyu What is the status of this PR? |
From @guozhangwang 's comment on the JIRA, I thought he acknowledged that lifting the lock up one dir would solve the problem for Windows environment. |
I just lost the overview :) Last comment from @guozhangwang (#4702 (comment)) seems to indicate that he wants some updates to the PR. Or is this already contained and the PR is ready for reviewing? Maybe you can rebase the PR to resolve the merge conflict first? Thanks a lot! |
Can we discuss how to achieve compatibility with existing locking structure first ? Thanks |
Sure. However, I am not exactly sure what you mean? Can you elaborate? |
I meant that, after the move of the lock file, is there any compatibility issue we need to consider. |
I didn't find the local repo for the original PR. Need to open a new one if the approach in current PR is confirmed. |
@tedyu Sorry for late reply.
I don't think so. If an app is shut down, it should free all locks. On startup, it can just use the new locking strategy. Or do I miss anything? It would only be an issue, if multiple thread (or different versions) would operate in parallel -- but this case should be be possible on a single machine and thus we should be fine? |
bq. but this case should be be possible on a single machine I guess you meant 'should not be' |
Created #5650 |
@tedyu @mjsax My concern about compatibility is that if we have multiple instances of Streams running on the same machine, say A and B, where A shuts down and upgrades to a new version, it will then try to create the lock file in a different directory (one layer up) while B is still holding locks on the old directory, is it possible that both instances think they now own the task's directory and move on to modify the state (e.g. one still owning the task while the other moves in and "cleanup" the state)? |
If you run two |
I just started up a small example app that merely created the In one case I used the same app-id, and both instances started up fine. It makes sense as the different instances will end up writing to different task-id directories. In the other case, I changed the app-id of the second instance and a new directory So I think we need more testing to cover the scenario @guozhangwang outlined above. But it wouldn't hurt to follow @mjsax suggestion to configure with different |
I think for K8s deployment, the state directory should be customized for each streaming app. |
@tedyu What is the status of this PR? |
See the above comment from Guozhang. |
Can we move the discussion to the new PR? Thanks. |
Specify StandardOpenOption#DELETE_ON_CLOSE when creating the FileChannel.
Committer Checklist (excluded from commit message)