[SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of the app #22813

ouyangxiaochen · 2018-10-24T06:56:47Z

What changes were proposed in this pull request?

The cleanup mechanism will clear all the eligible directories under SPARK_WORK_DIR. If the other configured paths are the same as the SPARK_WORK_DIR configuration, this will cause the file directories of other configuration items to be deleted by mistake. For example, the SPARK_LOCAL_DIRS and SPARK_WORK_DIR settings are the same.

We should add an another condition which start with 'app-' when removing the expired app-* directories in SPARK_WORK_DIR

How was this patch tested?

manual test

…he app-

AmplabJenkins · 2018-10-24T07:01:44Z

Can one of the admins verify this patch?

jiangxb1987 · 2018-10-24T15:47:00Z

IIUC it's not expected to share the SPARK_WORK_DIR with any other usage. Does this happen in real environment and hard to bypass the limitation?

ouyangxiaochen · 2018-10-25T02:09:08Z

@jiangxb1987 Yes, it happened in our real environment.
The scenario as follows:
Some disk corruptions in the production cluster which is normal.
SPARK_LOCAL_DIRS = /data1/bigdata/spark/tmp, disk data1 is broken, the maintenance engineer modified data1 to data2 or another. Unfortunately. the config SPARK_WORK_DIR = /data2/bigdata/spark/tmp, and then we start a Thriftserver process, some temporay folder will be created at the new config path which is the same as SPARK_WORK_DIR, but when the cleanup cycle time is reached, the folder created by Thriftserver will be removed by WorkDirCleanUp, so it will cause the Beeline and JDBC query to fail.
There is a very extreme situation that the user configures the operating system directory, which will cause a lot of trouble. So i think add this condition could reduce some unnecessary risks.

srowen · 2018-10-27T13:47:42Z

I don't think that's a reasonable usage scenario. That said is there any harm to this change? would it ever miss cleaning something up that it should?

dongjoon-hyun · 2018-10-29T04:55:19Z

@ouyangxiaochen . Sorry, but the use case sounds like a misconfiguration.

ouyangxiaochen · 2018-10-29T06:13:14Z

As far as I know, when a spark program is submitted to the cluster, a directory will be created under SPARK_WORK_DIR. The directory name consists of application, timestamp, and five-digit serial number. WorkDirCleanUp should only delete expired application directories. @srowen Could you tell me if there are other types of directories or files being created under SPARK_WORK_DIR? In addition to the application directory. Thanks!

ouyangxiaochen · 2018-10-29T06:27:38Z

@dongjoon-hyun Yes, you can think so. So I want to solve this problem on the spark platform to reduce the risk of some misoperations of operation and maintenance engineer. WorkDirCleanUp is only responsible for cleaning up the directories that it generates.

srowen · 2018-10-29T14:27:01Z

I don't know what else goes in the work dir. It isn't valid to reuse it for anything else. Can you simply avoid using a work dir that is or has been used by something else?

The argument for making this change anyway is just that the code should delete just what it writes. But I am just not sure it can be something you rely on for correct behavior

ouyangxiaochen · 2018-11-01T09:57:46Z

@srowen Your suggestion is very good, but sometimes the maintenance engineers have limited skills in this area. If they configures the operating system root directory as SPARK_WORK_DIR due to disk damage, it will bring a catastrophic accident. So, I think it is necessary to add this condition to avoid this production accidents.

srowen · 2018-11-01T14:10:57Z

I just think that if you have engineers randomly writing and reading stuff in this dir, a bunch of other stuff goes wrong. This is not a problem that Spark can reasonably solve. Certainly, you have much bigger production problems if this level of discipline can't be enforced.

Closes apache#21766 Closes apache#21679 Closes apache#21161 Closes apache#20846 Closes apache#19434 Closes apache#18080 Closes apache#17648 Closes apache#17169 Add: Closes apache#22813 Closes apache#21994 Closes apache#22005 Closes apache#22463 Add: Closes apache#15899 Add: Closes apache#22539 Closes apache#21868 Closes apache#21514 Closes apache#21402 Closes apache#21322 Closes apache#21257 Closes apache#20163 Closes apache#19691 Closes apache#18697 Closes apache#18636 Closes apache#17176 Closes apache#23001 from wangyum/CloseStalePRs. Authored-by: Yuming Wang <[email protected]> Signed-off-by: hyukjinkwon <[email protected]>

WorkDirCleanup should only remove the directory at the beginning of t…

cf8b30f

…he app-

ouyangxiaochen changed the title ~~[SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of t…~~ [SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of the app Oct 24, 2018

srowen mentioned this pull request Nov 10, 2018

[INFRA] Close stale PRs #23001

Closed

asfgit closed this in a3ba3a8 Nov 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of the app #22813

[SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of the app #22813

Uh oh!

ouyangxiaochen commented Oct 24, 2018 •

edited

Loading

Uh oh!

AmplabJenkins commented Oct 24, 2018

Uh oh!

jiangxb1987 commented Oct 24, 2018 •

edited

Loading

Uh oh!

ouyangxiaochen commented Oct 25, 2018 •

edited

Loading

Uh oh!

srowen commented Oct 27, 2018

Uh oh!

dongjoon-hyun commented Oct 29, 2018

Uh oh!

ouyangxiaochen commented Oct 29, 2018

Uh oh!

ouyangxiaochen commented Oct 29, 2018

Uh oh!

srowen commented Oct 29, 2018

Uh oh!

ouyangxiaochen commented Nov 1, 2018

Uh oh!

srowen commented Nov 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of the app #22813

[SPARK-25818][CORE] WorkDirCleanup should only remove the directory at the beginning of the app #22813

Uh oh!

Conversation

ouyangxiaochen commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

AmplabJenkins commented Oct 24, 2018

Uh oh!

jiangxb1987 commented Oct 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ouyangxiaochen commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

srowen commented Oct 27, 2018

Uh oh!

dongjoon-hyun commented Oct 29, 2018

Uh oh!

ouyangxiaochen commented Oct 29, 2018

Uh oh!

ouyangxiaochen commented Oct 29, 2018

Uh oh!

srowen commented Oct 29, 2018

Uh oh!

ouyangxiaochen commented Nov 1, 2018

Uh oh!

srowen commented Nov 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ouyangxiaochen commented Oct 24, 2018 •

edited

Loading

jiangxb1987 commented Oct 24, 2018 •

edited

Loading

ouyangxiaochen commented Oct 25, 2018 •

edited

Loading