Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The HDFS file upload utility class may have the possibility of data loss. #97

Open
Minnull opened this issue Sep 14, 2022 · 5 comments
Open
Labels
affects/master PR/issue: this bug affects master version. severity/trivial Severity of bug type/bug Type: something is unexpected

Comments

@Minnull
Copy link

Minnull commented Sep 14, 2022

经过对hdfs文件上传时的代码逻辑分析,看代码存在可能丢数据的风险

代码位置:https://github.com/vesoft-inc/nebula-exchange/blob/master/exchange-common/src/main/scala/com/vesoft/exchange/common/utils/HDFSUtils.scala
代码方法:

def upload(localPath: String, remotePath: String, namenode: String = null): Unit = {
try {
val localFile = new File(localPath)
if (!localFile.exists() || localFile.length() <= 0) {
return
}
} catch {
case e: Throwable =>
LOG.warn("check for empty local file error, but you can ignore this check error. " +
"If there is empty sst file in your hdfs, please delete it manually",
e)
}
val system = getFileSystem(namenode)
system.copyFromLocalFile(new Path(localPath), new Path(remotePath))
}

(1)校验文件不存在,看调用逻辑,只有文件生成后才会上传文件,如果上传的时候文件不存在了,也没有抛出异常和记录,这部分代码相当于把异常吃掉了,感觉存在丢数据风险

if (!localFile.exists() || localFile.length() <= 0) {
return}

(2)catch阶段感觉存在同样未对异常处理的问题

} catch {
case e: Throwable =>
LOG.warn("check for empty local file error, but you can ignore this check error. " +
"If there is empty sst file in your hdfs, please delete it manually",
e)
}

(3)经过测试,发现目前文件存在并发情况下被其他任务删除的场景下,hdfs会上传存在size为0的文件,影响ingest

解决办法

个人认为是否需要把这些异常统一抛出去,executor执行情况下收到异常将会kill容器,重试task保证数据完整性

期望回复

@wey-gu
Copy link
Contributor

wey-gu commented Sep 15, 2022

感谢 @Minnull 的分析,不知道您有没有兴趣 PR fix :)

@Minnull
Copy link
Author

Minnull commented Sep 15, 2022

感谢回复,我会尽可能的参与修复这个问题的工作之中。

@Sophie-Xie Sophie-Xie added the type/bug Type: something is unexpected label Nov 29, 2022
@wey-gu
Copy link
Contributor

wey-gu commented Nov 29, 2022

@Minnull 不知道您有没有精力提 PR 修复这个哈 :)

@Minnull
Copy link
Author

Minnull commented Nov 29, 2022

@Minnull 不知道您有没有精力提 PR 修复这个哈 :)

不好意思,最近比较忙,一直在关注哈

@wey-gu
Copy link
Contributor

wey-gu commented Nov 29, 2022

不急不急哈~~~ take your time :)

@HarrisChu HarrisChu added affects/none PR/issue: this bug affects none version. severity/none Severity of bug severity/trivial Severity of bug affects/master PR/issue: this bug affects master version. labels Dec 1, 2022
@github-actions github-actions bot removed severity/none Severity of bug affects/none PR/issue: this bug affects none version. labels Dec 7, 2022
@QingZ11 QingZ11 changed the title hdfs文件上传工具类,是否可能存在丢数据的问题 The HDFS file upload utility class may have the possibility of data loss. Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/master PR/issue: this bug affects master version. severity/trivial Severity of bug type/bug Type: something is unexpected
Projects
None yet
Development

No branches or pull requests

4 participants