We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
发现一个问题: 如果应用持续反复修改一个大文件的部分内容,最终会导致EIO。
问题重现的办法 juicefs mount -d "mysql://juiceschema:Juicestest++--@(juice2db-m.db.sfcloud.local:3306)/juiceschema" /juicefs_test/ fio -direct=1 -ioengine=sync -runtime=3600 -group_reporting -bs=8k -filesize=1g -nrfiles=1 -thread -iodepth 16 -numjobs=1 -readwrite=randwrite -name=randwrite -directory=/juicefs_test 就是单线程反复修改一个文件的部分数据。我们的环境中约运行10分钟会看到IO Error (EIO) 具体报错如下:
2023/02/10 11:32:51.014410 juicefs[82643] <WARNING>: write inode:273634 error: input/output error [writer.go:208] 2023/02/10 11:32:51.014443 juicefs[82643] <ERROR>: write inode:273634 indx:0 input/output error [writer.go:212] 2023/02/10 11:32:51.058436 juicefs[82643] <ERROR>: error: Error 1406: Data too long for column 'slices' at row 1 //这个报错就是下面这个stack报出来的,报错的inode就是上面的 273634 goroutine 181792171 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x65 github.com/juicedata/juicefs/pkg/meta.errno({0x31e12c0, 0xc04c3287e0}) /MyEBook/juicefs/pkg/meta/utils.go:104 +0xc5 github.com/juicedata/juicefs/pkg/meta.(*dbMeta).Write(0xc000acc0a0, {0xc09bfb1f78?, 0x3?}, 0x42ce2, 0x0, 0xb000, {0x1706873, 0x1000, 0x0, 0x1000}) /MyEBook/juicefs/pkg/meta/sql.go:2037 +0x365 github.com/juicedata/juicefs/pkg/vfs.(*chunkWriter).commitThread(0xc0866521b0) /MyEBook/juicefs/pkg/vfs/writer.go:201 +0x1c4 created by github.com/juicedata/juicefs/pkg/vfs.(*fileWriter).writeChunk /MyEBook/juicefs/pkg/vfs/writer.go:270 +0x3d9 [utils.go:104] 2023/02/10 11:32:51.058499 juicefs[82643] <WARNING>: write inode:273634 error: input/output error [writer.go:208] 2023/02/10 11:32:51.058520 juicefs[82643] <ERROR>: write inode:273634 indx:0 input/output error [writer.go:212] 2023/02/10 11:32:51.106496 juicefs[82643] <ERROR>: error: Error 1406: Data too long for column 'slices' at row 1 goroutine 181792171 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x65 github.com/juicedata/juicefs/pkg/meta.errno({0x31e12c0, 0xc0646189f0}) /MyEBook/juicefs/pkg/meta/utils.go:104 +0xc5 github.com/juicedata/juicefs/pkg/meta.(*dbMeta).Write(0xc000acc0a0, {0xc09bfb1f78?, 0x3?}, 0x42ce2, 0x0, 0xfe000, {0x1706879, 0x1000, 0x0, 0x1000}) /MyEBook/juicefs/pkg/meta/sql.go:2037 +0x365 github.com/juicedata/juicefs/pkg/vfs.(*chunkWriter).commitThread(0xc0866521b0) /MyEBook/juicefs/pkg/vfs/writer.go:201 +0x1c4 created by github.com/juicedata/juicefs/pkg/vfs.(*fileWriter).writeChunk /MyEBook/juicefs/pkg/vfs/writer.go:270 +0x3d9 [utils.go:104] 2023/02/10 11:32:51.106556 juicefs[82643] <WARNING>: write inode:273634 error: input/output error [writer.go:208] 2023/02/10 11:32:51.106577 juicefs[82643] <ERROR>: write inode:273634 indx:0 input/output error [writer.go:212] 2023/02/10 11:32:51.150219 juicefs[82643] <ERROR>: error: Error 1406: Data too long for column 'slices' at row 1
根因分析:
func (m *dbMeta) compactChunk(inode Ino, indx uint32, force bool) { if !force { // avoid too many or duplicated compaction m.Lock() k := uint64(inode) + (uint64(indx) << 32) if len(m.compacting) > 10 || m.compacting[k] { // <-----真正的问题代码在这里:如果当前有超过10个compact goroutine,那么会有(ino, chunk_idx)组合得不到运行 m.Unlock() return } }
建议的改进:
初步改进的代码
func (m *dbMeta) compactChunk(inode Ino, indx uint32, force bool) { cptKey := compactKey{inode, indx} if !force { m.Lock() running, cptKeyExisted := m.compactMap[cptKey] // avoid duplicated compaction if cptKeyExisted && running { m.Unlock() logger.Warnf("Quit compact for [%d, %d], already running", inode, indx) return } // avoid too many compaction if len(m.compactMap) > 3 { // cptKeyExisted : we need to run it till the end // !pctKeyExisted : leave it for other goroutine to start it if !cptKeyExisted { m.addToCompactPending(cptKey) // simply like : m.compactPending = append(m.compactPending, cptKey) logger.Warnf("Add to pending queue for [%d, %d], len1: %d, len2: %d", inode, indx, len(m.compactMap), len(m.compactPending)) m.Unlock() return } } m.compactMap[cptKey] = true m.Unlock() defer func() { m.Lock() //m.compactCnt-- // compaction not done if _, ok := m.compactMap[cptKey]; ok { m.compactMap[cptKey] = false } m.Unlock() }() } logger.Warnf("Run compact for [%d, %d],compactMap len: %d", inode, indx, len(m.compactMap)) var c = chunk{Inode: inode, Indx: indx} err := m.roTxn(func(s *xorm.Session) error { _, err := s.MustCols("indx").Get(&c) return err }) if err != nil { logger.Warnf("return err1: %s", err.Error()) return } ss := readSliceBuf(c.Slices) if ss == nil { logger.Errorf("Corrupt value for inode %d chunk indx %d", inode, indx) return } skipped := skipSome(ss) ss = ss[skipped:] pos, size, slices := compactChunk(ss) // compaction complete if len(ss) < 100 || size == 0 { m.Lock() delete(m.compactMap, cptKey) go m.runNextCompaction() m.Unlock() logger.Warnf("Deleted and End compacting for [%d, %d], compactMap: %v", inode, indx, m.compactMap) return } ... }
Environment:
juicefs --version
cat /etc/os-release
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7" ORACLE_BUGZILLA_PRODUCT_VERSION=7.9 ORACLE_SUPPORT_PRODUCT="Oracle Linux" ORACLE_SUPPORT_PRODUCT_VERSION=7.9
uname -a
The text was updated successfully, but these errors were encountered:
简单点可以在有许多 slices 时更加激进地尝试触发 compaction,尽量避免其被饿死。 @zp001paul 可以试下这个 PR 么?#4309
Sorry, something went wrong.
今天重测了一下,问题解决了,fix真快啊。
SandyXSD
Successfully merging a pull request may close this issue.
发现一个问题:
如果应用持续反复修改一个大文件的部分内容,最终会导致EIO。
问题重现的办法
juicefs mount -d "mysql://juiceschema:Juicestest++--@(juice2db-m.db.sfcloud.local:3306)/juiceschema" /juicefs_test/
fio -direct=1 -ioengine=sync -runtime=3600 -group_reporting -bs=8k -filesize=1g -nrfiles=1 -thread -iodepth 16 -numjobs=1 -readwrite=randwrite -name=randwrite -directory=/juicefs_test
就是单线程反复修改一个文件的部分数据。我们的环境中约运行10分钟会看到IO Error (EIO)
具体报错如下:
根因分析:
关键问题代码:
建议的改进:
初步改进的代码
Environment:
juicefs --version
) or Hadoop Java SDK version: juicefs version 1.0.4+2023-04-06.f1c475d9-- oss: swift
-- mysql 8.0
-- juicefs client: bare metal x86, oracle-linux 7.9
cat /etc/os-release
):NAME="Oracle Linux Server"
VERSION="7.9"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.9"
PRETTY_NAME="Oracle Linux Server 7.9"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:oracle:linux:7:9:server"
HOME_URL="https://linux.oracle.com/"
BUG_REPORT_URL="https://bugzilla.oracle.com/"
ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.9
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.9
uname -a
): Linux zhangpudesktop 3.10.0-1160.el7.x86_64 update reader length after write #1 SMP Thu Oct 1 17:21:35 PDT 2020 x86_64 x86_64 x86_64 GNU/LinuxThe text was updated successfully, but these errors were encountered: