Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果应用持续反复修改一个大文件的部分内容,最终会导致EIO #4308

Closed
zp001paul opened this issue Jan 4, 2024 · 2 comments · Fixed by #4309
Closed
Assignees
Labels
kind/bug Something isn't working missed missed bug

Comments

@zp001paul
Copy link

发现一个问题:
如果应用持续反复修改一个大文件的部分内容,最终会导致EIO。

问题重现的办法
juicefs mount -d "mysql://juiceschema:Juicestest++--@(juice2db-m.db.sfcloud.local:3306)/juiceschema" /juicefs_test/
fio -direct=1 -ioengine=sync -runtime=3600 -group_reporting -bs=8k -filesize=1g -nrfiles=1 -thread -iodepth 16 -numjobs=1 -readwrite=randwrite -name=randwrite -directory=/juicefs_test
就是单线程反复修改一个文件的部分数据。我们的环境中约运行10分钟会看到IO Error (EIO)
具体报错如下:

2023/02/10 11:32:51.014410 juicefs[82643] <WARNING>: write inode:273634 error: input/output error [writer.go:208]
2023/02/10 11:32:51.014443 juicefs[82643] <ERROR>: write inode:273634 indx:0  input/output error [writer.go:212]
2023/02/10 11:32:51.058436 juicefs[82643] <ERROR>: error: Error 1406: Data too long for column 'slices' at row 1  //这个报错就是下面这个stack报出来的,报错的inode就是上面的 273634
goroutine 181792171 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/juicedata/juicefs/pkg/meta.errno({0x31e12c0, 0xc04c3287e0})
        /MyEBook/juicefs/pkg/meta/utils.go:104 +0xc5
github.com/juicedata/juicefs/pkg/meta.(*dbMeta).Write(0xc000acc0a0, {0xc09bfb1f78?, 0x3?}, 0x42ce2, 0x0, 0xb000, {0x1706873, 0x1000, 0x0, 0x1000})
        /MyEBook/juicefs/pkg/meta/sql.go:2037 +0x365
github.com/juicedata/juicefs/pkg/vfs.(*chunkWriter).commitThread(0xc0866521b0)
        /MyEBook/juicefs/pkg/vfs/writer.go:201 +0x1c4
created by github.com/juicedata/juicefs/pkg/vfs.(*fileWriter).writeChunk
        /MyEBook/juicefs/pkg/vfs/writer.go:270 +0x3d9 [utils.go:104]
2023/02/10 11:32:51.058499 juicefs[82643] <WARNING>: write inode:273634 error: input/output error [writer.go:208]
2023/02/10 11:32:51.058520 juicefs[82643] <ERROR>: write inode:273634 indx:0  input/output error [writer.go:212]
2023/02/10 11:32:51.106496 juicefs[82643] <ERROR>: error: Error 1406: Data too long for column 'slices' at row 1
goroutine 181792171 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/juicedata/juicefs/pkg/meta.errno({0x31e12c0, 0xc0646189f0})
        /MyEBook/juicefs/pkg/meta/utils.go:104 +0xc5
github.com/juicedata/juicefs/pkg/meta.(*dbMeta).Write(0xc000acc0a0, {0xc09bfb1f78?, 0x3?}, 0x42ce2, 0x0, 0xfe000, {0x1706879, 0x1000, 0x0, 0x1000})
        /MyEBook/juicefs/pkg/meta/sql.go:2037 +0x365
github.com/juicedata/juicefs/pkg/vfs.(*chunkWriter).commitThread(0xc0866521b0)
        /MyEBook/juicefs/pkg/vfs/writer.go:201 +0x1c4
created by github.com/juicedata/juicefs/pkg/vfs.(*fileWriter).writeChunk
        /MyEBook/juicefs/pkg/vfs/writer.go:270 +0x3d9 [utils.go:104]
2023/02/10 11:32:51.106556 juicefs[82643] <WARNING>: write inode:273634 error: input/output error [writer.go:208]
2023/02/10 11:32:51.106577 juicefs[82643] <ERROR>: write inode:273634 indx:0  input/output error [writer.go:212]
2023/02/10 11:32:51.150219 juicefs[82643] <ERROR>: error: Error 1406: Data too long for column 'slices' at row 1

根因分析:

  1. 根本原因不在compact chunk的速度跟不上应用写入速度。
  2. 根本原因是:如果多个chunk需要做compact,那么总有部分chunk得不到compact ,导致最终length(slices)溢出
    关键问题代码:
func (m *dbMeta) compactChunk(inode Ino, indx uint32, force bool) {
	if !force {
		// avoid too many or duplicated compaction
		m.Lock()
		k := uint64(inode) + (uint64(indx) << 32)
		if len(m.compacting) > 10 || m.compacting[k] { // <-----真正的问题代码在这里:如果当前有超过10个compact goroutine,那么会有(ino, chunk_idx)组合得不到运行
			m.Unlock()
			return
		}
}

建议的改进:

  1. len(m.compacting) > 10 时不要丢(ino,c_indx),而是存到一个slice去
  2. 在压缩完成的时候(即len(ss) < 2),从slice取一个(ino,c_indx)出来继续运行
  3. 降低压缩完成标准:len(ss) < 2变成len(ss) < 100

初步改进的代码

func (m *dbMeta) compactChunk(inode Ino, indx uint32, force bool) {
	cptKey := compactKey{inode, indx}
	if !force {
		m.Lock()
		running, cptKeyExisted := m.compactMap[cptKey]
		// avoid duplicated compaction
		if cptKeyExisted && running {
			m.Unlock()
			logger.Warnf("Quit compact for [%d, %d], already running", inode, indx)
			return
		}
		// avoid too many compaction
		if len(m.compactMap) > 3 {
			// cptKeyExisted  : we need to run it till the end
			// !pctKeyExisted : leave it for other goroutine to start it
			if !cptKeyExisted {
				m.addToCompactPending(cptKey) // simply like : m.compactPending = append(m.compactPending, cptKey)
				logger.Warnf("Add to pending queue for [%d, %d], len1: %d, len2: %d",
					inode, indx, len(m.compactMap), len(m.compactPending))
				m.Unlock()
				return
			}
		}

		m.compactMap[cptKey] = true
		m.Unlock()

		defer func() {
			m.Lock()
			//m.compactCnt--
			// compaction not done
			if _, ok := m.compactMap[cptKey]; ok {
				m.compactMap[cptKey] = false
			}
			m.Unlock()
		}()
	}

	logger.Warnf("Run compact for [%d, %d],compactMap len: %d", inode, indx, len(m.compactMap))

	var c = chunk{Inode: inode, Indx: indx}
	err := m.roTxn(func(s *xorm.Session) error {
		_, err := s.MustCols("indx").Get(&c)
		return err
	})
	if err != nil {
		logger.Warnf("return err1: %s", err.Error())
		return
	}

	ss := readSliceBuf(c.Slices)
	if ss == nil {
		logger.Errorf("Corrupt value for inode %d chunk indx %d", inode, indx)
		return
	}
	skipped := skipSome(ss)
	ss = ss[skipped:]
	pos, size, slices := compactChunk(ss)

       // compaction complete
	if len(ss) < 100 || size == 0 {
		m.Lock()
		delete(m.compactMap, cptKey)
		go m.runNextCompaction()
		m.Unlock()
		logger.Warnf("Deleted and End compacting for [%d, %d], compactMap: %v", inode, indx, m.compactMap)
		return
	}
...
}

Environment:

  • JuiceFS version (use juicefs --version) or Hadoop Java SDK version: juicefs version 1.0.4+2023-04-06.f1c475d9
  • Cloud provider or hardware configuration running JuiceFS:
    -- oss: swift
    -- mysql 8.0
    -- juicefs client: bare metal x86, oracle-linux 7.9
  • OS (e.g cat /etc/os-release):
    NAME="Oracle Linux Server"
    VERSION="7.9"
    ID="ol"
    ID_LIKE="fedora"
    VARIANT="Server"
    VARIANT_ID="server"
    VERSION_ID="7.9"
    PRETTY_NAME="Oracle Linux Server 7.9"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:oracle:linux:7:9:server"
    HOME_URL="https://linux.oracle.com/"
    BUG_REPORT_URL="https://bugzilla.oracle.com/"

ORACLE_BUGZILLA_PRODUCT="Oracle Linux 7"
ORACLE_BUGZILLA_PRODUCT_VERSION=7.9
ORACLE_SUPPORT_PRODUCT="Oracle Linux"
ORACLE_SUPPORT_PRODUCT_VERSION=7.9

  • Kernel (e.g. uname -a): Linux zhangpudesktop 3.10.0-1160.el7.x86_64 update reader length after write #1 SMP Thu Oct 1 17:21:35 PDT 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Object storage (cloud provider and region, or self maintained): swift
  • Metadata engine info (version, cloud provider managed or self maintained): mysql 8.0
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage): GiB, LAN
  • Others:
@SandyXSD
Copy link
Contributor

SandyXSD commented Jan 4, 2024

简单点可以在有许多 slices 时更加激进地尝试触发 compaction,尽量避免其被饿死。
@zp001paul 可以试下这个 PR 么?#4309

@zp001paul
Copy link
Author

今天重测了一下,问题解决了,fix真快啊。

@zhoucheng361 zhoucheng361 added the missed missed bug label Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working missed missed bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants