Skip to content

Gitea keeps on filling /tmp with indexes #31792

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jeroenlaylo opened this issue Aug 7, 2024 · 22 comments · Fixed by #32360
Closed

Gitea keeps on filling /tmp with indexes #31792

jeroenlaylo opened this issue Aug 7, 2024 · 22 comments · Fixed by #32360
Labels

Comments

@jeroenlaylo
Copy link

jeroenlaylo commented Aug 7, 2024

Description

We are running Gitea (1.22.1) on OpenBSD (7.5-current on amd64). For some reason, Gitea keeps writing index files to /tmp, despite having configured different paths in the Gitea config. Since this partition is 3GB, it fills up rather quickly.

What causes this behaviour? Is there a configuration option that we are missing? If it possible to have Gitea use a custom path for these temporary indexes?

Our config:

APP_NAME = Code
RUN_USER = _gitea
RUN_MODE = prod
WORK_PATH = /usr/local/share/gitea

[server]
PROTOCOL = http
DOMAIN = [domain]
ROOT_URL = https://[domain]
HTTP_ADDR = 127.0.0.1
HTTP_PORT = 3000
START_SSH_SERVER = true
BUILTIN_SSH_SERVER_USER = git
SSH_PORT = 2222
SSH_LISTEN_PORT = %(SSH_PORT)s
SSH_SERVER_CIPHERS = [email protected],aes256-ctr,[email protected]
SSH_SERVER_KEY_EXCHANGES = curve25519-sha256
SSH_SERVER_MACS = [email protected]
SSH_SERVER_HOST_KEYS = /var/gitea/.ssh/gitea.ed
;, /var/gitea/.ssh/gitea.rsa
APP_DATA_PATH = /var/gitea/data
PPROF_DATA_PATH = /var/gitea/data/tmp/pprof
LFS_JWT_SECRET = [secret]
SSH_DOMAIN = [domain]
DISABLE_SSH = false
LFS_START_SERVER = true
OFFLINE_MODE = true

[database]
DB_TYPE = postgres
LOG_SQL = false
HOST = 127.0.0.1:5432
NAME = gitea
USER = gitea
PASSWD = [secret]
SCHEMA = 
SSL_MODE = disable
CHARSET = utf8

[security]
INSTALL_LOCK = true
SECRET_KEY = 
INTERNAL_TOKEN = [secret]
PASSWORD_HASH_ALGO = pbkdf2

[ssh.minimum_key_sizes]
ED25519 = 256
ECDSA = -1
RSA = 4096
DSA = -1

[camo]

[oauth2]
ENABLE = true
JWT_SIGNING_PRIVATE_KEY_FILE = /var/gitea/jwt/private.pem
JWT_SECRET = [secret]

[log]
ROOT_PATH = /var/log/gitea
MODE = file
LEVEL = Fatal

[git]

[service]
DISABLE_REGISTRATION = false
REQUIRE_SIGNIN_VIEW = false
DISABLE_USERS_PAGE = true
DEFAULT_KEEP_EMAIL_PRIVATE = true
DEFAULT_ALLOW_CREATE_ORGANIZATION = false
DEFAULT_USER_VISIBILITY = private
DEFAULT_ORG_VISIBILITY = private
ROOT = /var/gitea/gitea-repositories
SCRIPT_TYPE = sh
DEFAULT_PRIVATE = private
PREFERRED_LICENSES = BSD-2-Clause,ISC
REGISTER_EMAIL_CONFIRM = false
ENABLE_NOTIFY_MAIL = false
ALLOW_ONLY_EXTERNAL_REGISTRATION = false
ENABLE_CAPTCHA = false
DEFAULT_ENABLE_TIMETRACKING = true
NO_REPLY_ADDRESS = noreply.[domain]

[repository.local]
LOCAL_COPY_PATH = /var/gitea/tmp/local-repo

[repository.upload]
TEMP_PATH = /var/gitea/data/tmp/uploads
FILE_MAX_SIZE = 2048

[cache]
ADAPTER = redis
HOST = redis://127.0.0.1:6379/1?pool_size=100&idle_timeout=180s

[ui]
SHOW_USER_EMAIL = false
DEFAULT_THEME = gitea-dark
THEMES = gitea-dark
ED25519 = 256
ECDSA = 256
RSA = 2047
DSA = -1

[indexer]
ISSUE_INDEXER_TYPE = db
ISSUE_INDEXER_PATH = /var/gitea/indexers/issues.bleve
STARTUP_TIMEOUT = 30s
REPO_INDEXER_ENABLED = false
REPO_INDEXER_PATH = /var/gitea/indexers/repos.bleve
;ISSUE_INDEXER_QUEUE_DIR = /var/gitea/indexers/issues.queue
;REPO_INDEXER_PATH = /var/gitea/indexers/repos.bleve

[admin]
DISABLE_REGULAR_ORG_CREATION = true
ENABLE_OPENID_SIGNIN = false
ENABLE_OPENID_SIGNUP = false

[queue]
TYPE = redis
CONN_STR = redis://127.0.0.1:6379/0
MAX_WORKERS = 4
DATADIR = /var/gitea/queue

[storage]
STORAGE_TYPE = local

[mailer]
ENABLED = true
PROTOCOL = smtp+starttls
SMTP_ADDR = [smtp-server]
SMTP_PORT = 587
FROM = [sender]
SEND_AS_PLAIN_TEXT = true
USER = [user]
PASSWD = [secret]

[session]
PROVIDER = db
COOKIE_SECURE = true

[picture]
AVATAR_UPLOAD_PATH = /var/gitea/data/avatars
REPOSITORY_AVATAR_UPLOAD_PATH = /var/gitea/data/repo-avatars
DISABLE_GRAVATAR = true
ENABLE_FEDERATED_AVATAR = false

[attachment]
MAX_SIZE = 2048
PATH = /var/gitea/data/attachments

[time]
FORMAT = RFC1123Z

[other]
SHOW_FOOTER_VERSION = false
SHOW_FOOTER_TEMPLATE_LOAD_TIME = false
CHUNKED_UPLOAD_PATH = /var/gitea/data/tmp/package-upload

[repository]
ROOT = /var/code/repos

[lfs]
PATH = /var/code/lfs

[repository.pull-request]
DEFAULT_MERGE_STYLE = merge

[repository.signing]
DEFAULT_TRUST_MODEL = committer

[git.timeout]
DEFAULT = 720
MIGRATE = 30000
MIRROR = 72000
CLONE = 30000
PULL = 30000
GC = 60

Gitea Version

1.22.1

Can you reproduce the bug on the Gitea demo site?

Yes

Log Gist

No response

Screenshots

No response

Git Version

No response

Operating System

No response

How are you running Gitea?

We are using the OpenBSD package

Database

PostgreSQL

@lunny
Copy link
Member

lunny commented Aug 8, 2024

The tmp directory will be used in many operations. There is already a env variable $TMPDIR to be used. You can set it when running Gitea.

@wULLSnpAXbWZGYDYyhWTKKspEQoaYxXyhoisqHf
Copy link
Contributor

I personally don't recall this being the case in the past though (I have also been facing this issues on 1.22.0+dev-686-gee242a08e9), do we know when and what changed this?

@jeroenlaylo
Copy link
Author

jeroenlaylo commented Aug 8, 2024

The tmp directory will be used in many operations. Maybe we need a TMP_DIR configuration rather than use system's. I wouldn't say this is a bug but a proposal or enhancement.

IIRC, this hasn't always been the case. I have been contacted by some users that the Gitea instance was sluggish, upon investigation it turned out that this was due to a full /tmp partition. A rough estimation: it is likely a change introduced in 1.20.x, 1.21.xor1.22.x`.

The TMP_DIR setting/variable would be an ideal addition, yeah.

I personally don't recall this being the case in the past though (I have also been facing this issues on 1.22.0+dev-686-gee242a08e9), do we know when and what changed this?

Thank you kindly for your feedback. This takes some of the doubts away that I was having (whether it wasn't just PEBCAK). Do you experience this on OpenBSD - or on a different OS?

@wULLSnpAXbWZGYDYyhWTKKspEQoaYxXyhoisqHf
Copy link
Contributor

IIRC, this hasn't always been the case. I have been contacted by some users that the Gitea instance was sluggish, upon investigation it turned out that this was due to a full /tmp partition. A rough estimation: it is likely a change introduced in 1.20.x, 1.21.xor1.22.x`.

yeah, I think I would have noticed, been using Gitea for some time now, too.

The TMP_DIR setting/variable would be an ideal addition, yeah.

I concur, making this configurable would be preferable.

I can imagine a decision to speed up the indexer was made at some point, which moved its files to /tmp, since it's mostly ramfs these days.

Thank you kindly for your feedback. This takes some of the doubts away that I was having (whether it wasn't just PEBCAK). Do you experience this on OpenBSD - or on a different OS?

and I am equally glad you opened this issue, because I was having these doubts myself, since I don't get to spend as much time tuning my instance as I used to.

I haven't mentioned this, sorry; I have been running Gitea on Arch (a Linux distro), but I'd expect this behaviour to be largely similar among OSs at least in that it uses system's temp folder.

@yp05327
Copy link
Contributor

yp05327 commented Aug 22, 2024

TMP_DIR is necessary I think.
I have met a similar problem when install Python packages by pip.
By default, pip also using /tmp as temp directory, if it is too small, you may get error during the installation.
But pip has some env options like TMP_DIR, so it is easy to fix this issue: just use another directory.

@jeroenlaylo
Copy link
Author

jeroenlaylo commented Aug 27, 2024

From diving somewhat further, it seems the problem is that cancel() is never called on erroring out - causing leftover temporary files to not be cleaned. Here is a quick patch:

Index: modules/git/repo_index.go
--- modules/git/repo_index.go.orig
+++ modules/git/repo_index.go
@@ -51,7 +51,7 @@ func (repo *Repository) readTreeToIndex(id ObjectID, i
 
 // ReadTreeToTemporaryIndex reads a treeish to a temporary index file
 func (repo *Repository) ReadTreeToTemporaryIndex(treeish string) (filename, tmpDir string, cancel context.CancelFunc, err error) {
-	tmpDir, err = os.MkdirTemp("", "index")
+	tmpDir, err = os.MkdirTemp("${LOCALSTATEDIR}/gitea/tmp", "index")
 	if err != nil {
 		return filename, tmpDir, cancel, err
 	}
@@ -63,9 +63,16 @@ func (repo *Repository) ReadTreeToTemporaryIndex(treei
 			log.Error("failed to remove tmp index file: %v", err)
 		}
 	}
+
+	// Defer the cancel function to ensure cleanup in case of an error
+	defer func() {
+		if err != nil {
+			cancel()
+		}
+	}()
+
 	err = repo.ReadTreeToIndex(treeish, filename)
 	if err != nil {
-		defer cancel()
 		return "", "", func() {}, err
 	}
 	return filename, tmpDir, cancel, err

This was only tested on OpenBSD and fixed the behaviour we were seeing. Setting the tmpDir to a custom path is something that fits our scenario better.

@lunny
Copy link
Member

lunny commented Sep 21, 2024

From diving somewhat further, it seems the problem is that cancel() is never called on erroring out - causing leftover temporary files to not be cleaned. Here is a quick patch:

Index: modules/git/repo_index.go
--- modules/git/repo_index.go.orig
+++ modules/git/repo_index.go
@@ -51,7 +51,7 @@ func (repo *Repository) readTreeToIndex(id ObjectID, i
 
 // ReadTreeToTemporaryIndex reads a treeish to a temporary index file
 func (repo *Repository) ReadTreeToTemporaryIndex(treeish string) (filename, tmpDir string, cancel context.CancelFunc, err error) {
-	tmpDir, err = os.MkdirTemp("", "index")
+	tmpDir, err = os.MkdirTemp("${LOCALSTATEDIR}/gitea/tmp", "index")
 	if err != nil {
 		return filename, tmpDir, cancel, err
 	}
@@ -63,9 +63,16 @@ func (repo *Repository) ReadTreeToTemporaryIndex(treei
 			log.Error("failed to remove tmp index file: %v", err)
 		}
 	}
+
+	// Defer the cancel function to ensure cleanup in case of an error
+	defer func() {
+		if err != nil {
+			cancel()
+		}
+	}()
+
 	err = repo.ReadTreeToIndex(treeish, filename)
 	if err != nil {
-		defer cancel()
 		return "", "", func() {}, err
 	}
 	return filename, tmpDir, cancel, err

This was only tested on OpenBSD and fixed the behaviour we were seeing. Setting the tmpDir to a custom path is something that fits our scenario better.

I don't think the logic is right. The cancel should be invoked from outside if the err is nil.
It seems the logic is the same as before. It's weird the patch fixes it.

@pauldavisthefirst
Copy link

We have this problem here (git.ardour.org). And it's not a "small" problem. Two days ago our server stopped functioning because nearly 10GB of the root partition was filled by gitea's "index[0-9]+" files.

@lunny
Copy link
Member

lunny commented Oct 27, 2024

We have this problem here (git.ardour.org). And it's not a "small" problem. Two days ago our server stopped functioning because nearly 10GB of the root partition was filled by gitea's "index[0-9]+" files.

Do you think an optional temp directory could resolve your problem? If that, you can set a special temp dir with env variable $TMPDIR.

@pauldavisthefirst
Copy link

We have a cron job that runs every night to remove these files, so for the most part, the problem is under control.

Periodically, however, and for reasons that are not obvious, gitea runs amok and generates far more of them than "normal"

Frankly, using 10GB of space on the server no matter it lives is not really acceptable.

@lunny
Copy link
Member

lunny commented Oct 27, 2024

We have a cron job that runs every night to remove these files, so for the most part, the problem is under control.

Periodically, however, and for reasons that are not obvious, gitea runs amok and generates far more of them than "normal"

Frankly, using 10GB of space on the server no matter it lives is not really acceptable.

I think so. We are investigating the problem. Something may not be cleaned correctly.

@pauldavisthefirst
Copy link

So for example, since my last comment, when I ran the "cleaner" by hand, gitea has generated another 11918 files.

@lunny
Copy link
Member

lunny commented Oct 27, 2024

So for example, since my last comment, when I ran the "cleaner" by hand, gitea has generated another 11918 files.

What's the temp files/directories looks like? All are index[0-9]+?

@pauldavisthefirst
Copy link

Yes, every one of them.

And since my last remark, 19619 files ... :)

@lunny
Copy link
Member

lunny commented Oct 28, 2024

Yes, every one of them.

And since my last remark, 19619 files ... :)

It's wired that those index directories should be cleaned up after use except Gitea exited abnormally.

@pauldavisthefirst
Copy link

An additional 23k files since we last talked :)

@lunny
Copy link
Member

lunny commented Oct 28, 2024

Do you have any error logs?

@pauldavisthefirst
Copy link

Not sure where to look, and am headed to bed now. If you give me a pointer, I have a suspicion this situation will be continuing when I get up ...

@pauldavisthefirst
Copy link

As an additional note, we've disabled indexing in the past to try to prevent his, and it is not working at least in the sense that these files are being produced anyway.

@lunny
Copy link
Member

lunny commented Oct 28, 2024

How did you disable indexing? And I think you mean these directories but not files? From the code, I think those are directories. And every directory has a .tmp-index file?

@pauldavisthefirst
Copy link

Yes, sorry, directories. And yes, they all have .tmp-index.

Disabling indexing was done by putting this in app.ini:

[indexer]
REPO_INDEXER_ENABLED = false

@lunny
Copy link
Member

lunny commented Oct 28, 2024

They are two different features. I will send a PR like what #31792 (comment) posted, but I really don't know how it works.

And before that, could you try to upgrade to v1.22.3?

lafriks pushed a commit that referenced this issue Oct 29, 2024
Try to fix #31792 

Credit to @jeroenlaylo
Copied from
#31792 (comment)

---------

Co-authored-by: wxiaoguang <[email protected]>
DennisRasey pushed a commit to DennisRasey/forgejo that referenced this issue Nov 6, 2024
Try to fix #31792

Credit to @jeroenlaylo
Copied from
go-gitea/gitea#31792 (comment)

---------

Co-authored-by: wxiaoguang <[email protected]>
(cherry picked from commit feca8802b85dd75090c533ebdb92835d3d529f17)
a1batross pushed a commit to a1batross/gitea that referenced this issue Nov 13, 2024
Try to fix go-gitea#31792 

Credit to @jeroenlaylo
Copied from
go-gitea#31792 (comment)

---------

Co-authored-by: wxiaoguang <[email protected]>
lunny added a commit to lunny/gitea that referenced this issue Nov 21, 2024
Try to fix go-gitea#31792 

Credit to @jeroenlaylo
Copied from
go-gitea#31792 (comment)

---------

Co-authored-by: wxiaoguang <[email protected]>
lunny added a commit that referenced this issue Nov 22, 2024
Backport #32360 

Try to fix #31792 

Credit to @jeroenlaylo
Copied from
#31792 (comment)

Co-authored-by: wxiaoguang <[email protected]>
@go-gitea go-gitea locked as resolved and limited conversation to collaborators Jan 27, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants