Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Memory corruption caught by hardened allocators #1585

Open
artemislena opened this issue Jan 18, 2025 · 5 comments
Open

[BUG] Memory corruption caught by hardened allocators #1585

artemislena opened this issue Jan 18, 2025 · 5 comments

Comments

@artemislena
Copy link

Describe the bug

When running with Graphene's hardened malloc or LLVM's scudo (presumably other hardened malloc implementations too), Valkey will crash at startup with a segmentation fault. Even on debug log level, the crash happens very early, before any logs are being emitted.

To reproduce

On Linux, run Valkey with the LD_PRELOAD environment variable set to the hardened allocator's path, or use a global /etc/ld.so.preload file. hardened_malloc has some instructions to help with that. If you're on NixOS specifically, you can also use the environment.memoryAllocator.provider option. On other OSes, I'm not sure; it might be hard to do on macOS and no idea how preloading works on any BSD.

Expected behavior

Valkey shouldn't segfault regardless of allocator.

Additional information

Going off when we changed configs, bug was first observed sometime between October 27 and November 19 2024. Guessing it was introduced by 8.0.0 or 8.0.1 (nixpkgs updated from 7.2.7 to 8.0.1 on November 13, figure that's what caused it back then). The issue still persists in 8.0.2. This is not a bug in the allocator, rather, this is a bug in Valkey uncovered by more hardened allocators (which less hardened allocators like jemalloc or the glibc one wouldn't catch), as evidenced by the fact it happens with two different mallocs. This bug may result in a vulnerability, but I'm not reporting it as such because I don't know how to test for that sort of thing.

@ranshid
Copy link
Member

ranshid commented Jan 19, 2025

@artemislena thank you for reporting this. However I was unable to reproduce the issue.
I tried with both Valkey compiled with JEmalloc and LIBC malloc running linked to Graphene's hardened malloc:

lsof -p `pidof valkey-server` | grep libhardened_malloc
valkey-se xxxxxx ubuntu  mem       REG   259,1    74520  800960 /home/ubuntu/hardened_malloc/out/libhardened_malloc.so

But no issues on startup and the server is behaving as expected.

I was running on AWS EC2 VM:

uname -a
Linux ip-xxx-xxx-xxx-xxx 6.8.0-1016-aws #17-Ubuntu SMP <some creation date> aarch64 aarch64 aarch64 GNU/Linux

OS details:

sb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.1 LTS
Release:        24.04
Codename:       noble

Can you maybe provide steps to reproduce?

@artemislena
Copy link
Author

uname -a
Linux […] 6.6.69-hardened1 #1-NixOS SMP PREEMPT_DYNAMIC […] x86_64 GNU/Linux
cat /etc/os-release 
ANSI_COLOR="0;38;2;126;186;228"
BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="25.05.20250116.5df4362"
CPE_NAME="cpe:/o:nixos:nixos:25.05"
DEFAULT_HOSTNAME=nixos
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
ID_LIKE=""
IMAGE_ID=""
IMAGE_VERSION=""
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 25.05 (Warbler)"
SUPPORT_URL="https://nixos.org/community.html"
VARIANT=""
VARIANT_ID=""
VENDOR_NAME=NixOS
VENDOR_URL="https://nixos.org/"
VERSION="25.05 (Warbler)"
VERSION_CODENAME=warbler
VERSION_ID="25.05"

T.: We're running w our NixOS server configs; hardening.nix n redis.nix (yeah, maybe we should rename that into valkey.nix eventually) should be the most relevant files here. ('f reproducing using these configs, remove or comment out "/etc/nix-ld.so.preload" from L37 in redis.nix so the file that'd preload the malloc won't be ignored n 'f ya need ptrace for debugging, delete or comment out L18 in hardening.nix) Besides the kernel (n distro), the main relevant thing here might be the architecture? It's entirely possible the bug only occurs on x86_64.

@artemislena
Copy link
Author

artemislena commented Jan 20, 2025

T.: Update: Ok, it's definitely not relateda the hardened kernel. Minimum for reproducing (tested in VM):

environment.memoryAllocator.provider = "graphene-hardened"; # Happens w any value other than the default ("libc", i.e. no preload file) n "jemalloc", i.e. "scudo", "mimalloc", "graphene-hardened-light" also find the issue
services.redis = {
  vmOverCommit = true;
  package = pkgs.valkey;
  servers."".enable = true;
};

+ minimal network n shell config that shouldn't affect Valkey whatsoever. No fancy sysctls or kernel params or whatever, still getting the error. So unless it's some difference between NixOS n Ubuntu in their default-ish configs, I'm betting the bug only happens on x86_64. Interestingly, further testing showed the issue'd also be found by hardened_malloc in the light config as well as when using mimalloc.

@ranshid
Copy link
Member

ranshid commented Jan 20, 2025

@artemislena, still trying to place my hands on nixos VM. I did try to reproduce on X86_64 ubuntu VM, but the issue was not reproduce.
Maybe you can help by attempting the same on an ubuntu machine?

@artemislena
Copy link
Author

T.: Huh, that's strange, then. Don't got any Ubuntu machine, sorry, n spinning up a VM w that's also not super trivial (while w NixOS we got a module that automatically generates VMs for us from Nix configs). Could maybe try later this week on our Mac (using UTM) but that'd also be aarch64 then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants