Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valgrind issue in lz4, while using -a aggregation #431

Closed
phaag opened this issue Mar 5, 2023 · 9 comments
Closed

valgrind issue in lz4, while using -a aggregation #431

phaag opened this issue Mar 5, 2023 · 9 comments
Assignees
Labels
bug Something isn't working

Comments

@phaag
Copy link
Owner

phaag commented Mar 5, 2023

          Well, maybe this output from valgrind could hint you on something (I am not yet familiar with the code):
==355767== Use of uninitialised value of size 8
==355767==    at 0x4854F7B: LZ4_putPositionOnHash (lz4.c:505)
==355767==    by 0x4854F7B: LZ4_putPosition (lz4.c:513)
==355767==    by 0x4854F7B: LZ4_compress_generic (lz4.c:691)
==355767==    by 0x4854F7B: LZ4_compress_fast_extState (lz4.c:746)
==355767==    by 0x48562E1: LZ4_compress_fast (lz4.c:765)
==355767==    by 0x48562E1: LZ4_compress_default (lz4.c:776)
==355767==    by 0x4865D16: Compress_Block_LZ4 (nffile.c:251)
==355767==    by 0x4865D16: nfwrite (nffile.c:1243)
==355767==    by 0x4864EDE: nfwriter (nffile.c:1291)
==355767==    by 0x48E0EA6: start_thread (pthread_create.c:477)
==355767==    by 0x4AF1A2E: clone (clone.S:95)

There are many similar issues - and this only happens when I provide -a flag.

Line numbers may be a bit off though as I am experimenting with my zstd support patch (it does not affect aggregation behavior anyway), but at least you know where to look :)

Originally posted by @aldem in #427 (comment)

@aldem
Copy link

aldem commented Mar 5, 2023

Update: similar behavior is observed with LZO, so this really has nothing to do with compression code itself:

==362798== Conditional jump or move depends on uninitialised value(s)
==362798==    at 0x4853444: lzo1x_1_compress_core (minilzo.c:5153)
==362798==    by 0x4852B7A: lzo1x_1_compress (minilzo.c:5234)
==362798==    by 0x4865C51: Compress_Block_LZO (nffile.c:201)
==362798==    by 0x4865C51: nfwrite (nffile.c:1239)
==362798==    by 0x4864EDE: nfwriter (nffile.c:1291)
==362798==    by 0x48E0EA6: start_thread (pthread_create.c:477)
==362798==    by 0x4AF1A2E: clone (clone.S:95)

and BZ2:

==362946== Conditional jump or move depends on uninitialised value(s)
==362946==    at 0x491126D: ??? (in /usr/lib/x86_64-linux-gnu/libbz2.so.1.0.4)
==362946==    by 0x491186A: BZ2_bzCompress (in /usr/lib/x86_64-linux-gnu/libbz2.so.1.0.4)
==362946==    by 0x4865CCC: Compress_Block_BZ2 (nffile.c:304)
==362946==    by 0x4865CCC: nfwrite (nffile.c:1247)
==362946==    by 0x4864EDE: nfwriter (nffile.c:1291)
==362946==    by 0x48E0EA6: start_thread (pthread_create.c:477)
==362946==    by 0x4AF1A2E: clone (clone.S:95)

@phaag
Copy link
Owner Author

phaag commented Mar 5, 2023

Can you please specify the exact command line including valgrind.

@aldem
Copy link

aldem commented Mar 5, 2023

Something like this:

valgrind --tool=memcheck -- /opt/nfdump/bin/nfdump -r nfcapd.20230304233310 -w nfcapd.20230304233310.agg -a -z

@phaag
Copy link
Owner Author

phaag commented Mar 5, 2023

Thanks! - valgrind could at least be a bit more precise ... the output is a bit of limited value.

@aldem
Copy link

aldem commented Mar 5, 2023

Better this to track origin:

valgrind --tool=memcheck --track-origins=yes ...

Just found a reference:

==364794==  Uninitialised value was created by a stack allocation
==364794==    at 0x40EF40: AddFlowCache (nflowcache.c:968)

and this:

==364794== Syscall param write(buf) points to uninitialised byte(s)
==364794==    at 0x48EAFEF: __libc_write (write.c:26)
==364794==    by 0x48EAFEF: write (write.c:24)
==364794==    by 0x4865E17: nfwrite (nffile.c:1266)
==364794==    by 0x4864EDE: nfwriter (nffile.c:1291)
==364794==    by 0x48E0EA6: start_thread (pthread_create.c:477)
==364794==    by 0x4AF1A2E: clone (clone.S:95)
==364794==  Address 0x13afe044 is 4 bytes inside a block of size 10,485,760 alloc'd
==364794==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==364794==    by 0x4864DA2: NewFile (nffile.c:567)
==364794==    by 0x48649D3: OpenNewFile (nffile.c:782)
==364794==    by 0x4044CE: main (nfdump.c:1106)
==364794==  Uninitialised value was created by a stack allocation
==364794==    at 0x40EF40: AddFlowCache (nflowcache.c:968)

Most likely there is some corruption introduced in either AddFlowCache or in arguments to AddFlowCache.

@aldem
Copy link

aldem commented Mar 5, 2023

Looks like there is really some out-of-bounds memory access, even without compression involved (only -a):

==378232== 28 errors in context 1 of 1:
==378232== Syscall param write(buf) points to uninitialised byte(s)
==378232==    at 0x48EEFEF: __libc_write (write.c:26)
==378232==    by 0x48EEFEF: write (write.c:24)
==378232==    by 0x4869A97: nfwrite (nffile.c:1265)
==378232==    by 0x4868B6E: nfwriter (nffile.c:1290)
==378232==    by 0x48E4EA6: start_thread (pthread_create.c:477)
==378232==    by 0x4AF5A2E: clone (clone.S:95)
==378232==  Address 0xf6fb0d2 is 146 bytes inside a block of size 10,485,760 alloc'd
==378232==    at 0x483877F: malloc (vg_replace_malloc.c:307)
==378232==    by 0x486897E: NewFile (nffile.c:567)
==378232==    by 0x4868663: OpenNewFile (nffile.c:782)
==378232==    by 0x4044CE: main (nfdump.c:1106)

I have to learn the code first to find it out, without understanding how it does things (supposed to do at least) it is hard - it could be anywhere but for sure somewhere in code related to aggregation.

@phaag
Copy link
Owner Author

phaag commented Mar 5, 2023

I will try to dive into that next week. I need a bit more time.

@phaag phaag added the bug Something isn't working label Mar 6, 2023
@phaag phaag self-assigned this Mar 6, 2023
@phaag
Copy link
Owner Author

phaag commented Mar 7, 2023

The patch fixes the uninitialised memory reference.

@phaag phaag closed this as completed Mar 7, 2023
@phaag
Copy link
Owner Author

phaag commented Mar 7, 2023

Just to add a comment oh this: It was not an out-of-bounds memory access. The outFlags variable, which is really seldom used, was not properly initialised when doing aggregation. Valgrind barks at the first reading event of that memory location. This access is no earlier than storing the data in the file, or before that - compressing the data. However, you can force this also by fmt printing the flows instead storing to disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants