Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

DelayImagesFilterTest segfault #1843

Open
acachy opened this issue Dec 2, 2018 · 16 comments
Open

DelayImagesFilterTest segfault #1843

acachy opened this issue Dec 2, 2018 · 16 comments

Comments

@acachy
Copy link

acachy commented Dec 2, 2018

Debian 8 x64, gcc 8.2

I try to build PS from source.

./out/Release/pagespeed_automatic_test finishes with:
[ RUN ] DelayImagesFilterTest.DelayImagesAcrossDifferentFlushWindow
Segmentation fault

  • I tried root and normal user
  • When I use BUILD_TYPE=Debug, all tests is OK (everything other is 100% same)
@oschaaf
Copy link
Member

oschaaf commented Dec 2, 2018

That is remarkable, I have never seen that test fail before. Test source:

TEST_F(DelayImagesFilterTest, DelayImagesAcrossDifferentFlushWindow) {

@acachy could you post a backtrace from gdb?

@keesspoelstra just a hunch but could this be a Linux occurrence of the issue with adopting orphaned iterators in the html parsing flow?

@acachy
Copy link
Author

acachy commented Dec 2, 2018

[ RUN ] DelayImagesFilterTest.DelayImagesAcrossDifferentFlushWindow
[New Thread 0x7ffff63e3700 (LWP 5571)]
[New Thread 0x7ffff7be6700 (LWP 5572)]

Thread 2187 "rewrite-0" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff63e3700 (LWP 5571)]
0x0000555555fd3167 in net_instaweb::MockLogRecord::LogImageRewriteActivity(char const*, std::string const&, net_instaweb::RewriterApplication_Status, bool, bool, bool, int, bool, bool, net_instaweb::ImageType, int) ()
(gdb) bt
#0 0x0000555555fd3167 in net_instaweb::MockLogRecord::LogImageRewriteActivity(char const*, std::string const&, net_instaweb::RewriterApplication_Status, bool, bool, bool, int, bool, bool, net_instaweb::ImageType, int) ()
#1 0x000055555656d880 in net_instaweb::ImageRewriteFilter::FinishRewriteImageUrl(net_instaweb::CachedResult const*, net_instaweb::ResourceContext const*, net_instaweb::HtmlElement*, net_instaweb::HtmlElement::Attribute*, int, net_instaweb::HtmlResourceSlot*, net_instaweb::InlineResult*) ()
#2 0x000055555656e085 in net_instaweb::ImageRewriteFilter::Context::Render() ()
#3 0x0000555556587e9e in net_instaweb::RewriteContext::Propagate(net_instaweb::RenderOp) ()
#4 0x0000555556595959 in net_instaweb::RewriteDriver::RewriteComplete(net_instaweb::RewriteContext*, net_instaweb::RenderOp) ()
#5 0x000055555658d297 in net_instaweb::RewriteContext::RewriteDoneImpl(net_instaweb::RewriteResult, int) ()
#6 0x00005555566fb717 in net_instaweb::Function::CallRun() ()
#7 0x0000555556711e30 in net_instaweb::QueuedWorkerPool::Run(net_instaweb::QueuedWorkerPool::Sequence*, net_instaweb::QueuedWorker*) ()
#8 0x00005555566fb717 in net_instaweb::Function::CallRun() ()
#9 0x000055555671540d in net_instaweb::Worker::WorkThread::Run() ()
#10 0x00005555564c6633 in net_instaweb::PthreadThreadImpl::InvokeRun(void*) ()
#11 0x00007ffff7e1dfa3 in start_thread (arg=) at pthread_create.c:486
#12 0x00007ffff7d4e84f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

@oschaaf
Copy link
Member

oschaaf commented Dec 2, 2018

@acachy is this happening on the master branch?

@acachy
Copy link
Author

acachy commented Dec 2, 2018

Yes

git rev-parse HEAD
20fbc55

@oschaaf
Copy link
Member

oschaaf commented Dec 2, 2018

Thanks. So I coincidentally ran the release process earlier today, which includes all tests on ubuntu 14.x and centos 6.x. Those passed fine. So I'm thinking that the OS/arch you used (Debian 8 x64, gcc 8.2) is a prerequisite to reproducing the problem.

@acachy
Copy link
Author

acachy commented Dec 2, 2018

A lot of fresh packages in use.

Any assumptions - what libs might be related and why Debug-version works OK?

@oschaaf
Copy link
Member

oschaaf commented Dec 2, 2018

@keesspoelstra would be best to assess if this line of thought is worth pursuing more;

the only thing I have right now is a hunch, based on vague recollections of a fairly large debugging effort on release builds a couple of years ago for the Microsoft windows port. I believe the problem didn't show in debug versions there either, and some of the stack frames involved do hit code paths that seem relevant, I think.

(This may very well be a red herring: I have never seen the problem on any *nix platform. But maybe the glibc/gcc version you used teased out the issue)

@acachy
Copy link
Author

acachy commented Dec 2, 2018

I've rebuilt it with gcc 4.9
No more error with DelayImagesAcrossDifferentFlushWindow

However, I've got multiple SIG36 and SIG40 cases like

[ RUN      ] ExpensiveOperationRpcHandlerTest.ImmediateDeny
[New Thread 0x7ffff53e1700 (LWP 5389)]
[New Thread 0x7ffff73e5700 (LWP 5390)]
[New Thread 0x7ffff7be6700 (LWP 5391)]

Thread 3893 "grpc_test_serve" received signal SIG40, Real-time event 40.
[Switching to Thread 0x7ffff7be6700 (LWP 5391)]
0x00007ffff7d4e9ae in __GI_epoll_pwait (epfd=7, events=0x7ffff7be55e0, maxevents=100, timeout=-1, set=0x7ffff7be6628) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
42      ../sysdeps/unix/sysv/linux/epoll_pwait.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7d4e9ae in __GI_epoll_pwait (epfd=7, events=0x7ffff7be55e0, maxevents=100, timeout=-1, set=0x7ffff7be6628) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x00000000014c939a in pollset_work ()
#2  0x0000000001484e1a in cq_next ()
#3  0x000000000148539b in grpc_completion_queue_next ()
#4  0x0000000001460978 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
#5  0x00000000014436c6 in net_instaweb::CentralControllerRpcServer::MainLoop(grpc::CompletionQueue*) ()
#6  0x00000000010b2f03 in net_instaweb::PthreadThreadImpl::InvokeRun(void*) ()
#7  0x00007ffff7e1dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#8  0x00007ffff7d4e84f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95




[ RUN      ] ExpensiveOperationRpcHandlerTest.ImmediateDeny
[New Thread 0x7ffff53e1700 (LWP 9296)]
[New Thread 0x7ffff73e5700 (LWP 9297)]
[New Thread 0x7ffff7be6700 (LWP 9298)]

Thread 3893 "grpc_test_serve" received signal SIG40, Real-time event 40.
[Switching to Thread 0x7ffff7be6700 (LWP 9298)]
0x00007ffff7d4e9ae in __GI_epoll_pwait (epfd=7, events=0x7ffff7be55e0, maxevents=100, timeout=-1, set=0x7ffff7be6628) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
42      ../sysdeps/unix/sysv/linux/epoll_pwait.c: No such file or directory.
(gdb) bt
#0  0x00007ffff7d4e9ae in __GI_epoll_pwait (epfd=7, events=0x7ffff7be55e0, maxevents=100, timeout=-1, set=0x7ffff7be6628) at ../sysdeps/unix/sysv/linux/epoll_pwait.c:42
#1  0x00000000014c939a in pollset_work ()
#2  0x0000000001484e1a in cq_next ()
#3  0x000000000148539b in grpc_completion_queue_next ()
#4  0x0000000001460978 in grpc::CompletionQueue::AsyncNextInternal(void**, bool*, gpr_timespec) ()
#5  0x00000000014436c6 in net_instaweb::CentralControllerRpcServer::MainLoop(grpc::CompletionQueue*) ()
#6  0x00000000010b2f03 in net_instaweb::PthreadThreadImpl::InvokeRun(void*) ()
#7  0x00007ffff7e1dfa3 in start_thread (arg=<optimized out>) at pthread_create.c:486

@oschaaf
Copy link
Member

oschaaf commented Dec 2, 2018

I think grpc uses these signals, they probably are not a concern.
If you want gdb to ignore them:

handle SIG36 noprint nostop
handle SIG40 noprint nostop

@acachy
Copy link
Author

acachy commented Dec 2, 2018

So, looks like the problem is related exclusively to gcc 8.2

@oschaaf
Copy link
Member

oschaaf commented Dec 2, 2018

Thanks for helping narrowing that down. I will raise a thread on [email protected] to discuss tomorrow.
(You can subscribe over at https://lists.apache.org/[email protected] if you want to follow that)

@keesspoelstra
Copy link
Contributor

@acachy @oschaaf Compiled with gtest/gmock 1.8.0, and it runs fine, so it seems to be an issue with the gtest/gmock version.
Will push a PR later this week.
For 1.8.0 we need to move to the abseil googletest project instead of the separate gtest/gmock submodules.

@jmarantz
Copy link
Contributor

jmarantz commented Dec 4, 2018

Have you tried running with valgrind? valgrind works well on MPS unit & integration tests, and might be worth a run in case this is a real bug that is just not showing up depending on gmock/gtest behavior.

@oschaaf
Copy link
Member

oschaaf commented Dec 4, 2018

Well, I reproduced the problem and ran tests with valgrind; also caught it in GDB. I ended up suspecting static initialization trouble with gtest/gmock 1.7, showing up only in release builds on gcc 6.x (and the originally reported gcc version, 1.8). Skipping the particular problematic test reported above, other tests would fail too (with less complex code/frames in the backtrace)..

I think @keesspoelstra more or less proved that the problem is in gtest/gmock by making the problem disappear on a symptomatic OS by upgrading their versions on his dev system.

@keesspoelstra
Copy link
Contributor

keesspoelstra commented Dec 4, 2018

@jmarantz @oschaaf

Upgrading will be fun:
I originally did a symlink to the appropriate directories in googletest 1.8.0 , which compiled fine
After that I tried to change the gyps, which worked well for the pagespeed source tree, not so fine for the linked in chromium submodule, as it expects gtest and gmock to be at a certain location.
We cannot change the chromium sourcecode I guess.

I can make a PR with symlinks to a submodule, but what do we think about symlinks to a folder within a submodule in GIT?
We don't have that yet in the repo, I think?

@keesspoelstra
Copy link
Contributor

keesspoelstra commented Dec 4, 2018

Also interesting to see, this seems to be the fix for getting 1.7.0 to work with newer GCCs
c-ares/c-ares@0d46f76
google/googletest#705 (comment)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants