Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

AddressSanitizer find bugs in axmol engine #1257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
crazyhappygame opened this issue Jul 8, 2023 · 21 comments
Closed

AddressSanitizer find bugs in axmol engine #1257

crazyhappygame opened this issue Jul 8, 2023 · 21 comments
Labels
Milestone

Comments

@crazyhappygame
Copy link
Contributor

  • axmol version: 4e664e6
  • devices test on: Windows 10
  • developing environments
    • NDK version: r19c
    • Xcode version: 12.4
    • Visual Studio:
      • VS version: 2022 (17.6.3)
      • MSVC version: 1929, 1934
      • Windows SDK version: 10.0.22621.0
    • cmake version: 3.25.2
      Steps to Reproduce:
      Windows, Visual studio 2022
  1. Enable adress sanitizer for VS
    https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=msvc-170
    by adding on the top of to CMakeLists.txt
add_compile_options(/fsanitize=address)
  1. build and start cpp-test
  2. Press "Start AutoTest"
  3. After some time application crashed with

image

Sample errors:

  1. ActionsProgressTests
    Address Sanitizer Error: Use of out-of-scope stack memory
void ProgressTimer::updateColor()
{
    if (!_sprite)
        return;

    if (!_vertexData.empty())
    {
        const Color4B& sc = _sprite->getQuad().tl.colors;
        for (int i = 0; i < _vertexData.size(); ++i)
        {
            _vertexData[i].colors = sc;
        }
    }
}
  1. TextureCacheUnbindTest
    Address Sanitizer Error: Use of deallocated memory
        // release the asyncStruct
        delete asyncStruct;
        --_asyncRefCount;

Comments:
This kind of problems means that we are in undefined behavior zone and can not reason about program correctness.
This kind of issue could result in the problems seen in #1211

@crazyhappygame
Copy link
Contributor Author

FYI. @rh101 @halx99 @DelinWorks @kiranb47

Visual studio support only AddressSanitizer https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=msvc-170

Does any one of can build and check axmol on linux against all available sanitizers [address, memory, thread, undefined]

@rh101
Copy link
Contributor

rh101 commented Jul 9, 2023

@crazyhappygame The ActionsProgressTests issue seems like a false positive from ASAN. It can be boiled down to this code which throws this error:

const Color4B& sc = _sprite->getQuad().tl.colors;
_vertexData[i].colors = sc;

If we change it to this, an error is still flagged by ASAN:

const auto& tl = _sprite->getQuad().tl;
_vertexData[i].colors = tl.colors;

For some reason the const reference is flagged as being out of scope, which shouldn't be the case. Unless there is something I'm not seeing, none of the code above should result in no errors.

This version of it does not throw an error:

const auto& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;

@rh101
Copy link
Contributor

rh101 commented Jul 9, 2023

2. TextureCacheUnbindTest
Address Sanitizer Error: Use of deallocated memory

@crazyhappygame I couldn't reproduce this issue. Did you test it with the 32bit of 64bit build?

Once I did the change to fix issue 1 in ProgressTimer::updateColor(), the entire cpp_test auto test passed without any further ASAN exceptions (besides a few crashes that were fixed in #1259 that are unrelated to ASAN).

@crazyhappygame
Copy link
Contributor Author

@rh101 ASAN does not produce false positive.
Below line is a real bug (not a false positive)

const Color4B& sc = _sprite->getQuad().tl.colors;

Clarification:
sprite returns by value V3F_C4B_T2F_Quad

V3F_C4B_T2F_Quad getQuad() const { return _quad; }

V3F_C4B_T2F_Quad temporary value is destroyed immediately and we keep reference to const Color4B& sc that points to non existing part of V3F_C4B_T2F_Quad

Possible fixes:

  1. Store copy of Color4B
const Color4B sc = _sprite->getQuad().tl.colors;
  1. Extend lifetime of rvalue
const auto& quad = _sprite->getQuad();

and later use quad

_vertexData[i].colors = quad.tl.colors;

Please check "Reference Lifetime Extension" e.g.:
https://abseil.io/tips/107

Generally:
This is OK. Reference Lifetime Extension.

const auto& temp = _sprite->getQuad();

This is WRONG. Reference Lifetime Extension does not work for any sub expression.

const auto& temp1 = _sprite->getQuad().tl;
const auto& temp2 = _sprite->getQuad().tl.color;

@halx99
Copy link
Collaborator

halx99 commented Jul 9, 2023 via email

@crazyhappygame
Copy link
Contributor Author

  1. TextureCacheUnbindTest
    Address Sanitizer Error: Use of deallocated memory

This looks like some threading problem. Most probably there is some race. We should run tread sanitizer to increase chance to reproduce this problem. I reproduce this problem on win 64 bit build

@rh101
Copy link
Contributor

rh101 commented Jul 9, 2023

Clarification: sprite returns by value V3F_C4B_T2F_Quad

V3F_C4B_T2F_Quad getQuad() const { return _quad; }

V3F_C4B_T2F_Quad temporary value is destroyed immediately and we keep reference to const Color4B& sc that points to non existing part of V3F_C4B_T2F_Quad

You're absolutely right; I completely missed that the return value of getQuad isn't a reference.

@crazyhappygame
Copy link
Contributor Author

crazyhappygame commented Jul 9, 2023

Universal Reference is better fix it: auto&& xxx =

  1. @halx99 Universal reference will not work. (update: double checked Universal reference && ASAN report error)
  2. @rh101 getQuad returning by reference will work

@halx99
Copy link
Collaborator

halx99 commented Jul 9, 2023

auto&& quad = _sprite->getQuad();
temp._vertexData[i].colors = quad.tl.colors;

not work?

@crazyhappygame
Copy link
Contributor Author

I checked below:

auto&& t = helpArrow->getQuad().tl.colors;

That should work

auto&& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;

There should be no difference how Reference Lifetime Extension works for below cases

auto&& t1= _sprite->getQuad();
const auto& t2 = _sprite->getQuad();
const V3F_C4B_T2F_Quad& t2 = _sprite->getQuad();

@rh101
Copy link
Contributor

rh101 commented Jul 9, 2023

auto&& quad = _sprite->getQuad();
temp._vertexData[i].colors = quad.tl.colors;

Sorry, yes that does work, which is equivalent to:

const auto& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;

Which would be preferred?

1 - Change getQuad to return a const reference? So it would become:
const V3F_C4B_T2F_Quad& getQuad() const { return _quad; }

or

2 - Update it's usage in void ProgressTimer::updateColor() to:

auto&& quad = _sprite->getQuad();
_vertexData[i].colors = quad.tl.colors;

@crazyhappygame
Copy link
Contributor Author

crazyhappygame commented Jul 9, 2023

Both works for me.

Not sure how big can _vertexData vector and what compiler code is generated here but there is a chance that below could be the faster. There is no indirection in body loop quad.tl.colors

        const Color4B sc = _sprite->getQuad().tl.colors;
        for (int i = 0; i < _vertexData.size(); ++i)
        {
            _vertexData[i].colors = sc;
        }

And most probably changing to below will be some small performance and readability improvement

const Color4B sc = _sprite->getQuad().tl.colors;
for(auto& d: _vertexData){
   d.color = sc;
}

@rh101
Copy link
Contributor

rh101 commented Jul 9, 2023

Not sure how big can _vertexData vector and what compiler code is generated here but there is a chance that below could be the faster. There is no indirection in body loop quad.tl.colors

        const Color4B sc = _sprite->getQuad().tl.colors;
        for (int i = 0; i < _vertexData.size(); ++i)
        {
            _vertexData[i].colors = sc;
        }

That would be an extra copy for the const Color4B sc = _sprite->getQuad().tl.colors;

If getQuads returned a const &:
const V3F_C4B_T2F_Quad& getQuad() const { return _quad; }

then the old code in ProgressTimer::updateColor() can remain as it is, and it will work, plus it won't be doing an extra copy since it's just getting the reference to colors.

const auto& sc = _sprite->getQuad().tl.colors;
for (int i = 0; i < _vertexData.size(); ++i)
{
    _vertexData[i].colors = sc;
}

Also, the other places that use getQuads get a copy of it, like this:

V3F_C4B_T2F_Quad quad = _sprite->getQuad();

All they do is either calculations on the values or copy the contents somewhere else, so if we change getQuads to return a const & we can remove a single copy operation, and just have all usages do:

const V3F_C4B_T2F_Quad& quad = _sprite->getQuad();
or
const auto& quad = _sprite->getQuad(); etc. etc.

@crazyhappygame
Copy link
Contributor Author

I think this single copy does not matter here:

  1. we create V3F_C4B_T2F_Quad (that is much bigger then Color4B)
V3F_C4B_T2F_Quad getQuad() const { return _quad; }
  1. We copy Color4B every single loop iteration
_vertexData[i].colors = sc;

Most probably below would be the most performant and readable code

const V3F_C4B_T2F_Quad& getQuad() const { return _quad; }

and then

const Color4B& sc = _sprite->getQuad().tl.colors;
for(auto& d: _vertexData){
   d.color = sc;
}

@halx99
Copy link
Collaborator

halx99 commented Jul 9, 2023

const V3F_C4B_T2F_Quad& getQuad() const { is best

@rh101
Copy link
Contributor

rh101 commented Jul 9, 2023

const V3F_C4B_T2F_Quad& getQuad() const { is best

I can take care of this in a few minutes. It'll affect other functions too, such as

void updateQuad(V3F_C4B_T2F_Quad* quad, ssize_t index);
void insertQuad(V3F_C4B_T2F_Quad* quad, ssize_t index);

They will become:

void updateQuad(const V3F_C4B_T2F_Quad& quad, ssize_t index);
void insertQuad(const V3F_C4B_T2F_Quad& quad, ssize_t index);

Which is probably a good thing too, since inside those methods the quad pointer is being de-referenced, and there is never a null check being done prior to that.

I'll get a PR done soon unless someone else wants to handle this.

@crazyhappygame
Copy link
Contributor Author

@rh101 thank you for merging fix.
Anyone could check other sanitizer on Linux/MacOS platforms [address, memory, thread, undefined]. There is definitely problem in async Texture loading....

@rh101
Copy link
Contributor

rh101 commented Jul 11, 2023

Anyone could check other sanitizer on Linux/MacOS platforms [address, memory, thread, undefined]. There is definitely problem in async Texture loading....

All of the texture loading tests seem to run fine with both thread and undefined behavior sanitizers enabled on Mac OSX (M1 arm64). The problem I did run into though is that I can't do the automated test, since it throws a thread error that seems to be related to the automated test threading code:
image

This error occurs as soon as I click "Start AutoTest".

@rh101
Copy link
Contributor

rh101 commented Jul 11, 2023

If you go into the TextureCache (TextureCacheUnbindTest) tests, and you click the right arrow (next test) button very quickly, then you will definitely get crashes. This may have something to do with the scheduler, which still runs a scheduled call to TextureCache::addImageAsyncCallBack even though that test has been removed, so data may be invalid.

I could reproduce the issue constantly on MacOS, but on Windows 10 I couldn't get it to happen at all, possibly because the images load very quickly, so the callback occurs before the next test is loaded when the right arrow is clicked.

EDIT: These errors are related to the test implementation, and not TextureCache. When a test scene is deleted, it doesn't unbind the callbacks registered with the texture cache for the async image loading. The way I managed to get it working is to simply use this:

auto* cache = Director::getInstance()->getTextureCache();
cache->unbindAllImageAsync(); 

in the destructor for each test.

The only issue with the above is that the test code also needs to be refactored to move the test setup from the constructor to the onEnter() methods. The reason is that a new test scene is constructed before the previous test scene is released, so here is what would happen if the code is left in the constructors:

1 - NewTestScene constructor called, TextureCache::addImageAsync is registered for all images.
2 - CurrentTestScene destructor is called, which calls TextureCache::unbindAllImageAsync, and this unbinds all image async, including the ones created in NewTestScene. This is not the behavior we want.
3 - CurrentTestScene replaced by NewTestScene. Nothing displays since all async calls were unbound.

If we move the code to onEnter, then this is what happens:

1 - NewTestScene constructor called (no code in here)
2 - CurrentTestScene destructor called, which calls TextureCache::unbindAllImageAsync, and this unbinds all image async, belonging only to the CurrentTestScene
3 - CurrentTestScene becomes NewTestScene
4 - CurrentTestScene::onEnter() is called, which sets up the new async image loading
5 - CurrentTestScene correctly displays the test

Come to think of it, the entire test implementation is a little strange, since everything is done in the constructors, which is not ideal. The constructors shouldn't be used to set up the tests, along with the fact that virtual methods are being called from constructors, which end up being resolved statically, causing issues with overriden methods.

Anyhow, with the changes above, the TextureCache test no longer crashes.

@rh101
Copy link
Contributor

rh101 commented Jul 12, 2023

@crazyhappygame If you get a chance to, please try out the changes in #1267 to verify that it fixes the crash you were getting in TextureCacheUnbindTest.

@rh101
Copy link
Contributor

rh101 commented Aug 29, 2023

@crazyhappygame If you have verified that the issues raised have been addressed, then please close this issue.

@halx99 halx99 added this to the HelpDesk milestone Sep 20, 2023
@axmolengine axmolengine locked and limited conversation to collaborators Sep 27, 2023
@halx99 halx99 converted this issue into discussion #1362 Sep 27, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

3 participants