Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezing while drawing the initial fractal #177

Closed
cjmayo opened this issue Oct 5, 2020 · 13 comments
Closed

Freezing while drawing the initial fractal #177

cjmayo opened this issue Oct 5, 2020 · 13 comments

Comments

@cjmayo
Copy link
Collaborator

cjmayo commented Oct 5, 2020

47752e0 (#164) does stop segfaults but for me it appears to be often causing the application to freeze soon after start-up when the application is shown but before the initial fractal has completed drawing (often before there is anything visible of the fractal).

Tests pass fine.

This is with Python 3.8.6 and Python 3.8.5. I've only recently noticed, although possibly I just haven't run it much since first applying the patch. If I revert it this issue goes away.

@dragonmux
Copy link
Member

It's possible that in the tests we never have the GIL already but in this normal usage, we do when the cleanup code is run

@mindhells
Copy link
Member

I also had some trouble (deadlocks) with the GIL lock in #175. Actually I had to disable it for the "continuous mode", see https://github.com/fract4d/gnofract4d/pull/175/files#diff-046a50aa52e99568daf19d67df90afa7R136

Couldn't figure out the reason though.

@cjmayo
Copy link
Collaborator Author

cjmayo commented Oct 13, 2020

Would switching over to the new controller code avoid this?

@mindhells
Copy link
Member

Would switching over to the new controller code avoid this?

The new controller interface uses a different approach for the reference counting problem: pfo and site instances are owned by the controller instance, image and colormap are kept until replaced (by a new calculation process) or the instance is disposed.

I was working on switching gnofract4d to this new interface but I couldn't continue due to the lack of time: https://github.com/HyveInnovate/gnofract4d/commit/2fda447db3665f010f4da0ea1f5237fcdd47eeda
A controller instance could be created and kept until you need a new one (basically when you change the formula a new a new pfo to be created).

I'm not sure if this would solve the problem, but I like the idea of using python extension types because they help creating better interfaces in my opinion. In the PR #85 there's more info about this.

@cjmayo
Copy link
Collaborator Author

cjmayo commented Oct 14, 2020

Thanks for the pointers.

I was wondering if the problem was something to do with site owning the thread and being part of the cargs being deleted (but that is an uneducated guess, and it may well be the GIL being held somewhere completely different, there is a lot of GIL locking in PySite though...). The new interface seemed to separate site off.

Looks like it would help with the problems with testfdsite (#173) and worked around in fract4dgui.gtkfractal.Hidden.onData().

@cjmayo
Copy link
Collaborator Author

cjmayo commented Oct 19, 2020

I cloned HyveInnovate@2fda447 and rebased on master (no big deal). Then spent a while trying to understand the test failures (just the new API I think, more later), and finally decided to get on and install it.

My test for the segfault and freezing has simply been to start the program, wait for the initial fractal, close and repeat.
I realise the branch is still WIP but for all the times I have tried it I have had one segfault and one:

pure virtual method called
terminate called without an active exception
Aborted

and that is a lot fewer problems than with master.

There aren't many test failures, only two causes I think:

fract4d.fractal.T.drawpoint():

>       result = fract4dc.pf_calc(
            self.pfunc, self.params[0:4], self.maxiter, 0, 0, 0, repeats)
E       ValueError: Not a valid handle

Although gnofract4d --singlepoint works.

And in fract4d/tests/test_3d.py, multiple failures in Setup():

        self.fw = fract4dc.fw_create(
>           1, self.pfunc, self.cmap, self.im._img, self.f.site)
E       AttributeError: 'T' object has no attribute 'site'

What happens to those tests of fract4d/c/fract4dc/functions.cpp?

@mindhells
Copy link
Member

As I see it, switching Gnofract4d to use the new controller is an ambitious long term task.
When I first created #85 I did it just to show a possible approach on how to improve fract4dc interface, I also provided some code examples to transfer the knowledge I got from doing this.

What happens to those tests of fract4d/c/fract4dc/functions.cpp?

That's exactly the point and the idea behind all of this, and it's all explained in #110. But basically, in my opinion, we should move those tests to C++, because some (actually many of them) fract4dc interfaces exist just to do those tests in python.
For example, fract4dc.fw_create it's used only for the tests not by the actual application.

As a middle-term solution, if we want to use this new controller, we could expose the site entity from inside the controller so the test_3d tests have it. But I understand you are still having segfault problems with this new controller, aren't you? So we would need first to solve that.

@cjmayo
Copy link
Collaborator Author

cjmayo commented Oct 21, 2020

As I see it, switching Gnofract4d to use the new controller is an ambitious long term task.

OK, good to have found out a bit more.

As a middle-term solution, if we want to use this new controller, we could expose the site entity from inside the controller so the test_3d tests have it. But I understand you are still having segfault problems with this new controller, aren't you? So we would need first to solve that.

I'll see if I can reproduce it. (It is a lot less frequent than with the current code, but yes it would be preferable to find - could even be something unrelated from GTK...)

For me current master is the worst of all, freezing is the most frequent problem. No easy options.

@mindhells
Copy link
Member

Might have something to do with #154 as well (I was also suspicious about GKT)

I can reproduce the problem in my environment (python 3.8.6 - MacOS 10.15.7)
I'm trying to debug this with Xcode but, since it seems to be a race condition problem, haven't had any luck so far
If you have the same environment, and want to try, check this commit

I'll try also to check the sanitizers as I did here

I'll keep this posted if I find anything

@mindhells
Copy link
Member

Another interesting thing:

  • Gtkfractal->draw_image is called 4 times (2 for the actual image and 2 for the preview) when gnofract4d starts

That is something that is annoying me a lot while working on #175

Of course this is not the cause of the problem, but I think it's worth to check if it's making it worse

@cjmayo
Copy link
Collaborator Author

cjmayo commented Oct 22, 2020

Trying to find something in the Python I got curious about:

gnofract4d/gnofract4d

Lines 66 to 70 in 0930ad1

if mainWindow.f.thaw():
gi.require_version('GLib', '2.0')
from gi.repository import GLib
GLib.idle_add(mainWindow.on_fractal_change)

And in investigating that saw MainWindow.on_fractal_change() being called several times on startup, which is calling self.f.draw_image().

It's not the root cause but it does seem to be what is exposing the problem at startup. #182 appears to fix this issue for me, so far... (I still like the controller interface though).

@mindhells
Copy link
Member

I think I might have found the actual problem and, hopefully, the solution (PR coming soon).

With the information already provided in this issue I think it's pretty clear we have a deadlock here:

PyGILState_STATE gstate = PyGILState_Ensure();
delete args;
PyGILState_Release(gstate);

Basically what we are trying to do there is to acquire the interpreter lock before deleting the cargs object. That's ok.
The problem is we are trying to do that while at the same time holding the interpreter lock here


That's because pycalc is executed in the interpreter thread.
So when 2 calls to fract4dc.calc happen in a very short time interval we have them waiting for each other to release the interpreter lock and thus a deadlock.

Honestly I'm not 100% sure how GIL works and if this is really the problem but, following my guessing I came up with this solution that is working for me:

PyObject * pycalc([[maybe_unused]] PyObject *self, PyObject *args, PyObject *kwds)
    {
        calc_args *cargs = parse_calc_args(args, kwds);

        if (NULL == cargs)
        {
            return NULL;
        }

        if (cargs->asynchronous)
        {

            std::thread t([cargs](){
                auto &site = *cargs->site;
                site.interrupt();
                site.wait();
                site.start();
                site.set_thread(std::thread(calculation_thread, cargs));
            });
            t.detach();
        }
        else
        
      ...

        Py_INCREF(Py_None);

        return Py_None;
    }

Basically here I'm releasing the interpreter lock immediately instead of waiting for site->wait when fract4dc.calc is called, so the other thread can acquire it.

what do you think?

@cjmayo
Copy link
Collaborator Author

cjmayo commented Oct 27, 2020

Moving the fractal around by using the cursor keys (a lot) is another way I can get master to freeze - this change stops that happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants