Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspicion: QMS crashes if BRouter RD5 file is missing #624

Open
wthaem opened this issue Jul 29, 2023 · 27 comments
Open

Suspicion: QMS crashes if BRouter RD5 file is missing #624

wthaem opened this issue Jul 29, 2023 · 27 comments
Labels
bug Something isn't working

Comments

@wthaem
Copy link
Contributor

wthaem commented Jul 29, 2023

Describe the bug

Suspicion: QMS crashes when trying to add a new track with BRouter on-the-fly routing with endpoint not in a routing database file (rd5 file).

What have you done to circle down the problem?

See next point

To Reproduce

  1. Remove all BRouter rd5 files
  2. Start QMS
  3. Activate BRouter offline routing with on-the-fly routing
  4. Right click in map window and select Add track from context menu
  5. Move mouse to next routing point
  6. QMS crashes

Expected behavior

No crash

Screenshots

None

Attachments

N.A.

Tracebacks

None

Desktop

  • OS: Windows 10, BRouter 1.7.2
  • QMapShack Version: 1.17.0
  • Window Manager:

Additional context

This crash is different from the one caused by rapid mouse move while selecting a routing point located in an existing rd5 file. The likelihood of that one can be decreased by reducing the mouse speed in the OS mouse settings.

@wthaem wthaem added the bug Something isn't working label Jul 29, 2023
@mitxel-m
Copy link
Contributor

I can not reproduce it.

Without .rd5 files:
when I click ( or move) to add the next routing I get an error, but QMS does not crash.

With .rd5 files:
I get the same error when I try to add a routing point outside the area covered by the .rd5 files, but no crash.

error:
Mala respuesta del servidor: Error transferring http://127.0.0.1:17777/brouter?lonlats=-5.954731,36.291186%7C-5.934819,36.286689&profile=car-eco&alternativeidx=0&format=gpx - server replied: Bad Request

I am testing on:

  • QMapShack: 1.17.0
  • OS: Linux Mint 21 ( ubuntu 22.04)
  • Brouter: I have tried both 1.6.1 and latest 1.7.2

@wthaem
Copy link
Contributor Author

wthaem commented Jul 30, 2023

@mitxel-m: Thanks for testing.

In this case, one more of these old and unidentified mouse-related crashes happening only on Windows and making the use of BRouter very difficult.

@wthaem
Copy link
Contributor Author

wthaem commented Jul 30, 2023

Just one piece of information more: the crash happens only if on-the-fly routing is selected.

@kiozen
Copy link
Collaborator

kiozen commented Jul 31, 2023

Tried it myself on Linux. No crash. The error message gets displayed as expected. That's all

@wthaem
Copy link
Contributor Author

wthaem commented Jul 31, 2023

I run an AutoIt script with the help of which I can reproduce the crash.

Here are the steps (a sleep after each step is included but not shown here, routing data is available):

MouseMove(840, 615)   ; goto first routing point
MouseClick($MOUSE_CLICK_RIGHT, 840, 615) ; open context menu
Send("{DOWN}")
Send("{DOWN}")
Send("{DOWN}")              ; goto "add route"
Send("{ENTER}")               ; start adding a route  
MouseMove(1040, 615)   ; move mouse to next point
MouseClick($MOUSE_CLICK_LEFT, 1040, 615) ; clash after left click - can be reproduced

Using the mouse or the touchpad: same crash.

@kiozen
Copy link
Collaborator

kiozen commented Jul 31, 2023

Any chance to run it in MS's debugger to get a backtrace?

@wthaem
Copy link
Contributor Author

wthaem commented Jul 31, 2023

Maybe, this helps to understand the stable crash:

I took the parameters used when starting java from within QMS, called with them java from a commandline and got:

d:\GPS\BRouter>"C:\Program Files\Common Files\Oracle\Java\javapath\java.exe" -Xmx128M -Xms128M -Xmn8M -DmaxRunningTime=300 -cp brouter-1.7.2-all.jar btools.server.RouteServer segments4 profiles2 customprofiles 17777 1 127.0.0.1
BRouter 1.7.2 / 19072023
Exception in thread "main" java.net.BindException: Address already in use: bind
        at java.base/sun.nio.ch.Net.bind0(Native Method)
        at java.base/sun.nio.ch.Net.bind(Net.java:556)
        at java.base/sun.nio.ch.Net.bind(Net.java:545)
        at java.base/sun.nio.ch.NioSocketImpl.bind(NioSocketImpl.java:629)
        at java.base/java.net.ServerSocket.bind(ServerSocket.java:393)
        at java.base/java.net.ServerSocket.<init>(ServerSocket.java:275)
        at btools.server.RouteServer.main(RouteServer.java:302)

Java version:

D:\Scripts>"C:\Program Files\Common Files\Oracle\Java\javapath\java.exe" -version
java version "20" 2023-03-21
Java(TM) SE Runtime Environment (build 20+36-2344)
Java HotSpot(TM) 64-Bit Server VM (build 20+36-2344, mixed mode, sharing)

@kiozen: Backtrace possible, but not helpful: crash most likely happens not in QMS but in BRouter environment. Compare backtrace in issue #388 and lenghty discussion there.

@kiozen
Copy link
Collaborator

kiozen commented Jul 31, 2023

If that is the same issue and the backtrace from #388 is correct I would say the exception is not caught as expected. It should be handled in the catch block. On the other hand side in the catch block another exception is thrown. I couldn't see where it is caught.

To circle down the problem:

  • make sure the catch block is entered. (breakpoint or debug message)
  • disable the 2nd throw to check if that avoids crashing.

@wthaem
Copy link
Contributor Author

wthaem commented Jul 31, 2023

Some more debuging info from CRouterBRouter.cpp:

line 328: the else if (netErr != QNetworkReply::NoError) is the branch used with the following data:

netErr: QNetworkReply::ProtocolInvalidOperationError
reply->errorString(): "Error transferring http://127.0.0.1:17777/brouter?lonlats=12.494491,52.106414%7C13.211337,52.106414&profile=car-eco&alternativeidx=0&format=gpx - server replied: Bad Request"

The throw reply->errorString() in the next line 329 is caught in the catch branch (line 383) and can be followed up to line 388:

throw tr("Bad response from server: %1").arg(msg)

No idea where this exception is caught.

@wthaem
Copy link
Contributor Author

wthaem commented Aug 1, 2023

More debugging info:

  • CLineOpAddPoint::leftClick is entered
  • CLineOpAddPoint::leftClick is properly finished with canvas->slotTriggerCompleteUpdate(CCanvas::eRedrawMouse);

Wherefrom is this method called?

@wthaem
Copy link
Contributor Author

wthaem commented Aug 4, 2023

Debugging with VisualStudio:

  • ILineOp::slotTimeoutRouting() is called and ends normally
  • next step goes to Qt5Core.Dll and here it crashes

@kiozen
Copy link
Collaborator

kiozen commented Aug 9, 2023

Can you test the fix in the PR. The fix is handling the exception properly

@wthaem
Copy link
Contributor Author

wthaem commented Aug 9, 2023

A crash is still there but I don't understand yet what happens when trying to debug.

@wthaem
Copy link
Contributor Author

wthaem commented Aug 10, 2023

Call stack for crash:

 	Qt5Core.dll!00007fffceeb69ee()	Unbekannt
>	qmapshack.exe!CRouterBRouter::synchronousRequest(const QVector<QPointF> & points, const QList<IGisItem *> & nogos, QPolygonF & coords, double * costs) Zeile 399	C++
 	qmapshack.exe!CRouterBRouter::calcRoute(const QPointF & p1, const QPointF & p2, QPolygonF & coords, double * costs) Zeile 295	C++
 	qmapshack.exe!ILineOp::tryRouting(IGisLine::point_t & pt1, IGisLine::point_t & pt2) Zeile 322	C++
 	qmapshack.exe!ILineOp::finalizeOperation(int idx) Zeile 346	C++
 	qmapshack.exe!ILineOp::slotTimeoutRouting() Zeile 206	C++
 	[Externer Code]	
 	qmapshack.exe!main(int argc, char * * argv) Zeile 75	C++
 	[Externer Code]	

(On-the-fly routing, add new track, start and end points not on a road, crash when moving to endpoint, not yet clicked at endpoint)

@wthaem
Copy link
Contributor Author

wthaem commented Aug 10, 2023

@kiozen, @ntruchsess: More info:

Just before crashing the catch branch in CRouterBRouter::synchronousRequest starting at line 383 is called and here coords is cleared (line 384). Is this compatible with line 399 return coords.size();?

@kiozen
Copy link
Collaborator

kiozen commented Aug 10, 2023

There is an additional mutex unlock that might cause problems. Please update

git fetch
git reset --hard origin/QMS-624 

@ntruchsess
Copy link
Contributor

ntruchsess commented Aug 10, 2023

I don't think this 'additional' mutex unlock can cause issues as both calls to unlock are actually not 'additional' but mutually exclusive. Either the mutex is unlocked within the catch-block (then the method is left throwing the error-message) or the it is unlocked below the catch-block. But it is never unlocked twice.

If there are any doubts regarding the unlock of mutex I suggest to use QMutexUnlocker: https://doc.qt.io/qt-6/qmutexlocker.html

@ntruchsess
Copy link
Contributor

I don't think this 'additional' mutex unlock can cause issues as both calls to unlock are actually not 'additional' but mutually exclusive. Either the mutex is unlocked within the catch-block (then the method is left throwing the error-message) or the it is unlocked below the catch-block. But it is never unlocked twice.

If there are any doubts regarding the unlock of mutex I suggest to use QMutexUnlocker: https://doc.qt.io/qt-6/qmutexlocker.html

I just tried to go save and use qmutexlocker but it causes issues with responsitiveness as it blocks the UI-thread.

@wthaem
Copy link
Contributor Author

wthaem commented Aug 10, 2023

@kiozen, @ntruchsess: Updated QMS-624. Crash after described actions in QMS GUI but different call stack:

 	0000021b4ce9ca00()	Unbekannt
 	Qt5Core.dll!00007ff809082efb()	Unbekannt
 	Qt5Network.dll!00007ff810299696()	Unbekannt
 	Qt5Core.dll!00007ff80908950b()	Unbekannt
 	Qt5Widgets.dll!00007ff809b34990()	Unbekannt
 	Qt5Widgets.dll!00007ff809b33a13()	Unbekannt
 	Qt5Core.dll!00007ff809062aca()	Unbekannt
 	Qt5Core.dll!00007ff809064845()	Unbekannt
 	qwindows.dll!00007ff80c012dff()	Unbekannt
 	Qt5Core.dll!00007ff8090aba5a()	Unbekannt
 	qwindows.dll!00007ff80c012dd9()	Unbekannt
 	Qt5Core.dll!00007ff80905ef2c()	Unbekannt
 	Qt5Core.dll!00007ff809061a94()	Unbekannt
>	qmapshack.exe!main(int argc, char * * argv) Zeile 75	C++
 	[Externer Code]	

@kiozen
Copy link
Collaborator

kiozen commented Aug 11, 2023

This is getting more and more weird. Are you sure you have all libraries compiled/installed properly? It's always a problem to compile an app with version X of the library but at runtime it uses version Y.

Maybe we are hunting ghosts here.

@ntruchsess
Copy link
Contributor

@kiozen, @ntruchsess: More info:

Just before crashing the catch branch in CRouterBRouter::synchronousRequest starting at line 383 is called and here coords is cleared (line 384). Is this compatible with line 399 return coords.size();?

it is compatible. clear() removes all elements. size() then returns 0. I just noticed that it would be better to not return 0 in this case (0 is interpreted as 'valid response but no routing-points determined) but -1 (which means 'no reponse'). While this is completely unrelated to the crash you are experiencing it would result in slightly (if not negligible) less cpu-usage processing the routing result.

@wthaem
Copy link
Contributor Author

wthaem commented Aug 11, 2023

Are you sure you have all libraries compiled/installed properly? It's always a problem to compile an app with version X of the library but at runtime it uses version Y.

Is there a way to check this?

@kiozen
Copy link
Collaborator

kiozen commented Aug 14, 2023

Is there a way to check this?

Not sure if there is a better way in Windows. But you should make sure there is no Qt path in the $PATH environment variable. Or at least no QT path that you do not intend to use.

@ntruchsess
Copy link
Contributor

ntruchsess commented Aug 14, 2023

@kiozen I identified a potential race on accessing the field 'synchronous'. As delivery of signals is asynchronous slotRequestFinished might run after the field 'synchronous' has been set to false within the synchronous request processing. If this happens the asynchronous logic would run unlocking the mutex a second time.
See: #633

let's see whether this makes a difference.

@ntruchsess
Copy link
Contributor

ntruchsess commented Aug 15, 2023

ok, as it is still crashing I suppose the use of QMutex on Windows just doesn't work (or may 'just doesn't work on @wthaem 's machine').
As far as I remember I used the mutex to protect the field 'synchronous' (being accessed from within the local event-loop). Now that I have replaced the field by a reply-property (being delivered via qt signal) I decided to remove the mutex entirely: e0399d3

@wthaem please give it a try by re-checking out #633

@kiozen
Copy link
Collaborator

kiozen commented Aug 16, 2023

QMutex should work as expected on any platform. This is Qt.

@ntruchsess
Copy link
Contributor

ntruchsess commented Aug 16, 2023

QMutex should work as expected on any platform. This is Qt.

I agree, it should work the same on Windows (and I' pretty sure it's not faulty as Canvas+IDrawContext does use it and wouldn't work at all if locking/unlocking were prone to crash).
But it's likely the timing in respect to visibility of unsynchronized writes is not the same on all platforms and one might encounter a race on one platform or CPU while the same code runs totally fine on a different platform or CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants