Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix json buffer guard #3743

Closed
wants to merge 1 commit into from
Closed

Conversation

willmmiles
Copy link
Member

GlobalBufferAsyncJsonResponse was still trying to init AsyncJsonResponse with the shared document, even when it failed to acquire the lock. This was still corrupting memory and causing crashes.

Possibly fixes #3690, #3685, and other 0.14.1 crash issues in any context where parallel requests are being made.

NOTE: The code handles a locked buffer by returing a "try again later" HTTP return code. This asks clients to delay by >=1 second, and can severly affect the apparent interactivity when collisions occur. I am working on lifting this restriction, but the tradeoff is extra heap usage, so it won't be safe to use without a web server queue.

GlobalBufferAsyncJsonResponse was still trying to init AsyncJsonResponse
with the shared document, even when it failed to acquire the lock.
This was still corrupting memory and causing crashes.

Possibly fixes wled#3690, wled#3685, and other 0.14.1 issues.
@blazoncek
Copy link
Collaborator

I'm trying to understand what's happening here so help me out, please.

GlobalBufferAsyncJsonResponse was still trying to init AsyncJsonResponse with the shared document, even when it failed to acquire the lock. This was still corrupting memory and causing crashes.

Why is this a problem or why and where did the corruptions happen? If lock is not acquired then AsyncJsonBuffer is immediately released (since owns_lock() returns false). Isn't that so? Perhaps a swap of delete and serveJsonError() needed? serveJsonError() will allocate new dynamic JSON buffer for its needs. There's also the fact that AsyncJsonBuffer in guard wrapper reuses global/static JSON buffer so no other corruption than in-memory JSON document can occur.

NOTE: The code handles a locked buffer by returing a "try again later" HTTP return code. This asks clients to delay by >=1 second, and can severly affect the apparent interactivity when collisions occur.

I think this is the intended behaviour as to release pressure on ESP.

@willmmiles
Copy link
Member Author

Why is this a problem or why and where did the corruptions happen? If lock is not acquired then AsyncJsonBuffer is immediately released (since owns_lock() returns false). Isn't that so? P

The problem was that the constructor of AsyncJsonResponse would still proceed even if the lock could not be acquired -- calling createNestedArray() or createNestedObject() on the global document and corrupting the contents.

I think this is the intended behaviour as to release pressure on ESP.

It is, but it's got a severe performance impact (a minimum of a 1 second delay!!). It's particularly problematic when there are burst requests as some integrations seem to do -- while they ultimately succeed, it happens very slowly. Better than crashing, but it still feels like a big regression from the old behavior. Still, this was the "cheaply available" fix without deeper surgery.

I do have a better fix in progress, but it depends on updates to the web server code to be even remotely memory efficient. WIP tests are pretty sweet though - give willmmiles/WLED:wm-dev a try if you want to see it all working together.

@blazoncek
Copy link
Collaborator

Thanks.

What about:

inline GlobalBufferAsyncJsonResponse(bool isArray) : JSONBufferGuard(17) {
  if (owns_lock()) AsyncJsonResponse(pDoc, isArray);
};

I see no apparent ill effect from that. (with my limited C++ knowledge)

@willmmiles
Copy link
Member Author

It's not possible to conditionally execute a base class constructor in C++. The code you've posted there is legal C++, but it won't do what you intend -- the line you're using to try to call the constructor is creating a new AsyncJsonResponse on the stack, then immediately destructing it.

https://godbolt.org/z/MeP31esvf for a demo.

@blazoncek
Copy link
Collaborator

Thanks for the explanation, but still. If default base class is constructed on stack then it is the one without parameters. Isn't that so?
If so, then default AsyncJsonResponse() is created which uses dynamic buffer.

AsyncJsonResponse(size_t maxJsonBufferSize = DYNAMIC_JSON_DOCUMENT_SIZE, bool isArray=false)

That, unfortunately, will try to allocate 16k dynamic buffer.

@willmmiles
Copy link
Member Author

You are correct! That's why the patch presented here is to error out before construction of the AsyncJsonResponse if the lock can't be acquired. (If the lock is sucessfully acquired, it's move-constructed in to the GlobalBufferAsyncJsonResponse instance.)

C++ doesn't permit 'allocated but not constructed' as an expressible state for any non-POD type. There isn't a way to prevent calling /some/ constructor except by not instantiating the object at all. For a class, by the time you are executing your constructor body, you are guaranteed that /some/ constructor has run for all base classes and member variables. In the case of AsyncJsonResponse, neither constructor is usable, so we don't have any choice except to defer instantiation until after the lock is acquired. (Or, well, give up on AsyncJsonResponse altogether ... but that's a whole 'nother can of worms.)

@blazoncek
Copy link
Collaborator

Great! Thank you, I learned something new.

blazoncek added a commit that referenced this pull request Feb 7, 2024
blazoncek added a commit that referenced this pull request Feb 11, 2024
softhack007 pushed a commit to MoonModules/WLED-MM that referenced this pull request Feb 12, 2024
@willmmiles
Copy link
Member Author

Superseded by 4f42a17 and 6dcd959.

@willmmiles willmmiles closed this Feb 15, 2024
@willmmiles willmmiles deleted the global-json-fix-v2 branch March 7, 2024 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

14.1 can't connect to router on 14.1 on ESP8266 Wemos D1 mini
2 participants