Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newlib: add thread safe implementation #8619

Closed
wants to merge 4 commits into from

Conversation

vincent-d
Copy link
Member

@vincent-d vincent-d commented Feb 22, 2018

Contribution description

Inspired by #4529, but using rmutex, this implements __malloc_lock/__malloc_unlock and __env_lock/__env_unlock syscalls to make newlib trhead-safe.

I guess it should be improved to check whether _REENT_SMALL is defined, and maybe disabled by default...

Issues/PRs references

#4529, fixes #4488

@miri64
Copy link
Member

miri64 commented Feb 22, 2018

The main problem with #4529 sticks however (for later reference 78c4080 was the initial commit in this PR):

# 78c4080~1
   text	   data	    bss	    dec	    hex	filename
   3280	     36	   2732	   6048	   17a0	tests/minimal/bin/samr21-xpro/tests_minimal.elf
# 78c4080
   text	   data	    bss	    dec	    hex	filename
   3644	    136	   2988	   6768	   1a70	tests/minimal/bin/samr21-xpro/tests_minimal.elf

And that's just the minimal application (i.e. main+idle threads).

@jnohlgard
Copy link
Member

How about adding a thread flag that controls allocation of a thread local reent struct?
The allocation can be taken from the stack space in the same way as the tcb struct is allocated from the end of the stack space.
If the flag is not set, then impure_ptr will be directed to the global reent struct instead of the thread local one.

core/include/thread.h Outdated Show resolved Hide resolved
@kaspar030
Copy link
Contributor

Only +85 lines, nice!

@kaspar030
Copy link
Contributor

The main problem with #4529 sticks however

Yeah, this seems expensive. OTOH, currently, it is just plain luck that we're not seeing more problems due to not thread-safe newllib calls...

@vincent-d
Copy link
Member Author

The main problem with #4529 sticks however

This can not really be solved I guess... A struct _reent per thread is needed for newlib to work safe. And this struct is about 100 bytes wide, so the thread stack sizes need to be expanded by the same amount of bytes.
I can't really explain the flash size difference, though. But I guess more code is linked within newlib. Which I guess is the price to have thread safety...

@vincent-d
Copy link
Member Author

Fixed typo.

Just to add another argument: this code could still be disabled by default if there is no consensus, it's simply a module (MODULE_NEWLIB_THREAD_SAFE).

@vincent-d
Copy link
Member Author

Re-added (and hopefully improved) the Makefile.include tests from #4529 to check if _REENT_SMALL is defined and <sys/reent.h> exists.

@vincent-d vincent-d added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Feb 27, 2018
@jnohlgard
Copy link
Member

See OTAkeys#10 for an example of per-thread control of reent struct allocation

@vincent-d
Copy link
Member Author

How do we want to proceed here?
Shall we merge OTAkeys#10 ? I'd rather start with this one as-is (maybe disable the module by default, though) and then keep exploring @gebart lead.

@kaspar030 @miri64 what is your feeling here?

@tcschmidt
Copy link
Member

ping @kaspar030 @miri64

@jcarrano
Copy link
Contributor

@vincent-d Once this gets merged, should I modify the tlsf malloc to use __malloc_lock, __malloc_unlock, etc functions?
@vincent-d Checking if the correct defines are defined is something that should definitelky be done. Right now on my machine, for example, the incorrect header (the one for non-nano newlib) is being used and no error is being raised.
@vincent-d @miri64 Looking at the definition of struct _reent, it seems that a lot of space is taken up by variables used by wide character routines (see struct _misc_reent). The only way to fix that is to mess with newlib itself.

@vincent-d
Copy link
Member Author

should I modify the tlsf malloc to use __malloc_lock, __malloc_unlock, etc functions?

I don't think so, __malloc_lock and __malloc_unlock are internal to newlib allocator I think. So I guess tlsf should use it's own locking mechanism.

Right now on my machine, for example, the incorrect header (the one for non-nano newlib) is being used and no error is being raised.

I thought the checks in sys/newlib_thread_safe/Makefile.include would be enough. Could you elaborate a bit on this?

@jcarrano
Copy link
Contributor

jcarrano commented May 30, 2018

I don't think so, __malloc_lock and __malloc_unlock are internal to newlib allocator I think. So I guess tlsf should use it's own locking mechanism.

So if the entire malloc/free/etc family of functions is replaced, then I don't have to worry about newlib locks?

I thought the checks in sys/newlib_thread_safe/Makefile.include would be enough. Could you elaborate a bit on this?

Yes, they are good! What is not good is what is currently in the tree (it does not perform enough checks).

@vincent-d
Copy link
Member Author

Coming back on this, I really think we should have this feature.

So if the entire malloc/free/etc family of functions is replaced, then I don't have to worry about newlib locks?

They are also used internally from printf/scanf functions if I understand correctly.

@jcarrano
Copy link
Contributor

@vincent-d One thing I don't fully understand: right now, is it safe to call memory allocation functions from more than one thread? If the answer is no, then this feature is not optional, it's required.

@vincent-d
Copy link
Member Author

right now, is it safe to call memory allocation functions from more than one thread?

I believe it's not. One can easily reproduce printing issue with multiple threads, and I didn't need a lot pf research to find this kind of articles: http://www.nadler.com/embedded/newlibAndFreeRTOS.html among others.

@vincent-d
Copy link
Member Author

Can we try to reach a consensus here?

@kaspar030 @gebart @haukepetersen is there any reason to block this as-is?

Btw, rebased!

@jcarrano
Copy link
Contributor

@vincent-d I'd like this to be merged. Even more, I'd want it to be the default (but that could be done in another PR.)

@stale
Copy link

stale bot commented Aug 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Aug 10, 2019
@vincent-d vincent-d force-pushed the pr/newlib_thread_safe branch from 2a31cbf to 8b43d5b Compare August 21, 2019 08:56
@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Aug 21, 2019
@vincent-d
Copy link
Member Author

Rebased.

@jcarrano
Copy link
Contributor

@vincent-d Have you come across any platform where newlib-nano does not have reent support? Maybe we can devise some script that iterates through all platforms and checks this. Then we can say "only reent newlib" is supported and remove the ifdefs.

If we don't guarantee reentrancy then its value is greatly diminished as we cannot write portable code that relies on this.

@gschorcht
Copy link
Contributor

gschorcht commented Dec 31, 2019

@vincent-d I'm sorry to join the discussion so late. I didn't know about this PR before.

__env_lock and __malloc_lock are only two lock functions that would make newlib calls thread safe. There are a number of other newlib calls that use lock functions like __fp_lock, __fp_lock_all, __sfp_lock_acquire, __sinit_lock_acquire, __tz_lock.

For some platforms, e.g.. Cortex-M, RISC-V or ESP, all these lock functions are simply mapped by the newlib to the functions _lock_acquire and _lock_acquire_recursive which are dummy functions by default. But, they are either retargetable (Cortex-M, RISC-V) or declared as weak symbols (ESP) so that they can be overwritten by functions with real functionality. These _lock_acquire* functions are also used by a number of other newlib system functions, for example by the puts_r function.

[UPDATE] It seems that RIOT uses the newlib-nano for Cortex-M and RISC-V for which the statements above aren't true 😎

So IMHO, it would be a more general approach for platforms that use the newlib instead of the newlib-nano to implement the lock_acquire* functions instead of implementing all separate lock functions like __env_lock or __malloc_lock. Implementing _lock_acquire* would then cover any kind of locking for the newlib.

For the ESP platforms, I had already implemented these _lock_acquire* functions based on mutex or rmutex, see commit. Unfortunately, these _lock_acquire* functions don't work for test applications that call puts inside ISRs, see issue #12732. These tests would need to be changed so that they do not call puts directly from an ISR.

@stale
Copy link

stale bot commented Jul 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jul 6, 2020
@miri64 miri64 removed the State: stale State: The issue / PR has no activity for >185 days label Jul 6, 2020
@stale
Copy link

stale bot commented Jan 9, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

@stale stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jan 9, 2021
@tcschmidt
Copy link
Member

@kaspar030 @miri64 @gschorcht - any path to progress this?

@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jan 10, 2021
@benpicco
Copy link
Contributor

Should be superseded by #15606

@benpicco benpicco closed this Jan 10, 2021
@fjmolinas
Copy link
Contributor

@maribu should this one have been closed with #15606? In the PR description you seemed to have mentioned that both were good options with different tradeoffs.

@maribu
Copy link
Member

maribu commented Mar 12, 2021

@maribu should this one have been closed with #15606? In the PR description you seemed to have mentioned that both were good options with different tradeoffs.

#15606 is of limited use, but also of limited cost. This might still be useful for some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making the newlib thread-safe
10 participants