Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for freestanding C++ #189

Open
nicolasnoble opened this issue Oct 23, 2024 · 6 comments
Open

Support for freestanding C++ #189

nicolasnoble opened this issue Oct 23, 2024 · 6 comments
Labels
enhancement New feature or request
Milestone

Comments

@nicolasnoble
Copy link

I managed to get snitch to run properly on a freestanding C++ environment, namely the Sony PlayStation 1:

image

I encountered a few very minor issues however, and I'm reporting them here to see if we can create a working set of configuration settings to make this work. It'd be nice to have a SNITCH_FREESTANDING toggle that is.

  • Freestanding means no <cstring>, so no std::memmove and no std::memcpy. These could easily be defined away so the user can replace them at compilation time with macros indicating the replacement of their choice, such as __builtin_memmove and __builtin_memcpy when using gcc. These occur in snitch_append.cpp and snitch_string_utility.cpp.
  • Freestanding means no <cstdio>, so no snitch_console.cpp nor snitch_file.cpp. My opinion for these is to simply disable most of the implementation details in freestanding mode, and let the user provide alternative implementations for them, in the form of simple C-style externs, such as void snitch_freestanding_write_console(const char * data, size_t len), void * snitch_freestanding_open_for_write(const char * fname), void snitch_freestanding_write(void *, const char * data, size_t len), and void snitch_freestanding_close(void *), to simplify interoperability.

I can issue a pull request to address these, if that's an acceptable design.

@cschreib
Copy link
Member

I'm currently traveling so my response will be brief, but I just wanted to say thank you for sharing this. This is cool!

I have no experience with freestanding environments, but what you propose makes sense in spirit.

For memcpy/memmove, could an alternative be to simply handwrite the copy algorithm if SNITCH_FREESTANDING (to be added) is set to 1? I assume the compiler can detect the pattern and replace it with the appropriate builtin (with optimisations enabled...). Would need testing. That would require less setup on the user side. Alternatively, is <algorithm> available?

For <cstdio>, your proposal makes sense. Though, the mechanism we have been using so far is to use function_ref (essentially a function pointer that can't be null and also works with member functions) rather than an extern declaration. This is already done to customise the function to use for console output, in particular. Could be done for file output too.

Though in the freestanding case there is no suitable default, as far as I understand, so we'd have to initialise these refs to "unimplemented" functions (which terminate on call), and that is not very nice.

We could instead initialise the functions refs to undefined extern declarations, which comes back to your proposal but with the added ability to also use member functions if needed. The downside is then that the functions must be defined, even when not used, which is also not very nice.

I'm reluctant to introduce macros other than 0/1 and numerical values, to keep things simple.

@cschreib cschreib added the enhancement New feature or request label Oct 24, 2024
@cschreib cschreib added this to the v1.x milestone Oct 24, 2024
@nicolasnoble
Copy link
Author

I'm currently traveling so my response will be brief, but I just wanted to say thank you for sharing this. This is cool!

Oh, please do enjoy your travel, I'm in no rush at all. Thank you for taking the time to reply and your detailed feedback :)

I have no experience with freestanding environments, but what you propose makes sense in spirit.

If I'm being perfectly honest, I tried to go for the fastest evaluation path, in a fail fast manner, and I didn't necessarily spend a huge lot of time looking at the implementation details for what the best solution could be. It turns out your library is exceedingly viable for freestanding environments with no FPU, RTC, or filesystem, and while a basic demo test creates a 385kB binary which, for a machine where you have a bit less than 2MB available, is a lot, it's still acceptable for only a test.

For memcpy/memmove, could an alternative be to simply handwrite the copy algorithm if SNITCH_FREESTANDING (to be added) is set to 1? I assume the compiler can detect the pattern and replace it with the appropriate builtin (with optimisations enabled...). Would need testing. That would require less setup on the user side. Alternatively, is <algorithm> available?

<algorithm> is totally available, yes, and std::copy is a thing, as I guess you were probably thinking of it. I guess the main difference with std::memcpy and std::memmove is it won't allow nullifying some UB caused by data conversion and reinterpretation, but while I only glanced at your code, it seems you're not relying on that at all, so it could be a perfectly viable change instead. I haven't looked deeply on why you need std::memmove, but the copy direction is only a small implementation detail on top of std::copy itself.

For <cstdio>, your proposal makes sense. Though, the mechanism we have been using so far is to use function_ref (essentially a function pointer that can't be null and also works with member functions) rather than an extern declaration. This is already done to customise the function to use for console output, in particular. Could be done for file output too.

Though in the freestanding case there is no suitable default, as far as I understand, so we'd have to initialise these refs to "unimplemented" functions (which terminate on call), and that is not very nice.

I see it now, yes. This works for me. I just haven't seen how it works just yet, but that's pretty much irrelevant: it's there, and it's usable. If you don't like the idea of having a default which crashes, these could also just be no-ops. They'd just... do nothing. I'm a bit more partial to an std::terminate myself, as displaying nothing feels like a more frustrating debugging experience for the user, in case they forgot to set the console output.

We could instead initialise the functions refs to undefined extern declarations, which comes back to your proposal but with the added ability to also use member functions if needed. The downside is then that the functions must be defined, even when not used, which is also not very nice.

I'm reluctant to introduce macros other than 0/1 and numerical values, to keep things simple.

As I didn't dig enough, I haven't noticed you don't have any other macros beyond 0/1 and numerical values, indeed, and I totally agree with you on that one.

Couple of remarks, then: we could have a new SNITCH_FREESTANDING toggle, which then cascades into two other toggles: SNITCH_WITH_CONSOLE and SNITCH_WITH_FILE, which would each bump out the current implementations for console and files into no-ops or std::terminate. In fact, I technically would only need a console output myself, as I don't intend on setting the -out option, nor using any generator. For now. Which simplifies the implementation details overall.

In addition to the usage of std::copy, I could quickly send out a pull request. I'm not 100% sure how we could proceed for the unit testing of all that however, but spinning up a freestanding environment takes a few minutes of gcc compilation time in a github workflow, and I have scripts ready for the case of the PlayStation 1. It's just that running these tests can be a bit more challenging, at face value. And I haven't tried Linux-on-Linux freestanding gcc yet, and I'm not completely sure on how reliable that'd be as a test.

Also, notable: as I said, I went for the fastest path to trying things out, and I used the following toggles, different from your defaults. Not totally sure this is relevant, but it's something to consider:

  • SNITCH_WITH_EXCEPTIONS 0
  • SNITCH_WITH_TIMINGS 0
  • SNITCH_APPEND_TO_CHARS 0
  • SNITCH_WITH_ALL_REPORTERS 0
  • SNITCH_WITH_MULTITHREADING 0

@cschreib
Copy link
Member

It turns out your library is exceedingly viable for freestanding environments with no FPU, RTC, or filesystem, and while a basic demo test creates a 385kB binary which, for a machine where you have a bit less than 2MB available, is a lot, it's still acceptable for only a test.

That's great to hear! I haven't really made a concerted effort to reduce binary size, although it was a consideration when writing the float-to-string implementation. The state-of-the-art algorithms do have much bigger tables, which seemed to me were not worth the cost.

When you say there is no FPU, does that mean that floating point calculations are still possible, but emulated in software? Does this have any impact on C++ code, other than performance considerations? I'm asking mostly to see if it would make sense to add a compile-time toggle to remove the float-to-string code and all you to save more space.

Another question on binary size: have you tweaked any of the SNITCH_MAX_* options? I expect the default SNITCH_MAX_TEST_CASES of 5000 is probably a bit much for embedded platforms... Reducing this and others might help you shrink the binary footprint further.

Lastly, I'm dropping this here because I think it's very relevant; there was a recent blog article from the author of fmtlib, discussing how to reduce binary size. We don't use fmtlib nor std::format, but the article has a few cool tricks for investigating this kinds of issues, that may be worth looking into.

It inspired me to try compiling snitch without linking to any C or C++ runtime to see what we actually depend on. Oddly, even with -fno-exceptions, much of the dependence on the C++ runtime was exception-related stuff in std::optional and std::variant. Apparently, even with -fno-exceptions, libstdc++ does let these classes throw... Maybe I need to compile against an actual freestanding STL.

I haven't looked deeply on why you need std::memmove

The append() function arguments can alias each other:

snitch::small_string<128> s;
append(s, "abc");
append(s, s); // aliasing

I initially thought this could cause UB because of overlapping memory ranges, but actually append() always copies past the end of the first argument, which in principle points to unclaimed memory. So I think by construction the ranges can never overlap in normal use 🤔. But they can be made to, with a bit of unsanitary code:

snitch::small_string<128> s;
append(s, "abc");
std::string_view old_s = s;
s.clear();
append(s, old_s); // now the ranges actually overlap

Since the small_string never re-allocates, and since char is a trivial type, old_s actually points to valid memory. So with std::memmove this code doesn't cause UB today, but it would if we had used std::memcpy or std::copy. But it's asking for trouble... We don't rely on this behaviour in snitch, so perhaps std::copy would be fine.

The only reason I didn't use std::copy was that I was initially weary of including <algorithm>, which is a notably heavy header, and I feared for compilation time. But it's in in a *.cpp file, and other files now include <algorithm> too, so I don't think that's a valid worry anymore.

I see it now, yes. This works for me. I just haven't seen how it works just yet, but that's pretty much irrelevant: it's there, and it's usable.

Here's how that would work for the console output:

// Define you own printing function.
void my_console_print(std::string_view message) noexcept {
    // Do whatever you need to do to print the provided characters...
}

// Then in main():
int main(int argc, const char* argv[]) {
    // Replace the print function used for the command-line interface and debug messages:
    snitch::cli::console_print = &my_console_print;
    // Replace the print function used for test reporting:
    snitch::tests.print_callback = &my_console_print;
}

For files it's not possible today, but we could use a similar mechanism to override the open/write/close functions. The difficulty is managing the file state (e.g. FILE* pointer in C, or std::ofstream object in C++). We have already solved this for reporters; the idea is to use an inplace_any<max_file_object_size> (with a configurable max_file_object_size) to store this state. Then the three functions become (here implemented using std::ofstream):

void file_open(inplace_any<max_file_object_size>& storage, std::string_view path) {
    storage.emplace<std::ofstream>(path);
}

void file_write(inplace_any<max_file_object_size>& storage, std::string_view message) {
    storage.get<std::ofstream>() << message;
}

void file_close(inplace_any<max_file_object_size>& storage) {
    storage.reset();
}

(could be made a little nicer if we introduced a type-erased inplace_any_view so that we don't have to repeat the size in the template parameter...)

I'm a bit more partial to an std::terminate myself, as displaying nothing feels like a more frustrating debugging experience for the user, in case they forgot to set the console output.

I agree!

Couple of remarks, then: we could have a new SNITCH_FREESTANDING toggle, which then cascades into two other toggles: SNITCH_WITH_CONSOLE and SNITCH_WITH_FILE, which would each bump out the current implementations for console and files into no-ops or std::terminate.

As above, I think we'd go for std::terminate. Otherwise, perhaps some bike-shedding on names: rather than split consoles vs files, we could simply have a single SNITCH_WITH_STDIO. I think it's unlikely that you'd have console but not files, or vice versa.

I like the idea of adding SNITCH_FREESTANDING, as that could cascade into other defines (no exceptions, no threading, etc.). There are a few things to decide, such as what happens if someone sets SNITCH_FREESTANDING=ON and SNITCH_WITH_STDIO=ON (IMO: the freestanding toggle takes precedence, and turns stdio off).

I'm not 100% sure how we could proceed for the unit testing of all that however, ...

I think spinning up a Playstation 1 environment might be overkill. Perhaps a simpler approach would be to set SNITCH_FREESTANDING=ON, define a "do nothing" function for the console output, and configure snitch to not link to libc or libstdc++ (the fmtlib articles shows how to do that). In principle this should compile and run. Though, as above, we may need to compile against a freestanding STL. It seems GCC has a -ffreestanding toggle that enables this, but this may require GCC 13 to work properly.

As to actually checking that the run is successful without having console output, we could simply check the return code of the test application (GitHub actions already do that automatically). That will be annoying to debug when the tests fails though... The nicer alternative would be to build our own "standard library", creating a shared library that only exposes wrappers around the <cstdio> functions that we need, and configure snitch to use that. That should mimic a free-standing environment (does it?) and still give us console output to debug problems when they occur.

@nicolasnoble
Copy link
Author

When you say there is no FPU, does that mean that floating point calculations are still possible, but emulated in software? Does this have any impact on C++ code, other than performance considerations? I'm asking mostly to see if it would make sense to add a compile-time toggle to remove the float-to-string code and all you to save more space.

Yes, the compiler will have to emit soft float implementation, which is fairly bloated, and slow. It should be considered the same as with exceptions or rtti. However, I didn't really see any float usage straight off? Wasn't your conversion code working at compilation time? Or I didn't necessarily spend too much time investigating what I was looking at.

Another question on binary size: have you tweaked any of the SNITCH_MAX_* options? I expect the default SNITCH_MAX_TEST_CASES of 5000 is probably a bit much for embedded platforms... Reducing this and others might help you shrink the binary footprint further.

Yes, this definitely helps. But the default values are very viable still, which is good for entry-level users.

Here's how that would work for the console output:

// Define you own printing function.
void my_console_print(std::string_view message) noexcept {
    // Do whatever you need to do to print the provided characters...
}

// Then in main():
int main(int argc, const char* argv[]) {
    // Replace the print function used for the command-line interface and debug messages:
    snitch::cli::console_print = &my_console_print;
    // Replace the print function used for test reporting:
    snitch::tests.print_callback = &my_console_print;
}

Right, I see now you can manually define main; seen the proper documentation. It makes sense.

Otherwise, perhaps some bike-shedding on names: rather than split consoles vs files, we could simply have a single SNITCH_WITH_STDIO. I think it's unlikely that you'd have console but not files, or vice versa.

Wellll... it's actually not that odd. Console usually would be a tty interface through uart for instance, while files require an actual filesystem working in the backend, which isn't really a guarantee. The PS1 has a standard tty system in its kernel, but no default writable filesystem (it's cd-rom based). It's similar to other embedded environment too, not just the PS1.

I think spinning up a Playstation 1 environment might be overkill. Perhaps a simpler approach would be to set SNITCH_FREESTANDING=ON, define a "do nothing" function for the console output, and configure snitch to not link to libc or libstdc++ (the fmtlib articles shows how to do that). In principle this should compile and run. Though, as above, we may need to compile against a freestanding STL. It seems GCC has a -ffreestanding toggle that enables this, but this may require GCC 13 to work properly.

As to actually checking that the run is successful without having console output, we could simply check the return code of the test application (GitHub actions already do that automatically). That will be annoying to debug when the tests fails though... The nicer alternative would be to build our own "standard library", creating a shared library that only exposes wrappers around the <cstdio> functions that we need, and configure snitch to use that. That should mimic a free-standing environment (does it?) and still give us console output to debug problems when they occur.

So the problem is that you can't trust libstdc++ to honor the freestanding toggle. I was the one to file bugs against the gcc project in the first place which got freestanding mode in libstdc++ to even build and work not long ago. And the result is that when you compile gcc/g++, you specify you want to build a libstdc++ which is freestanding. The resulting libstdc++ will be smaller and have less features.

Here's the gist: if you use the freestanding toggle, it'll affect the codegen, but not much the precompiled libstdc++ code itself. And a freestanding libstdc++ which has been built from source as freestanding will have missing headers like cstdio which won't get installed. In a normal gcc environment, you'd be able to compile code against cstdio for instance, even with the freestanding compiler toggle, simply because the header is present on the filesystem.

My point being you wouldn't safely be able to rely on the freestanding alone to ensure the code properly compiles in a fully freestanding environment, and you could get false positive results.

@cschreib
Copy link
Member

However, I didn't really see any float usage straight off? Wasn't your conversion code working at compilation time? Or I didn't necessarily spend too much time investigating what I was looking at.

It's used to implement compile-time float-to-string serialization yes, but also for run-time serialization if SNITCH_APPEND_TO_CHARS is 0 (which you do set). Though the tables used by the conversion functions make up only 1520 bytes, which is far less than most modern float-to-string algorithms (e.g. ryu's tables are about 100kB). And if you disable timings, and you don't use floats in your own test code, I don't think floats are used anywhere in the program so this might all end up being compiled-out anyway.

Wellll... it's actually not that odd. Console usually would be a tty interface through uart for instance, while files require an actual filesystem working in the backend, which isn't really a guarantee. The PS1 has a standard tty system in its kernel, but no default writable filesystem (it's cd-rom based). It's similar to other embedded environment too, not just the PS1.

Interesting! OK, makes sense to keep them separate then.

My point being you wouldn't safely be able to rely on the freestanding alone to ensure the code properly compiles in a fully freestanding environment, and you could get false positive results.

I understand, that's unfortunate. I think the simplest alternative that is left would be to still compile against a non-freestanding STL, but check that the link-time dependencies don't include anything that would not be available in a freestanding environment. That sounds doable.

@cschreib
Copy link
Member

cschreib commented Nov 23, 2024

For example, with the build options you selected above, g++ 11 on an x86_64 linux host with the regular STL, linking the resulting libsnitch.a to a simple test application like so:

#include <snitch/snitch.hpp>

TEST_CASE("test") {
    CHECK(1 == 2);
}

And calling objdump -T test, we get:

test:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.34) __libc_start_main
0000000000000000  w   D  *UND*	0000000000000000  Base        __gmon_start__
0000000000000000  w   D  *UND*	0000000000000000  Base        _ITM_deregisterTMCloneTable
0000000000000000  w   D  *UND*	0000000000000000  Base        _ITM_registerTMCloneTable
0000000000000000  w   DF *UND*	0000000000000000 (GLIBC_2.2.5) __cxa_finalize
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) strlen
0000000000000000      DF *UND*	0000000000000000 (CXXABI_1.3) __gxx_personality_v0
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.4)  __stack_chk_fail
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) abort
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) memcmp
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) memchr
0000000000000000      DF *UND*	0000000000000000 (GLIBCXX_3.4.20) _ZSt24__throw_out_of_range_fmtPKcz
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) memmove
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) memset
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) fwrite
0000000000000000      DF *UND*	0000000000000000 (GLIBCXX_3.4) _ZSt9terminatev
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) fopen
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) fclose
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) fflush
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.14) memcpy
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) __cxa_atexit
0000000000032b44  w   DF .text	0000000000000025  Base        _ZNSt11char_traitsIcE2eqERKcS2_
00000000000faa18 g    DO .bss	0000000000000008 (GLIBC_2.2.5) stdout

We can add a test that checks this output against an allow-list; any entry not in the allow-list would flag a test failure. My understanding is that we can keep all of these except:

  • fopen / fclose / fflush -> will be gone with SNITCH_WITH_FILE=off
  • stdout -> will be done with SNITCH_WITH_CONSOLE=off
  • fwrite -> currently used both for writing to file and the console... do you have that one? Or will we need to move to printf() for the console?
  • memcpy / memmove / memchr / strlen / memcmp -> can be replaced by C++ algorithms (I worry that a non-freestanding C++ STL will use these under the hood though...) or hand-rolled
  • _ZSt24__throw_out_of_range_fmtPKcz -> not sure what is introducing that one, but probably std::optional or std::variant

In particular I assume these ones are OK, but let me know if not:

  • abort, not sure where that one comes from
  • _ZSt9terminatev (std::terminate()), used all over the place in snitch
  • _ZNSt11char_traitsIcE2eqERKcS2_ (std::char_traits<char>::eq(char const&, char const&)), probably from std::string_view

Edit: Doing some more digging as to where some of these are coming from.

  • strlen is called from std::char_traits<char>::length(char const*), so likely for the std::string_view constructor taking a null-terminated C-string (that will be in the command-line parsing, but also in the append() function for const char*).
  • memcmp is called from std::char_traits<char>::compare(char const*, char const*, unsigned long), so std::string_view comparison operator, called all over the place.
  • memchr is called from std::char_traits<char>::find(char const*, unsigned long, char const&), so std::string_view::find() which is called in a few places.
  • memset is inserted by the compiler to initialize some struct managed by snitch.
  • abort is inserted by the compiler to replace all calls to throw in the STL (in std::__throw_bad_variant_access and std::__throw_bad_optional_access).
  • When I remove the explicit calls to memcpy and memmove in the snitch code, I still see memmove in the output. This is coming from the implementation of std::copy.

So:

I worry that a non-freestanding C++ STL will use these under the hood though...

... that worry was justified. While we can solve some of them (strlen and memchr would be easily hand-rolled as they are not called often), others would be painful (memcmp would require getting rid of std::string_view comparisons), and others are simply impossible (memset, abort).

From the GCC docs, I read:

GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp.

And it seems it will always insert calls to these functions to manage aggregate memory, so having a dependency on these symbols seems unavoidable. So while we can get rid of explicit calls to all these in snitch code, the output binary will still refer to these functions, and I don't know how we could differentiate explicit calls from calls emitted by the compiler.

Unless we choose to accept this as a caveat of the "is this going to compile in a freestanding environment?" check, it seems the only alternative left is to actually compile in an emulated freestanding environment.

FYI: I started a branch with some of this in: https://github.com/snitch-org/snitch/tree/freestanding. If you want to pick it up, either fork it and have at it, or I can invite you as maintainer and you can contribute directly to the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants