Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reader.lisp: First try at parsing C numeric literals #7

Merged

Conversation

akanouras
Copy link
Contributor

Hello and thank you for your work!

This is a first, buggy, try at parsing C numeric literals, would love if you could review and comment on it!

@y2q-actionman
Copy link
Owner

Thank you very much. I hadn't thought of implementing C numeric literals until I saw your PR.
I have tested your code with my existing test code. I got an error, but I don't know the cause yet. I will let you know when I find the cause.

@akanouras
Copy link
Contributor Author

Thank you for the fast reply!

Apparently I missed testing with single-digit literals, for starters, and didn't try your tests against the new code. I will push a new commit later today - sorry for the inconvenience.

What do you think about the general appearance of the code? Am I going about it the right way?
Haven't yet programmed FSMs, but I suspect it would take 1/5 of the code that way, and with less bugs as well. Another way to shorten the code would be using cl-ppcre, but I avoided that, seeing as you're not using it.

@akanouras
Copy link
Contributor Author

Hi again,

just letting you know I've pushed a new commit - test results are same as before now: "Success: 168 tests, 1265 checks."

@akanouras
Copy link
Contributor Author

I should mention the motive for this (and future) pull requests - hopefully it's compatible with your goals for this project:

I need to implement quite a few network protocols whose authoritative definition is mostly in C header files. Toward that end I'm writing a SWIG XML parser in CL, as other CL tools I've tried don't match this usage pattern. SWIG does a bit of constant folding, but not enough. With this pull request, I've managed to import most #defines from a test C header file, and can move on to implementing enum and struct handling.

It is my hope to eventually create a native CL system that imports C header files without depending on external tools, with the help of with-c-syntax .

@y2q-actionman
Copy link
Owner

I was surprised to see your purpose. My purpose was just for fun... For example, looking at Duff's Device C code in Lisp and laughing at it. I will cooperate you if this code is useful for something. I review your commits; give me some time.

But I still need to hear what you're going to do.
I think network protocols are very sensitive for C struct alignments or paddings. My code does not treat them, and I don't intend to support it because C alignment rule is not meaningful in Lisp. To obtain them, you still need to use a C compiler.

e.g. CFFI-grovel (https://common-lisp.net/project/cffi/manual/html_node/The-Groveller.html) is just for the purpose. Current with-c-syntax cannot treat #define because it lacks the C preprocessor. CFFI-grovel works good for getting #define values also.

And, SWIG itself has a support for Common Lisp. I would like to know why these tools were not working for you.

@akanouras
Copy link
Contributor Author

I knew you'd be surprised, I too got scared of the possibility of someone actually using it in the way presented in the README. :-)

Feel free to take your time reading the following, or not at all; I've already overcome my first block by writing this patch.

TL;DR: cl-autowrap is unfortunately not working for me; I need a lightweight and more native replacement I can hack on, and with-c-syntax seems like it could play a part in it.

Here is some more background to what I'm doing - please keep in mind while reading it that for now I only want to implement as much code as is needed to get to the point of actually implementing protocols - not create complete implementations of anything described.

I first tried using cl-autowrap, which didn't work for me for two reasons:

  1. Its invocation of c2ffi fails on my system, possibly because of an incompatibility with LLVM/Clang 11.0.1 on Ubuntu 21.04, and I don't have the inclination or LLVM chops to debug it.
  2. If I am not mistaken, it seems to work on preprocessed input and is missing most #defines in its output .

After realizing 2. above I didn't try its fork, claw, either.

There is a chance Vacietis has a usable parser, but I was too afraid to go near that, seeing how many failed before me.

Regarding SWIG, Common Lisp (along with a few more languages) support was dropped in version 4.0, and I'd prefer to implement the integration in CL, instead of a mix of C/C++ and SWIG templates.

I am aware of C struct alignment intricacies, and CFFI-Grovel is indeed the tool I intend to use for the next steps! However, as I need to generate (and maintain) grovel wrappers for many C header files from various C projects, some of which are moving targets, I have to automate the process to the extent it is possible.

Let's call the system I'm writing cffi-grovel-generator for this discussion.

So, my thought process is:

  1. The user runs "cffi-grovel-generator wrap ..." on the C header file(s) of interest
  2. cffi-grovel-generator calls SWIG to parse the C header file
  3. cffi-grovel-generator uses with-c-syntax to parse leftover snippets by SWIG if needed, or doesn't, passing them on to CFFI-Grovel. TBD.
  4. cffi-grovel-generator generates the cffi-grovel wrapper file
  5. The user augments the wrapper file to their liking
  6. The user runs "cffi-grovel-generator update ..." whenever the C header file(s) is/are updated
  7. The user runs "cffi-grovel-generator generate ..."
  8. cffi-grovel-generator calls CFFI-Grovel/GCC for each supported architecture (I need amd64/armhf/arm64/mips/mipsel/mips64el and possibly riscv64 soon)
  9. cffi-grovel-generator stores the resultant files in the end system's Git repo
  10. The resulting end system only has .lisp files and is able to be compiled/loaded on any supported architecture (and maybe even Mezzano OS at some point) without needing any external tools

While implementing the above, I may find out that the SWIG XML -> Grovel template process is enough for my purposes, in which case I'll have no use for with-c-syntax for the moment. if OTOH I find out that SWIG cannot work in an architecture-independent way, as I've read on Reddit, I'll expedite my efforts on getting with-c-syntax to work as a configurable C/CPP parser.

Practically speaking, the whole thing will have many similarities with cl-autowrap, but it will use SWIG XML/with-c-syntax instead of Clang as the "frontend", GCC instead of Clang as the "backend", and the guts will be written in CL instead of C++.

Wrt with-c-syntax, it may have been written for fun, but it's already parsing a decent subset of C nevertheless. ;-)

In every case, with-c-syntax will always be able to be used to write CL in C syntax for fun (or for scaring babies).

@y2q-actionman
Copy link
Owner

I understand a bit now. I've been writing the cffi-grovel files by hand if required. I've only done this for small libraries, so hand-writing was sufficient, but I understand that it's hard when there are a lot of them.

I did some digging and found the following Reddit thread. The context is same?
https://www.reddit.com/r/lisp/comments/61efpo/tutorials_or_in_depth_examples_on_how_to_use/

I haven't yet figured out how with-c-syntax will be used, but if you think there is a possibility that this fun library can be used, by all means use it. I may also use the cffi-grovel-generator.
(I've been leaving with-c-syntax for a long time, but I'm getting ready to fix some of the known bugs that only I know about.)


Regarding SWIG, Common Lisp (along with a few more languages) support was dropped in version 4.0

I didn't know that.. It feels sad.


I'll expedite my efforts on getting with-c-syntax to work as a configurable C/CPP parser.

To put with-c-syntax into practice, there are two difficult points that I can think of right now.

First point. The current with-c-syntax only supports the C90 grammar. This is not a problem if you are only parsing header files as you say, because headers should only contain declarations. However, the header may contain static inline functions introduced in C99. In this case, inline is not recognized as a keyword by with-c-syntax, so I think it cannot be parsed.
To really make them readable, the grammar definition would need to be updated.

Second point. I think CPP is tough.
I once came up with the idea of including CPP in with-c-syntax. (The fact that there is a file named 'proprocessor.lisp' in the source is a remnant of that ambition.) So I did some research and found a CPP implementation called mcpp. The documentation that accompanies it complains about every existing CPP implementation, as well as the CPP standard itself. I found them to be chaotic and confusing. (It's interesting to read if you have time.)

Since Common Lisp has defmacro and reader macro in the first place, I thought that implementing this complex CPP in with-c-syntax would be of little value. So I decided not to implement CPP at that time.

If we do implement CPP, I think we need to make some compromises..


In every case, with-c-syntax will always be able to be used to write CL in C syntax for fun (or for scaring babies).

😁

@y2q-actionman y2q-actionman self-requested a review May 8, 2021 09:32
@@ -250,6 +250,71 @@ This is bound by '#{' read macro to the `*readtable*' at that time.")
(read-slash-comment stream char
#'read-single-or-equal-symbol))

(eval-when (:compile-toplevel)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think +acceptable-numeric-characters+ may be referred not only in compile-time. I prefer to use (:compile-toplevel :load-toplevel :execute) here.

Suggested change
(eval-when (:compile-toplevel)
(eval-when (:compile-toplevel :load-toplevel :execute)

src/reader.lisp Outdated
Comment on lines 254 to 258
(defconstant +acceptable-numeric-characters+
'(( 2 . (#\0 #\1))
( 8 . (#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7))
(10 . (#\- #\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\.))
(16 . (#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\A #\B #\C #\D #\E #\F #\a #\b #\c #\d #\e #\f)))))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to use (alexandria:define-constant ... :test 'equal) to define a list constant, to suppress redefinition errors.

I use it here:
https://github.com/y2q-actionman/with-c-syntax/blob/master/src/with-c-syntax.lisp#L244-L281

(16 . (#\0 #\1 #\2 #\3 #\4 #\5 #\6 #\7 #\8 #\9 #\A #\B #\C #\D #\E #\F #\a #\b #\c #\d #\e #\f)))))

(defun read-bare-number (&optional (base 10) (stream *standard-input*) (c0 nil))
(let ((string (make-array '(0) :element-type 'character :adjustable t))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it work in my environment (SBCL 2.1.3 on MacOSX), :fill-pointer 0 is required.

Suggested change
(let ((string (make-array '(0) :element-type 'character :adjustable t))
(let ((string (make-array '(0) :element-type 'character :adjustable t :fill-pointer 0))

((member c acceptable-characters) (vector-push-extend (read-char stream) string))
(t nil))) ; Finish processing if any other character
(if floatp
(let ((*readtable* named-readtables::*standard-readtable*))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with-standard-io-syntax is usable for here, because It resets *readtable* and *read-base* to 10.

Suggested change
(let ((*readtable* named-readtables::*standard-readtable*))
(with-standard-io-syntax

src/reader.lisp Show resolved Hide resolved
y2q-actionman
y2q-actionman previously approved these changes May 8, 2021
Copy link
Owner

@y2q-actionman y2q-actionman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed this commit fixed existing tests. thank you

@y2q-actionman y2q-actionman dismissed their stale review May 8, 2021 12:19

sorry, I pressed the wrong button.

@y2q-actionman y2q-actionman merged commit ef5473f into y2q-actionman:master May 24, 2021
@y2q-actionman
Copy link
Owner

I finally added numeric literals. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants