Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace HsSrcExts backend with own Preprocessor backend #231

Closed
TravisCardwell opened this issue Oct 20, 2024 · 11 comments
Closed

Replace HsSrcExts backend with own Preprocessor backend #231

TravisCardwell opened this issue Oct 20, 2024 · 11 comments
Assignees

Comments

@TravisCardwell
Copy link
Collaborator

TravisCardwell commented Oct 20, 2024

We have decided to stop using the haskell-src-exts package. We have already started using our own AST representation, since haskell-src-exts is missing some needed features. We currently use haskell-src-exts to render the source code when in preprocessor mode, but we found that we have little control over how the code is formatted. For example, Pretty output does not support documentation, and while haskell-src-exts-sc could work for us, it would require implementing our own pretty-printer (fork of Pretty). (See #26 (comment) for details.) At that point, we may as well implement a pretty-printer for our own AST that supports documentation, as that would likely be much easier to maintain in the long run.

Note that we only have to pretty-print the code that we generate (a subset of Haskell syntax), not develop a pretty-printer that handles all cases. We do not have to parse Haskell code.

Here are some other options that we considered:

  • We could generate code with poor formatting and then fix the formatting with a code formatter. Code formatters tend to be pretty big dependencies, however. Adding support for a new release of GHC would depend on support by the formatter.
  • ghc-exactprint is an option. It uses GHC types, however, and supporting a range of GHC versions would require a lot of maintenance.

Here are some code formatters for reference/inspiration/discouragement:

Note that the maintained code formatters avoid haskell-src-exts due to various issues.

@TravisCardwell TravisCardwell self-assigned this Oct 20, 2024
@TravisCardwell
Copy link
Collaborator Author

It sounds like we will design our AST to use a type family to determine the types of annotations to different parts of the AST for different stages/passes of the translation, similar to the technique described in Trees that Grow.

One generally writes a formatter that supports all stages/passes, so that it can be used to format error messages in intermediate stages as well as the final code. I am not yet sure what is best in our case, however, as I do not think we have a concrete plan for the translation passes (documented) yet. When we got to this, here are some things to consider:

  • Should we support formatting the AST for all passes, or will it be sufficient to only format the final representation?
  • Will the representation of the initial pass be annotated with documentation that has already been translated to Haddock?

@TravisCardwell
Copy link
Collaborator Author

We need to decide what to do about formatting rules ("style").

We could provide options that allows users to tweak the style according to their tastes, but doing so would make the formatting implementation more complicated.

We could support different styles by name, which would allow users to select the style that they dislike least.

How important is this? Perhaps we do not need to worry about style options so much? Perhaps users who use/prefer existing formatters can simply run their preferred formatter on the generated code?

Based on what I know at this time, I think it might be preferable to create a single style with no options. That would be the easiest to maintain as we implement features during initial development. We can organize the code to allow for other styles (perhaps with options) that may be implemented in the future.

If we do this, we need to pick a style. We could render the same style that we are writing hs-bindgen in, which I think is Edsko's style. (Perhaps in HsBindgen.Backend.PP.Render.Edsko) I do not know formatting rules for some things for which I do not have examples for, however. Another option is to render a (similar) style with simplified rules that make it easier to automate. (Perhaps in HsBindgen.Backend.PP.Render.Simple)

@TravisCardwell
Copy link
Collaborator Author

Should we take character widths into account when formatting code?

In particular, the widths of characters determine how code is formatted when using a maximum line length. With Unicode text, the maximum line length is generally specified in columns/cells, not characters. For example, identifiers in Chinese may result in lines that are well over 80 columns even when there are fewer than 80 characters per line.

When comments are aligned alongside code, ignoring differences in character width often results in misalignment. We will not have to worry about this if we put all comments on separate lines.

Note that character width is not specified as a Unicode property, as fonts have leeway. (For example, the reference mark (U+203B) 「※」 is often problematic because it is displayed using one column in some fonts and two columns in other fonts. We will not run across this character, which is not valid in C or Haskell identifiers.) I usually reference Vim.

Perhaps we do not want to worry about such details now but may revisit them in the future. In that case, we can use a function that gets the width of text with an initial implementation that just returns the number of characters, which can later be changed. The function should be used consistently, avoiding counting characters anywhere else in the code.

@TravisCardwell
Copy link
Collaborator Author

I have been working on a pretty printer for a few days without communication, so I should probably post an update!

The pretty package implements a simple pretty-printing library, originally based on the famous paper. It defines a Doc type that represents a set of possible layouts, and rending a Doc selects the "best" layout according to the specified options.

This library does not seem to provide enough flexibility to format code very well. In particular, it does not provide a way to specify different possible layouts that code may be written, depending on how it fits within the specified line length. For example, consider a function call with three arguments (f x y z). If the actual length, taking account the name of the function and the actual arguments, can fit within the specified line length, then we should do that. If it is too long, however, we generally put each argument on separate lines, perhaps taking multiple lines when an argument expression is long. With this library, we have to choose one style for all cases, resulting in really long lines and/or line breaks at inappropriate locations.

I have been working on an alternative that tries to keep the simplicity of pretty but allows us to specify multiple layouts. Instead of representing a set of possible layouts, the data structure represents a decision tree for rendering. The full tree is of course prohibitively large, but it is created lazily, so only relevant decisions are evaluated.

Another issue that the alternative addresses is being able to format documentation. To format Haskell-style documentation, we need to know the current indentation and the maximum line length. The pretty library does not provide such context. Indentation is done by "nesting," so total indentation of each line is only known when rendering. The alternative threads this information (Reader-style) so that it is possible to take into account when formatting documentation.

If interested, here is an overview of the current design:

data Options = Options { ... }
defaultOptions :: Options

data Content = ... | ...
renderContent :: Options -> Content -> String

-- defined here so base is only dependency
newtype Reader ctx a = Reader { unReader :: ctx -> a }

data Context = Context { ... }
mkContext :: Options -> Context

newtype Doc = Doc { unDoc :: Reader Context Content }
... API for constructing and rendering 'Doc's

class Pretty a where
  pretty :: a -> Doc
  prettyPrec :: Int -> a -> Doc

render :: Pretty a => Options -> a -> String
render_ :: Pretty a => a -> String

The Doc and Pretty API build up a Content decision tree, and rendering is done by traversing that tree. Monadic operations are implemented within the library, so users can generally use an API like pretty. (Example: PP.pretty name <+> PP.char '=' <+> PP.pretty body)

I think that the idea is sound, but I have found that it is pretty difficult to debug, which could be frustrating as we change and add new things to our AST. I implemented some debug tracing that helps, though the huge amount of output can be difficult to grok. For example, I am currently debugging a strange issue where some code uses vcat ("vertical concatenation") but it renders with horizontal concatenation. Dumping the full decision tree shows VCatContent nodes, but tracing the decisions oddly shows no corresponding decisions!

In the call on Thursday, I mentioned that I hope to implement a "lightweight" solution like pretty instead of a more involved solution like ormolu. I have started to have second thoughts about this. If a more verbose solution is easier to debug/maintain, then we should go with that IMHO. Unless I find a silly bug in my code and resolve that issue very soon, it might be time to abort the pretty-style combinators and go fully monadic. An ormolu-style design instead uses a monadic write head, and the API functions directly render to the Builder according to (and modifying) the monadic context.

@TravisCardwell
Copy link
Collaborator Author

Based on discussion in the chat, it sounds like an initial version does not need to have very pretty output. An initial simple solution can suffice, and we can implement a better pretty-printer in the future if it is needed.

I implemented a wrapper around the pretty library that uses a CtxDoc type to thread a Context that keeps track of the current indentation and maximum line length. This enables the implementation of a renderedLines function that can be used to format documentation comments for the current indentation.

-- | Create a document with rendered lines
--
-- The function is passed the maximum width of each line, computed as the
-- maximum line length minus the current indentation.  It must return a list
-- of lines, using an empty string to represent a blank line.  None of the
-- strings may contain newline characters.
--
-- The lines are /automatically/ indented according to the current indentation.
renderedLines :: (Int -> [String]) -> CtxDoc

I implemented the Pretty instances to try to keep expressions on one line when possible but also avoid really long lines. It does not produce code as a human would write it, of course, and there is some inconsistent indentation due to how little control over indentation we have with the pretty library. Chains of infix operators led to really long lines, so I implemented a special case for that. Currently, the biggest issue is pattern matching constructors with many arguments. I may be able to improve on the current situation.

If we would like to have better indentation, however, we should either switch to a different underlying library or write our own.

Example 1

File: hs-bindgen/examples/simple_structs.h

module Demo where

data CS1 = CS1

instance Foreign.Storable.Storable CS1 where

  Foreign.Storable.sizeOf = \_ -> 8

  Foreign.Storable.alignment = \_ -> 4

  Foreign.Storable.peek =
    \x0 ->
      pure MkCS1 <*> Foreign.Storable.peekByteOff x0 0
        <*> Foreign.Storable.peekByteOff x0 32

  Foreign.Storable.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCS1 cS1_a2 cS1_b3 ->
            Foreign.Storable.pokeByteOff x0 0 cS1_a2
              >> Foreign.Storable.pokeByteOff x0 32 cS1_b3

data CS2 = CS2

instance Foreign.Storable.Storable CS2 where

  Foreign.Storable.sizeOf = \_ -> 12

  Foreign.Storable.alignment = \_ -> 4

  Foreign.Storable.peek =
    \x0 ->
      pure MkCS2 <*> Foreign.Storable.peekByteOff x0 0
        <*> Foreign.Storable.peekByteOff x0 32
        <*> Foreign.Storable.peekByteOff x0 64

  Foreign.Storable.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCS2 cS2_a2 cS2_b3 cS2_c4 ->
            Foreign.Storable.pokeByteOff x0 0 cS2_a2
              >> Foreign.Storable.pokeByteOff x0 32 cS2_b3
              >> Foreign.Storable.pokeByteOff x0 64 cS2_c4

data CX = CX

instance Foreign.Storable.Storable CX where

  Foreign.Storable.sizeOf = \_ -> 1

  Foreign.Storable.alignment = \_ -> 1

  Foreign.Storable.peek =
    \x0 ->
      pure MkCX <*> Foreign.Storable.peekByteOff x0 0

  Foreign.Storable.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCX cX_a2 -> Foreign.Storable.pokeByteOff x0 0 cX_a2

data CS4 = CS4

instance Foreign.Storable.Storable CS4 where

  Foreign.Storable.sizeOf = \_ -> 8

  Foreign.Storable.alignment = \_ -> 4

  Foreign.Storable.peek =
    \x0 ->
      pure MkCS4 <*> Foreign.Storable.peekByteOff x0 0
        <*> Foreign.Storable.peekByteOff x0 32

  Foreign.Storable.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCS4 cS4_b2 cS4_a3 ->
            Foreign.Storable.pokeByteOff x0 0 cS4_b2
              >> Foreign.Storable.pokeByteOff x0 32 cS4_a3
Example 2

File: hs-bindgen/examples/primitive_types.h

module Demo where

data CPrimitive = CPrimitive

instance Foreign.Storable.Storable CPrimitive where

  Foreign.Storable.sizeOf = \_ -> 176

  Foreign.Storable.alignment = \_ -> 16

  Foreign.Storable.peek =
    \x0 ->
      pure MkCPrimitive
        <*> Foreign.Storable.peekByteOff x0 0
        <*> Foreign.Storable.peekByteOff x0 8
        <*> Foreign.Storable.peekByteOff x0 16
        <*> Foreign.Storable.peekByteOff x0 32
        <*> Foreign.Storable.peekByteOff x0 48
        <*> Foreign.Storable.peekByteOff x0 64
        <*> Foreign.Storable.peekByteOff x0 80
        <*> Foreign.Storable.peekByteOff x0 96
        <*> Foreign.Storable.peekByteOff x0 112
        <*> Foreign.Storable.peekByteOff x0 128
        <*> Foreign.Storable.peekByteOff x0 160
        <*> Foreign.Storable.peekByteOff x0 192
        <*> Foreign.Storable.peekByteOff x0 224
        <*> Foreign.Storable.peekByteOff x0 256
        <*> Foreign.Storable.peekByteOff x0 320
        <*> Foreign.Storable.peekByteOff x0 384
        <*> Foreign.Storable.peekByteOff x0 448
        <*> Foreign.Storable.peekByteOff x0 512
        <*> Foreign.Storable.peekByteOff x0 576
        <*> Foreign.Storable.peekByteOff x0 640
        <*> Foreign.Storable.peekByteOff x0 704
        <*> Foreign.Storable.peekByteOff x0 768
        <*> Foreign.Storable.peekByteOff x0 832
        <*> Foreign.Storable.peekByteOff x0 896
        <*> Foreign.Storable.peekByteOff x0 960
        <*> Foreign.Storable.peekByteOff x0 1024
        <*> Foreign.Storable.peekByteOff x0 1088
        <*> Foreign.Storable.peekByteOff x0 1152
        <*> Foreign.Storable.peekByteOff x0 1280

  Foreign.Storable.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCPrimitive cPrimitive_c2 cPrimitive_sc3 cPrimitive_uc4 cPrimitive_s5 cPrimitive_si6 cPrimitive_ss7 cPrimitive_ssi8 cPrimitive_us9 cPrimitive_usi10 cPrimitive_i11 cPrimitive_s212 cPrimitive_si213 cPrimitive_u14 cPrimitive_ui15 cPrimitive_l16 cPrimitive_li17 cPrimitive_sl18 cPrimitive_sli19 cPrimitive_ul20 cPrimitive_uli21 cPrimitive_ll22 cPrimitive_lli23 cPrimitive_sll24 cPrimitive_slli25 cPrimitive_ull26 cPrimitive_ulli27 cPrimitive_f28 cPrimitive_d29 cPrimitive_ld30 ->
            Foreign.Storable.pokeByteOff x0 0 cPrimitive_c2
              >> Foreign.Storable.pokeByteOff x0 8 cPrimitive_sc3
              >> Foreign.Storable.pokeByteOff x0 16 cPrimitive_uc4
              >> Foreign.Storable.pokeByteOff x0 32 cPrimitive_s5
              >> Foreign.Storable.pokeByteOff x0 48 cPrimitive_si6
              >> Foreign.Storable.pokeByteOff x0 64 cPrimitive_ss7
              >> Foreign.Storable.pokeByteOff x0 80 cPrimitive_ssi8
              >> Foreign.Storable.pokeByteOff x0 96 cPrimitive_us9
              >> Foreign.Storable.pokeByteOff x0 112 cPrimitive_usi10
              >> Foreign.Storable.pokeByteOff x0 128 cPrimitive_i11
              >> Foreign.Storable.pokeByteOff x0 160 cPrimitive_s212
              >> Foreign.Storable.pokeByteOff x0 192 cPrimitive_si213
              >> Foreign.Storable.pokeByteOff x0 224 cPrimitive_u14
              >> Foreign.Storable.pokeByteOff x0 256 cPrimitive_ui15
              >> Foreign.Storable.pokeByteOff x0 320 cPrimitive_l16
              >> Foreign.Storable.pokeByteOff x0 384 cPrimitive_li17
              >> Foreign.Storable.pokeByteOff x0 448 cPrimitive_sl18
              >> Foreign.Storable.pokeByteOff x0 512 cPrimitive_sli19
              >> Foreign.Storable.pokeByteOff x0 576 cPrimitive_ul20
              >> Foreign.Storable.pokeByteOff x0 640 cPrimitive_uli21
              >> Foreign.Storable.pokeByteOff x0 704 cPrimitive_ll22
              >> Foreign.Storable.pokeByteOff x0 768 cPrimitive_lli23
              >> Foreign.Storable.pokeByteOff x0 832 cPrimitive_sll24
              >> Foreign.Storable.pokeByteOff x0 896 cPrimitive_slli25
              >> Foreign.Storable.pokeByteOff x0 960 cPrimitive_ull26
              >> Foreign.Storable.pokeByteOff x0 1024 cPrimitive_ulli27
              >> Foreign.Storable.pokeByteOff x0 1088 cPrimitive_f28
              >> Foreign.Storable.pokeByteOff x0 1152 cPrimitive_d29
              >> Foreign.Storable.pokeByteOff x0 1280 cPrimitive_ld30

Please note that I have not pushed yet. There are some tasks that I need to finish before pushing, such as cleaning up my test code and determining (pretty) dependency bounds.

@TravisCardwell
Copy link
Collaborator Author

I improved the rendering of long case patterns to avoid really long lines.

Example
module Demo where

data CPrimitive = MkCPrimitive
  { cPrimitive_c :: Foreign.C.CChar
  , cPrimitive_sc :: Foreign.C.CSChar
  , cPrimitive_uc :: Foreign.C.CSChar
  , cPrimitive_s :: Foreign.C.CShort
  , cPrimitive_si :: Foreign.C.CShort
  , cPrimitive_ss :: Foreign.C.CShort
  , cPrimitive_ssi :: Foreign.C.CShort
  , cPrimitive_us :: Foreign.C.CUShort
  , cPrimitive_usi :: Foreign.C.CUShort
  , cPrimitive_i :: Foreign.C.CInt
  , cPrimitive_s2 :: Foreign.C.CInt
  , cPrimitive_si2 :: Foreign.C.CInt
  , cPrimitive_u :: Foreign.C.CUInt
  , cPrimitive_ui :: Foreign.C.CUInt
  , cPrimitive_l :: Foreign.C.CLong
  , cPrimitive_li :: Foreign.C.CLong
  , cPrimitive_sl :: Foreign.C.CLong
  , cPrimitive_sli :: Foreign.C.CLong
  , cPrimitive_ul :: Foreign.C.CULong
  , cPrimitive_uli :: Foreign.C.CULong
  , cPrimitive_ll :: Foreign.C.CLLong
  , cPrimitive_lli :: Foreign.C.CLLong
  , cPrimitive_sll :: Foreign.C.CLLong
  , cPrimitive_slli :: Foreign.C.CLLong
  , cPrimitive_ull :: Foreign.C.CULLong
  , cPrimitive_ulli :: Foreign.C.CULLong
  , cPrimitive_f :: Foreign.C.CFloat
  , cPrimitive_d :: Foreign.C.CDouble
  , cPrimitive_ld :: Foreign.C.CDouble
  }

instance Foreign.Storable CPrimitive where

  Foreign.sizeOf = \_ -> 176

  Foreign.alignment = \_ -> 16

  Foreign.peek =
    \x0 ->
      pure MkCPrimitive <*> Foreign.peekByteOff x0 0
        <*> Foreign.peekByteOff x0 8
        <*> Foreign.peekByteOff x0 16
        <*> Foreign.peekByteOff x0 32
        <*> Foreign.peekByteOff x0 48
        <*> Foreign.peekByteOff x0 64
        <*> Foreign.peekByteOff x0 80
        <*> Foreign.peekByteOff x0 96
        <*> Foreign.peekByteOff x0 112
        <*> Foreign.peekByteOff x0 128
        <*> Foreign.peekByteOff x0 160
        <*> Foreign.peekByteOff x0 192
        <*> Foreign.peekByteOff x0 224
        <*> Foreign.peekByteOff x0 256
        <*> Foreign.peekByteOff x0 320
        <*> Foreign.peekByteOff x0 384
        <*> Foreign.peekByteOff x0 448
        <*> Foreign.peekByteOff x0 512
        <*> Foreign.peekByteOff x0 576
        <*> Foreign.peekByteOff x0 640
        <*> Foreign.peekByteOff x0 704
        <*> Foreign.peekByteOff x0 768
        <*> Foreign.peekByteOff x0 832
        <*> Foreign.peekByteOff x0 896
        <*> Foreign.peekByteOff x0 960
        <*> Foreign.peekByteOff x0 1024
        <*> Foreign.peekByteOff x0 1088
        <*> Foreign.peekByteOff x0 1152
        <*> Foreign.peekByteOff x0 1280

  Foreign.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCPrimitive
            cPrimitive_c2
            cPrimitive_sc3
            cPrimitive_uc4
            cPrimitive_s5
            cPrimitive_si6
            cPrimitive_ss7
            cPrimitive_ssi8
            cPrimitive_us9
            cPrimitive_usi10
            cPrimitive_i11
            cPrimitive_s212
            cPrimitive_si213
            cPrimitive_u14
            cPrimitive_ui15
            cPrimitive_l16
            cPrimitive_li17
            cPrimitive_sl18
            cPrimitive_sli19
            cPrimitive_ul20
            cPrimitive_uli21
            cPrimitive_ll22
            cPrimitive_lli23
            cPrimitive_sll24
            cPrimitive_slli25
            cPrimitive_ull26
            cPrimitive_ulli27
            cPrimitive_f28
            cPrimitive_d29
            cPrimitive_ld30 ->
              Foreign.pokeByteOff x0 0 cPrimitive_c2
                >> Foreign.pokeByteOff x0 8 cPrimitive_sc3
                >> Foreign.pokeByteOff x0 16 cPrimitive_uc4
                >> Foreign.pokeByteOff x0 32 cPrimitive_s5
                >> Foreign.pokeByteOff x0 48 cPrimitive_si6
                >> Foreign.pokeByteOff x0 64 cPrimitive_ss7
                >> Foreign.pokeByteOff x0 80 cPrimitive_ssi8
                >> Foreign.pokeByteOff x0 96 cPrimitive_us9
                >> Foreign.pokeByteOff x0 112 cPrimitive_usi10
                >> Foreign.pokeByteOff x0 128 cPrimitive_i11
                >> Foreign.pokeByteOff x0 160 cPrimitive_s212
                >> Foreign.pokeByteOff x0 192 cPrimitive_si213
                >> Foreign.pokeByteOff x0 224 cPrimitive_u14
                >> Foreign.pokeByteOff x0 256 cPrimitive_ui15
                >> Foreign.pokeByteOff x0 320 cPrimitive_l16
                >> Foreign.pokeByteOff x0 384 cPrimitive_li17
                >> Foreign.pokeByteOff x0 448 cPrimitive_sl18
                >> Foreign.pokeByteOff x0 512 cPrimitive_sli19
                >> Foreign.pokeByteOff x0 576 cPrimitive_ul20
                >> Foreign.pokeByteOff x0 640 cPrimitive_uli21
                >> Foreign.pokeByteOff x0 704 cPrimitive_ll22
                >> Foreign.pokeByteOff x0 768 cPrimitive_lli23
                >> Foreign.pokeByteOff x0 832 cPrimitive_sll24
                >> Foreign.pokeByteOff x0 896 cPrimitive_slli25
                >> Foreign.pokeByteOff x0 960 cPrimitive_ull26
                >> Foreign.pokeByteOff x0 1024 cPrimitive_ulli27
                >> Foreign.pokeByteOff x0 1088 cPrimitive_f28
                >> Foreign.pokeByteOff x0 1152 cPrimitive_d29
                >> Foreign.pokeByteOff x0 1280 cPrimitive_ld30

Since I do not think it is a priority, I have not implemented proper length calculation that handles Unicode elegantly. I just count characters (length) and included a TODO.

Note that I make use of unsnoc, which was added to Data.List in base-0.19.0.0. For now, I just copied the source to the "auxiliary functions" section at the bottom of the module, with a comment explaining why.

@TravisCardwell
Copy link
Collaborator Author

Thank you @phadej for pointing out precedence issues with the preprocessor pretty-printer.

The current pretty-printer implementation assumes that we create the AST with precedence in mind. Like using parentheses when writing code by hand, an EParens constructor could be used to output parentheses around expressions that need them. The Template Haskell backend translates our AST directly to the Template Haskell AST and could therefore ignore EParens constructors. When creating the AST, we would need to remember to use EParens constructors where needed, like when writing code by hand. I have learned, however, that we do not want to add an EParens constructor because of this added complexity to code generation.

Experimenting with this today, I changed the pretty-printer to aggressively parenthesize (infix) expressions. This should ensure that the code is parsed as desired, at the expense of readability. In the current state, long lines are back, and there are lots of parentheses.

Example
module Demo where

data CPrimitive = MkCPrimitive
  { cPrimitive_c :: Foreign.C.CChar
  , cPrimitive_sc :: Foreign.C.CSChar
  , cPrimitive_uc :: Foreign.C.CSChar
  , cPrimitive_s :: Foreign.C.CShort
  , cPrimitive_si :: Foreign.C.CShort
  , cPrimitive_ss :: Foreign.C.CShort
  , cPrimitive_ssi :: Foreign.C.CShort
  , cPrimitive_us :: Foreign.C.CUShort
  , cPrimitive_usi :: Foreign.C.CUShort
  , cPrimitive_i :: Foreign.C.CInt
  , cPrimitive_s2 :: Foreign.C.CInt
  , cPrimitive_si2 :: Foreign.C.CInt
  , cPrimitive_u :: Foreign.C.CUInt
  , cPrimitive_ui :: Foreign.C.CUInt
  , cPrimitive_l :: Foreign.C.CLong
  , cPrimitive_li :: Foreign.C.CLong
  , cPrimitive_sl :: Foreign.C.CLong
  , cPrimitive_sli :: Foreign.C.CLong
  , cPrimitive_ul :: Foreign.C.CULong
  , cPrimitive_uli :: Foreign.C.CULong
  , cPrimitive_ll :: Foreign.C.CLLong
  , cPrimitive_lli :: Foreign.C.CLLong
  , cPrimitive_sll :: Foreign.C.CLLong
  , cPrimitive_slli :: Foreign.C.CLLong
  , cPrimitive_ull :: Foreign.C.CULLong
  , cPrimitive_ulli :: Foreign.C.CULLong
  , cPrimitive_f :: Foreign.C.CFloat
  , cPrimitive_d :: Foreign.C.CDouble
  , cPrimitive_ld :: Foreign.C.CDouble
  }

instance Foreign.Storable CPrimitive where

  Foreign.sizeOf = \_ -> 176

  Foreign.alignment = \_ -> 16

  Foreign.peek =
    \x0 ->
      ((((((((((((((((((((((((((((pure MkCPrimitive <*> Foreign.peekByteOff x0 0) <*> Foreign.peekByteOff x0 8) <*> Foreign.peekByteOff x0 16) <*> Foreign.peekByteOff x0 32) <*> Foreign.peekByteOff x0 48) <*> Foreign.peekByteOff x0 64) <*> Foreign.peekByteOff x0 80) <*> Foreign.peekByteOff x0 96) <*> Foreign.peekByteOff x0 112) <*> Foreign.peekByteOff x0 128) <*> Foreign.peekByteOff x0 160) <*> Foreign.peekByteOff x0 192) <*> Foreign.peekByteOff x0 224) <*> Foreign.peekByteOff x0 256) <*> Foreign.peekByteOff x0 320) <*> Foreign.peekByteOff x0 384) <*> Foreign.peekByteOff x0 448) <*> Foreign.peekByteOff x0 512) <*> Foreign.peekByteOff x0 576) <*> Foreign.peekByteOff x0 640) <*> Foreign.peekByteOff x0 704) <*> Foreign.peekByteOff x0 768) <*> Foreign.peekByteOff x0 832) <*> Foreign.peekByteOff x0 896) <*> Foreign.peekByteOff x0 960) <*> Foreign.peekByteOff x0 1024) <*> Foreign.peekByteOff x0 1088) <*> Foreign.peekByteOff x0 1152) <*> Foreign.peekByteOff x0 1280

  Foreign.poke =
    \x0 ->
      \x1 ->
        case x1 of
          MkCPrimitive
            cPrimitive_c2
            cPrimitive_sc3
            cPrimitive_uc4
            cPrimitive_s5
            cPrimitive_si6
            cPrimitive_ss7
            cPrimitive_ssi8
            cPrimitive_us9
            cPrimitive_usi10
            cPrimitive_i11
            cPrimitive_s212
            cPrimitive_si213
            cPrimitive_u14
            cPrimitive_ui15
            cPrimitive_l16
            cPrimitive_li17
            cPrimitive_sl18
            cPrimitive_sli19
            cPrimitive_ul20
            cPrimitive_uli21
            cPrimitive_ll22
            cPrimitive_lli23
            cPrimitive_sll24
            cPrimitive_slli25
            cPrimitive_ull26
            cPrimitive_ulli27
            cPrimitive_f28
            cPrimitive_d29
            cPrimitive_ld30 ->
              Foreign.pokeByteOff x0 0 cPrimitive_c2 >> (Foreign.pokeByteOff x0 8 cPrimitive_sc3 >> (Foreign.pokeByteOff x0 16 cPrimitive_uc4 >> (Foreign.pokeByteOff x0 32 cPrimitive_s5 >> (Foreign.pokeByteOff x0 48 cPrimitive_si6 >> (Foreign.pokeByteOff x0 64 cPrimitive_ss7 >> (Foreign.pokeByteOff x0 80 cPrimitive_ssi8 >> (Foreign.pokeByteOff x0 96 cPrimitive_us9 >> (Foreign.pokeByteOff x0 112 cPrimitive_usi10 >> (Foreign.pokeByteOff x0 128 cPrimitive_i11 >> (Foreign.pokeByteOff x0 160 cPrimitive_s212 >> (Foreign.pokeByteOff x0 192 cPrimitive_si213 >> (Foreign.pokeByteOff x0 224 cPrimitive_u14 >> (Foreign.pokeByteOff x0 256 cPrimitive_ui15 >> (Foreign.pokeByteOff x0 320 cPrimitive_l16 >> (Foreign.pokeByteOff x0 384 cPrimitive_li17 >> (Foreign.pokeByteOff x0 448 cPrimitive_sl18 >> (Foreign.pokeByteOff x0 512 cPrimitive_sli19 >> (Foreign.pokeByteOff x0 576 cPrimitive_ul20 >> (Foreign.pokeByteOff x0 640 cPrimitive_uli21 >> (Foreign.pokeByteOff x0 704 cPrimitive_ll22 >> (Foreign.pokeByteOff x0 768 cPrimitive_lli23 >> (Foreign.pokeByteOff x0 832 cPrimitive_sll24 >> (Foreign.pokeByteOff x0 896 cPrimitive_slli25 >> (Foreign.pokeByteOff x0 960 cPrimitive_ull26 >> (Foreign.pokeByteOff x0 1024 cPrimitive_ulli27 >> (Foreign.pokeByteOff x0 1088 cPrimitive_f28 >> (Foreign.pokeByteOff x0 1152 cPrimitive_d29 >> Foreign.pokeByteOff x0 1280 cPrimitive_ld30)))))))))))))))))))))))))))

While ormolu is not the paragon of style, IMHO, I tried formatting the above example output with it. I was surprised that it does not change either of the long chains of infix operators, keeping the ~1,000 character lines.

Our functions/operators are represented using Global, so perhaps we could add fixity/precedence information (similar to how resolve specifies a Name for each Global) and use it to determine when parentheses are not necessary. I started looking into this, but it is complicated/tricky enough that I would really want to have tests to confirm correctness, especially for chains with operators of mixed fixity/precedence. Since we do not generate any code like that, perhaps this is not something that we should prioritize... I could write special-case handlers for the specific long chains that we create.

For now, I decided to go ahead and commit the aggressive version, since even ormolu does not clean such chains.

If we would like to produce readable code, then perhaps a better approach is to avoid long infix chains.

  • Applicative construction (pure MkCFoo <*> peekByteOff x0 0 <*> ...) is generally not good for maintenance because the constructor arguments are specified by order. When arguments are added/changed/deleted, it can be easy to make mistakes that lead to bugs. I generally write do (or applicative do) blocks where each argument is bound to the field name, and RecordWildCards is used to return the constructed value. In our case, use of applicative construction is not an issue because the code is generated and such changes/maintenance is therefore automated. Using a do block and RecordWildCards would make the code more readable, though.

  • We are sequencing IO actions using the >> operator. Using a do block would make the code more readable.

@TravisCardwell
Copy link
Collaborator Author

I went ahead and implemented imports in the new preprocessor, so that I can compile generated modules to test them. Note that I use NoImplicitPrelude and import Prelude qualified.

On my first test, I got errors like the following:

Demo.hs:43:3: error:
    Qualified name in binding position: Foreign.sizeOf
   |
43 |   Foreign.sizeOf = \_ -> 176
   |   ^^^^^^^^^^^^^^

We need to qualify such function names in general, but we cannot qualify them in instance definitions. I will fix this next.

@TravisCardwell
Copy link
Collaborator Author

With an improved version of QualifiedName, the above instance declaration error is now resolved. The generated code now compiles without error.

@TravisCardwell
Copy link
Collaborator Author

I replaced the HsSrcExts backend with the new PP backend. The HsSrcExts backend is deleted, and package haskell-src-exts is no longer a dependency. (Perhaps #158 can be closed when this change is merged?)

I tried the examples as follows:

$ cabal run hs-bindgen -- \
    preprocess \
      --input hs-bindgen/examples/primitive_types.h \
      --output Generated.hs
$ ghci Generated.hs

Here are the current results. The basics such as simple_structs.h and primitive_types.h compile without error.

  • hs-bindgen/examples/anonymous.h
    • CErrors: duplicate member 'c'
  • hs-bindgen/examples/comments.h OK
  • hs-bindgen/examples/enums.h
    • No instance for Storable Void arising from use of peekByteOff
  • hs-bindgen/examples/fixedwidth.h
    • Not in scope: type constructor or class 'CUint64T'
    • Not in scope: type constructor or class 'CUint32T'
  • hs-bindgen/examples/forward_declaration.h
    • UnrecognizedCursor: CXCursor_TypeRef
  • hs-bindgen/examples/headers.h
    • 'stddef.h' file not found
  • hs-bindgen/examples/invalid_standard_types.h
    • Unknown type name 'uint64_t'
  • hs-bindgen/examples/macro_functions.h
    • (Empty)
  • hs-bindgen/examples/macros.h
    • (Empty)
  • hs-bindgen/examples/nested_types.h
    • Not in scope: type constructor or class 'CStruct'0020foo'
      • Note: the '0020 is an escaped space.
  • hs-bindgen/examples/primitive_types.h OK
  • hs-bindgen/examples/simple_structs.h OK
  • hs-bindgen/examples/standard_types.h
    • 'stddef.h' file not found
  • hs-bindgen/examples/test-th-01.h OK
  • hs-bindgen/examples/typedef_vs_macro.h
    • Not in scope: type constructor or class 'CT1'
    • Not in scope: type constructor or class 'CT2'
    • Not in scope: type constructor or class 'CM1'
    • Not in scope: type constructor or class 'CM2'
  • hs-bindgen/examples/uses_utf8.h
    • No instance for Storable Void arising from use of peekByteOff

edsko added a commit that referenced this issue Nov 7, 2024
Replace `HsSrcExts` backend with `PP` backend (#231)
@edsko
Copy link
Collaborator

edsko commented Nov 7, 2024

Closed in #255.

@edsko edsko closed this as completed Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants