Fix dhall-to-nix encoding of symbols with special keys #2426

Profpatsch · 2022-06-19T02:06:07Z

Profpatsch · 2022-06-19T02:06:45Z

I have to say that I pretty much implemented this blindly, I don’t know if there’s any golden tests that have to be corrected now, or if I missed any symbols in the AST.

Profpatsch · 2022-06-19T02:07:52Z

I kept as closely to the GHC version of z-encoding as possible, most of the code is copied verbatim (except I switched the function to Text and added a check that skips the conversion to String if there is no characters that have to be escaped).

Profpatsch · 2022-06-19T02:49:50Z

Oh, another problem I found is that sum keys can be the same nearly arbitrary symbols in Dhall, but the nix representation of anonymous function keys can not:

The dhall

<Foo | Bar/Baz>.Bar/Baz

generates the nix code

{ Bar/Baz, Foo }:
  Bar/Baz

but nix function attribute arguments cannot be quoted, and cannot contain / for example. So we are gonna have to Z-Encode every union key in the generated nix.

Profpatsch · 2022-06-19T03:45:09Z

I added a second commit which should fix this problem.

Now only function arguments and let-bindings should be z-encoded anymore, because those can’t really be reduced any further. Maybe there could be an encoding via a record field, but that would impact performance.

Profpatsch · 2022-06-19T03:46:35Z

I haven’t found a way to make a let binding appear in the nix code though, maybe that is not even possible? They always normalize out in my examples and never appear in the beta-reduced output.

Profpatsch · 2022-06-19T04:19:35Z

And another one, we also need to quote field selectors.

dhall-nix/src/Dhall/Nix.hs

ocharles · 2022-06-20T07:13:57Z

I don't really use Dhall anymore, so don't have much input here. It looks generally fine, but I haven't given it a thorough review. I'm probably not the right person to be reviewing, so I'm untagging myself.

dhall-nix/src/Dhall/Nix.hs

Profpatsch · 2022-12-25T18:35:25Z

I totally forgot about this; @Gabriella439 I think I addressed your comments, PTAL

Gabriella439

Looks great! Just some tiny suggestions

Gabriella439 · 2022-12-27T17:08:45Z

dhall-nix/src/Dhall/Nix.hs

+    then StaticKey (VarName key)
+    else DynamicKey (Plain (DoubleQuoted [Plain key]))
+    where
+        simpleChar c = isAsciiLower c || isAsciiUpper c


Minor suggestion: I think this makes the intention more clear

Suggested change

simpleChar c = isAsciiLower c || isAsciiUpper c

simpleChar c = isAscii c && isLetter c

This would also require a matching change to the imports

Edit: Fixed to use && instead of ||

I … don’t think all ascii values are a valid nix symbol. It includes stuff like 0x04 (Ctrl+c); and I’d have to verify if they are allowed to start with a digit

For example

nix-repl> { 1 = "foo"; } error: syntax error, unexpected INT at «string»:1:3: 1| { 1 = "foo"; } | ^ 2|

Whoops, I meant to say isAscii c && isLetter c

But isAscii still includes all control characters for example

i.e.

-- | Selects the first 128 characters of the Unicode character set, -- corresponding to the ASCII character set. isAscii :: Char -> Bool isAscii c = c < '\x80'

Please read my responses more closely

Oh, using && … at least to me filtering out by boolean and is not more intuitive than set logic.

The main reason I suggested using isLetter is that it seems more direct than specifying that something is a letter by saying that it is uppercase or lowercase. What you've written reads to me like:

simpleChar c = (isAscii c && isUpper c) || (isAscii c && isLower c)

Anyway, coming back to this, apart from the aesthetics, isLetter needs support for unicode ICU (it’s not just a simple ord check):

isLetter :: Char -> Bool isLetter c = case generalCategory c of GeneralCategory UppercaseLetter -> True LowercaseLetter -> True TitlecaseLetter -> True ModifierLetter -> True OtherLetter -> True _ -> False generalCategory :: Char -> GeneralCategory generalCategory c = toEnum $ fromIntegral $ wgencat $ fromIntegral $ ord c foreign import ccall unsafe "u_gencat" wgencat :: Int -> Int

vs

isAsciiLower :: Char -> Bool isAsciiLower c = c >= 'a' && c <= 'z'

Gabriella439 · 2022-12-27T17:10:04Z

dhall-nix/src/Dhall/Nix.hs

+                  ( isAsciiLower c
+                  || isAsciiUpper c


Perhaps reuse the simpleChar function here

Hm, I’d have to think about whether they are semantically the same.

Indeed, thinking about it they are not the same, since we either double quote where necessary (using the simpleChar filter), or z-encode where necessary (internal symbols that cannot be quoted). In particular, z-encoded values should never land inside a struct for example.

I think this made me notice a place where I’d forgotten the encoding, namely Project.

What I mean is that right now simpleChar c is defined as isAsciiLower c || isAsciiUpper c, so it seems like you should be able to replace isAsciiLower c || isAsciiUpper c right here with simpleChar c

I still feel like you can reuse the simpleChar function here:

needsEncoding c = not (simpleChar c)

Gabriella439 · 2022-12-27T17:11:13Z

dhall-nix/src/Dhall/Nix.hs

+-- @
+--         Before          After
+--         --------------------------
+--         Trak            Trak
+--         foo-wib         foozmwib
+--         \>               zg
+--         \>1              zg1
+--         foo\#            foozh
+--         foo\#\#           foozhzh
+--         foo\#\#1          foozhzh1
+--         fooZ            fooZZ
+--         :+              ZCzp
+-- @


If you begin each line with > then you don't need to escape special characters like \#

This is the change I'm referring to:

Suggested change

-- @

-- Before After

-- --------------------------

-- Trak Trak

-- foo-wib foozmwib

-- \> zg

-- \>1 zg1

-- foo\# foozh

-- foo\#\# foozhzh

-- foo\#\#1 foozhzh1

-- fooZ fooZZ

-- :+ ZCzp

-- @

-- > Before After

-- > --------------------------

-- > Trak Trak

-- > foo-wib foozmwib

-- > > zg

-- > >1 zg1

-- > foo# foozh

-- > foo## foozhzh

-- > foo##1 foozhzh1

-- > fooZ fooZZ

-- > :+ ZCzp

Profpatsch · 2024-01-14T19:31:18Z

The fix unfortunately does not apply anymore. I don’t know if it was fixed on HEAD in the meantime

Gabriella439 · 2024-01-29T02:49:52Z

This is not fixed in HEAD

Symbols in nix can only consist of a very restricted amount of characters, whereas in dhall they can be basically anything. So let’s use an encoding scheme similar to what GHC uses to generate C symbols. Code slightly changed (some GHC-specific cases removed). I might have missed some cases of dhall symbols that are translated verbatim.

Symbols in nix can only consist of a very restricted amount of characters, whereas in dhall they can be basically anything. When you want to get a value of a union, before it was generated into ``` { Foo, Bar }: Foo ``` where `Foo` cannot be a complex symbol like `{ Foo/Baz, Bar }: Foo/Baz`, because nix does not allow it in anonymous record arguments. So now we generate it as ``` u: u."Foo/Baz" ``` which should always work and is equal (though it loses the information of what other fields are there in the nix code). Before I faultily encoded some of these symbols with a Z-encoding, but that was wrong, so it was undone.

Another one I missed, when you have a field selector, you want to quote it, in case it has some symbols nix does not know how to handle. `x.Foo/bar` will now be `x."Foo/bar"`, which is valid nix.

We copied the Z-encoding functions from GHC more or less verbatim, but we can rewrite it in terms of `Text.concatMap`, which should perform better. In this first step, we change all the helper functions from `Char -> String` to `Char -> Text`, and apply hlint warnings.

We can drop the extra `any needsEncoding` check, since it should be performant enough on its own when using `Text.concatMap` and simplifies the code a bit.

A small improvement in the generation logic. The hnix should really just do this for us.

Profpatsch · 2024-02-15T09:46:20Z

Rebased on master

Profpatsch · 2024-07-20T12:41:34Z

Is this intended to stay broken?

Profpatsch requested review from ocharles and Gabriella439 June 19, 2022 02:06

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 93b0d1c to 82d5b93 Compare June 19, 2022 02:09

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 82d5b93 to 791d9e8 Compare June 19, 2022 03:42

ocharles reviewed Jun 19, 2022

View reviewed changes

dhall-nix/src/Dhall/Nix.hs Outdated Show resolved Hide resolved

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 8212381 to 8fbdaf7 Compare June 19, 2022 20:19

ocharles reviewed Jun 20, 2022

View reviewed changes

dhall-nix/src/Dhall/Nix.hs Outdated Show resolved Hide resolved

ocharles requested review from ocharles and removed request for ocharles June 20, 2022 07:14

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 8fbdaf7 to 1313716 Compare June 20, 2022 11:53

Gabriella439 reviewed Jun 22, 2022

View reviewed changes

dhall-nix/src/Dhall/Nix.hs Outdated Show resolved Hide resolved

dhall-nix/src/Dhall/Nix.hs Outdated Show resolved Hide resolved

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 1313716 to 2e26784 Compare December 25, 2022 18:28

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 9aab03b to 5e3a407 Compare December 25, 2022 21:08

Gabriella439 approved these changes Dec 27, 2022

View reviewed changes

Profpatsch added 6 commits February 14, 2024 16:15

dhall-to-nix: Quote field selection symbols

d17ff96

Another one I missed, when you have a field selector, you want to quote it, in case it has some symbols nix does not know how to handle. `x.Foo/bar` will now be `x."Foo/bar"`, which is valid nix.

dhall-to-nix: Z-encode symbols without going through String

34a797e

We can drop the extra `any needsEncoding` check, since it should be performant enough on its own when using `Text.concatMap` and simplifies the code a bit.

dhall-to-nix: Only double-quote symbols if they have symbols

7a7e767

A small improvement in the generation logic. The hnix should really just do this for us.

Profpatsch force-pushed the fix-dhall-to-nix-key-encoding branch from 5e3a407 to 7a7e767 Compare February 15, 2024 09:46

Merge branch 'main' into fix-dhall-to-nix-key-encoding

53a0f34

Gabriella439 merged commit 4305dab into dhall-lang:main Aug 21, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dhall-to-nix encoding of symbols with special keys #2426

Fix dhall-to-nix encoding of symbols with special keys #2426

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

ocharles commented Jun 20, 2022

Profpatsch commented Dec 25, 2022

Gabriella439 left a comment

Gabriella439 Dec 27, 2022 •

edited

Loading

Profpatsch Dec 27, 2022

Profpatsch Dec 27, 2022

Gabriella439 Dec 27, 2022

Profpatsch Dec 27, 2022

Profpatsch Dec 27, 2022

Gabriella439 Dec 27, 2022

Profpatsch Dec 28, 2022

Gabriella439 Dec 28, 2022 •

edited

Loading

Profpatsch Feb 15, 2024

Gabriella439 Dec 27, 2022

Profpatsch Dec 27, 2022

Profpatsch Dec 28, 2022

Gabriella439 Dec 28, 2022

Gabriella439 Mar 2, 2024

Gabriella439 Dec 27, 2022

Gabriella439 Mar 2, 2024

Profpatsch commented Jan 14, 2024

Gabriella439 commented Jan 29, 2024

Profpatsch commented Feb 15, 2024

Profpatsch commented Jul 20, 2024

	simpleChar c = isAsciiLower c \|\| isAsciiUpper c
	simpleChar c = isAscii c && isLetter c

--- @
---         Before          After
---         --------------------------
---         Trak            Trak
---         foo-wib         foozmwib
---         \>               zg
---         \>1              zg1
---         foo\#            foozh
---         foo\#\#           foozhzh
---         foo\#\#1          foozhzh1
---         fooZ            fooZZ
---         :+              ZCzp
--- @
+-- >        Before          After
+-- >        --------------------------
+-- >        Trak            Trak
+-- >        foo-wib         foozmwib
+-- >        >               zg
+-- >        >1              zg1
+-- >        foo#            foozh
+-- >        foo##           foozhzh
+-- >        foo##1          foozhzh1
+-- >        fooZ            fooZZ
+-- >        :+              ZCzp

Fix dhall-to-nix encoding of symbols with special keys #2426

Fix dhall-to-nix encoding of symbols with special keys #2426

Conversation

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

Profpatsch commented Jun 19, 2022

ocharles commented Jun 20, 2022

Profpatsch commented Dec 25, 2022

Gabriella439 left a comment

Choose a reason for hiding this comment

Gabriella439 Dec 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gabriella439 Dec 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Profpatsch commented Jan 14, 2024

Gabriella439 commented Jan 29, 2024

Profpatsch commented Feb 15, 2024

Profpatsch commented Jul 20, 2024

Gabriella439 Dec 27, 2022 •

edited

Loading

Gabriella439 Dec 28, 2022 •

edited

Loading