Scim users get distorted between construct, post, get. by fisx · Pull Request #1754 · wireapp/wire-server

fisx · 2021-09-10T15:16:56Z

Case handling in scim is a bit of a mess.

One of our the bigger issues was this: we parse RichInfo data in scim from a json schema that contains the key/value pairs both as a json object, and for when an ordering of keys is required, an assoc assoc array. The parser constructs the union of the unordered and the ordered map. The problem is that jsonLower (a relatively new helper function that attempts to deal with the criminally insane idea that scim json needs to be case insensitive in its object attributes. jsonLower honours this requirement, but of course does not go through all the string values in richFieldType in the assoc list that correspond to the object attributes in the map. Fixed eg. here (calling the smart constructor that eliminates duplicates rather than the ADT constructor).

To make things more fun, there is this issue with lower-casing in haskell, which we fix here by running all lower-casers in sequence, and overall by using case-insensitive consistently over the various lower-casing functions in text or base.

This PR also introduces a function normalizeLikeStored that normalizes Scim users. It is used a lot in tests, but also to make sure that the spar responses are a bit more "normal".

I also add some new tests, and increase the entropy when creating scim data in /services/spar/test-integration/Util/Scim.hs (because I ran into collisions there).

Sorry, this isn't very coherent. Maybe the changes will make more sense than this attempt at summarizing them? You can skim through the commit history, but I don't recommend reading them in order for a review.

Checklist

The PR Title explains the impact of the change.
The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
changelog.d contains the following bits of information:
- A file with the changelog entry in one or more suitable sub-sections. The sub-sections are marked by directories inside changelog.d.

also: - no more Arbitrary instances for un-normalized types. - more coherent normalization. - fixes a couple of failing test cases.

I don't remember why I did this, but I think the reason has evaporated. Now it seems quite silly.

fisx · 2021-09-24T20:50:55Z

failure on concourse:

when service can be discovered and the URLs change:             FAIL
>         test/unit/Test/Brig/Calling.hs:169:
>         servers should get overwritten
>         expected: Discovered (SFTServers (SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft4.foo.example.com.", srvTargetPort = 443}} :| [SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft5.foo.example.com.", srvTargetPort = 443}}]))
>          but got: Discovered (SFTServers (SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft1.foo.example.com.", srvTargetPort = 443}} :| [SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft2.foo.example.com.", srvTargetPort = 443}},SrvEntry {srvPriority = 0, srvWeight = 0, srvTarget = SrvTarget {srvTargetDomain = "sft3.foo.example.com.", srvTargetPort = 443}}]))

failure when running tests locally:

  test-integration/Test/Spar/APISpec.hs:769:5:
  1) WireIdPAPIV2.Spar.API, DELETE /identity-providers/:idp, zuser has no team, responds with 'insufficient permissions'
       uncaught exception: ErrorCall
       Assertions failed:
        1: 202 =/= 403

       Response was:

       Response {responseStatus = Status {statusCode = 403, statusMessage = "Forbidden"}, responseVersion = HTTP/1.1, responseHeaders = [("Transfer-Encoding","chunked"),("Date","Fri, 24 Sep 2021 20:44:39 GMT"),("Server","Warp/3.3.13"),("Content-Encoding","gzip"),("Content-Type","application/json")], responseBody = Just "{\"code\":403,\"message\":\"The given phone number has been blacklisted due to suspected abuse or a complaint.\",\"label\":\"blacklisted-phone\"}", responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
       CallStack (from HasCallStack):
         error, called at src/Bilge/Assert.hs:89:5 in bilge-0.22.0-2K7mJKDUfGc17BkrPwuOOW:Bilge.Assert
         <!!, called at src/Bilge/Assert.hs:107:19 in bilge-0.22.0-2K7mJKDUfGc17BkrPwuOOW:Bilge.Assert
         !!!, called at test-integration/Util/Core.hs:580:3 in main:Util.Core
         createRandomPhoneUser, called at test-integration/Test/Spar/APISpec.hs:684:26 in main:Test.Spar.APISpec
         testGetPutDelete, called at test-integration/Test/Spar/APISpec.hs:769:5 in main:Test.Spar.APISpec

I predict both are flakes and won't reproduce for quite a while, but I'm not entirely confident about the latter.

fisx · 2021-09-25T09:03:12Z

I saw it one more time, then couldn't reproduce it in 20 or so other local runs, or on concourse. Can't think of any connection between this PR and that test. Hm...

fisx · 2021-09-27T09:43:13Z

libs/wire-api/src/Wire/API/User/RichInfo.hs

+normalizeRichInfoAssocListInt = nubOrdOn nubber . filter ((/= mempty) . richFieldValue)
+  where
+    -- see also: https://github.com/basvandijk/case-insensitive/issues/31
+    nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType


we might get away with only this:

Suggested change

nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType

nubber = richFieldType

haven't tried.

Yeah, Text.toCaseFold should be exactly the same as CI.foldedCase, since CI wraps a Text here, if I'm not mistaken. And we shouldn't call both toLower and toCaseFold.

Nope, this will trigger that thing that I think is a bug in case-insensitive again: basvandijk/case-insensitive#31 (comment)

pcapriotti

Looks reasonably clean. I'm not super confident, but it looks like we are now normalising everything at every stage, so it's more likely to be correct (especially if we decide to just ignore locale-specific issues and uncommon unicode characters like Cherokee letters), although I suspect some normalisation steps are unnecessary. I left a few comments below.

pcapriotti · 2021-09-27T12:21:47Z

libs/hscim/src/Web/Scim/Schema/Schema.hs

  | CustomSchema Text
  deriving (Show, Eq)

+fakeEnumSchema :: [Schema]


Maybe adding a comment that this is just for testing is a good idea.

pcapriotti · 2021-09-27T12:24:13Z

libs/hscim/src/Web/Scim/Schema/Common.hs

 jsonLower (Object o) = Object . HM.fromList . fmap lowerPair . HM.toList $ o
  where
-    lowerPair (key, val) = (toLower key, jsonLower val)
+    lowerPair (key, val) = (CI.foldCase key, jsonLower val)


It's probably not ideal to use case-folded strings as JSON keys (Unicode recommends to use case-folding only for comparison). Why is this JSON normalisation still needed? I would guess that once all the case comparisons on the haskell side are done correctly, JSON could be generated just using the "original" strings. Am I thinking about this wrong?

The original idea of jsonLower was to run it in json parsers initially, so that the actual parser could rely on everything being lower-case. This was an easy way of working around the fact that json and therefore aeson is strictly case-sensitive.

So morally, this is just lower-casing Values that are about to be deconstructed. But I will double-check and add this to the haddocks.

I double-checked, and I was right:

git grep -Hn jsonLower libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:84: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:95: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:114: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Class/Group.hs:63: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Class/Group.hs:76: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/AuthenticationScheme.hs:70: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/Common.hs:97:jsonLower :: Value -> Value libs/hscim/src/Web/Scim/Schema/Common.hs:98:jsonLower (Object o) = Object . HM.fromList . fmap lowerPair . HM.toList $ o libs/hscim/src/Web/Scim/Schema/Common.hs:100: lowerPair (key, val) = (CI.foldCase key, jsonLower val) libs/hscim/src/Web/Scim/Schema/Common.hs:101:jsonLower (Array x) = Array (jsonLower <$> x) libs/hscim/src/Web/Scim/Schema/Common.hs:102:jsonLower same@(String _) = same -- (only object attributes, not all texts in the value side of objects!) libs/hscim/src/Web/Scim/Schema/Common.hs:103:jsonLower same@(Number _) = same libs/hscim/src/Web/Scim/Schema/Common.hs:104:jsonLower same@(Bool _) = same libs/hscim/src/Web/Scim/Schema/Common.hs:105:jsonLower same@Null = same libs/hscim/src/Web/Scim/Schema/ListResponse.hs:62: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/Meta.hs:70: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/ResourceType.hs:59: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/Address.hs:39: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/Certificate.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/Email.hs:47: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/IM.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/Name.hs:39: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/Phone.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower libs/hscim/src/Web/Scim/Schema/User/Photo.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower libs/wire-api/test/unit/Test/Wire/API/User/RichInfo.hs:171: jsonroundtrip = unsafeParse . Scim.jsonLower . Aeson.toJSON

... aaand the haddocks already pretty much say this much. @pcapriotti if you can think of anything to add, please let me know (i can also do it in a separate PR).

pcapriotti · 2021-09-27T12:26:11Z

libs/wire-api/src/Wire/API/User/RichInfo.hs

+normalizeRichInfoAssocListInt = nubOrdOn nubber . filter ((/= mempty) . richFieldValue)
+  where
+    -- see also: https://github.com/basvandijk/case-insensitive/issues/31
+    nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType


Yeah, Text.toCaseFold should be exactly the same as CI.foldedCase, since CI wraps a Text here, if I'm not mistaken. And we shouldn't call both toLower and toCaseFold.

fisx · 2021-09-27T17:21:24Z

although I suspect some normalisation steps are unnecessary.

very possible! i wanted to mention that my priority was soundness, not efficiency, but forgot.

This was "needed" because hscim inconsistently used 'Text.toLower', 'CI.foldedCase' etc. throughout the code base, and since they are behaving slightly differently, we had to make sure here to catch them all. Since we have normalized that, we can simplify.

fisx · 2021-09-27T17:41:41Z

hm...

    RichInfo Examples
      Empty rich info:                                                                              OK
      Old RichInfoMapAndList:                                                                       OK
      case insensitive 'richinfo':                                                                  OK
      RichInfoMapAndList as only assoc list:                                                        OK
      RichInfoMapAndList Map:                                                                       OK
      Without Old RichInfoMapAndList:                                                               OK
      wrong version:                                                                                OK
      drop empty fields:                                                                            OK
    'toRichInfoAssocList', 'fromRichInfoAssocList'
      works (counter-example of earlier bug):                                                       OK
      works (property):                                                                             FAIL (0.17s)
        *** Failed! Falsified (after 37 tests and 10 shrinks):
        RichInfoAssocList {unRichInfoAssocList = [RichField {richFieldType = "\43889", richFieldValue = "`"}]}
        RichInfoAssocList {unRichInfoAssocList = [RichField {richFieldType = "\43889", richFieldValue = "`"},RichField {richFieldType = "\5025", richFieldValue = "`"}]} /= RichInfoAssocList {unRichInfoAssocList = [RichField {richFieldType = "\43889", richFieldValue = "`"}]}
        Use --quickcheck-replay=719615 to reproduce.

looks like this doesn't work out of the box?

This reverts commit f368bde.

fisx force-pushed the various-fixes-3 branch from 7e84d4a to 1f3c33a Compare September 10, 2021 18:56

fisx changed the title ~~Scim users get distorted between cosntruct, post, get.~~ Scim users get distorted between construct, post, get. Sep 10, 2021

fisx force-pushed the various-fixes-3 branch 2 times, most recently from d862722 to 7eb487a Compare September 20, 2021 14:53

fisx force-pushed the various-fixes-3 branch from 7eb487a to 33aeb1e Compare September 21, 2021 11:44

fisx changed the title ~~Scim users get distorted between construct, post, get.~~ Scim users get distorted between construct, post, get. [skip ci] Sep 21, 2021

fisx force-pushed the various-fixes-3 branch 2 times, most recently from ef5e267 to 9eb2355 Compare September 24, 2021 07:20

fisx added 11 commits September 24, 2021 21:06

More entropy in integration tests.

84808a9

Remove bogus name.

6b5e9cd

Normalize scim data before responding to POST /scim/v2/Users.

8d8f44e

Unit tests.

e482855

Smart constructors for RichInfoMapAndList, RichInfoAssocList.

c1c09d6

also: - no more Arbitrary instances for un-normalized types. - more coherent normalization. - fixes a couple of failing test cases.

The FIX!!

6fa007a

...

625b0f4

Not The REAL FIX!!! either

55a4a3a

The third fix at least makes one test pass...

dac9e3a

...

5bbc0cb

typo

a31a22d

fisx force-pushed the various-fixes-3 branch from 35f9988 to a31a22d Compare September 24, 2021 19:12

fisx marked this pull request as ready for review September 24, 2021 19:25

fisx changed the title ~~Scim users get distorted between construct, post, get. [skip ci]~~ Scim users get distorted between construct, post, get. Sep 24, 2021

revert wrong change caused by braindead pattern matching.

873f242

fisx requested a review from pcapriotti September 24, 2021 19:40

fisx added 3 commits September 24, 2021 21:42

changelog

da41aa1

Fix outdated test case.

a2f98b1

Don't brutally and point-lessly lower-case case-insensitive json.

300f6e4

I don't remember why I did this, but I think the reason has evaporated. Now it seems quite silly.

hi ci

cea2456

fisx commented Sep 27, 2021

View reviewed changes

pcapriotti approved these changes Sep 27, 2021

View reviewed changes

fisx added 2 commits September 27, 2021 19:33

Haddocks.

2e19959

Drop redundant lower-casing.

f368bde

This was "needed" because hscim inconsistently used 'Text.toLower', 'CI.foldedCase' etc. throughout the code base, and since they are behaving slightly differently, we had to make sure here to catch them all. Since we have normalized that, we can simplify.

fisx added 2 commits September 27, 2021 19:46

Revert "Drop redundant lower-casing."

314ed0b

This reverts commit f368bde.

hi ci

05789b9

fisx merged commit 0eea220 into develop Sep 27, 2021

fisx deleted the various-fixes-3 branch September 27, 2021 20:32

	nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType
	nubber = richFieldType

Conversation

fisx commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

fisx commented Sep 24, 2021

Uh oh!

fisx commented Sep 25, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pcapriotti left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fisx commented Sep 27, 2021

Uh oh!

fisx commented Sep 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fisx commented Sep 10, 2021 •

edited

Loading