Scim users get distorted between construct, post, get.#1754
Conversation
7e84d4a to
1f3c33a
Compare
d862722 to
7eb487a
Compare
7eb487a to
33aeb1e
Compare
ef5e267 to
9eb2355
Compare
also: - no more Arbitrary instances for un-normalized types. - more coherent normalization. - fixes a couple of failing test cases.
35f9988 to
a31a22d
Compare
I don't remember why I did this, but I think the reason has evaporated. Now it seems quite silly.
|
failure on concourse: failure when running tests locally: I predict both are flakes and won't reproduce for quite a while, but I'm not entirely confident about the latter. |
|
I saw it one more time, then couldn't reproduce it in 20 or so other local runs, or on concourse. Can't think of any connection between this PR and that test. Hm... |
| normalizeRichInfoAssocListInt = nubOrdOn nubber . filter ((/= mempty) . richFieldValue) | ||
| where | ||
| -- see also: https://github.com/basvandijk/case-insensitive/issues/31 | ||
| nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType |
There was a problem hiding this comment.
we might get away with only this:
| nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType | |
| nubber = richFieldType |
haven't tried.
There was a problem hiding this comment.
Yeah, Text.toCaseFold should be exactly the same as CI.foldedCase, since CI wraps a Text here, if I'm not mistaken. And we shouldn't call both toLower and toCaseFold.
There was a problem hiding this comment.
Nope, this will trigger that thing that I think is a bug in case-insensitive again: basvandijk/case-insensitive#31 (comment)
pcapriotti
left a comment
There was a problem hiding this comment.
Looks reasonably clean. I'm not super confident, but it looks like we are now normalising everything at every stage, so it's more likely to be correct (especially if we decide to just ignore locale-specific issues and uncommon unicode characters like Cherokee letters), although I suspect some normalisation steps are unnecessary. I left a few comments below.
| | CustomSchema Text | ||
| deriving (Show, Eq) | ||
|
|
||
| fakeEnumSchema :: [Schema] |
There was a problem hiding this comment.
Maybe adding a comment that this is just for testing is a good idea.
| jsonLower (Object o) = Object . HM.fromList . fmap lowerPair . HM.toList $ o | ||
| where | ||
| lowerPair (key, val) = (toLower key, jsonLower val) | ||
| lowerPair (key, val) = (CI.foldCase key, jsonLower val) |
There was a problem hiding this comment.
It's probably not ideal to use case-folded strings as JSON keys (Unicode recommends to use case-folding only for comparison). Why is this JSON normalisation still needed? I would guess that once all the case comparisons on the haskell side are done correctly, JSON could be generated just using the "original" strings. Am I thinking about this wrong?
There was a problem hiding this comment.
The original idea of jsonLower was to run it in json parsers initially, so that the actual parser could rely on everything being lower-case. This was an easy way of working around the fact that json and therefore aeson is strictly case-sensitive.
So morally, this is just lower-casing Values that are about to be deconstructed. But I will double-check and add this to the haddocks.
There was a problem hiding this comment.
I double-checked, and I was right:
git grep -Hn jsonLower
libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:84: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:95: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Capabilities/MetaSchema.hs:114: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Class/Group.hs:63: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Class/Group.hs:76: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/AuthenticationScheme.hs:70: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/Common.hs:97:jsonLower :: Value -> Value
libs/hscim/src/Web/Scim/Schema/Common.hs:98:jsonLower (Object o) = Object . HM.fromList . fmap lowerPair . HM.toList $ o
libs/hscim/src/Web/Scim/Schema/Common.hs:100: lowerPair (key, val) = (CI.foldCase key, jsonLower val)
libs/hscim/src/Web/Scim/Schema/Common.hs:101:jsonLower (Array x) = Array (jsonLower <$> x)
libs/hscim/src/Web/Scim/Schema/Common.hs:102:jsonLower same@(String _) = same -- (only object attributes, not all texts in the value side of objects!)
libs/hscim/src/Web/Scim/Schema/Common.hs:103:jsonLower same@(Number _) = same
libs/hscim/src/Web/Scim/Schema/Common.hs:104:jsonLower same@(Bool _) = same
libs/hscim/src/Web/Scim/Schema/Common.hs:105:jsonLower same@Null = same
libs/hscim/src/Web/Scim/Schema/ListResponse.hs:62: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/Meta.hs:70: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/ResourceType.hs:59: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Address.hs:39: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Certificate.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Email.hs:47: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/IM.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Name.hs:39: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Phone.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower
libs/hscim/src/Web/Scim/Schema/User/Photo.hs:32: parseJSON = genericParseJSON parseOptions . jsonLower
libs/wire-api/test/unit/Test/Wire/API/User/RichInfo.hs:171: jsonroundtrip = unsafeParse . Scim.jsonLower . Aeson.toJSON
There was a problem hiding this comment.
... aaand the haddocks already pretty much say this much. @pcapriotti if you can think of anything to add, please let me know (i can also do it in a separate PR).
| normalizeRichInfoAssocListInt = nubOrdOn nubber . filter ((/= mempty) . richFieldValue) | ||
| where | ||
| -- see also: https://github.com/basvandijk/case-insensitive/issues/31 | ||
| nubber = Text.toLower . Text.toCaseFold . CI.foldedCase . richFieldType |
There was a problem hiding this comment.
Yeah, Text.toCaseFold should be exactly the same as CI.foldedCase, since CI wraps a Text here, if I'm not mistaken. And we shouldn't call both toLower and toCaseFold.
very possible! i wanted to mention that my priority was soundness, not efficiency, but forgot. |
This was "needed" because hscim inconsistently used 'Text.toLower', 'CI.foldedCase' etc. throughout the code base, and since they are behaving slightly differently, we had to make sure here to catch them all. Since we have normalized that, we can simplify.
|
hm... looks like this doesn't work out of the box? |
This reverts commit f368bde.
Case handling in scim is a bit of a mess.
One of our the bigger issues was this: we parse
RichInfodata in scim from a json schema that contains the key/value pairs both as a json object, and for when an ordering of keys is required, an assoc assoc array. The parser constructs the union of the unordered and the ordered map. The problem is thatjsonLower(a relatively new helper function that attempts to deal with the criminally insane idea that scim json needs to be case insensitive in its object attributes.jsonLowerhonours this requirement, but of course does not go through all the string values inrichFieldTypein the assoc list that correspond to the object attributes in the map. Fixed eg. here (calling the smart constructor that eliminates duplicates rather than the ADT constructor).To make things more fun, there is this issue with lower-casing in haskell, which we fix here by running all lower-casers in sequence, and overall by using
case-insensitiveconsistently over the various lower-casing functions intextorbase.This PR also introduces a function
normalizeLikeStoredthat normalizes Scim users. It is used a lot in tests, but also to make sure that the spar responses are a bit more "normal".I also add some new tests, and increase the entropy when creating scim data in
/services/spar/test-integration/Util/Scim.hs(because I ran into collisions there).Sorry, this isn't very coherent. Maybe the changes will make more sense than this attempt at summarizing them? You can skim through the commit history, but I don't recommend reading them in order for a review.
Checklist
changelog.d.