Introduce polysemy in Galley: part 0 by pcapriotti · Pull Request #1878 · wireapp/wire-server

pcapriotti · 2021-10-21T13:54:26Z

Background

This is the first in a series of PRs with the goal of converting our single Galley monad stack into a polysemy Sem monad. The plan is to do this incrementally in a way that is very unlikely to break anything.

Explanation of the changes

(Please read this before reviewing, then review commits individually.)

One of the main technical difficulties with using Sem instead of our mtl-based Galley monad is the lack of instances for ad-hoc type classes like MonadCatch and MonadUnliftIO. Although it might be possible to work around the first one with some tricks, the second is just not well-defined at all for "open" Sem monads (i.e. Sem r monads where r is only required to satisfy some Member constraints).

This PR is an initial step towards removing such problematic higher-order constraints, and it accomplishes the following:

It adds a type variable r to the Galley monad type. This will eventually be used to hold the Sem effects, but at the moment is just a phantom type variable.
It removes the constrained monad polymorphism in the Data module.
It introduces a placeholder Concurrency effect. This will eventually be used to implement some async primitives as effect methods, but for now it's just there so that we can keep track of which functions need MonadUnliftIO.
It replaces all uses of MonadUnliftIO with uses of some ad-hoc functions, and removes the MonadUnliftIO instance of Galley. These functions will later be implemented using the Concurrency effect.
It removes an unnecessary dependency on Data functions in integration tests.

Note that at this point we are only using polysemy for its Member constraint and the EffectRow kind. We are still not making use of Sem in any way, and we won't for some time.

Next steps

A follow-up PR will deal with MonadCatch and friends in a similar way, so we will never have to define dubious instances for Sem. A corresponding placeholder effect will be introduced.

Later on, we will start making use of error and input effects, still without actually having to use Sem.

Once most of the code is equipped with the correct constraints, we can start reimplementing Galley in terms of Sem, and perhaps even make it a type synonym.

After that, we will introduce more high level effects (db, brig-intra, gundeck-intra, etc...) and refactor accordingly.

Checklist

The PR Title explains the impact of the change.
The PR description provides context as to why the change should occur and what the code contributes to that effect. This could also be a link to a JIRA ticket or a Github issue, if there is one.
changelog.d contains the following bits of information:
- A file with the changelog entry in one or more suitable sub-sections. The sub-sections are marked by directories inside changelog.d.

This is step 0 in the process of converting galley to effects. We introduce a phantom type variable `r` in the `Galley` monad, which will later be used for the effect row.

mdimjasevic

This is great!

There is just one comment I left next to what seems like an accidental sed replacement.

mdimjasevic · 2021-10-21T14:03:59Z

services/galley/src/Galley/API/Error.hs

    \\n\
    \Rationale: there are two services in the backend that need to keep their status in sync: galley (teams),\n\
-    \and spar (SSO).  Galley keeps track of team features.  If galley creates an idp, the feature flag is\n\
+    \and spar (SSO).  Galley r keeps track of team features.  If galley creates an idp, the feature flag is\n\


I suppose this is an undesired consequence of using sed.

isovector · 2021-10-21T15:35:08Z

Overall this looks like a good plan!

I'm a bit concerned about this Concurrency effect, however. What actions does it support and what are its laws? I think it's really important to hammer this out sooner than later --- otherwise you're at risk of accidentally just pushing all of the "tricky IO stuff" into Concurrency and accidentally reinventing IO.

Is concurrency a necessary feature of galley, or is it merely a performance optimization? If the latter, you shouldn't need any Member constraints for it --- make some effects closer to the business logic and hide the concurrency inside of the interpreters!

isovector · 2021-10-21T19:56:03Z

To follow up on my previous comment, I spent some time going through galley. As far as I can tell, UnliftIO is used exclusively as a performance improvement, and there is no inherently parallel behavior. I'm just about to hop on a flight, so please excuse the rough notes:

Things that use unliftIO:

Galley.External.deliver: This thing could absolutely be an effect / interpreter on top of Output. No need to use async.
Galley.API.Teams.ensureUnboundUsers: validates, doesn't parse. Performance
optimization.
Galley.Data.TeamNotifications.add uses it, but I think it's a redundant constraint
Galley.Data to do two requests in parallel. The pattern is always async/wait.
This functionality doesn't need to be exposed at the application level.
Galley.Intra.Push.pushLocal uses it, but fires and forgets (forM_ and void).
Same for Galley.API.Update.notifyConversationMetadataUpdate
Same for Galley.API.LegalHold
Galley.API.Message.getRemoteClients concatenates the results of an async call.
Could be an effect.
Galley.API.Query.getRemoteConversationsWithFailures same usage. Still an implementation detail.
Galley.API.Util.ensureConnectedToLocals same

In short, I see nothing here that couldn't just run new threads in IO hoisted through polysemy interpreters. There are no cases which require arbitrary uses of async.

Before moving forward with polysemization, I'd suggest refactoring these uses out to just do liftIO instead. It'll make the upgrade path much smoother.

pcapriotti · 2021-10-22T06:20:13Z

Thanks @isovector for looking at this!

I'm a bit concerned about this Concurrency effect, however. What actions does it support and what are its laws? I think it's really important to hammer this out sooner than later --- otherwise you're at risk of accidentally just pushing all of the "tricky IO stuff" into Concurrency and accidentally reinventing IO.

The actions it supports are precisely those in the last section of Galley.App (i.e. async/wait, fork, mapConcurrently and similar).

Is concurrency a necessary feature of galley, or is it merely a performance optimization? If the latter, you shouldn't need any Member constraints for it --- make some effects closer to the business logic and hide the concurrency inside of the interpreters!

I agree that concurrency (parallelism, really) in Galley is just for performance, and that the Concurrency effect is not in the spirit of polysemy, but I don't quite see how to just rewrite it in terms of IO at this stage in the process. For example, the data functions use the MonadUnliftIO instance of Client (DB client monad) directly to parallelise queries, so we need to expose this requirement somehow to the other layers of the app. Eventually, this part will go away as we make DB access an effect, but I wouldn't know how to do it now.

Before moving forward with polysemization, I'd suggest refactoring these uses out to just do liftIO instead. It'll make the upgrade path much smoother.

It would, and it's a good idea in principle, I'm just not sure how to do this in practice.

I understand the overall principle on what makes a good effect, and why having an effect like Concurrency, or say DB, is not the intended way to use polysemy. However, I think they can make sense as temporary crutches while moving a codebase to effects. My plan to eventually split or regroup these "fake" effects, by introducing good ones and slowly replacing constraints.

For example, consider the Data functions. Right now some of them have a constraint for Concurrency. Later I might introduce a DB effect, and add a corresponding constraint. At that point, I can move the async stuff to the DB interpreter, and remove the Concurrency constraint from all the Data function. But DB is still bad, because it's too wide and hard to write interpreters for, so next we introduce more specific DB effects (say ConversationStore, TeamStore, etc...) and remove uses of plain DB. Once this process is applied throughout the code base, we can delete Concurrency and DB, and be left with a set of "good" effects.

So I would still go with the current PR as the first step. What do you think?

isovector · 2021-10-22T13:05:31Z

For example, the data functions use the MonadUnliftIO instance of Client (DB client monad) directly to parallelise queries, so we need to expose this requirement somehow to the other layers of the app.

I think this is the crux of my concern. In 61fe4de the old MonadClient code gets specialized into Galley, but IMO this is going the wrong direction. In monomorphizing this, we're increasing the surface area of the application code --- however, where polysemy really shines is in eliding details of this sort.

We took the other direction in Spar. Rather than futz with Data, we took its haddock section

wire-server/services/spar/src/Spar/Data.hs

Lines 53 to 63 in 61fe4de

    
           -- * IDPs 
        
           storeIdPConfig, 
        
           Replaced (..), 
        
           Replacing (..), 
        
           setReplacedBy, 
        
           clearReplacedBy, 
        
           GetIdPResult (..), 
        
           getIdPConfig, 
        
           getIdPIdByIssuerWithoutTeam, 
        
           getIdPIdByIssuerWithTeam, 
        
           getIdPConfigsByTeam,

and turned that into an effect:

wire-server/services/spar/src/Spar/Sem/IdP.hs

Lines 27 to 31 in 6291a90

    
           data IdP m a where 
        
             StoreConfig :: IP.IdP -> IdP m () 
        
             GetConfig :: SAML.IdPId -> IdP m (Maybe IP.IdP) 
        
             GetIdByIssuerWithoutTeam :: SAML.Issuer -> IdP m (GetIdPResult SAML.IdPId) 
        
             GetIdByIssuerWithTeam :: SAML.Issuer -> TeamId -> IdP m (Maybe SAML.IdPId)

which gets interpreted back directly into (MonadClient m, Embed m) via

wire-server/services/spar/src/Spar/Sem/IdP/Cassandra.hs

Lines 9 to 21 in 6291a90

    
           idPToCassandra :: 
        
             forall m r a. 
        
             (MonadClient m, Member (Embed m) r) => 
        
             Sem (IdP ': r) a -> 
        
             Sem r a 
        
           idPToCassandra = 
        
             interpret $ 
        
               embed @m . \case 
        
                 StoreConfig iw -> Data.storeIdPConfig iw 
        
                 GetConfig i -> Data.getIdPConfig i 
        
                 GetIdByIssuerWithoutTeam i -> Data.getIdPIdByIssuerWithoutTeam i 
        
                 GetIdByIssuerWithTeam i t -> Data.getIdPIdByIssuerWithTeam i t 
        
                 GetConfigsByTeam itlt -> Data.getIdPConfigsByTeam itlt

and finally has an interpretation of Embed Client into Final IO:

wire-server/services/spar/src/Spar/Sem/SAMLUserStore/Cassandra.hs

Lines 32 to 42 in 6291a90

    
           interpretClientToIO :: 
        
             Members '[Error SparError, Final IO] r => 
        
             ClientState -> 
        
             Sem (Embed Client ': r) a -> 
        
             Sem r a 
        
           interpretClientToIO ctx = interpret $ \case 
        
             Embed action -> withStrategicToFinal @IO $ do 
        
               action' <- liftS $ runClient ctx action 
        
               st <- getInitialStateS 
        
               handler' <- bindS $ throw @SparError . SAML.CustomError . SparCassandraError . cs . show @SomeException 
        
               pure $ action' `Catch.catch` \e -> handler' $ e <$ st

I'd suggest this direction is going to be a smoother move. Rather than introducing a bunch of temporary, "bad" effects, why not just implement the effects we want from the get go and save some engineering effort. I'd be more than happy to help hammer out any technical difficulties in moving this direction if you're interested in pairing!

pcapriotti · 2021-10-27T10:14:58Z

Closing in favour of #1881.

pcapriotti added 13 commits October 21, 2021 15:25

Add type variable to Galley monad

3a42167

This is step 0 in the process of converting galley to effects. We introduce a phantom type variable `r` in the `Galley` monad, which will later be used for the effect row.

Use API instead of DB access in 1-1 conv test

481c08f

Monomorphise Data functions

61fe4de

Avoid MonadUnliftIO in Bilge.RPC

630aad5

Introduce placeholder Concurrency effect

1068bd3

Add temporary async function for the Galley monad

333acf8

Add forkIO

a147c14

Add mapConcurrently

71ab476

Add wait

5785cf0

Add pooledMapConcurrentlyN and friends

930c247

Add concurrently

0aa4bc5

Remove MonadUnliftIO instance for Galley

604d61b

Add CHANGELOG entry

0cdcba2

mdimjasevic approved these changes Oct 21, 2021

View reviewed changes

pcapriotti mentioned this pull request Oct 26, 2021

Galley polysemy (1/5) - Introduce Sem and "access" effects #1881

Merged

4 tasks

pcapriotti closed this Oct 27, 2021

pcapriotti deleted the pcapriotti/galley-polysemy branch April 14, 2022 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Introduce polysemy in Galley: part 0#1878

Introduce polysemy in Galley: part 0#1878
pcapriotti wants to merge 13 commits intodevelopfrom
pcapriotti/galley-polysemy

pcapriotti commented Oct 21, 2021

Uh oh!

mdimjasevic left a comment

Uh oh!

mdimjasevic Oct 21, 2021

Uh oh!

isovector commented Oct 21, 2021

Uh oh!

isovector commented Oct 21, 2021

Uh oh!

pcapriotti commented Oct 22, 2021

Uh oh!

isovector commented Oct 22, 2021

Uh oh!

pcapriotti commented Oct 27, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

pcapriotti commented Oct 21, 2021

Background

Explanation of the changes

Next steps

Checklist

Uh oh!

mdimjasevic left a comment

Choose a reason for hiding this comment

Uh oh!

mdimjasevic Oct 21, 2021

Choose a reason for hiding this comment

Uh oh!

isovector commented Oct 21, 2021

Uh oh!

isovector commented Oct 21, 2021

Uh oh!

pcapriotti commented Oct 22, 2021

Uh oh!

isovector commented Oct 22, 2021

Uh oh!

pcapriotti commented Oct 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pcapriotti commented Oct 27, 2021 •

edited

Loading