Skip to content

Module naming conventions for GHC base libraries #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 20, 2023
Merged
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
145 changes: 145 additions & 0 deletions proposals/0000-ghc-module-naming.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
.. sectnum::

**Module naming conventions for GHC base libraries**

Background and motivation
===========================
The accepted `Proposal #51: GHC base libraries <https://github.com/haskellfoundation/tech-proposals/blob/main/proposals/accepted/051-ghc-base-libraries.rst>`_
defines the following libraries:

* ``base``: the foundational library on which the rest of the ecosystem is based.

* Its API is carefully curated by the `Core Libraries Committee <https://github.com/haskell/core-libraries-committee>`_, and is kept rather stable.

* ``ghc-experimental``: the home of experimental extensions to GHC, usually ones proposed by the
`GHC Steering Committee <https://github.com/ghc-proposals/ghc-proposals/>`_.

* Functions and types in here are usually candidates for later transfer into ``base``. But not necessarily: if a collection of functions is not adopted widely enough, it may not be proposed for a move to `base`.

* It is user-facing (user are encouraged to depend on it), but its API is less stable than ``base``.

* ``ghc-prim, ghc-internals`` (and perhaps others): define functions and data types used internally by GHC to support the API of ``base`` and ``ghc-experimental``.

* These libraries come with no stability guarantees: they may change at short notice.

In addition we already have:

* ``ghc``: this library exposes GHC as a library, through the (currently ill-defined) GHC API.

All these libraries follow the Haskell Package Versioning Policy (PVP).

The question arises of *what module names should be used*. For example, suppose that all three exposed a module called ``Data.Tuple``. In principle that would be fine -- GHC allows you
to use the package name in the ``import`` statement, to disambiguate. But it's *extremely* confusing. This proposal articulates a set of conventions to
help us design module names.

The proposal
============

This proposal is split into four sub-proposals for easier discussion. Each sub-proposal builds on the
earlier ones -- they are increments, not alternatives.

The goals of this proposal are deliberately limited to establish naming conventions. We do not propose
any changes to ``ghc`` or to ``cabal``.

Proposal 1
-----------

* Modules in ``base``, ``ghc-experimental``, ``ghc-prim``, ``ghc-internals`` etc should all have distinct names.

That principle leads immediately to the question: what should those names be? Hence proposal 2.

Proposal 2
-----------

* Modules in GHC's internal libraries (``ghc-prim``, ``ghc-internals`` etc) should be of form ``GHC.*``.
* Modules in ``ghc-experimental`` should be of form ``Experimental.*``.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d recommend to make the Experiment a suffix.

Rationale: Data.Tuple.Experimental is an companion/extension of Data.Tuple; some exports may move from one to the other. Many developers sort their imports alphabetically. Making this a suffix means all Data.Tuple-related imports are next to each other.

Ok (omitting explicit import lists and qualifiers):

import Control.Applicative
import Control.Arrow
import Experimental.Control.Applicative
import Experimental.Foreign.C
import Data.Tuple
import Foreign.C

Better:

import Control.Applicative
import Control.Applicative.Experimental
import Control.Arrow
import Data.Tuple
import Foreign.C
import Foreign.C.Experimental

Also, maybe people will use the idiom

import Data.List qualified as L
import Data.List.Experimental qualified as L

which is also nicer if both qualified as L are next to each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearly this is a matter of taste. Personally I find it easier to think of Experimental.* as a complete sub-tree of modules, all experimental. And it's consistent with using a prefix GHC.* or GhcAPI.* elsewhere.

But it doesn't matter what I feel provided whatever we do

  • Satisfies the maximum number of users
  • Is carried through consistently (e.g. all modules in ghc-experimental end in .Experimental.

I would love to hear from others about prefix-vs-postfix.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, below you write

  • This sort of naming is conventionally used to distinguish modules within a package, not between packages.

which is only partly true – the module namespace prefixes are not always package prefixes; quite a few packages have modules in various logical places (Data. and Control.). The module namespace groups things by concept (or at least tries to).

And due to ghc-experimental’s nature it’s expected that it defines things both in Data and Control and Foreign.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think suffix is better. We've had very good luck not having package-distinct prefixes --- I am not sure why practice has gone better than theory! --- so I think it is OK to keep on doing that here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in favour of the .Experimental suffix. I imagine Data.List.Experimental could re-export the contents of Data.List and add some extra experimental goodies; it's then a very small delta to switch between import Data.List and import Data.List.Experimental, whereas import Experimental.Data.List would appear somewhere completely different if imports are sorted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning for using a suffix is convincing.

* Modules in ``base`` should not have either of these prefixes.

So example we might have

* ``GHC.Tuple`` in ``ghc-internals``,
* ``Experimental.Tuple`` or ``Experimental.Data.Tuple`` in ``ghc-experimental``
* ``Data.Tuple`` in ``base``

Proposal 3
-----------

The current ``base`` API exposes many modules starting with ``GHC.*``, so the proposed conventions could only
apply to *new* modules.

* Over time, and only with the agreement and support of the Core Libraries Committee, we may remove some ``GHC.*`` modules
from ``base``, especially ones that are barely used, or are manifestly "internal" (i.e. part of the implementation
of other, more public functions).
Of course there would be a significant deprecation cycle, to allow client libraries to adapt.

Proposal 3 only expresses a direction of travel. We will have to see what the CLC's attitude is,
and what the Haskell community thinks. Anything that disturbs the API of base needs to be considered
rather carefully.


Proposal 4
------------

* The public API of package ``ghc`` (GHC as a library) should have modules of form ``GhcAPI.*``.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some particular reason to use a "merged prefix"(GhcAPI.*) instead of using the module structure to express the relationship (GHC.API.* - this is the API of GHC)? The latter feels more natural.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cosmetics, but I find it strange that suddenly Ghc in GhcAPI is cased like this.

I would find GHC.API. more natural, but I understand that you want to avoid having the public API inside the namespace for the rest of GHC.

There is the Language. prefix commonly used by most compiler-like-libraries (e.g. haskell-src-exts, ghc-parser, ghc-opts). We could reasonably join this namespace instead of inventing our own, maybe using Language.Haskell.GHC.* or, to avoid overly long names, Language.GHC.*.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the alternatives for proposal 4 are really completely convincing. Moreover, I don't think we need to make a decision now, given that there is no immediate prospect of the GHC API redesign happening. If it does happen, whoever drives it forward will be best placed to decide on naming. So I think this proposal could simply establish the principle that it will have a distinguishable module naming convention, but not yet make a concrete choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would prefer it only talk about libraries we've planned on making.


All of the modules in package ``ghc`` currently start with ``GHC.*`` which correctly signals that they are part of GHC's internals.
As part of the GHC API redesign (a HF project in its own right, currently stalled) it would be very helpfult
to modules with stable APIs, and a new prefix, such as ``GhcAPI.*``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure about this. bifurcating GHC into "public API" vs "internals" is just one way to go about organizing its code (and I personally don't think it is a good one). Regardless of whether it is good or bad, the module naming I think should be independent of such matter of internal organization: modules in GHC should be named the same way whether they are more or less stable.

(If we really had a stable deivions to make, we would have a separate library per https://nikita-volkov.github.io/internal-convention-is-a-mistake/, but we are nowhere near that.)

"API" is also, I think, bad bargain, the "application" and "programming" don't mean anything in particular, and "interface" is quite redundant when all module naming is for interface purposes.


GHC. I still think is the best name for the compiler proper. GHC stands for Glasgow Haskell Compiler --- it's in the name.

The counterparts to ghc-internals/ghc-prim are called things like libgcc (GCC) or libcompiler-rt (Clang). The lib prefix of C/C++ libraries is doing some work here, compiler-rt is rather generic. but I think the rt is good. We have "RTS" for just the C part, but it might almost be better to think of the runtime as something encompassing all of rts + ghc-prim + ghc-internals. None of this stuff is part of the compiler proper, but all of it is unstable "support code" propping up the code the user actually wrote. And functionality does in fact move between the various parts of it with some fluidity --- for example @dcoutts is working on moving some IO stuff back into C for the threaded runtime (like the unthreaded runtime does today) for more performance io_uring support.

So I don't yet have an alternative name I really like, but given the above, maybe something like GHR for "Glasgow Haskell Runtime" would make sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"bifurcating GHC into "public API" vs "internals" is just one way to go about organizing its code" -- I think/hope that the ghc api proposal is not this. Rather, it is to create a new set of modules with new functions that promise to be a stable (and usable) api, and which are not intended for internal use by ghc. That is to say, ghc itself does not dogfood the ghc api (although ghci may choose to do so) and instead the functions provided by the api are designed from the start for external consumption, perhaps glossing over certain tricky but rare bits at first, especially ones highly dependent on less stable datatypes. As such, having a distinct namespace of some sort for the API is very important regardless.

I can imagine such a package perhaps evolving over time to be a distinct package on top of ghc, but that seems orthogonal to this discussion. And, well, now that I think about it, since the API proposal currently is stalled, we probably need not worry about it at all in this discussion -- I just wanted to clarify what the idea is, last I understood it, for the sake of future discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"bifurcating GHC into "public API" vs "internals" is just one way to go about organizing its code" -- I think/hope that the ghc api proposal is not this. Rather, it is to create a new set of modules with new functions that promise to be a stable (and usable) api, and which are not intended for internal use by ghc. That is to say, ghc itself does not dogfood the ghc api (although ghci may choose to do so) and instead the functions provided by the api are designed from the start for external consumption, perhaps glossing over certain tricky but rare bits at first, especially ones highly dependent on less stable datatypes. As such, having a distinct namespace of some sort for the API is very important regardless.

Yes, @gbaz has it exactly right. GhcAPI.* is a shim around the internal modules GHC.*, one that has stronger stability guarantees. But those internal modules must remain available because we can't predict everything that a client of the ghc library may want to do. I hope that if someone finds they can only do something through GHC.* they will petition the GHC API working group (still in the womb) to add a suitable function to GhcAPI.*.

We must be able to reorganise and refactor GHC's internals without constraint. At the moment we simply don't know what clients of ghc are relying on, so we may well mess up their lives without ever knowing. If we had GhcAPI.* we'd know which modules we needed to take (much) more care with.

The details are not important; I'm just using this proposal as a way establish, in principle. a namespace for a stable GHC API.

Copy link
Contributor

@Ericson2314 Ericson2314 Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like ignoring it for now; I would like it if this proposal didn't mention that at all, and if/when we create that stable layer on top we worry about its module names then.

That brings us back to using GHC.* both for the compiler (internals) and for ghc-internals/ghc-prim. In another thread, this was brought up as a deficiency. This proposal doesn't yet talk about this problem, but shouldn't it?



Timescale
==========
The first release of GHC with `ghc-experimental` and `ghc-internals` will be GHC 9.10, which expect to
release in early 2024. It would be good to establish naming conventions for modules well before this date.

Example lifecycle
===================

By way of example, consider the ``HasField`` class, which supports overloaded record fields.
It is currently defined in ``base:GHC.Records``, which is an odd module to have to import.
Moreover there is
more than one GHC proposal that suggest changes to its design (e.g. see `GHC Proposal 158 <https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0158-record-set-field.rst>`_); it is not nearly as stable as most of ``base``

If ``ghc-experimental`` had existed we would have put it in ``ghc-experimental:Experimental.Records``.
That would have made it clear that the design of overloaded records still evolving.
Once the design becomes settled and stable, it could move to ``base``, perhaps in a module like ``Data.Records``.

Other similar examples include

* The tuple proposal of `GHC Proposal 475 <https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0475-tuple-syntax.rst>`_
* The `DataToTag CLC proposal <https://github.com/haskell/core-libraries-committee/issues/104>`_ would have been easier to expose through ``ghc-experimental`` in the first instance.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good example; the motivation for that proposal is not convincing unless the existing dataToTag# and getTag exposed from ghc-prim and base get improved types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I have replaced this example with the exceptions proposal.


Alternatives
==============
* We could dispute Proposal 1: one could imagine deliberately naming modules in ``ghc-experimental`` with the
same module name as their eventual expected (by someone) home in ``base``. The goal would be to reduce impact if and when
the module moves from ``ghc-experimental`` to ``base``. For example, we might add ``Data.Tuple`` to ``ghc-experimental`` containing the new type constructors ``Tuple2``, ``Tuple3`` etc that are proposed in `GHC Proposal 475 <https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0475-tuple-syntax.rst>`_. However:

* In the meantime there are two modules both called ``Data.Tuple``. This is bad. Which one does ``import Data.Tuple`` import? (Look at the Cabal file, perhaps?) How can I import both? (Package-qualified imports perhaps.) So it will really only help in the case of a brand-new module, not already in ``base``.
* It loses the explicit cue, in the source code, given by ``import Experimental.Data.Tuple``.

* We could use ``GHC.*`` for modules in ``ghc-experimental``, and maybe ``GHC.Internals.*`` for module in ``ghc-internals``. But

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we adopt the proposal of having a module warning for the internal modules, then perhaps we don't need a ghc-internals-specific module prefix and we could just use GHC. Not sure if that's a good idea.

Copy link
Contributor

@Ericson2314 Ericson2314 Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but the "broader runtime" (which this is) vs the compiler itself are very different things. I think GHC.Internals isn't even different enough (say we wanted to use GHC.Internals for extra unstable parts of GHC itself?) but at least it is something to differentiate.


* There are two sorts of GHC-specific-ness to consider:

* Modules that are part of GHC's implementations
* Modules that support a GHC extension, blessed by the GHC Steering Committee

It is worth distinguishing these: it's confusing if both start with ``GHC.``.

* It would be a huge upheaval (with impact on users) to rename hundreds of modules in ``ghc-internals``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the case? At the moment ghc-internals doesn't yet exist, so nobody depends on it! And the whole point is that most users should not depend on it, they should depend on base instead!

So why couldn't we establish the convention that ghc-internals uses GHC.Internal.* as its preferred module prefix, at least for new modules? We could still move over existing modules from base without renaming them if that was practically easier, and then rename them later. Renaming modules in ghc-internals should be fairly cheap as users aren't supposed to import them anyway, so the only package affected should be base.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we land on GHC.Internal and Foo.Experimental, I feel like the package should be named ghc-internal without the s, for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the case? At the moment ghc-internals doesn't yet exist, so nobody depends on it! And the whole point is that most users should not depend on it, they should depend on base instead!

I anticipate that many of the modules in base will move to ghc-internals, leaving behind a shim module. But I suppose it is true that when we move, say GHC.Base into ghc-internals we could

  • Rename it to GHC.Internal.Base, in ghc-internals
  • Leave behind a shim module in base that
    • Is called GHC.Base
    • Imports GHC.Internal.Base and re-exports it all

Hmm. That's true. Moreover, because for a long time (possibly forever) we will have modules like GHC.Base in base (for back-compat reasons) it would be good if the module in ghc-internals had a different name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK if deprecated modules have non-standard names. If anything it is good; it adds an extra signal saying what ought to be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Leave behind a shim module in base that
    • Is called GHC.Base
    • Imports GHC.Internal.Base and re-exports it all

Yes. Ideally, GHC.Base will re-export individual declarations from GHC.Internal.Base using an explicit export list (not a module export). That way any new definitions added to GHC.Internal.Base won't accidentally leak into the base API.

So I think it's positively helpful to give the internal modules new names. Another example: we could add module-level DEPRECATED pragmas to the base shim modules prior to removing them.


* We could use ``GHC.Experimental.*`` for modules in ``ghc-experimental``. But that seems a bit backwards: ``GHC.Tuple`` (in ``ghc-internals``) would superficially appear more stable (less experimental) than ``GHC.Experimental.Tuple`` in ``ghc-experimental``; but the reverse is the case.

* We could use a suffix ``*.Internals`` or ``*.Experimental`` instead of a prefix. But

* This sort of naming is often used to distinguish modules *within* a package, not *between* packages.
* In the case of ``ghc-internals`` it would still suffer from the cost of renaming hundreds of modules.

* Concerning Proposal 4, we could instead use

* ``GHC.API`` (but then the public namespace is inside the internal one)
* ``GHCAPI``
* ``GHCapi``
* ``Language.Haskell.GHC`` or ``Language.GHC``