Skip to content

Commit 3a3d26e

Browse files
Tooling for maintaining a GHC API
1 parent 1761cda commit 3a3d26e

File tree

1 file changed

+197
-0
lines changed

1 file changed

+197
-0
lines changed

proposals/ghc-api-tooling.md

+197
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Tooling for maintaining a GHC API
2+
3+
## Abstract
4+
5+
This proposal is to build tools to define and maintain a GHC API. Some
6+
automation is necessary to monitor the needs of projects using GHC as a library,
7+
and to make GHC developers aware when their changes affect these projects. With
8+
this knowledge, the involved parts of GHC can be better defined and documented.
9+
10+
## Background
11+
12+
The Haskell Foundation started the GHC API stability initiative last year. This
13+
is a project that aims to identify and mitigate how the GHC compiler affects the
14+
maintainance costs of Haskell tools which use the GHC compiler as a library.
15+
16+
During an [outreach phase], the most cited concern by tooling authors was the lack
17+
of documentation that would effectively help them use and upgrade the
18+
[ghc library].
19+
20+
[ghc library]: https://hackage.haskell.org/package/ghc
21+
[outreach phase]: https://discourse.haskell.org/t/ghc-api-stability-update-3/11407
22+
23+
Documentation is important to use any library, and the compiler is documented
24+
both in the code and [beyond][ghc commentary]. However, the general sentiment
25+
is that documentation is still lacking. This could be due to documentation
26+
being not easy to navigate and discover, for instance if there are relevant
27+
cross references that are missing. And secondly, it could be due to
28+
documentation being written for an audience with a shared context about GHC
29+
internals, which does not always include the authors of Haskell tooling.
30+
31+
[ghc commentary]: https://gitlab.haskell.org/ghc/ghc-wiki-mirror/-/blob/master/commentary.md
32+
33+
When considering what to document better, GHC developers conjecture that not
34+
all of the GHC implementation is currently used by users of the `ghc` library,
35+
and so it would be necessary to identify which parts of the implementation need
36+
to be documented for external use.
37+
38+
## Problem Statement
39+
40+
The problem this proposal aims to address is identifying the parts of the GHC
41+
implementation that are used in other packages, and improving the documentation
42+
of these parts so it is accessible to an audience not initially acquainted with
43+
the GHC implementation.
44+
45+
If the project succeeds, good documentation will save tooling authors the cost
46+
of discovering what the GHC implementation does by trial and error. In practice,
47+
poor understanding of the GHC implementation translates in a long stream of
48+
bugs to fix in downstream projects until each project finally gets the
49+
understanding right. Additionally, the definition of a GHC API should reduce the
50+
amount of changes necessary to Haskell tools during upgrades of the API.
51+
52+
A solution should make easy for GHC developers to know when they are about to
53+
change parts of the GHC implementation that are used in other packages, and it
54+
should offer to tool authors the documentation they need to make effective use
55+
of the GHC implementation in their projects. This documentation must allow a
56+
newcomer to answer at least which features are offered by the GHC
57+
implementation, how they are used, and what is the meaning of the involved
58+
types and functions.
59+
60+
## Prior Art and Related Efforts
61+
62+
To the best of my knowledge, no project has tried before to improve the
63+
documentation of the GHC implementation, though there have been efforts
64+
to refactor the implementation itself to make it easier to maintain and
65+
reuse. This author thinks that accessible documentation amplifies the
66+
the benefits of any code changes.
67+
68+
## Technical Content
69+
70+
In order to identify which parts of the GHC implementation are used by other
71+
packages, GHC developers should have an index of the names from the `ghc`
72+
library that are used in a selected set of packages, called henceforth the
73+
indexing set. This index can be used to define the curated subset of the GHC
74+
implementation that will be exposed to tooling authors under some designated
75+
module hierarchy. In this document, we will refer to this curated subset as
76+
the GHC API.
77+
78+
The size of the initial GHC API can be tuned by growing the indexing set
79+
progressively, starting with the projects that are considered most relevant to
80+
the community, and relaxing it as more resources become available.
81+
82+
The GHC API will indicate the features that need to be documented for external
83+
use, and it will allow to flag the changes to GHC that affect it. GHC
84+
developers would then have the opportunity to decide whether to make the changes
85+
backward compatible or document the API changes for their users.
86+
87+
The following phases emerge from these considerations.
88+
89+
### Indexing Phase
90+
91+
This phase should produce a tool that can build the index of names from the
92+
`ghc` library (and perhaps `ghc-lib-parser`) which are used in other packages.
93+
It should be possible to configure which packages or units to include in the
94+
indexed set.
95+
96+
In addition, a library should be provided that allows us to query the index.
97+
The following queries should be possible to answer:
98+
99+
* The list of names from the `ghc` library that are used by other packages. Note
100+
that the index should provide enough information to allow importing the name
101+
(e.g. whether it is a pattern synonym; or if it is the name of a data
102+
constructor or a field, it should be accompanied by the name of the data type).
103+
* The modules from other units that are using a given name
104+
* The most commonly used names from the `ghc` library
105+
106+
This phase could be based on the compiler plugin and the analysis script in
107+
[this repo][indexing repo], or it could be based on other indexing solutions.
108+
109+
[indexing repo]: https://github.com/tweag/ghc-api-usage-stats
110+
111+
### API generation phase
112+
113+
This phase should produce a tool that generates or regenerates modules in the
114+
GHC API from the index. If a module does not exist yet, it should be
115+
created from some configurable template. If the module already exists, the tool
116+
should edit the export list and import declarations while trying to preserve
117+
the contents in the rest of the module file. Other generators sometimes
118+
implement special comments to designate lines that should not be modified by
119+
the generator.
120+
121+
The tool should probably allow us to specify rules to indicate a few things:
122+
* which names should be exposed in which modules
123+
* which modules should be used to bring some names into scope
124+
* to exclude some names from being exported despite appearing in the index.
125+
A file with a list of excluded names should be generated if using globbing
126+
or similar in the rules, so new excluded names are made visible when
127+
regenerating the API.
128+
129+
### Documentation review phase
130+
131+
In this phase, the code documentation of GHC needs to reviewed, and procedures
132+
need to be documented to keep it up to date.
133+
134+
For the review part, a team of a newcomer and an experienced contributor should
135+
systematically review the documentation of each module in the exposed subset.
136+
Perhaps starting by the most commonly used definitions as indicated by the
137+
index queries.
138+
139+
For the update procedures, it should be documented what the GHC API is, how to
140+
update it, and when to update it. Newcomers should be invited to request
141+
documentation improvements. Documentation improvements should be made fast and
142+
easy to merge. Maybe most continuous integration (CI) jobs could be skipped for
143+
documentation updates except for some linting.
144+
145+
Additionally, the immutability of the GHC API needs to be checked in GHC's CI.
146+
Tooling to do this already exists for other parts of GHC, so this task should
147+
be mostly about configuration work.
148+
149+
### Risks and Limitations
150+
151+
The project could fail if the size of the GHC API exceeds the availability of
152+
the community to document it all. In such a case, the project should still be
153+
helpful to identify the areas of the GHC implementation that still need
154+
additional effort to better support their exposure.
155+
156+
Not all changes to the GHC API will be possible to detect automatically, in
157+
particular, changes in behavior that don't modify types or the type signatures
158+
of functions. Alternatively, the proposal could be extended to try to detect
159+
changes to documentation of definitions that appear in the GHC API. But still
160+
there will be shades of the behavior that will likely not be caught in
161+
documentation either.
162+
163+
## Timeline
164+
165+
There are no specific deadlines to this project.
166+
167+
## Budget
168+
169+
The cost of this project involves the engineering time needed to perform
170+
the identified phases. The following is a rough guess from the proposer,
171+
but it needs to be refined with whoever is appointed to execute the project.
172+
173+
```
174+
Indexing phase --- 40 hours
175+
API generation phase --- 80 hours
176+
Documentation review phase --- depends on the chosen indexing set
177+
```
178+
179+
The actual money required also needs to be negotiated with the appointed
180+
developers.
181+
182+
## Stakeholders
183+
184+
* GHC developers
185+
* Tooling authors from the [outreach phase]
186+
* Users of Haskell tools who need them to stay up to date
187+
188+
## Success
189+
190+
The project will be successful if the users of the `ghc` library have an
191+
accurate understanding of what it will take to upgrade their projects to use a
192+
newer version of the compiler by reading changelogs and the API documentation,
193+
thus eliminating the trial and error costs.
194+
195+
The project will be successful too if accidental breakage of downstream tooling
196+
is avoided thanks to the definition of a GHC API whose modifications are
197+
flagged by GHC's CI.

0 commit comments

Comments
 (0)