|
| 1 | +# Tooling for maintaining a GHC API |
| 2 | + |
| 3 | +## Abstract |
| 4 | + |
| 5 | +This proposal is to build tools to define and maintain a GHC API. Some |
| 6 | +automation is necessary to monitor the needs of projects using GHC as a library, |
| 7 | +and to make GHC developers aware when their changes affect these projects. With |
| 8 | +this knowledge, the involved parts of GHC can be better defined and documented. |
| 9 | + |
| 10 | +## Background |
| 11 | + |
| 12 | +The Haskell Foundation started the GHC API stability initiative last year. This |
| 13 | +is a project that aims to identify and mitigate how the GHC compiler affects the |
| 14 | +maintainance costs of Haskell tools which use the GHC compiler as a library. |
| 15 | + |
| 16 | +During an [outreach phase], the most cited concern by tooling authors was the lack |
| 17 | +of documentation that would effectively help them use and upgrade the |
| 18 | +[ghc library]. |
| 19 | + |
| 20 | +[ghc library]: https://hackage.haskell.org/package/ghc |
| 21 | +[outreach phase]: https://discourse.haskell.org/t/ghc-api-stability-update-3/11407 |
| 22 | + |
| 23 | +Documentation is important to use any library, and the compiler is documented |
| 24 | +both in the code and [beyond][ghc commentary]. However, the general sentiment |
| 25 | +is that documentation is still lacking. This could be due to documentation |
| 26 | +being not easy to navigate and discover, for instance if there are relevant |
| 27 | +cross references that are missing. And secondly, it could be due to |
| 28 | +documentation being written for an audience with a shared context about GHC |
| 29 | +internals, which does not always include the authors of Haskell tooling. |
| 30 | + |
| 31 | +[ghc commentary]: https://gitlab.haskell.org/ghc/ghc-wiki-mirror/-/blob/master/commentary.md |
| 32 | + |
| 33 | +When considering what to document better, GHC developers conjecture that not |
| 34 | +all of the GHC implementation is currently used by users of the `ghc` library, |
| 35 | +and so it would be necessary to identify which parts of the implementation need |
| 36 | +to be documented for external use. |
| 37 | + |
| 38 | +## Problem Statement |
| 39 | + |
| 40 | +The problem this proposal aims to address is identifying the parts of the GHC |
| 41 | +implementation that are used in other packages, and improving the documentation |
| 42 | +of these parts so it is accessible to an audience not initially acquainted with |
| 43 | +the GHC implementation. |
| 44 | + |
| 45 | +If the project succeeds, good documentation will save tooling authors the cost |
| 46 | +of discovering what the GHC implementation does by trial and error. In practice, |
| 47 | +poor understanding of the GHC implementation translates in a long stream of |
| 48 | +bugs to fix in downstream projects until each project finally gets the |
| 49 | +understanding right. Additionally, the definition of a GHC API should reduce the |
| 50 | +amount of changes necessary to Haskell tools during upgrades of the API. |
| 51 | + |
| 52 | +A solution should make easy for GHC developers to know when they are about to |
| 53 | +change parts of the GHC implementation that are used in other packages, and it |
| 54 | +should offer to tool authors the documentation they need to make effective use |
| 55 | +of the GHC implementation in their projects. This documentation must allow a |
| 56 | +newcomer to answer at least which features are offered by the GHC |
| 57 | +implementation, how they are used, and what is the meaning of the involved |
| 58 | +types and functions. |
| 59 | + |
| 60 | +## Prior Art and Related Efforts |
| 61 | + |
| 62 | +To the best of my knowledge, no project has tried before to improve the |
| 63 | +documentation of the GHC implementation, though there have been efforts |
| 64 | +to refactor the implementation itself to make it easier to maintain and |
| 65 | +reuse. This author thinks that accessible documentation amplifies the |
| 66 | +the benefits of any code changes. |
| 67 | + |
| 68 | +## Technical Content |
| 69 | + |
| 70 | +In order to identify which parts of the GHC implementation are used by other |
| 71 | +packages, GHC developers should have an index of the names from the `ghc` |
| 72 | +library that are used in a selected set of packages, called henceforth the |
| 73 | +indexing set. This index can be used to define the curated subset of the GHC |
| 74 | +implementation that will be exposed to tooling authors under some designated |
| 75 | +module hierarchy. In this document, we will refer to this curated subset as |
| 76 | +the GHC API. |
| 77 | + |
| 78 | +The size of the initial GHC API can be tuned by growing the indexing set |
| 79 | +progressively, starting with the projects that are considered most relevant to |
| 80 | +the community, and relaxing it as more resources become available. |
| 81 | + |
| 82 | +The GHC API will indicate the features that need to be documented for external |
| 83 | +use, and it will allow to flag the changes to GHC that affect it. GHC |
| 84 | +developers would then have the opportunity to decide whether to make the changes |
| 85 | +backward compatible or document the API changes for their users. |
| 86 | + |
| 87 | +The following phases emerge from these considerations. |
| 88 | + |
| 89 | +### Indexing Phase |
| 90 | + |
| 91 | +This phase should produce a tool that can build the index of names from the |
| 92 | +`ghc` library (and perhaps `ghc-lib-parser`) which are used in other packages. |
| 93 | +It should be possible to configure which packages or units to include in the |
| 94 | +indexed set. |
| 95 | + |
| 96 | +In addition, a library should be provided that allows us to query the index. |
| 97 | +The following queries should be possible to answer: |
| 98 | + |
| 99 | +* The list of names from the `ghc` library that are used by other packages. Note |
| 100 | + that the index should provide enough information to allow importing the name |
| 101 | + (e.g. whether it is a pattern synonym; or if it is the name of a data |
| 102 | + constructor or a field, it should be accompanied by the name of the data type). |
| 103 | +* The modules from other units that are using a given name |
| 104 | +* The most commonly used names from the `ghc` library |
| 105 | + |
| 106 | +This phase could be based on the compiler plugin and the analysis script in |
| 107 | +[this repo][indexing repo], or it could be based on other indexing solutions. |
| 108 | + |
| 109 | +[indexing repo]: https://github.com/tweag/ghc-api-usage-stats |
| 110 | + |
| 111 | +### API generation phase |
| 112 | + |
| 113 | +This phase should produce a tool that generates or regenerates modules in the |
| 114 | +GHC API from the index. If a module does not exist yet, it should be |
| 115 | +created from some configurable template. If the module already exists, the tool |
| 116 | +should edit the export list and import declarations while trying to preserve |
| 117 | +the contents in the rest of the module file. Other generators sometimes |
| 118 | +implement special comments to designate lines that should not be modified by |
| 119 | +the generator. |
| 120 | + |
| 121 | +The tool should probably allow us to specify rules to indicate a few things: |
| 122 | +* which names should be exposed in which modules |
| 123 | +* which modules should be used to bring some names into scope |
| 124 | +* to exclude some names from being exported despite appearing in the index. |
| 125 | + A file with a list of excluded names should be generated if using globbing |
| 126 | + or similar in the rules, so new excluded names are made visible when |
| 127 | + regenerating the API. |
| 128 | + |
| 129 | +### Documentation review phase |
| 130 | + |
| 131 | +In this phase, the code documentation of GHC needs to reviewed, and procedures |
| 132 | +need to be documented to keep it up to date. |
| 133 | + |
| 134 | +For the review part, a team of a newcomer and an experienced contributor should |
| 135 | +systematically review the documentation of each module in the exposed subset. |
| 136 | +Perhaps starting by the most commonly used definitions as indicated by the |
| 137 | +index queries. |
| 138 | + |
| 139 | +For the update procedures, it should be documented what the GHC API is, how to |
| 140 | +update it, and when to update it. Newcomers should be invited to request |
| 141 | +documentation improvements. Documentation improvements should be made fast and |
| 142 | +easy to merge. Maybe most continuous integration (CI) jobs could be skipped for |
| 143 | +documentation updates except for some linting. |
| 144 | + |
| 145 | +Additionally, the immutability of the GHC API needs to be checked in GHC's CI. |
| 146 | +Tooling to do this already exists for other parts of GHC, so this task should |
| 147 | +be mostly about configuration work. |
| 148 | + |
| 149 | +### Risks and Limitations |
| 150 | + |
| 151 | +The project could fail if the size of the GHC API exceeds the availability of |
| 152 | +the community to document it all. In such a case, the project should still be |
| 153 | +helpful to identify the areas of the GHC implementation that still need |
| 154 | +additional effort to better support their exposure. |
| 155 | + |
| 156 | +Not all changes to the GHC API will be possible to detect automatically, in |
| 157 | +particular, changes in behavior that don't modify types or the type signatures |
| 158 | +of functions. Alternatively, the proposal could be extended to try to detect |
| 159 | +changes to documentation of definitions that appear in the GHC API. But still |
| 160 | +there will be shades of the behavior that will likely not be caught in |
| 161 | +documentation either. |
| 162 | + |
| 163 | +## Timeline |
| 164 | + |
| 165 | +There are no specific deadlines to this project. |
| 166 | + |
| 167 | +## Budget |
| 168 | + |
| 169 | +The cost of this project involves the engineering time needed to perform |
| 170 | +the identified phases. The following is a rough guess from the proposer, |
| 171 | +but it needs to be refined with whoever is appointed to execute the project. |
| 172 | + |
| 173 | +``` |
| 174 | +Indexing phase --- 40 hours |
| 175 | +API generation phase --- 80 hours |
| 176 | +Documentation review phase --- depends on the chosen indexing set |
| 177 | +``` |
| 178 | + |
| 179 | +The actual money required also needs to be negotiated with the appointed |
| 180 | +developers. |
| 181 | + |
| 182 | +## Stakeholders |
| 183 | + |
| 184 | +* GHC developers |
| 185 | +* Tooling authors from the [outreach phase] |
| 186 | +* Users of Haskell tools who need them to stay up to date |
| 187 | + |
| 188 | +## Success |
| 189 | + |
| 190 | +The project will be successful if the users of the `ghc` library have an |
| 191 | +accurate understanding of what it will take to upgrade their projects to use a |
| 192 | +newer version of the compiler by reading changelogs and the API documentation, |
| 193 | +thus eliminating the trial and error costs. |
| 194 | + |
| 195 | +The project will be successful too if accidental breakage of downstream tooling |
| 196 | +is avoided thanks to the definition of a GHC API whose modifications are |
| 197 | +flagged by GHC's CI. |
0 commit comments