-
Notifications
You must be signed in to change notification settings - Fork 30
A process to document a GHC API #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
A process to document a GHC API #66
Conversation
3a3d26e
to
af284e1
Compare
It is a bit unfair to the refactorings we've done with @Ericson2314, @doyougnu, and others. Documentation was a major reason why I personally started doing this. From our 2022 "Modularizing GHC" white paper: In my experience documenting the code in its current state is often difficult because you end up documenting accidental complexity that shouldn't be here in the first place. For an actual example, see https://docs.google.com/document/d/1mQEpV3fYz1pHi64KTnlv8gifh9ONQ-jytk5sIHqnV9U/edit?tab=t.0#heading=h.xp3xd558qgs7 which was an attempt last year to document the big picture of cabal and ghc interaction: it's already a mess (and that's without documenting backpack). About the proposal itself: I fear that making some part of the accidentally complex code now dubbed "GHC API" more difficult to change will mean that the accidental complexity will stay forever. But it depends on the indexing phase and it might also lead to fixing the code instead of ossifying it, so let's see. |
I wonder whether building tooling is the first thing to do? It's a bit of a meta-thing. We want a house, but instead of building a house, we first spend time building tools to help us build a house. It can be the right thing to do. But there is a danger that you spend lots of time building tools, only to discover, when using them, that they aren't quite the right tools after all. I wonder if it would instead be better to spent that effort instead to:
Some thoughts about this
|
Thanks @hsyl20. Happy to include a reference to the anecdote. If there is a more extensive discussion of documentation activities elsewhere I'd like to link them too.
This one looks good to link too. 👍 |
Thanks. Module hierarchy introduction was discussed in https://gitlab.haskell.org/ghc/ghc/-/issues/13009 and ghc-proposals/ghc-proposals#57. Increased modularity in https://gitlab.haskell.org/ghc/ghc/-/issues/17957 This comment from @bgamari is particularly relevant to the current discussion. |
Answering to Simon,
The question from GHC developers that triggered the current direction of the proposal is: how do we know which functions need good documentation? If the tooling effort is excessive, a middle ground is to do only the indexing which I think is simple enough, and manually build the GHC API modules. This will still help to prevent some accidental breakage even if not all of the functions in the API are documented. But it looks to me like you prefer clients of the |
In general I am in agreement with @simonpj, but it occurs to me that we have an example of a solution to one of the proposal's goals, in GHC already:
Isn't this exactly the reason for the But in the short term that is not tractable because the API is too large and knows too much about the compiler. So a test like this would hamper development speed too much and therefore is more of a goal than a step on the path towards the goal. To migrate the code base to a state where a test like that is feasible we would need to do more modularity work and take inspiration from the Imagine if for each level of the module hierarchy we had a module called
and so on. The basic idea is to migrate the code base to a state where we can create a shim module just as I did for the RTS flags here (in particular see this comment). The corollary to This would make the API explicit and obvious in the file system, give GHC developers finer grained control over the API, lay the foundation to begin to test the API for changes and centralize the entry point to the API. Perhaps we could even split out the |
This sounds like a good place to start! Why not define the Of course it could be helpful to have indexing information to know which parts of the existing |
I think that a big overall issue is that the compiler internals are just quite messy and entangled. It is expert level work to refactor these parts of the code-base as it can be very difficult to understand sufficient context to imagine a design which accounts for all the different situations. Refactoring needs to maintain existing (underspecified) behaviour most of the time. As well as being tricky, it is not advantageous to proceed in a very incremental fashion, since it is a live project with many people interacting with the code everyday. There are quite well specified abstraction boundaries in a few places (in particular how GHC/Cabal interact via package databases). The memory usage behaviour of the compiler pipeline is also better specified than it used to be. I'm sure there are other examples. It has been something that we have worked on for quite a few years now but it's slow progress. |
The estimate for developing a plugin which simply reports which modules/names/types from the ghc package are used seems to me to be a bit on the high end. I agree with @adamgundry that introducing a designed, and not auto generated API under GHC.API would be more sensible. I believe auto generating a GHC API based on used functions might be useful for an initial version. But is unlikely to be useful in the long run. Ideally a GHC API exposes functionality from GHC but not necessarily mirrors the same structure. So the investment for such a tool sounds quite high compared to the benefit. As @mpickering and @hsyl20 allude to simply going with the currently used functions as definition for the GHC API is likely to result in a interface not meaningfully better than what we have today. I think ideally someone would go over the currently used functionality by these packages and for everything decide:
This would ensure that users of the ghc API can tell if what they are using is:
However I can also see that the cost of doing this properly far exceeds the estimated effort required for the auto generate GHC API. |
Answering Andreas,
I'll add here that such a plugin already exists and is linked in the proposal. Additional effort is required if tweaking the output for API generation, documentation, automated tests, optimizations, etc. I don't vest much in the actual estimation, I think is fair for it to be revised.
Your considerations are interesting. At this point, I feel your unstable part of an API could make much of a difference to users. |
I'm somewhat uneasy about spending 100+ hours just to prepare tooling before even starting to deliver any practical value. I mean, if we are looking to fund 1000+ hours, it would make sense to spend 10% on tooling. But realistically at this stage we are probably talking about 200 hours in total at best, and spending better half of it on tooling for API autoselection feels underwhelming. Surely there are enough of easily identifiable parts of GHC API such that writing even cursory documentation for them will fill the entire budget. (That's accepting the premise that documentation is the bottleneck) |
I had written far-too-long comment which I canned; glad since I wrote that, many other people echoed my sentiments! Firstly, I strongly agree with @hsyl20
To quote the original proposal
IMO, based on our experience, if we're blaiming the docs here, we're merely "blaming the messenger". The problem is that the interfaces themselves are not modular, and given this unfortunate fact, documentation can't be terse and self-contained either. "being written for an audience with a shared context about GHC internals" is fundamentally a problem with the code itself, which merely reappears as a documentation problem. Firstly, I strongly agree with @simonpj and the other GHC devs that chimed in, and also @Bodigrim, that this is "too meta" and "too automated too soon". Frankly it sounds like the group behind the proposal is at impasse over what to do, and is hoping that the new tooling's output will provide a clear vision instead of humans doing so. I think that is doomed to fail. I also agree with Simon et al's counterproposal, that we should simply pick some small portion of the API to manually audit and refactor, in human/qualitative ways. I agree with @mpickering also that because things are currently quite tangled, we can't just stick another layer on top to do this. We need to actually untangle something to be able to make good interfaces that are possible to document well. And yes, that's hard! But it doesn't need to be fatally hard --- we just need to carefully pick where we begin so we don't spend all our time untangling, and we instead have time left over to clean up the thing after it is separated from the rest. To that end, I put forth my #56 as a counterproposal. Merely splitting out the AST and Parser as separate packages was never supposed to be the final step. Rather the idea is once you have a "clean workbench" of a fully-separated component, you can then bring in all the stakeholders --- GHC and 3rd-party tools --- and have a productive discussion refactoring and documenting interfaces (with much less effort!) until everyone is happy. At the point is done, we should have the vision and consensus we lack today --- this is as important as the refactored interfaces themselves! The experience of making the AST and parser a nice-to-use competent (at the level and docs and Haskell interfaces alike) will inform everyone involved what we are aiming for for the rest of GHC, and how much effort might be involved to get there. Only that that point should we pause, take a step back, and consider the sort of empiric analysis that this proposal proposes, because from the AST and parser cleanup, we will have the shared qualitative vision to guide us. These empirics are theory-laden, so we need a good theory first. |
Thanks all for their thoughts so far. At this point I think most participants would agree to slash the API generation phase, and probably most of the indexing phase if we only require some indexing to inform what parts of the GHC library are of interest to some client libraries. Thus we are mostly left with the documentation review phase. Some people proposed gradually collecting and documenting an API in specific modules or in a dedicated package. That would be easy to add to the proposal. There is also the question of what client packages to serve first, but I think almost any package considered interesting would do to evaluate the approach. Some comments seem to argue that helpful documentation is very difficult to write without refactoring GHC first (please, correct me if I'm wrong). If this were what GHC developers think in general, then there would be little point to press with this proposal just yet. Agreement would be necessary to keep the produced documentation up to date as GHC evolves.
Thanks @Ericson2314. I do think your proposal is very useful, even if I don't consider it in principle a prerequisite to improve documentation. |
It was my perception that this is exactly what this project should be helping out with... not exactly the refactoring, but figuring out what the constraints are, what possible self-contained parts of the API should exist, etc. |
My hope is that this project will lead to a fruitful dialogue:
This human design conversation is what I'd love to see. It starts from a concrete need (capability X) and a draft, albeit unsatisfactory API, and works from there. I'm agreeing with @hasufell here. |
Here's a possible rendering of the responsibilities. There is a project developer who does the work, is mentored by a GHC developer, and is possibly funded by the Haskell Foundation. The project developer might be a GHC developer herself, if there is someone available.
It is looking to me now like such process could happen with @Ericson2314's #56, if some tool is selected to drive the dialog. Alternatively, perhaps smaller projects are within reach for small tools like om-plugin-imports or print-api. Beware that the size of the tool might not mean that the project is small. I'm thinking of the hi-file-parser case. |
@shayne-fletcher coauthored that proposal, with the idea rhat HLint could be used as such a tool to evaluate the work. I also solicited these reviews #56 (comment) and #56 (comment) from tool authors. It's now been some time, but I'd hope that these people / some developers tools they work on, would want to be involved. |
As an HLS developer and a new GHC contributor, I completely agree with @simonpj. In fact, Simon’s approach is already happening naturally with HieAst. Here’s a concrete example from my own experience: This is exactly how GHC.API is meant to evolve—through an iterative feedback loop where tooling needs drive improvements. I believe this process will benefit more tooling users and continuously refine GHC.API over time. In the mean time, we are doing an upgration for HLS to ghc 9.12.2 haskell/haskell-language-server#4517. We have roughly 30 plugins in HLS beside the core ghcide. This should be a perfect opportunity to identity some parts of GHC that should be putting into |
👋 I have reoriented the proposal with the bits I collected from the discussion. Most notably:
As before, the proposal leaves to the Haskell Foundation and GHC developers to decide what tools to support first. I didn't include the ideas about how to organize the code and how to identify the API modules. But if there is consensus to do it one way or another, I'm happy to accept amendments. |
Some thoughts on this topic:
.. and then drive this direction with one or more client libraries using GHC API as far as it could go within allocated time. Not sure there is a way to draw a line to be reached. |
My initial view is this proposal has too much in it. It states it wants to document the API, but then it has a process of refactoring as well. I think that is part of what the goal is -- but it shouldn't be in this proposal. The part I like most is discovering what is already used by others. I think it is more reasonable to survey how the GHC API is used (portion-by-portion, as this proposal suggests, and which is good), and to then document as a snapshot (i.e. not tracking changes over time) what has been discovered -- i.e. which calls are invoked, and with what arguments. A work product can also be then an example layer on top with a streamlined interface, which satisfies current needs. This should not be released as a library -- but just remain a "cookbook" taken at a point in time. This work-product (the documentation in "cookbook" form), in turn can be used by the GHC team, stability team, etc to inform how they propose to evolve the API over time, and perhaps how to refactor the internals more broadly. However, it is better to not try to mandate that process within this proposal. |
Thanks @gbaz.
If the final deliverable is a cookbook, a follow up proposal to use it to act on GHC will need to be assembled timely, or the value of the cookbook will diminish as GHC and the tools evolve. If there is such a commitment from stakeholders, it looks to me like the cookbook approach can be effective at improving documentation. |
Latest abstract:
Rendered
Old abstract:
Old rendered