A PR for reactive streams support #151

bbakerman · 2024-05-17T11:14:26Z

This PR is a long running branch with work to allow "reactive" publishers to complete works progressively as results arrive.

A normal BatchLoader gathers ALL the value futures (given a set of keys) and completes them in one go.

The use of reactive Publisher / Subscribers means that keys can complete progressively as each result arrives.

This may mean that processing will happen quicker depending on whether further sub processing occurs

**Note**: This commit, as-is, is not (yet) intended for merge. It is created to provide a proof-of-concept and gauge interest as polishing/testing this requires a non-trivial amount of effort. Motivation ========== The current DataLoader mechanism completes the corresponding `CompletableFuture` for a given key when the corresponding value is returned. However, DataLoader's `BatchLoader` assumes that the underlying batch function can only return all of its requested items at once (as an example, a SQL database query). However, the batch function may be a service that can return items progressively using a subscription-like architecture. Some examples include: - Project Reactor's [Subscriber](https://www.reactive-streams.org/reactive-streams-1.0.4-javadoc/org/reactivestreams/Subscriber.html). - gRPC's [StreamObserver](https://grpc.github.io/grpc-java/javadoc/io/grpc/stub/StreamObserver.html). - RX Java's [Flowable](https://reactivex.io/RxJava/3.x/javadoc/io/reactivex/rxjava3/core/Flowable.html). Streaming results in this fashion offers several advantages: - Certain values may be returned earlier than others (for example, the batch function may have cached values it can return early). - Memory load is lessened on the batch function (which may be an external service), as it does not need to keep hold of the retrieved values before it can send them out at once. - We are able to save the need to stream individual error values by providing an `onError` function to terminate the stream early. Proposal ======== We provide two new `BatchLoader`s and support for them in `java-dataloader`: - `ObserverBatchLoader`, with a load function that accepts: - a list of keys. - a `BatchObserver` intended as a delegate for publisher-like structures found in Project Reactor and Rx Java. This obviates the need to depend on external libraries. - `MappedObserverBatchLoader`, similar to `ObserverBatchLoader` but with an `onNext` that accepts a key _and_ value (to allow for early termination of streams without needing to process `null`s). - `*WithContext` variants for the above. The key value-add is that the implementation of `BatchObserver` (provided to the load functions) will immediately complete the queued future for a given key when `onNext` is called with a value. This means that if we have a batch function that can deliver values progressively, we can continue evaluating the query as the values arrive. As an arbitrary example, let's have a batch function that serves both the reporter and project fields on a Jira issue: ```graphql query { issue { project { issueTypes { ... } } reporter { ... } } } ``` If the batch function can return a `project` immediately but is delayed in when it can `reporter`, then our batch loader can return `project` and start evaluating the `issueTypes` immediately while we load the `reporter` in parallel. This would provide a more performant query evaluation. As mentioned above, this is not in a state to be merged - this is intended to gauge whether this is something the maintainers would be interested in owning. Should this be the case, the author is willing to test/polish this pull request so that it may be merged.

…anch # Conflicts: # build.gradle

…er-proof-of-concept * origin/master: Bump to Java 11

`reactive-streams` has become the de-facto standard for reactive frameworks; we thus use this as a base to allow seamless interop (rather than prompt an extra adapter layer).

This gives us more workable exceptions.

Passing an exception into `onNext` is not typically done in reactive-land - we would instead call `onError(Throwable)`. We can thus avoid handling this case.

This is keeping in line with the other methods found in `DataLoaderFactory`.

Given the large number of existing tests, we copy across this existing set for our publisher tests. What this really indicates is that we should invest in parameterised testing, but this is a bit painful in JUnit 4 - so we'll bump to JUnit 5 independently and parameterise when we have this available. This is important because re-using the existing test suite reveals a failure that we'll need to address.

This keeps in line with the original suggestion (because yours truly couldn't read, apparently). We also purge any remaining mention of 'observer', which was the first swing at this code.

Multiple threads may call `onNext` - we thus (lazily) chuck a `synchronized` to ensure correctness at the cost of speed. In future, we should examine how we should manage this concurrency better.

…of-of-concept

…roof-of-concept Add a proof-of-concept for "Observer-like" batch loading

…anch # Conflicts: # build.gradle

…method

… case

…nError

We now have the same coverage but with less code. Note that: - this is currently failing on 'duplicate keys when caching disabled'. - we still need to add tests that only make sense for the Publisher variants (e.g. half-completed keys).

If we did not cache the futures, then the MappedBatchPublisher DataLoader would not work as we were only completing the last future for a given key.

Migrate publisher tests

…active-streams-common-publisher-impl # Conflicts: # src/main/java/org/dataloader/DataLoaderHelper.java # src/test/java/org/dataloader/DataLoaderBatchPublisherTest.java # src/test/java/org/dataloader/DataLoaderMappedBatchPublisherTest.java

…lisher-impl Making the Subscribers use a common base class

…anch-extra-tests-for-reactive

…anch

…ra-tests-for-reactive More tests for Publishers on reactive branch

…aderHelper is way too big

…e-reactive-classes-out-of-dataloader-helper Reactive streams branch move reactive classes out of dataloader helper

This is more symmetric with `MappedbatchLoader` and preserves efficiency; we do not need to emit a `Map.Entry` for duplicate keys (given the strong intention that this will be used to create a `Map`).

…ublishers Have MappedBatchPublisher take in a Set<K> keys (and add README sections)

AlexandreCarlton and others added 2 commits May 12, 2024 17:56

reactive streams support branch

95540ff

bbakerman changed the title ~~A PR for reactive streams support~~ DO NOT MERGE - YET - A PR for reactive streams support May 17, 2024

bbakerman added 2 commits May 17, 2024 21:35

Merge remote-tracking branch 'origin/master' into reactive-streams-br…

2cdba8a

…anch # Conflicts: # build.gradle

reactive streams support branch - merged master

1d78255

dondonz added this to the Next release 3.4.0 milestone May 17, 2024

AlexandreCarlton and others added 24 commits May 18, 2024 12:22

Merge remote-tracking branch 'origin/master' into observer-batch-load…

2032e33

…er-proof-of-concept * origin/master: Bump to Java 11

Eliminate *BatchObserver in favour of Publisher

6b5a732

`reactive-streams` has become the de-facto standard for reactive frameworks; we thus use this as a base to allow seamless interop (rather than prompt an extra adapter layer).

Use internal Assertions over Java's raw assert

68d7f54

This gives us more workable exceptions.

Remove handling of Throwable passed into onNext

a3132b7

Passing an exception into `onNext` is not typically done in reactive-land - we would instead call `onError(Throwable)`. We can thus avoid handling this case.

Expose new*DataLoader methods for *PublisherBatchLoader

fbeffae

This is keeping in line with the other methods found in `DataLoaderFactory`.

Rename '*PublisherBatchLoader' to 'BatchPublisher'

0d0b2f8

This keeps in line with the original suggestion (because yours truly couldn't read, apparently). We also purge any remaining mention of 'observer', which was the first swing at this code.

Ensure DataLoaderSubscriber is only called by one thread

14002f6

Multiple threads may call `onNext` - we thus (lazily) chuck a `synchronized` to ensure correctness at the cost of speed. In future, we should examine how we should manage this concurrency better.

Document Subscriber#onNext invocation order

0f303a8

Merge branch 'reactive-streams-branch' into observer-batch-loader-pro…

ce115fd

…of-of-concept

Merge pull request #148 from AlexandreCarlton/observer-batch-loader-p…

288be41

…roof-of-concept Add a proof-of-concept for "Observer-like" batch loading

Merge remote-tracking branch 'origin/master' into reactive-streams-br…

e16fa65

…anch # Conflicts: # build.gradle

reactive streams support branch - getting it compiling

a93112a

Making the Subscribers use a common base class

74567fe

Making the Subscribers use a common base class- synchronized on each …

4396624

…method

Making the Subscribers use a common base class- now with failing test…

8a64483

… case

Making the Subscribers use a common base class- fail the overall CF o…

3e8ac9c

…nError

Inline BatchPublisher tests into DataLoaderTest

eb2b40c

We now have the same coverage but with less code. Note that: - this is currently failing on 'duplicate keys when caching disabled'. - we still need to add tests that only make sense for the Publisher variants (e.g. half-completed keys).

Fix MappedBatchPublisher loaders to work without cache

651e561

If we did not cache the futures, then the MappedBatchPublisher DataLoader would not work as we were only completing the last future for a given key.

Merge pull request #155 from AlexandreCarlton/migrate-publisher-tests

8295396

Migrate publisher tests

Making the Subscribers use a common base class - merged in main branch

6d3c4eb

Merge pull request #154 from graphql-java/reactive-streams-common-pub…

3fddb8b

…lisher-impl Making the Subscribers use a common base class

More tests for Publishers

034c68f

bbakerman and others added 13 commits May 23, 2024 10:32

Merge remote-tracking branch 'origin/master' into reactive-streams-br…

b09ac60

…anch-extra-tests-for-reactive

Merge remote-tracking branch 'origin/master' into reactive-streams-br…

5d826b8

…anch

Now the builds pass - broken out the fixtures

8b344db

Merge pull request #158 from graphql-java/reactive-streams-branch-ext…

e9bfc2b

…ra-tests-for-reactive More tests for Publishers on reactive branch

This moves the reactive code pout into its own package because DataLo…

91d3036

…aderHelper is way too big

renamed classes inline with their counterparts

e98621b

made them non public and created a static factory support class

6523015

reorged method placement

170ccf8

Merge pull request #159 from graphql-java/reactive-streams-branch-mov…

77fd0dd

…e-reactive-classes-out-of-dataloader-helper Reactive streams branch move reactive classes out of dataloader helper

Added javadoc to publisher interfaces

4b9356e

Have MappedBatchPublisher take in a Set<K> keys

3c3cc99

This is more symmetric with `MappedbatchLoader` and preserves efficiency; we do not need to emit a `Map.Entry` for duplicate keys (given the strong intention that this will be used to create a `Map`).

Add README sections for *BatchPublisher

2e82858

Merge pull request #160 from AlexandreCarlton/add-documentation-for-p…

c3e6ee5

…ublishers Have MappedBatchPublisher take in a Set<K> keys (and add README sections)

bbakerman changed the title ~~DO NOT MERGE - YET - A PR for reactive streams support~~ A PR for reactive streams support May 27, 2024

bbakerman merged commit d44070a into master May 27, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A PR for reactive streams support #151

A PR for reactive streams support #151

bbakerman commented May 17, 2024 •

edited

Loading

A PR for reactive streams support #151

A PR for reactive streams support #151

Conversation

bbakerman commented May 17, 2024 • edited Loading

bbakerman commented May 17, 2024 •

edited

Loading