Replace univocity-parsers with FastCSV #4606

vdmitrienko · 2025-06-01T18:53:48Z

Overview

#4339

I hereby agree to the terms of the JUnit Contributor License Agreement.

Definition of Done

There are no TODOs left in the code
Method preconditions are checked and documented in the method's Javadoc
Coding conventions (e.g. for logging) have been followed
Change is covered by automated tests including corner cases, errors, and exception handling
Public API has Javadoc and @API annotations
Change is documented in Release Notes

marcphilipp

This looks very promising! 👍

gradle/libs.versions.toml

junit-jupiter-params/junit-jupiter-params.gradle.kts

...ooling-support-tests/src/test/java/platform/tooling/support/tests/ModularUserGuideTests.java

platform-tooling-support-tests/platform-tooling-support-tests.gradle.kts

marcphilipp · 2025-06-02T06:48:51Z

documentation/src/docs/asciidoc/release-notes/release-notes-6.0.0-M1.adoc

+* The `CsvFileSource.lineSeparator()` parameter is deprecated because line separators
+  are now detected automatically during CSV parsing. This setting is no longer required
+  and will be ignored.


Does auto-detection work in all cases? What happens if \n is used in a cell like in the following example with 4 columns?

a;b;\n c;d\r\n e;f;g;h\r\n

(assuming \n and \r are replaced with the corresponding character)

Does auto-detection work in all cases?

The auto-detection treats each of \r, \n, and \r\n as a line separator. For example, given the following input:

a;b\r c;d\n e;f\r\n g;h

The result is:

[["a", "b"], ["c", "d"], ["e", "f"], ["g", "h"]]

In contrast, univocity-parsers (when configured with \n as the line separator) produces different results:

["a", "b\rc", "d"], ["e", "f"], ["g", "h"]

What happens if \n is used in a cell like in the following example with 4 columns?

In this case, the results from FastCSV and univocity-parsers are mostly similar.

FastCSV:

[["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]]

univocity-parsers:

// .lineSeparator("\n") - same as FastCSV [["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]] // .lineSeparator("\r\n") - same as FastCSV [["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]] // .lineSeparator("\r") - differs from FastCSV [["a", "b", "c", "d"], ["e", "f", "g", "h"], [null]]

I’m afraid this breaks compatibility if someone uses a character sequence as a line delimiter that is not a newline.

So, considering 3 possible scenarios, all of them imply a breaking change 😞

User explicitly relies on \r\n as the line separator:
\r - causes an unexpected line break;
\n - causes an unexpected line break;

User explicitly relies on \r as the line separator:
\n - causes an unexpected line break;
\r\n - no change, since \r is already interpreted as a line break;

User explicitly relies on \n as the line separator:
\r - causes an unexpected line break;
\r\n - no change, since \n is already interpreted as a line break;

@osiegmar, would it be possible to add support for a lineSeparator() parameter in FastCSV?

Potentially, yes. Of course, this wouldn’t be a valid CSV file at all. Is this really a desired feature or just lack of specification/documentation and a good chance to change that with the new major version of JUnit?

Is there a (good) reason, someone separates text records by anything that is not a newline sequence?

Is there known usage of this?

I think dropping this in the new major version makes sense. I'm not aware of any cases other than using the same line separator on different operating systems. IIRC I initially introduced it because univocity-parsers would use the system line separator (and only that) be default.

I think dropping this in the new major version makes sense.

Great 👍
I'll make sure to clarify that in the release notes. Adding a few tests wouldn't hurt either.

marcphilipp · 2025-06-02T06:51:43Z

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

 			}
+			return String.join("\n", csvSource.value());


Does FastCSV provide an API for line-by-line reading so we don't have to create a string first? It's probably not a big deal since it comes from literals in an annotation.

With osiegmar/FastCSV@1077389 there is one now. @vdmitrienko You may want to give it a try if it simplify things for you.

Thanks, @osiegmar. This works well with individual strings, but it doesn't support headers. I think adding an overload that accepts an array (or varargs) of strings could simplify this use case:

build(final CsvCallbackHandler<T> callbackHandler, final String... data)

Regarding the validation of empty records, having a setting for that could be quite handy. It would also be great if the exception message included the index of the empty record. That said, we could also handle this on our side 🙂

junit-jupiter-params/src/main/java/org/junit/jupiter/params/provider/CsvArgumentsProvider.java

vdmitrienko added 15 commits June 1, 2025 20:46

Remove univocity-parsers license

6287630

Do not create a shadow jar from com.univocity

7d98380

Rework arguments providers to use FastCSV

aabcd60

test: Update expected root cause exceptions

fd33e6f

test: Update expected message on empty CSV

1dc063f

test: Cover additional cases for empty values

b8efe07

Move "either value or textBlock" validation to getData(CsvSource)

f62814c

Deprecate CsvFileSource.lineSeparator as it's now detected automatically

e768e7a

test: Remove CsvFileSource.lineSeparator() usages

72ae172

CsvReaderFactory: set "since" to 6.0

c96e787

Preserve the original validation order

3d3cadf

Formatting

b88e12d

ModularUserGuideTests: require de.siegmar.fastcsv module

cc7fd8c

platform-tooling-support-tests: add FastCSV dependency

ff2691c

Add release notes

cd3cc0e

vdmitrienko mentioned this pull request Jun 1, 2025

Replace univocity-parsers with FastCSV #4339

Open

1 task

marcphilipp reviewed Jun 2, 2025

View reviewed changes

test: use CsvParseException import instead of a fully qualified name

4eaba09

rolnico mentioned this pull request Jun 3, 2025

Replace univocity-parsers with FastCSV powsybl/powsybl-core#3463

Open

vdmitrienko added 5 commits June 3, 2025 19:20

Use condition() instead of creating PreconditionViolationException

7661c96

Respect alphabetical order in libs.versions.toml

59effbb

Shadow FastCSV

bfe368c

Remove the no longer used extraJavaModuleInfo plugin

b913ee3

Updates according to the recent changes in FastCSV snapshot

61350e4

vdmitrienko requested a review from osiegmar June 5, 2025 20:16

Uh oh!

Replace univocity-parsers with FastCSV #4606

Are you sure you want to change the base?

Replace univocity-parsers with FastCSV #4606

Conversation

vdmitrienko commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Definition of Done

Uh oh!

marcphilipp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdmitrienko Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marcphilipp Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vdmitrienko commented Jun 1, 2025 •

edited

Loading

vdmitrienko Jun 2, 2025 •

edited

Loading

marcphilipp Jun 3, 2025 •

edited

Loading