-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Replace univocity-parsers with FastCSV #4606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Replace univocity-parsers with FastCSV #4606
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very promising! 👍
...ooling-support-tests/src/test/java/platform/tooling/support/tests/ModularUserGuideTests.java
Outdated
Show resolved
Hide resolved
platform-tooling-support-tests/platform-tooling-support-tests.gradle.kts
Outdated
Show resolved
Hide resolved
* The `CsvFileSource.lineSeparator()` parameter is deprecated because line separators | ||
are now detected automatically during CSV parsing. This setting is no longer required | ||
and will be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does auto-detection work in all cases? What happens if \n
is used in a cell like in the following example with 4 columns?
a;b;\n
c;d\r\n
e;f;g;h\r\n
(assuming \n
and \r
are replaced with the corresponding character)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does auto-detection work in all cases?
The auto-detection treats each of \r
, \n
, and \r\n
as a line separator. For example, given the following input:
a;b\r
c;d\n
e;f\r\n
g;h
The result is:
[["a", "b"], ["c", "d"], ["e", "f"], ["g", "h"]]
In contrast, univocity-parsers (when configured with \n
as the line separator) produces different results:
["a", "b\rc", "d"], ["e", "f"], ["g", "h"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if \n is used in a cell like in the following example with 4 columns?
In this case, the results from FastCSV and univocity-parsers are mostly similar.
FastCSV:
[["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]]
univocity-parsers:
// .lineSeparator("\n") - same as FastCSV
[["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]]
// .lineSeparator("\r\n") - same as FastCSV
[["a", "b", null], ["c", "d"], ["e", "f", "g", "h"]]
// .lineSeparator("\r") - differs from FastCSV
[["a", "b", "c", "d"], ["e", "f", "g", "h"], [null]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m afraid this breaks compatibility if someone uses a character sequence as a line delimiter that is not a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, considering 3 possible scenarios, all of them imply a breaking change 😞
-
User explicitly relies on
\r\n
as the line separator:
\r
- causes an unexpected line break;
\n
- causes an unexpected line break; -
User explicitly relies on
\r
as the line separator:
\n
- causes an unexpected line break;
\r\n
- no change, since\r
is already interpreted as a line break; -
User explicitly relies on
\n
as the line separator:
\r
- causes an unexpected line break;
\r\n
- no change, since\n
is already interpreted as a line break;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@osiegmar, would it be possible to add support for a lineSeparator()
parameter in FastCSV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially, yes. Of course, this wouldn’t be a valid CSV file at all. Is this really a desired feature or just lack of specification/documentation and a good chance to change that with the new major version of JUnit?
Is there a (good) reason, someone separates text records by anything that is not a newline sequence?
Is there known usage of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dropping this in the new major version makes sense. I'm not aware of any cases other than using the same line separator on different operating systems. IIRC I initially introduced it because univocity-parsers would use the system line separator (and only that) be default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dropping this in the new major version makes sense.
Great 👍
I'll make sure to clarify that in the release notes. Adding a few tests wouldn't hurt either.
} | ||
return String.join("\n", csvSource.value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does FastCSV provide an API for line-by-line reading so we don't have to create a string first? It's probably not a big deal since it comes from literals in an annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With osiegmar/FastCSV@1077389 there is one now. @vdmitrienko You may want to give it a try if it simplify things for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @osiegmar. This works well with individual strings, but it doesn't support headers. I think adding an overload that accepts an array (or varargs) of strings could simplify this use case:
build(final CsvCallbackHandler<T> callbackHandler, final String... data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the validation of empty records, having a setting for that could be quite handy. It would also be great if the exception message included the index of the empty record. That said, we could also handle this on our side 🙂
Overview
#4339
I hereby agree to the terms of the JUnit Contributor License Agreement.
Definition of Done
@API
annotations