Skip to content

Commit 9d569b5

Browse files
committed
diffWords now takes an optional intlSegmenter option
kpdecker/jsdiff#539
1 parent 91b6060 commit 9d569b5

File tree

3 files changed

+14
-1
lines changed

3 files changed

+14
-1
lines changed

types/diff/diff-tests.ts

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@ Diff.diffChars(one, other, {
1616
Diff.diffChars(one, other, (value) => {
1717
value; // $ExpectType Change[]
1818
});
19+
Diff.diffWords('吾輩は猫である。名前はまだ無い。', '吾輩は猫である。名前はたぬき。', {
20+
intlSegmenter: new Intl.Segmenter('ja-JP', { granularity: 'word' }),
21+
});
1922
// $ExpectType Change[]
2023
Diff.diffLines(
2124
'line\nold value\nline',

types/diff/index.d.ts

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@ export interface WordsOptions extends BaseOptions {
3232
* `true` to ignore leading and trailing whitespace. This is the same as `diffWords()`.
3333
*/
3434
ignoreWhitespace?: boolean | undefined;
35+
36+
/**
37+
* An optional [`Intl.Segmenter`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object (which must have a `granularity` of `'word'`) for `diffWords` to use to split the text into words.
38+
*
39+
* By default, `diffWords` does not use an `Intl.Segmenter`, just some regexes for splitting text into words. This will tend to give worse results than `Intl.Segmenter` would, but ensures the results are consistent across environments; `Intl.Segmenter` behaviour is only loosely specced and the implementations in browsers could in principle change dramatically in future. If you want to use `diffWords` with an `Intl.Segmenter` but ensure it behaves the same whatever environment you run it in, use an `Intl.Segmenter` polyfill instead of the JavaScript engine's native `Intl.Segmenter` implementation.
40+
*
41+
* Using an `Intl.Segmenter` should allow better word-level diffing of non-English text than the default behaviour. For instance, `Intl.Segmenter`s can generally identify via built-in dictionaries which sequences of adjacent Chinese characters form words, allowing word-level diffing of Chinese. By specifying a language when instantiating the segmenter (e.g. `new Intl.Segmenter('sv', {granularity: 'word'})`) you can also support language-specific rules, like treating Swedish's colon separated contractions (like *k:a* for *kyrka*) as single words; by default this would be seen as two words separated by a colon.
42+
*/
43+
intlSegmenter?: Intl.Segmenter | undefined;
3544
}
3645

3746
export interface LinesOptions extends BaseOptions {

types/diff/tsconfig.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,8 @@
22
"compilerOptions": {
33
"module": "node16",
44
"lib": [
5-
"es6"
5+
"es6",
6+
"es2022.intl"
67
],
78
"noImplicitAny": true,
89
"noImplicitThis": true,

0 commit comments

Comments
 (0)