Skip to content

[BUG]: nlp-sentencize wrongly breaks sentences in quotation marks #3017

@Pupix

Description

@Pupix

Description

As the title says.

Here are some quick examples

console.log(sentencize('I said "Look out" right before he banged his head'));
> [ 'I said "Look out" right before he banged his head' ] // This is correct

console.log(sentencize('I said "Look out!" right before he banged his head'));
> ['I said "Look out!"', 'right before he banged his head'] // This should be one sentence

From looking at the code it seems to be doing exactly as it's told, but doesn't seem quite right.
Image
If it's a suffix aka " and previous token is a punctuation mark .!?, then split.

Related Issues

#3013

Questions

No.

Demo

No response

Reproduction

console.log(sentencize('I said "Look out!" right before he banged his head'));
> ['I said "Look out!"', 'right before he banged his head']

Expected Results

['I said "Look out!" right before he banged his head']

Actual Results

['I said "Look out!"', 'right before he banged his head']

Version

0.2.2

Environments

Node.js

Browser Version

No response

Node.js / npm Version

v22.9.0

Platform

Windows 11

Checklist

  • Read and understood the Code of Conduct.
  • Searched for existing issues and pull requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions