Skip to content

fix: ensure nlp/sentencize handles punctuation in quotation marks properly #5381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 23, 2025

Conversation

Planeshifter
Copy link
Member

Description

What is the purpose of this pull request?

This pull request:

  • updates nlp/sentencize to better handle punctuation marks inside of quotation marks and ensures we do not end sentences prematurely.

Related Issues

Does this pull request have any related issues?

This pull request:

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.


@stdlib-js/reviewers

…roperly

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: passed
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: na
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---
@stdlib-bot stdlib-bot added the Needs Review A pull request which needs code review. label Feb 22, 2025
@Planeshifter Planeshifter requested a review from kgryte February 22, 2025 22:02
@stdlib-bot
Copy link
Contributor

stdlib-bot commented Feb 22, 2025

Coverage Report

Package Statements Branches Functions Lines
nlp/sentencize $\color{green}203/203$
$\color{green}+100.00\%$
$\color{green}35/35$
$\color{green}+100.00\%$
$\color{green}2/2$
$\color{green}+100.00\%$
$\color{green}203/203$
$\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

---
type: pre_commit_static_analysis_report
description: Results of running static analysis checks when committing changes.
report:
  - task: lint_filenames
    status: passed
  - task: lint_editorconfig
    status: passed
  - task: lint_markdown
    status: na
  - task: lint_package_json
    status: na
  - task: lint_repl_help
    status: na
  - task: lint_javascript_src
    status: passed
  - task: lint_javascript_cli
    status: na
  - task: lint_javascript_examples
    status: na
  - task: lint_javascript_tests
    status: na
  - task: lint_javascript_benchmarks
    status: na
  - task: lint_python
    status: na
  - task: lint_r
    status: na
  - task: lint_c_src
    status: na
  - task: lint_c_examples
    status: na
  - task: lint_c_benchmarks
    status: na
  - task: lint_c_tests_fixtures
    status: na
  - task: lint_shell
    status: na
  - task: lint_typescript_declarations
    status: na
  - task: lint_typescript_tests
    status: na
  - task: lint_license_headers
    status: passed
---

---
type: pre_push_report
description: Results of running various checks prior to pushing changes.
report:
  - task: run_javascript_examples
    status: na
  - task: run_c_examples
    status: na
  - task: run_cpp_examples
    status: na
  - task: run_javascript_readme_examples
    status: na
  - task: run_c_benchmarks
    status: na
  - task: run_cpp_benchmarks
    status: na
  - task: run_fortran_benchmarks
    status: na
  - task: run_javascript_benchmarks
    status: na
  - task: run_julia_benchmarks
    status: na
  - task: run_python_benchmarks
    status: na
  - task: run_r_benchmarks
    status: na
  - task: run_javascript_tests
    status: na
---
Copy link
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Long-term, this is likely to be a game of whack-a-mole, and we're going to need to bite the bullet and actually implement the Unicode segmentation algorithm.

@kgryte kgryte removed the Needs Review A pull request which needs code review. label Feb 23, 2025
@kgryte kgryte merged commit eb500b7 into develop Feb 23, 2025
11 checks passed
@kgryte kgryte deleted the philipp/fix-sentencize-quotation-marks branch February 23, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: nlp-sentencize wrongly breaks sentences in quotation marks
3 participants