Skip to content

Fix escaping issue in filterExpression for RedisVectorStore file name #3202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kmw10693
Copy link

Issues#2945

Problem

When using SearchRequest.Builder#filterExpression(String) in a Redis-backed vector store, expressions that include file names such as 'java编程思想.pdf' or Windows-style paths like 'C:\\docs\\file.txt' cause parsing errors due to unescaped special characters (e.g., . or \).

These characters are treated as operators or escape tokens by the internal parser, leading to failures when the input is passed directly to FilterExpressionTextParser.

Changes Introduced

Introduced an internal escapeTextExpression(String) method in SearchRequest.Builder to automatically escape:

  • Backslashes (\) → \\
  • Dots (.) → \.
  • Applied this escaping inside filterExpression(String) before parsing via FilterExpressionTextParser.
  • This escaping is applied only within single-quoted string literals to preserve the overall filter expression syntax.

@markpollack
Copy link
Member

Thanks for the PR, I am afraid to merge this due to unintended sdie effects right before the release. Anything with Regex is always a difficult thing to deal with and make sure it is correct.

@markpollack markpollack added this to the 1.0.x milestone May 16, 2025
@kmw10693
Copy link
Author

@markpollack

Hi, thank you for the thoughtful feedback — I completely understand your concern about introducing regex-related changes right before a release. It's true that regex-related logic can be tricky and prone to side effects if not carefully verified.

To help mitigate this, I’ve added comprehensive test coverage focused on exactly the kind of edge cases that could cause parsing issues. Specifically, the tests ensure that string literals containing dots (.) and backslashes () are properly escaped only within quoted strings, and not elsewhere in the expression.

Here are the cases covered:

Dot and backslash escaping inside file names and paths:

file_name == 'clean.code.pdf' && path == 'C:\\docs\\files'

Multiple independent literals with dots:

file_name == 'a.b.c' && author == 'me.you'

Various file extensions:

'summary.epub', 'lecture_notes.md', 'slides.pptx'

List-style IN clause handling:

file_name IN ['a.pdf', 'b.txt', 'final.report.docx']

Each test verifies that escaping is applied consistently and correctly. I'm happy to improve test coverage further if that’s preferable.

Thanks again for the review 🙏

Matcher matcher = pattern.matcher(expression);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String content = matcher.group(1).replace("\\", "\\\\").replace(".", "\\.");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static Map<String, String> escapeText = {
  // from, to
  "\\", "\\\\",
  ".", "\\."
}

nit. how about extracting escape target texts to variable?
maybe we need more escape text pattern in the future.

@kmw10693
Copy link
Author

@markpollack
I've replaced the Regex with a simpler and more maintainable approach using a constant-based escape map and StringBuilder. This avoids potential side effects while making it easier to extend or audit in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants