You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The RelevancyEvaluator is key for validating RAG flows. This pull request improves it by making the PromptTemplate configurable, improving the format of the default one, introducing a Builder, and extending the documentation with more details on how to use it.
I added some unit tests. The RelevancyEvaluator is used in lots of integration tests in the project to test the QuestionAnswerAdvisor and RetrievalAugmentationAdvisor, that also help assessing the evaluator itself.
Signed-off-by: Thomas Vitale <[email protected]>
Copy file name to clipboardExpand all lines: spring-ai-docs/src/main/antora/modules/ROOT/pages/api/testing.adoc
+64-36Lines changed: 64 additions & 36 deletions
Original file line number
Diff line number
Diff line change
@@ -6,8 +6,6 @@ One method to evaluate the response is to use the AI model itself for evaluation
6
6
7
7
The Spring AI interface for evaluating responses is `Evaluator`, defined as:
8
8
9
-
10
-
11
9
[source,java]
12
10
----
13
11
@FunctionalInterface
@@ -42,58 +40,88 @@ public class EvaluationRequest {
42
40
* `dataList`: Contextual data, such as from Retrieval Augmented Generation, appended to the raw input.
43
41
* `responseContent`: The AI model's response content as a `String`
44
42
45
-
== RelevancyEvaluator
43
+
== Relevancy Evaluator
46
44
47
-
One implementation is the `RelevancyEvaluator`, which uses the AI model for evaluation. More implementations will be available in future releases.
45
+
The `RelevancyEvaluator` is an implementation of the `Evaluator` interface, designed to assess the relevance of AI-generated responses against provided context. This evaluator helps assess the quality of a RAG flow by determining if the AI model's response is relevant to the user's input with respect to the retrieved context.
48
46
49
-
The `RelevancyEvaluator` uses the input (`userText`) and the AI model's output (`chatResponse`) to ask the question:
47
+
The evaluation is based on the user input, the AI model's response, and the context information. It uses a prompt template to ask the AI model if the response is relevant to the user input and context.
50
48
51
-
[source, text]
49
+
This is the default prompt template used by the `RelevancyEvaluator`:
50
+
51
+
[source,text]
52
52
----
53
53
Your task is to evaluate if the response for the query
54
-
is in line with the context information provided.\n
55
-
You have two options to answer. Either YES/ NO.\n
56
-
Answer - YES, if the response for the query
57
-
is in line with context information otherwise NO.\n
58
-
Query: \n {query}\n
59
-
Response: \n {response}\n
60
-
Context: \n {context}\n
61
-
Answer: "
62
-
----
54
+
is in line with the context information provided.
63
55
64
-
Here is an example of a JUnit test that performs a RAG query over a PDF document loaded into a Vector Store and then evaluates if the response is relevant to the user text.
56
+
You have two options to answer. Either YES or NO.
65
57
66
-
[source,java]
67
-
----
68
-
@Test
69
-
void testEvaluation() {
58
+
Answer YES, if the response for the query
59
+
is in line with context information otherwise NO.
70
60
71
-
dataController.delete();
72
-
dataController.load();
61
+
Query:
62
+
{query}
73
63
74
-
String userText = "What is the purpose of Carina?";
NOTE: You can customize the prompt template by providing your own `PromptTemplate` object via the `.promptTemplate()` builder method. See xref:_custom_template[Custom Template] for details.
assertTrue(evaluationResponse.isPass(), "Response is not relevant to the question");
77
+
Here is an example of usage of the `RelevancyEvaluator` in an integration test, validating the result of a RAG flow using the `RetrievalAugmentationAdvisor`:
92
78
79
+
[source,java]
80
+
----
81
+
@Test
82
+
void evaluateRelevancy() {
83
+
String question = "Where does the adventure of Anacletus and Birba take place?";
The code above is from the example application located https://github.com/rd-1-2022/ai-azure-rag.git[here].
114
+
You can find several integration tests in the Spring AI project that use the `RelevancyEvaluator` to test the functionality of the `QuestionAnswerAdvisor` (see https://github.com/spring-projects/spring-ai/blob/main/spring-ai-integration-tests/src/test/java/org/springframework/ai/integration/tests/client/advisor/QuestionAnswerAdvisorIT.java[tests]) and `RetrievalAugmentationAdvisor` (see https://github.com/spring-projects/spring-ai/blob/main/spring-ai-integration-tests/src/test/java/org/springframework/ai/integration/tests/client/advisor/RetrievalAugmentationAdvisorIT.java[tests]).
115
+
116
+
=== Custom Template
117
+
118
+
The `RelevancyEvaluator` uses a default template to prompt the AI model for evaluation. You can customize this behavior by providing your own `PromptTemplate` object via the `.promptTemplate()` builder method.
119
+
120
+
The custom `PromptTemplate` can use any `TemplateRenderer` implementation (by default, it uses `StPromptTemplate` based on the https://www.stringtemplate.org/[StringTemplate] engine). The important requirement is that the template must contain the following placeholders:
121
+
122
+
* a `query` placeholder to receive the user question.
123
+
* a `response` placeholder to receive the AI model's response.
124
+
* a `context` placeholder to receive the context information.
0 commit comments