Possible error in evaluator pass / fail condition #393

ShayakSarkar · 2025-05-24T15:24:18Z

Possible error in evaluator pass / fail condition. Seems like pass should be for the condition that score >= threshold

prmerger-automator · 2025-05-24T15:24:38Z

@ShayakSarkar : Thanks for your contribution! The author(s) and reviewer(s) have been notified to review your proposed change.

learn-build-service-prod · 2025-05-24T15:27:48Z

Learn Build status updates of commit a1d121b:

✅ Validation status: passed

File	Status	Preview URL	Details
articles/ai-foundry/concepts/evaluation-evaluators/general-purpose-evaluators.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

learn-build-service-prod · 2025-05-24T15:40:42Z

Learn Build status updates of commit 20f4108:

✅ Validation status: passed

File	Status	Preview URL	Details
articles/ai-foundry/concepts/evaluation-evaluators/general-purpose-evaluators.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

ShayakSarkar · 2025-05-24T15:40:57Z

Seems like the examples show pass in scenarios where score >= threshold

In the following example:

relevance = 3 and relevance_threshold = 3. Score >= threshold (pass).
coherence = 2 and coherence_threshold = 3. Score <= threshold (fail).
{
"f1_score": 0.631578947368421,
"f1_result": "pass",
"f1_threshold": 3,
"similarity": 4.0,
"gpt_similarity": 4.0,
"similarity_result": "pass",
"similarity_threshold": 3,
"fluency": 3.0,
"gpt_fluency": 3.0,
"fluency_reason": "The input Data should get a Score of 3 because it clearly conveys an idea with correct grammar and adequate vocabulary, but it lacks complexity and variety in sentence structure.",
"fluency_result": "pass",
"fluency_threshold": 3,
"relevance": 3.0,
"gpt_relevance": 3.0,
"relevance_reason": "The RESPONSE does not fully answer the QUERY because it fails to explicitly state that Marie Curie was born in Warsaw, which is the key detail needed for a complete understanding. Instead, it only negates Paris, which does not fully address the question.",
"relevance_result": "pass",
"relevance_threshold": 3,
"coherence": 2.0,
"gpt_coherence": 2.0,
"coherence_reason": "The RESPONSE provides some relevant information but lacks a clear and logical structure, making it difficult to follow. It does not directly answer the question in a coherent manner, which is why it falls into the "Poorly Coherent Response" category.",
"coherence_result": "fail",
"coherence_threshold": 3,
"groundedness": 3.0,
"gpt_groundedness": 3.0,
"groundedness_reason": "The response attempts to answer the query about Marie Curie's birthplace but includes incorrect information by stating she was not born in Paris, which is irrelevant. It does provide the correct birthplace (Warsaw), but the misleading nature of the response affects its overall groundedness. Therefore, it deserves a score of 3.",
"groundedness_result": "pass",
"groundedness_threshold": 3
}

learn-build-service-prod · 2025-05-24T17:49:28Z

Learn Build status updates of commit 35b5264:

✅ Validation status: passed

File	Status	Preview URL	Details
articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md	✅Succeeded
articles/ai-foundry/concepts/evaluation-evaluators/general-purpose-evaluators.md	✅Succeeded
articles/ai-foundry/concepts/evaluation-evaluators/rag-evaluators.md	✅Succeeded
articles/ai-foundry/concepts/evaluation-evaluators/textual-similarity-evaluators.md	✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Try searching the learn.microsoft.com contributor guides
Post your question in the Learn support channel

v-regandowner · 2025-05-26T14:36:31Z

@lgayhardt - Can you review the proposed changes?

IMPORTANT: When the changes are ready for publication, adding a #sign-off comment is the best way to signal that the PR is ready for the review team to merge.

#label:"aq-pr-triaged"
@MicrosoftDocs/public-repo-pr-review-team

Possible error in evaluator pass / fail condition

a1d121b

Possible error in evaluator pass / fail condition. Seems like pass should be for the condition that score >= threshold

prmerger-automator bot added the do-not-merge label May 24, 2025

prmerger-automator bot assigned lgayhardt May 24, 2025

prmerger-automator bot requested review from lgayhardt and changliu2 May 24, 2025 15:24

prmerger-automator bot added azure-ai-foundry/svc Change sent to author labels May 24, 2025

Removed other possible errors

20f4108

Other possible errors

35b5264

prmerger-automator bot added the aq-pr-triaged C+L Pull Request Review Team label label May 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible error in evaluator pass / fail condition #393

Possible error in evaluator pass / fail condition #393

Uh oh!

ShayakSarkar commented May 24, 2025

Uh oh!

prmerger-automator bot commented May 24, 2025

Uh oh!

learn-build-service-prod bot commented May 24, 2025

Uh oh!

learn-build-service-prod bot commented May 24, 2025

Uh oh!

ShayakSarkar commented May 24, 2025

Uh oh!

learn-build-service-prod bot commented May 24, 2025

Uh oh!

v-regandowner commented May 26, 2025

Uh oh!

Uh oh!

Possible error in evaluator pass / fail condition #393

Are you sure you want to change the base?

Possible error in evaluator pass / fail condition #393

Uh oh!

Conversation

ShayakSarkar commented May 24, 2025

Uh oh!

prmerger-automator bot commented May 24, 2025

Uh oh!

learn-build-service-prod bot commented May 24, 2025

✅ Validation status: passed

Uh oh!

learn-build-service-prod bot commented May 24, 2025

✅ Validation status: passed

Uh oh!

ShayakSarkar commented May 24, 2025

Uh oh!

learn-build-service-prod bot commented May 24, 2025

✅ Validation status: passed

Uh oh!

v-regandowner commented May 26, 2025

Uh oh!

Uh oh!