Skip to content

Possible error in evaluator pass / fail condition #393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ShayakSarkar
Copy link

Possible error in evaluator pass / fail condition. Seems like pass should be for the condition that score >= threshold

Possible error in evaluator pass / fail condition. Seems like pass should be for the condition that score >= threshold
Copy link
Contributor

@ShayakSarkar : Thanks for your contribution! The author(s) and reviewer(s) have been notified to review your proposed change.

Copy link
Contributor

Learn Build status updates of commit a1d121b:

✅ Validation status: passed

File Status Preview URL Details
articles/ai-foundry/concepts/evaluation-evaluators/general-purpose-evaluators.md ✅Succeeded

For more details, please refer to the build report.

For any questions, please:

Copy link
Contributor

Learn Build status updates of commit 20f4108:

✅ Validation status: passed

File Status Preview URL Details
articles/ai-foundry/concepts/evaluation-evaluators/general-purpose-evaluators.md ✅Succeeded

For more details, please refer to the build report.

For any questions, please:

@ShayakSarkar
Copy link
Author

Seems like the examples show pass in scenarios where score >= threshold

In the following example:

  • relevance = 3 and relevance_threshold = 3. Score >= threshold (pass).
  • coherence = 2 and coherence_threshold = 3. Score <= threshold (fail).
    {
    "f1_score": 0.631578947368421,
    "f1_result": "pass",
    "f1_threshold": 3,
    "similarity": 4.0,
    "gpt_similarity": 4.0,
    "similarity_result": "pass",
    "similarity_threshold": 3,
    "fluency": 3.0,
    "gpt_fluency": 3.0,
    "fluency_reason": "The input Data should get a Score of 3 because it clearly conveys an idea with correct grammar and adequate vocabulary, but it lacks complexity and variety in sentence structure.",
    "fluency_result": "pass",
    "fluency_threshold": 3,
    "relevance": 3.0,
    "gpt_relevance": 3.0,
    "relevance_reason": "The RESPONSE does not fully answer the QUERY because it fails to explicitly state that Marie Curie was born in Warsaw, which is the key detail needed for a complete understanding. Instead, it only negates Paris, which does not fully address the question.",
    "relevance_result": "pass",
    "relevance_threshold": 3,
    "coherence": 2.0,
    "gpt_coherence": 2.0,
    "coherence_reason": "The RESPONSE provides some relevant information but lacks a clear and logical structure, making it difficult to follow. It does not directly answer the question in a coherent manner, which is why it falls into the "Poorly Coherent Response" category.",
    "coherence_result": "fail",
    "coherence_threshold": 3,
    "groundedness": 3.0,
    "gpt_groundedness": 3.0,
    "groundedness_reason": "The response attempts to answer the query about Marie Curie's birthplace but includes incorrect information by stating she was not born in Paris, which is irrelevant. It does provide the correct birthplace (Warsaw), but the misleading nature of the response affects its overall groundedness. Therefore, it deserves a score of 3.",
    "groundedness_result": "pass",
    "groundedness_threshold": 3
    }

Copy link
Contributor

Learn Build status updates of commit 35b5264:

✅ Validation status: passed

File Status Preview URL Details
articles/ai-foundry/concepts/evaluation-evaluators/agent-evaluators.md ✅Succeeded
articles/ai-foundry/concepts/evaluation-evaluators/general-purpose-evaluators.md ✅Succeeded
articles/ai-foundry/concepts/evaluation-evaluators/rag-evaluators.md ✅Succeeded
articles/ai-foundry/concepts/evaluation-evaluators/textual-similarity-evaluators.md ✅Succeeded

For more details, please refer to the build report.

For any questions, please:

@v-regandowner
Copy link
Contributor

@lgayhardt - Can you review the proposed changes?

IMPORTANT: When the changes are ready for publication, adding a #sign-off comment is the best way to signal that the PR is ready for the review team to merge.

#label:"aq-pr-triaged"
@MicrosoftDocs/public-repo-pr-review-team

@prmerger-automator prmerger-automator bot added the aq-pr-triaged C+L Pull Request Review Team label label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants