Skip to content

Track accurate PR cycle time by identifying first ready-for-review time #649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 0 commits into from

Conversation

Ayyanaruto
Copy link
Contributor

@Ayyanaruto Ayyanaruto commented Apr 25, 2025

#634
This PR enhances GitHub analytics by incorporating pull request timeline events like ready_for_review and reviewed to improve the accuracy of PR cycle time metrics.

Key Enhancements:

  • Timeline Event Support: Introduced methods to fetch and process timeline events from the GitHub API with pagination and event filtering.
  • Data Modeling: Added structured models and type definitions for ready_for_review and reviewed events to ensure type safety and consistency.
  • Metrics Updates: Refined get_pr_performance and get_rework_cycles to base cycle time and review analysis on actual timeline events instead of PR creation time.
  • ETL Integration: Extended the ETL pipeline to process and include ready_for_review events in PR event analytics.

Continuation of PR #636

Summary by CodeRabbit

  • New Features

    • Added support for fetching and processing GitHub pull request timeline events, including "reviewed" and "ready_for_review" types.
    • Introduced models and data classes to represent detailed timeline events and users.
    • Enhanced event filtering and sorting utilities for timeline data.
    • Added a new API method to retrieve PR timeline events with pagination.
    • Updated existing handlers to process timeline events and extract relevant timestamps.
  • Bug Fixes

    • Improved filtering of events to exclude bot-generated actions.
  • Refactor

    • Replaced direct review data retrieval with timeline-based event extraction.
    • Centralized event creation logic for consistency.
    • Modularized timeline event filtering and extraction utilities.
  • Tests

    • Added tests for timeline event models, filtering utilities, and event extraction correctness.
    • Updated existing tests to verify new timeline event processing logic.

Copy link

coderabbitai bot commented Apr 25, 2025

Walkthrough

This update adds timeline event models, methods for fetching and processing PR timeline events, and refactors related filtering and event extraction logic. It enhances the backend to handle detailed GitHub PR timeline data, including review and "ready for review" events, for improved incident and PR analytics.

Changes

File(s) Change Summary
backend/analytics_server/mhq/exapi/github.py Added methods to fetch, paginate, and process GitHub PR timeline events, including filtering and conversion to typed models. Introduced get_pr_timeline and supporting private methods for event handling and error management.
backend/analytics_server/mhq/exapi/models/timeline.py Introduced dataclasses for GitHubUser, GitHubReviewEvent, GitHubReadyForReviewEvent, and the GitHubTimeline type alias. Handles timestamp parsing and models event data.
backend/analytics_server/mhq/exapi/types/timeline.py Added TypedDict definitions for GitHub timeline events and users, including review and ready-for-review event structures, and a union type for timeline items.
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py Refactored internal logic to filter and process only review and ready-for-review events when calculating PR performance and rework cycles. Adjusted cycle time calculation to use the first ready-for-review event.
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py Refactored to fetch and process PR timeline events instead of just reviews. Updated event extraction and conversion logic, added a helper for standardized event creation, and updated method signatures to handle new event types.
backend/analytics_server/mhq/service/code/sync/utils/timeline.py Added TimelineEventUtils class with static methods for filtering, extracting, and sorting timeline events by type or class. Provides utilities for review and ready-for-review event handling.
backend/analytics_server/mhq/store/models/code/enums.py Added READY_FOR_REVIEW member to the PullRequestEventType enum.
backend/analytics_server/tests/factories/models/code.py Updated the get_pull_request_event factory to accept an optional actor_username parameter for more flexible event creation.
backend/analytics_server/tests/factories/models/exapi/github.py Added a GitHubUser dataclass, updated GithubPullRequestReview to use it, and modified the factory to construct reviews with user objects.
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py Added multiple tests for PR performance metrics involving ready-for-review and review events, including cycle time calculations and event ordering.
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py Updated tests to match new event data structures, method signatures, and field names, including use of actor_username and updated event data extraction.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GithubApiService
    participant GitHub
    participant TimelineEventUtils

    Client->>GithubApiService: get_pr_timeline(org, repo, pr_number)
    GithubApiService->>GitHub: GET /repos/{org}/{repo}/pulls/{pr_number}/timeline (paginated)
    GitHub-->>GithubApiService: Timeline events (JSON)
    GithubApiService->>GithubApiService: _fetch_all_timeline_events()
    GithubApiService->>GithubApiService: _process_timeline_events()
    GithubApiService->>TimelineEventUtils: Filter/sort events by type
    GithubApiService-->>Client: List[GitHubTimeline]
Loading

Poem

In the warren of code, new timelines appear,
With reviews and "ready" events, all crystal clear.
Rabbits hop through events, sorting with glee,
From GitHub to models, as neat as can be.
Now metrics are sharper, with cycles in view—
A carrot for progress, and analytics too!
🥕✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (8)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (1)

58-66: Redundant in-place sort after already receiving a sorted list

TimelineEventUtils.get_sorted_events_by_type() returns an already sorted list.
Calling review_events.sort(...) immediately afterwards performs an unnecessary O(n log n) sort on the same data.

-        review_events.sort(key=lambda x: x.created_at)

You can safely remove the above line to save a few CPU cycles and keep the code intent clear.

backend/analytics_server/mhq/service/code/sync/utils/timeline.py (1)

30-42: Type-var mismatch in filter_by_instance_type return annotation

events is List[E], event_class is Type[T], but the function returns
List[T]. When filtering, the resulting list actually contains items of
the same runtime type as the input (E), not necessarily T.
This confuses static type-checkers (e.g., mypy/pyright).

-    def filter_by_instance_type(events: List[E], event_class: Type[T]) -> List[T]:
+    def filter_by_instance_type(events: List[E], event_class: Type[T]) -> List[T]:

Change the return type to List[E] (or drop the type vars and use Any) or
replace the parameters with a single Type[T] / List[T] pairing.

backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (1)

215-233: idempotency_key type mismatch may hide bugs

get_pull_request_event ultimately stores idempotency_key as a string,
but the test passes the raw event.id (likely an int). Because Python
silently accepts both, the discrepancy can slip into production where DB
schemas often enforce varchar.

-            idempotency_key=event1.id,
+            idempotency_key=str(event1.id),

Apply the same change for event2.id and the old-event block below to keep
test data faithful to production types.

backend/analytics_server/mhq/exapi/github.py (1)

221-223: Avoid re-allocating the supported-event set on every call

SUPPORTED_EVENTS is re-created each iteration; move it to a module/class level constant to save allocations and ease discoverability.

-        SUPPORTED_EVENTS = {"ready_for_review", "reviewed"}
-        return timeline_event.get("event") in SUPPORTED_EVENTS
+        return timeline_event.get("event") in _SUPPORTED_TIMELINE_EVENTS

(define _SUPPORTED_TIMELINE_EVENTS once at the top of the file).

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)

308-324: Minor: don’t silently drop draft / pending reviews

continue # Skip incomplete reviews filters on the presence of
review.submitted_at, thereby omitting reviews in “PENDING” state.
If the intention is to exclude only draft/pending entries, also
check review.state != "PENDING" so that future GitHub changes do not
invalidate the logic.

backend/analytics_server/mhq/exapi/models/timeline.py (3)

29-42: Consider using more specific types for Dict fields and enum values.

Two improvements to consider:

  1. The _links field is typed as plain Dict without specifying key and value types. Use Dict[str, Any] or a more specific type if the structure is known.
  2. The state field should use a Literal type to restrict values to valid GitHub review states ("APPROVED", "CHANGES_REQUESTED", "COMMENTED", etc.)
- from typing import Optional, Dict, Literal, Union
+ from typing import Optional, Dict, Literal, Union, Any

@dataclass
class GitHubReviewEvent:
    id: int
    node_id: str
    user: GitHubUser
    body: str
    commit_id: str
    submitted_at: Union[str, datetime]
-   state: str
+   state: Literal["APPROVED", "CHANGES_REQUESTED", "COMMENTED", "DISMISSED"]
    html_url: str
    pull_request_url: str
    author_association: str
-   _links: Dict
+   _links: Dict[str, Any]
    event: Literal["reviewed"]

44-49: Defensively handle potential timestamp parsing errors.

The timestamp conversion looks good, but consider adding error handling for malformed timestamps to prevent unexpected exceptions at runtime.

def __post_init__(self):
    if isinstance(self.submitted_at, str) and self.submitted_at:
        # Convert ISO format with Z (UTC) to datetime
+       try:
            self.submitted_at = datetime.fromisoformat(
                self.submitted_at.replace("Z", "+00:00")
            )
+       except ValueError as e:
+           # Log the error or handle invalid timestamp format
+           print(f"Warning: Invalid timestamp format {self.submitted_at}: {e}")

44-69: Extract duplicate timestamp conversion logic to a utility function.

Both event classes have nearly identical timestamp conversion logic in their __post_init__ methods. Consider extracting this to a utility function to avoid code duplication.

+ def _convert_iso_timestamp(timestamp: Optional[Union[str, datetime]]) -> Optional[datetime]:
+     """Convert ISO 8601 timestamp string to datetime object."""
+     if not isinstance(timestamp, str) or not timestamp:
+         return timestamp
+     try:
+         return datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
+     except ValueError as e:
+         # Log the error or handle invalid timestamp format
+         print(f"Warning: Invalid timestamp format {timestamp}: {e}")
+         return timestamp

@dataclass
class GitHubReviewEvent:
    # ... fields ...

    def __post_init__(self):
-       if isinstance(self.submitted_at, str) and self.submitted_at:
-           # Convert ISO format with Z (UTC) to datetime
-           self.submitted_at = datetime.fromisoformat(
-               self.submitted_at.replace("Z", "+00:00")
-           )
+       self.submitted_at = _convert_iso_timestamp(self.submitted_at)

@dataclass
class GitHubReadyForReviewEvent:
    # ... fields ...

    def __post_init__(self):
-       if isinstance(self.created_at, str) and self.created_at:
-           # Convert ISO format with Z (UTC) to datetime
-           self.created_at = datetime.fromisoformat(
-               self.created_at.replace("Z", "+00:00")
-           )
+       self.created_at = _convert_iso_timestamp(self.created_at)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8a545c0 and 4ef489c.

📒 Files selected for processing (11)
  • backend/analytics_server/mhq/exapi/github.py (3 hunks)
  • backend/analytics_server/mhq/exapi/models/timeline.py (1 hunks)
  • backend/analytics_server/mhq/exapi/types/timeline.py (1 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (6 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (4 hunks)
  • backend/analytics_server/mhq/service/code/sync/utils/timeline.py (1 hunks)
  • backend/analytics_server/mhq/store/models/code/enums.py (1 hunks)
  • backend/analytics_server/tests/factories/models/code.py (2 hunks)
  • backend/analytics_server/tests/factories/models/exapi/github.py (2 hunks)
  • backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (2 hunks)
  • backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/analytics_server/mhq/exapi/github.py (2)
backend/analytics_server/mhq/exapi/types/timeline.py (2)
  • GitHubReadyForReviewEventDict (53-54)
  • GitHubReviewEventDict (26-38)
backend/analytics_server/mhq/exapi/models/timeline.py (3)
  • GitHubReadyForReviewEvent (53-69)
  • GitHubReviewEvent (30-49)
  • GitHubUser (7-26)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (2)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)
  • GithubETLHandler (43-420)
backend/analytics_server/tests/factories/models/code.py (1)
  • get_pull_request_event (79-108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: All file linting
🔇 Additional comments (14)
backend/analytics_server/mhq/store/models/code/enums.py (1)

32-32: Well-structured enum addition

Adding the READY_FOR_REVIEW enum value aligns perfectly with the PR objective of tracking accurate PR cycle times. This allows the system to explicitly model and process "ready for review" timeline events from GitHub.

backend/analytics_server/tests/factories/models/code.py (1)

89-89: Good improvement to the test factory

Adding the optional actor_username parameter enhances test flexibility by allowing explicit actor specification independent of reviewer, which aligns with the PR's focus on timeline events.

Also applies to: 107-107

backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (5)

4-8: Well-structured imports

Good job explicitly importing the required enums - this makes dependencies clear and the code more maintainable.


509-528: Comprehensive test for ready-for-review cycle time calculation

This test properly verifies that the cycle time is calculated from the ready-for-review event timestamp rather than PR creation time, which is a key improvement in this PR.


531-543: Good fallback test case

Excellent test for the fallback behavior when no ready-for-review event exists. This ensures backward compatibility with PRs that don't have explicit ready-for-review events.


546-574: Thorough test for multiple ready-for-review events

This test correctly verifies that the earliest ready-for-review event is used when multiple events exist. The test intentionally passes events in reverse chronological order, which is a good practice to ensure the implementation is sorting correctly.


577-616: Comprehensive integration test for all performance metrics

This test effectively validates all performance metrics (cycle time, first review time, rework time, merge time) when both ready-for-review and review events are present. It ensures that all timing calculations work together as expected.

backend/analytics_server/tests/factories/models/exapi/github.py (4)

32-35: Good data modeling with GitHubUser dataclass

Adding this dataclass improves type safety and structure, which aligns with the PR objective of enhancing data modeling for timeline events.


41-41: Well-structured model update

Changing from a simple string to a structured GitHubUser object improves type safety and readability.


49-49: Consistent property access update

Correctly updated to access the login via the user object structure.


59-60: Clear implementation with helpful comment

Good job adding an explanatory comment and updating the implementation to use the new GitHubUser class.

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)

357-374: Preserve timestamp fields in data or document their removal

Stripping submitted_at / created_at from the stored data dict means the
raw event no longer contains its own timestamp, forcing consumers to look at the
enclosing PullRequestEvent.created_at. That is fine, but please add a comment
or keep the fields if downstream analytics expect the original GitHub field
names.

backend/analytics_server/mhq/exapi/models/timeline.py (2)

64-69: 🛠️ Refactor suggestion

Update timestamp conversion logic to handle None value.

With the suggested change to make created_at optional, the conversion logic should be updated to handle None values.

def __post_init__(self):
-   if isinstance(self.created_at, str) and self.created_at:
+   if isinstance(self.created_at, str) and self.created_at and self.created_at is not None:
        # Convert ISO format with Z (UTC) to datetime
+       try:
            self.created_at = datetime.fromisoformat(
                self.created_at.replace("Z", "+00:00")
            )
+       except ValueError as e:
+           # Log the error or handle invalid timestamp format
+           print(f"Warning: Invalid timestamp format {self.created_at}: {e}")

Likely an incorrect or invalid review comment.


7-26:

❓ Verification inconclusive

Verify the completeness of the GitHubUser fields.

The GitHubUser class looks well-structured with comprehensive fields. The user_view_type field is not a standard GitHub API user object field in my experience. Verify if this is a custom field or a newer addition to the GitHub API.


🏁 Script executed:

#!/bin/bash
# Check GitHub API documentation or examples to verify user_view_type field

# First, let's check if this field appears in GitHub's API documentation
echo "Searching for user_view_type in GitHub API docs"
curl -s https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28 | grep -i "user_view_type"

# Let's also check if it exists in actual user data from GitHub API 
echo "Fetching a sample user to check fields"
curl -s https://api.github.com/users/octocat | jq 'keys' | grep -i "user_view_type"

Length of output: 65872


GitHubUser.user_view_type is an official field—confirm your endpoint usage

We’ve verified that user_view_type appears in GitHub’s REST API spec (it’s part of the “Simple User” object returned by GET /users – see https://docs.github.com/rest/reference/users#list-users). However, neither GET /users/{username} nor GET /user includes this field in their actual responses.

• If you consume list-users responses (where user_view_type is present), it’s safe to keep.
• If you model full user endpoints, remove or mark user_view_type optional to avoid parsing errors.

File: backend/analytics_server/mhq/exapi/models/timeline.py (lines 7–26)
Suggested diff:

 class GitHubUser:
     login: str
     id: int
     node_id: str
     avatar_url: str
     gravatar_id: str
     url: str
     html_url: str
     followers_url: str
     following_url: str
     gists_url: str
     starred_url: str
     subscriptions_url: str
     organizations_url: str
     repos_url: str
     events_url: str
     received_events_url: str
     type: str
-    user_view_type: str
+    user_view_type: Optional[str]  # only returned by List-Users endpoints
     site_admin: bool

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
backend/analytics_server/mhq/exapi/types/timeline.py (1)

1-13: Consider handling potential missing fields in GitHubUserDict

The base user dictionary defines all fields as required, but GitHub API responses might not consistently include all these fields.

-from typing import TypedDict, Dict, Union, Literal
+from typing import TypedDict, Dict, Union, Literal
+try:
+    # py≥3.11
+    from typing import NotRequired
+except ImportError:  # py≤3.10
+    from typing_extensions import NotRequired

 class GitHubUserDict(TypedDict):
     """GitHub user information as a dictionary."""

     # Required fields
     login: str
     id: int
     node_id: str
-    type: str
+    type: NotRequired[str]  # Sometimes might be missing
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ef489c and 99455f1.

📒 Files selected for processing (5)
  • backend/analytics_server/mhq/exapi/github.py (3 hunks)
  • backend/analytics_server/mhq/exapi/models/timeline.py (1 hunks)
  • backend/analytics_server/mhq/exapi/types/timeline.py (1 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (6 hunks)
  • backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
  • backend/analytics_server/mhq/exapi/models/timeline.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: All file linting
🔇 Additional comments (12)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (5)

178-181: Good fix: Now using correct owner login for timeline fetch!

This correctly addresses the previous issue where pr.user.login (the PR author) was incorrectly used instead of the repository owner, which would have failed with 404 errors for organization repositories.


182-187: Neat implementation using TimelineEventUtils

Good separation of concerns using the utility class to extract specific event types from the timeline. I like how you're only keeping the earliest ready-for-review event which accurately reflects when the PR first became reviewable.


299-325: Clear processing of review events

Good implementation of review event processing with the appropriate skip condition for incomplete reviews.


327-342: Well-structured ready-for-review event handling

The conditional processing of the ready-for-review event is clean and follows the same pattern as review events.


347-375: Good extraction of common event creation logic

The helper method effectively standardizes PullRequestEvent creation, improving maintainability and reducing duplicate code.

backend/analytics_server/mhq/exapi/github.py (4)

128-155: Well-structured PR timeline fetch method

Good implementation with proper error handling and clear documentation.


195-203: Great improvements to API request handling

This implementation properly addresses previous concerns by:

  1. Adding the required "mockingbird-preview" Accept header for the preview API
  2. Including a 15-second timeout to prevent hanging

284-294: Properly handling possible extra fields in GitHub user data

Good use of dataclasses.fields to dynamically filter the input dictionary to only include fields present in the GitHubUser dataclass, preventing TypeError exceptions.


330-340: Consistent field filtering in both event creation methods

The same approach is consistently used for both review events and ready-for-review events.

backend/analytics_server/mhq/exapi/types/timeline.py (3)

35-37: Comprehensive user type definition

Good job combining the required and optional fields for GitHub users into a complete type definition.


40-52: Well-defined review event structure

Clear definition of required fields with appropriate types. Using Literal["reviewed"] is a good way to restrict the event type.


96-106: Clear ready-for-review event definition and union type

Good implementation of the specific event type by extending the base event and using a union type to represent different timeline items.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 9

🔭 Outside diff range comments (1)
backend/analytics_server/mhq/service/external_integrations_service.py (1)

137-146: 💡 Verification agent

🧩 Analysis chain

Consider adding GitHub custom domain retrieval logic.

The get_external_integrations_service function retrieves custom domain for GitLab but not for GitHub. Given that the PR adds GitHub custom domain support throughout the codebase, this might be an oversight.

Should we add a _get_custom_github_domain() function similar to the GitLab implementation? This would ensure consistency with the custom domain support being added for GitHub across other parts of the codebase.


🏁 Script executed:

#!/bin/bash
# Check if GitHub custom domain retrieval exists elsewhere in the codebase
rg -A 5 "custom.*domain.*github|github.*custom.*domain" --type py

# Check how GitHub integrations store custom domain in provider_meta
rg -A 10 "UserIdentityProvider\.GITHUB.*provider_meta|provider_meta.*custom_domain.*[Gg]ithub" --type py

Length of output: 4885


Add GitHub custom‐domain logic to get_external_integrations_service

In backend/analytics_server/mhq/service/external_integrations_service.py, the helper only handles GitLab. To support GitHub custom domains (the rest of the code already uses get_custom_github_domain), extend the conditional:

• File: backend/analytics_server/mhq/service/external_integrations_service.py
Lines ~137–146

-    custom_domain_name = None
-    if user_identity_provider == UserIdentityProvider.GITLAB:
-        custom_domain_name = _get_custom_gitlab_domain()
+    custom_domain_name = None
+    if user_identity_provider == UserIdentityProvider.GITLAB:
+        custom_domain_name = _get_custom_gitlab_domain()
+    elif user_identity_provider == UserIdentityProvider.GITHUB:
+        # pull in the existing GitHub helper
+        from mhq.utils.github import get_custom_github_domain
+        custom_domain_name = get_custom_github_domain()
 
     return ExternalIntegrationsService(
         org_id,
         user_identity_provider,
         _get_access_token(),
         custom_domain_name,
     )

– Don’t forget to add the import at the top (or inline as shown) so that get_custom_github_domain is available.

♻️ Duplicate comments (5)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (2)

73-75: ⚠️ Potential issue

Inconsistent state access pattern needs alignment

The code still has inconsistent state access:

  • Line 73 uses x.data["state"]
  • Lines 149 and 155 use review_event.state

Unless PullRequestEvent has a .state property (which doesn't appear to be the case), the latter will raise AttributeError at runtime.

Apply this diff to fix the inconsistency:

-                review_event.state != PullRequestEventState.APPROVED.value
+                review_event.data["state"] != PullRequestEventState.APPROVED.value
-            if review_event.state == PullRequestEventState.APPROVED.value:
+            if review_event.data["state"] == PullRequestEventState.APPROVED.value:

Also applies to: 147-156


120-122: ⚠️ Potential issue

Incorrect calculation of approving reviews count

The calculation len(review_events) - len(blocking_reviews) is incorrect because:

  • It includes "COMMENTED" reviews that are neither approving nor blocking
  • It will break if new review states are introduced

Apply this diff to fix the calculation:

-            approving_reviews=len(review_events) - len(blocking_reviews),
+            approving_reviews=len(approved_reviews),
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)

179-182: Great fix for the organization login issue!

Using pr.base.repo.owner.login correctly handles both user and organization repositories, addressing the 404 errors mentioned in past reviews.

backend/analytics_server/mhq/exapi/github.py (2)

201-209: Excellent implementation of past review suggestions

The timeline fetching correctly implements both the Accept header for the preview API and a timeout to prevent hanging, as suggested in previous reviews.


290-300: ⚠️ Potential issue

Add missing user_view_type default as mentioned in past review

The dataclass field filtering is implemented correctly, but the user_view_type default is still missing, which could cause TypeError if the field is absent from the API response.

In _create_ready_for_review_event:

            actor=(
                GitHubUser(
                    **{
                        k: v
                        for k, v in typed_dict["actor"].items()
                        if k in {f.name for f in dataclasses.fields(GitHubUser)}
                    },
+                    user_view_type=typed_dict["actor"].get("user_view_type", "")
                )
                if typed_dict["actor"]
                else None
            ),

Apply the same fix in _create_review_event for the user field.

Also applies to: 336-346

🧹 Nitpick comments (22)
backend/analytics_server/mhq/service/settings/setting_type_validator.py (1)

6-27: Consider refactoring for better maintainability.

While the current implementation is correct, the repetitive if-statement pattern creates maintenance overhead. Each new setting type requires manual addition here.

Consider refactoring to a more scalable approach:

def settings_type_validator(setting_type: str):
-    if setting_type == SettingType.INCIDENT_SETTING.value:
-        return SettingType.INCIDENT_SETTING
-
-    if setting_type == SettingType.EXCLUDED_PRS_SETTING.value:
-        return SettingType.EXCLUDED_PRS_SETTING
-
-    if setting_type == SettingType.INCIDENT_TYPES_SETTING.value:
-        return SettingType.INCIDENT_TYPES_SETTING
-
-    if setting_type == SettingType.INCIDENT_SOURCES_SETTING.value:
-        return SettingType.INCIDENT_SOURCES_SETTING
-
-    if setting_type == SettingType.DEFAULT_SYNC_DAYS_SETTING.value:
-        return SettingType.DEFAULT_SYNC_DAYS_SETTING
-
-    if setting_type == SettingType.INCIDENT_PRS_SETTING.value:
-        return SettingType.INCIDENT_PRS_SETTING
-
-    # ADD NEW VALIDATOR HERE
+    # Create a mapping from string values to enum types
+    setting_type_mapping = {setting_enum.value: setting_enum for setting_enum in SettingType}
+    
+    if setting_type in setting_type_mapping:
+        return setting_type_mapping[setting_type]

    raise BadRequest(f"Invalid Setting Type: {setting_type}")

This approach automatically includes any new SettingType enum values without requiring manual updates to this validator.

backend/analytics_server/mhq/utils/github.py (1)

69-76: Fix typo and improve return statement.

There's a typo in the log message and the return statement should be explicit.

Apply this diff to fix these issues:

     if not github_domain:
         LOG.warn(
-            f"Custom domain not found for intergration for org {org_id} and provider {UserIdentityProvider.GITHUB.value}"
+            f"Custom domain not found for integration for org {org_id} and provider {UserIdentityProvider.GITHUB.value}"
         )
         # return nothing when custom domain is not found
         # this is to prevent the default domain from being used
-        return
+        return None
 
     return github_domain
web-server/src/utils/domainCheck.ts (1)

1-5: Consider improving the domain validation regex pattern.

The current regex has a few limitations:

  1. The pattern ([-.][A-Za-z0-9]+)* can match invalid domain formats like domain-.com or sub.-domain.com
  2. Underscores are not allowed but are valid in subdomains
  3. Port validation doesn't check the valid range (0-65535)

Consider this improved pattern:

 export const checkDomainWithRegex = (domain: string) => {
   const regex =
-    /^(https?:\/\/)[A-Za-z0-9]+([-.][A-Za-z0-9]+)*\.[A-Za-z]{2,}(:[0-9]{1,5})?(\/\S*)?$/;
+    /^(https?:\/\/)([A-Za-z0-9]([A-Za-z0-9-_]*[A-Za-z0-9])?\.)*[A-Za-z0-9]([A-Za-z0-9-]*[A-Za-z0-9])?\.[A-Za-z]{2,}(:[0-9]{1,5})?(\/\S*)?$/;
   return regex.test(domain);
 };

Alternatively, consider using the URL constructor with try-catch for more robust validation:

export const checkDomainWithRegex = (domain: string) => {
  try {
    const url = new URL(domain);
    return ['http:', 'https:'].includes(url.protocol);
  } catch {
    return false;
  }
};
backend/analytics_server/mhq/utils/string.py (1)

9-13: Consider compiling the regex pattern once for better performance.

The regex pattern is compiled on every function call. For better performance, especially if this function is called frequently, consider compiling the pattern once at module level.

+BOT_NAME_PATTERN = re.compile(
+    r"(?i)(\b[\w@-]*[-_\[\]@ ]+bot[-_\d\[\]]*\b|\[bot\]|_bot_|_bot$|^bot_)"
+)
+
+
 def is_bot_name(name: str) -> bool:
-    pattern = re.compile(
-        r"(?i)(\b[\w@-]*[-_\[\]@ ]+bot[-_\d\[\]]*\b|\[bot\]|_bot_|_bot$|^bot_)"
-    )
-    return bool(pattern.search(name))
+    return bool(BOT_NAME_PATTERN.search(name))
backend/analytics_server/mhq/service/incidents/incident_filter.py (1)

112-125: LGTM! The filtering logic correctly handles revert PR exclusion.

The implementation properly checks for the setting existence and type before filtering out revert PRs from the incident types.

Consider extracting the filtering logic into a helper method for better readability:

+    def __filter_revert_prs_if_needed(self, incident_types: List[str]) -> List[str]:
+        if SettingType.INCIDENT_PRS_SETTING not in self.setting_types:
+            return incident_types
+            
+        incident_prs_setting = self.setting_type_to_settings_map.get(
+            SettingType.INCIDENT_PRS_SETTING
+        )
+        
+        if (
+            isinstance(incident_prs_setting, IncidentPRsSetting)
+            and not incident_prs_setting.include_revert_prs
+        ):
+            return [
+                incident_type
+                for incident_type in incident_types
+                if incident_type != IncidentType.REVERT_PR
+            ]
+        
+        return incident_types
+
     def __incident_type_setting(self) -> List[str]:
         setting: Optional[IncidentTypesSetting] = self.setting_type_to_settings_map.get(
             SettingType.INCIDENT_TYPES_SETTING
         )
         if not setting:
             return []
         incident_types = []
         if setting and isinstance(setting, IncidentTypesSetting):
             incident_types = setting.incident_types

-        if SettingType.INCIDENT_PRS_SETTING in self.setting_types:
-            incident_prs_setting: Optional[IncidentPRsSetting] = (
-                self.setting_type_to_settings_map.get(SettingType.INCIDENT_PRS_SETTING)
-            )
-            if (
-                isinstance(incident_prs_setting, IncidentPRsSetting)
-                and not incident_prs_setting.include_revert_prs
-            ):
-                incident_types = [
-                    incident_type
-                    for incident_type in incident_types
-                    if incident_type != IncidentType.REVERT_PR
-                ]
-
-        return incident_types
+        return self.__filter_revert_prs_if_needed(incident_types)
web-server/src/utils/__tests__/domainCheck.test.ts (1)

19-39: Consider clarifying the invalid port test case and adding more edge cases.

The test case at line 28 ('https://example:toolongtsadasds') appears to be testing a non-numeric port rather than a "too long" port. Consider clarifying this:

-    'https://example:toolongtsadasds',
+    'https://example:abc123', // non-numeric port

Also, consider adding these test cases to improve coverage:

  • Valid URLs with paths: 'https://example.com/path/to/resource'
  • Valid URLs with query parameters: 'https://example.com?key=value'
  • Valid URLs with fragments: 'https://example.com#section'
  • Invalid URLs with multiple consecutive dots: 'https://example..com'
  • International domains: 'https://例え.jp'
backend/analytics_server/mhq/service/settings/models.py (1)

3-3: Well-structured models with proper type annotations!

The new IncidentPRFilter and IncidentPRsSetting models follow best practices with TypedDict and Literal types for field constraints.

Consider adding a docstring to clarify the purpose of the filter fields:

 class IncidentPRFilter(TypedDict):
+    """Filter for incident PRs based on title or head branch patterns."""
     field: Literal["title", "head_branch"]
     value: str

Also applies to: 49-58

backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (1)

369-395: Well-structured bot filter tests with good coverage.

The tests effectively cover all scenarios for filtering bot events. However, there's an inconsistency in the test data structure - line 390 uses "actor" while lines 372 and 382 use "user". Consider standardizing this for consistency.

-    bot_event.data = {"actor": {"type": "Bot", "login": "dependabot"}}
+    bot_event.data = {"user": {"type": "Bot", "login": "dependabot"}}
backend/analytics_server/mhq/service/incidents/models/adapter.py (1)

13-22: Consider the design implications of hardcoded values.

Several fields are hardcoded which may limit the flexibility of this adapter:

  1. incident_type is always REVERT_PR - is this adapter only for revert PRs?
  2. assignees only includes the resolution PR author - what about other assignees or reviewers?
  3. meta is empty - consider including relevant PR metadata

If this is specifically for revert PR incidents, consider renaming the function to be more specific (e.g., adaptRevertPRIncident). Otherwise, make the incident type configurable or derive it from the PR data.

backend/analytics_server/mhq/store/models/code/filter.py (1)

16-16: Improve type hints for better documentation and type safety.

The Dict type hint is too generic and doesn't document the expected structure.

Consider using a more specific type:

-    incident_pr_filters: Optional[List[Dict]] = None
+    incident_pr_filters: Optional[List[Dict[str, str]]] = None  # List of {"field": str, "value": str}

Or better yet, create a TypedDict:

from typing import TypedDict

class IncidentPRFilter(TypedDict):
    field: str
    value: str

# Then use:
incident_pr_filters: Optional[List[IncidentPRFilter]] = None
backend/analytics_server/tests/service/Incidents/test_incident_types_setting.py (2)

41-56: Good edge case testing with minor inconsistency.

The test properly handles empty incident types. However, there's an inconsistency in the entity_id value.

For consistency with other tests, consider using "team_id" instead of "dummy_id":

-        entity_id="dummy_id",
+        entity_id="team_id",

79-79: Consider shortening the test function name.

The function name is quite long and could be more concise while remaining descriptive.

Consider renaming to:

def test_get_incident_types_with_both_settings_includes_revert_prs():
web-server/src/utils/auth.ts (1)

36-36: Fix typo in comment

-    // if customDomain is provded, the host will be customDomain/api/v3
+    // if customDomain is provided, the host will be customDomain/api/v3
backend/analytics_server/tests/exapi/test_github.py (1)

1-41: Well-structured tests for domain handling

The tests effectively cover the main scenarios:

  • Default domain behavior
  • Empty string handling
  • Custom domain with /api/v3 suffix

Consider adding edge case tests:

@patch("mhq.exapi.github.Github", new=DummyGithub)
def test_custom_domain_with_trailing_slash(self):
    token = "deadpool"
    custom_domain = "https://github.sujai.com/"
    service = GithubApiService(access_token=token, domain=custom_domain)
    expected = "https://github.sujai.com/api/v3"
    self.assertEqual(service.base_url, expected)

@patch("mhq.exapi.github.Github", new=DummyGithub)
def test_custom_domain_without_protocol(self):
    token = "deadpool"
    custom_domain = "github.sujai.com"
    service = GithubApiService(access_token=token, domain=custom_domain)
    expected = "github.sujai.com/api/v3"
    self.assertEqual(service.base_url, expected)
web-server/pages/api/internal/[org_id]/utils.ts (2)

309-320: Fix typo in error message.

-    console.error('Error occured while getting custom domain from database:', error);
+    console.error('Error occurred while getting custom domain from database:', error);

334-337: Consider normalizing the GraphQL URL for consistency.

The REST API URL uses normalizeSlashes but the GraphQL URL doesn't. This could lead to malformed URLs if the custom domain has trailing slashes.

 export const getGitHubGraphQLUrl = async (): Promise<string> => {
   const customDomain = await getGitHubCustomDomain();
-  return customDomain ? `${customDomain}/api/graphql` : `${DEFAULT_GH_URL}/graphql`;
+  const url = customDomain ? `${customDomain}/api/graphql` : `${DEFAULT_GH_URL}/graphql`;
+  return normalizeSlashes(url);
 };
web-server/src/content/Dashboards/ConfigureGithubModalBody.tsx (1)

178-211: Well-designed UI for custom domain input.

The implementation provides excellent user experience with:

  • Dynamic label switching between "Custom Domain" and "(Optional)"
  • Contextual helper text with example
  • Proper error state handling
  • Consistent Material-UI patterns

Consider adding autoComplete="off" to prevent browser autocomplete suggestions for custom domains.

backend/analytics_server/mhq/api/incidents.py (1)

2-2: Consider using standard Optional naming convention.

The alias typeOptional is unconventional. If there's a naming conflict with another Optional in the codebase, consider resolving it differently (e.g., using qualified imports). If no conflict exists, use the standard Optional for better readability.

backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)

431-431: Use != operator for better readability

-            if not (pr_event_data.get("type") == "Bot"):
+            if pr_event_data.get("type") != "Bot":
🧰 Tools
🪛 Ruff (0.11.9)

431-431: Use pr_event_data.get("type") != "Bot" instead of not pr_event_data.get("type") == "Bot"

Replace with != operator

(SIM201)

backend/analytics_server/tests/utils/string/test_is_bot_name.py (3)

31-34: Add more comprehensive special character testing.

The current tests cover @ and [bot] patterns, but the regex supports additional special characters.

Consider adding more special character test cases to improve coverage:

 def test_special_characters():
     assert is_bot_name("test@bot")
     assert is_bot_name("[bot]")
+    assert is_bot_name("_bot_")  # Tests the specific _bot_ pattern
+    assert is_bot_name("test[bot]suffix")  # Tests [bot] within a name

36-43: Strong negative test coverage but missing some edge cases.

The negative tests effectively prevent false positives for words containing "bot" as a substring. However, some additional edge cases could be tested.

Consider adding these additional negative test cases:

 def test_negative_cases():
     assert not is_bot_name("robotics")
     assert not is_bot_name("lobotomy")
     assert not is_bot_name("botany")
     assert not is_bot_name("about")
     assert not is_bot_name("robotic")
     assert not is_bot_name("bots")
+    assert not is_bot_name("bot")  # Single "bot" without delimiters
+    assert not is_bot_name("sabotage")  # "bot" in middle without delimiters
+    assert not is_bot_name("robot")  # "bot" at end without delimiters

45-49: Add missing test case for whitespace-only strings.

The edge cases cover empty string, single space, and numeric strings, but there's a missing test case that could be important.

Consider adding a test for strings with only whitespace characters:

 def test_edge_cases():
     assert not is_bot_name("")
     assert not is_bot_name(" ")
     assert not is_bot_name("12345")
+    assert not is_bot_name("   ")  # Multiple spaces
+    assert not is_bot_name("\t")   # Tab character
+    assert not is_bot_name("\n")   # Newline character
🛑 Comments failed to post (9)
backend/analytics_server/mhq/utils/github.py (1)

58-66: ⚠️ Potential issue

Add error handling for empty integrations list.

The code accesses integrations[0] without checking if the list is empty, which could cause an IndexError if no GitHub integrations exist for the organization.

Apply this diff to add proper error handling:

 def get_custom_github_domain(org_id: str) -> Optional[str]:
     core_repo_service = CoreRepoService()
     integrations = core_repo_service.get_org_integrations_for_names(
         org_id, [UserIdentityProvider.GITHUB.value]
     )
 
+    if not integrations:
+        LOG.warn(
+            f"No GitHub integrations found for org {org_id}"
+        )
+        return None
+
     github_domain = (
         integrations[0].provider_meta.get("custom_domain")
         if integrations[0].provider_meta
         else None
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def get_custom_github_domain(org_id: str) -> Optional[str]:
    core_repo_service = CoreRepoService()
    integrations = core_repo_service.get_org_integrations_for_names(
        org_id, [UserIdentityProvider.GITHUB.value]
    )

    if not integrations:
        LOG.warn(
            f"No GitHub integrations found for org {org_id}"
        )
        return None

    github_domain = (
        integrations[0].provider_meta.get("custom_domain")
        if integrations[0].provider_meta
        else None
    )

    return github_domain
🤖 Prompt for AI Agents
In backend/analytics_server/mhq/utils/github.py between lines 58 and 66, the
code accesses the first element of the integrations list without verifying if
the list is empty, risking an IndexError. Add a check to ensure the integrations
list is not empty before accessing integrations[0], and handle the case
appropriately, such as setting github_domain to None or raising a descriptive
exception.
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (1)

4-8: ⚠️ Potential issue

Remove unused import PullRequestEventType

The import is not used anywhere in the file.

from mhq.store.models.code import (
    PullRequestState,
    PullRequestEventState,
-    PullRequestEventType,
)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from mhq.store.models.code import (
    PullRequestState,
    PullRequestEventState,
)
🧰 Tools
🪛 Ruff (0.11.9)

7-7: mhq.store.models.code.PullRequestEventType imported but unused

Remove unused import: mhq.store.models.code.PullRequestEventType

(F401)

🪛 GitHub Actions: Linting-Precommit

[error] 4-4: flake8 linting failed with exit code 1.


[error] 4-4: F401 'mhq.store.models.code.PullRequestEventType' imported but unused.


[error] 4-4: Process completed with exit code 1.

🤖 Prompt for AI Agents
In backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
between lines 4 and 8, the import PullRequestEventType is not used in the file.
Remove the unused import to clean up the code and avoid unnecessary
dependencies.
web-server/pages/api/internal/[org_id]/__tests__/github.test.ts (1)

36-48: ⚠️ Potential issue

Fix typo in error message

      expect(consoleSpy).toHaveBeenCalledWith(
-        'Error occured while getting custom domain from database:',
+        'Error occurred while getting custom domain from database:',
        expect.any(Error)
      );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    it('returns null on db error and logs error', async () => {
      const consoleSpy = jest.spyOn(console, 'error').mockImplementation();
      (db as jest.Mock).mockImplementation(() => {
        throw new Error('DB failure');
      });

      const domain = await githubUtils.getGitHubCustomDomain();
      expect(domain).toBeNull();
      expect(consoleSpy).toHaveBeenCalledWith(
        'Error occurred while getting custom domain from database:',
        expect.any(Error)
      );
    });
🤖 Prompt for AI Agents
In web-server/pages/api/internal/[org_id]/__tests__/github.test.ts around lines
36 to 48, the test expects a specific error message with a typo "Error occured"
which should be corrected to "Error occurred" to match proper spelling. Update
the console.error call in the test to reflect the correct spelling of "occurred"
in the error message string.
backend/analytics_server/mhq/service/incidents/models/adapter.py (1)

6-22: 🛠️ Refactor suggestion

⚠️ Potential issue

Add validation and error handling for robust incident adaptation.

The function lacks validation and error handling which could lead to runtime errors:

  1. No validation that the two PRs are related
  2. Direct int() casting on line 12 could fail if pr.number is None or non-numeric
  3. No null checks on PR attributes that might be optional
  4. Hardcoded status as RESOLVED - consider parameterizing or validating this assumption

Consider adding validation:

 def adaptIncidentPR(incident_pr: PullRequest, resolution_pr: PullRequest) -> Incident:
+    # Validate inputs
+    if not incident_pr or not resolution_pr:
+        raise ValueError("Both incident_pr and resolution_pr are required")
+    
+    # Ensure PR number can be converted to int
+    try:
+        incident_number = int(incident_pr.number)
+    except (TypeError, ValueError):
+        raise ValueError(f"Invalid PR number: {incident_pr.number}")
+    
     return Incident(
         id=incident_pr.id,
         provider=incident_pr.provider,
         key=str(incident_pr.id),
         title=incident_pr.title,
-        incident_number=int(incident_pr.number),
+        incident_number=incident_number,
         status=IncidentStatus.RESOLVED.value,
         creation_date=incident_pr.state_changed_at,
         acknowledged_date=resolution_pr.created_at,
         resolved_date=resolution_pr.state_changed_at,
         assigned_to=resolution_pr.author,
         assignees=[resolution_pr.author],
         url=incident_pr.url,
         meta={},
         incident_type=IncidentType.REVERT_PR,
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def adaptIncidentPR(incident_pr: PullRequest, resolution_pr: PullRequest) -> Incident:
    # Validate inputs
    if not incident_pr or not resolution_pr:
        raise ValueError("Both incident_pr and resolution_pr are required")

    # Ensure PR number can be converted to int
    try:
        incident_number = int(incident_pr.number)
    except (TypeError, ValueError):
        raise ValueError(f"Invalid PR number: {incident_pr.number}")

    return Incident(
        id=incident_pr.id,
        provider=incident_pr.provider,
        key=str(incident_pr.id),
        title=incident_pr.title,
        incident_number=incident_number,
        status=IncidentStatus.RESOLVED.value,
        creation_date=incident_pr.state_changed_at,
        acknowledged_date=resolution_pr.created_at,
        resolved_date=resolution_pr.state_changed_at,
        assigned_to=resolution_pr.author,
        assignees=[resolution_pr.author],
        url=incident_pr.url,
        meta={},
        incident_type=IncidentType.REVERT_PR,
    )
🤖 Prompt for AI Agents
In backend/analytics_server/mhq/service/incidents/models/adapter.py between
lines 6 and 22, the adaptIncidentPR function lacks validation and error
handling, which could cause runtime errors if incident_pr or resolution_pr
attributes are missing or invalid. Add validation to ensure the PRs are related
and check that incident_pr.number is not None and is numeric before casting to
int. Also, add null checks for optional attributes like state_changed_at,
created_at, and author to prevent potential exceptions. Consider parameterizing
the incident status instead of hardcoding it as RESOLVED to improve flexibility.
backend/analytics_server/mhq/store/models/code/filter.py (1)

89-95: 🛠️ Refactor suggestion

⚠️ Potential issue

Add validation for filter structure and field access security.

The dynamic attribute access using getattr with user-provided field names poses security and reliability risks:

  1. No validation that filter dictionaries contain required "field" and "value" keys
  2. Arbitrary field access could expose unintended attributes
  3. Invalid field names will cause AttributeError

Add validation and use an allowlist approach:

 def _incident_pr_filters_query():
     if not self.incident_pr_filters:
         return None
+    
+    # Define allowed fields for incident PR filtering
+    ALLOWED_FIELDS = {"title", "body", "base_branch", "head_branch"}
+    
+    filters = []
+    for filter_dict in self.incident_pr_filters:
+        # Validate filter structure
+        if not isinstance(filter_dict, dict) or "field" not in filter_dict or "value" not in filter_dict:
+            raise ValueError("Each filter must be a dict with 'field' and 'value' keys")
+        
+        field = filter_dict["field"]
+        if field not in ALLOWED_FIELDS:
+            raise ValueError(f"Field '{field}' is not allowed for filtering")
+        
+        # Validate the field exists on PullRequest model
+        if not hasattr(PullRequest, field):
+            raise ValueError(f"Field '{field}' does not exist on PullRequest model")
+        
+        filters.append(getattr(PullRequest, field).op("~")(filter_dict["value"]))
+    
-    return or_(
-        getattr(PullRequest, filter["field"]).op("~")(filter["value"])
-        for filter in self.incident_pr_filters
-    )
+    return or_(*filters) if filters else None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

        def _incident_pr_filters_query():
            if not self.incident_pr_filters:
                return None

            # Define allowed fields for incident PR filtering
            ALLOWED_FIELDS = {"title", "body", "base_branch", "head_branch"}

            filters = []
            for filter_dict in self.incident_pr_filters:
                # Validate filter structure
                if not isinstance(filter_dict, dict) or "field" not in filter_dict or "value" not in filter_dict:
                    raise ValueError("Each filter must be a dict with 'field' and 'value' keys")

                field = filter_dict["field"]
                if field not in ALLOWED_FIELDS:
                    raise ValueError(f"Field '{field}' is not allowed for filtering")

                # Validate the field exists on PullRequest model
                if not hasattr(PullRequest, field):
                    raise ValueError(f"Field '{field}' does not exist on PullRequest model")

                filters.append(getattr(PullRequest, field).op("~")(filter_dict["value"]))

            return or_(*filters) if filters else None
🤖 Prompt for AI Agents
In backend/analytics_server/mhq/store/models/code/filter.py around lines 89 to
95, the function _incident_pr_filters_query dynamically accesses attributes of
PullRequest using user-provided filter data without validation, risking errors
and security issues. Validate that each filter dictionary contains "field" and
"value" keys, and restrict attribute access to a predefined allowlist of safe
fields. Implement these checks before constructing the or_() expression to
ensure robustness and security.
web-server/src/content/Dashboards/ConfigureGithubModalBody.tsx (1)

122-128: ⚠️ Potential issue

Fix the incorrect element ID reference.

The function references 'gitlab-custom-domain' but this is a GitHub configuration component. The ID should be 'github-custom-domain' to match the actual element ID defined at line 183.

Apply this fix:

-      document.getElementById('gitlab-custom-domain')?.focus();
+      document.getElementById('github-custom-domain')?.focus();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

  const isDomainInputFocus = useBoolState(false);

  const focusDomainInput = useCallback(() => {
    if (!customDomain.value)
      document.getElementById('github-custom-domain')?.focus();
    else handleSubmission();
  }, [customDomain.value, handleSubmission]);
🤖 Prompt for AI Agents
In web-server/src/content/Dashboards/ConfigureGithubModalBody.tsx around lines
122 to 128, the focusDomainInput function references an incorrect element ID
'gitlab-custom-domain' which is inconsistent with the GitHub context. Change the
ID in document.getElementById to 'github-custom-domain' to correctly target the
input element for GitHub custom domain configuration.
backend/analytics_server/mhq/service/incidents/incidents.py (3)

142-144: ⚠️ Potential issue

Add error handling for missing PR attributes

The getattr call could raise AttributeError if the field doesn't exist on the PR object.

                incident_pr_number = self._extract_pr_number_from_regex(
-                    getattr(pr, filter["field"]), filter["value"]
+                    getattr(pr, filter["field"], ""), filter["value"]
                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

                incident_pr_number = self._extract_pr_number_from_regex(
                    getattr(pr, filter["field"], ""), filter["value"]
                )
🤖 Prompt for AI Agents
In backend/analytics_server/mhq/service/incidents/incidents.py between lines 142
and 144, the code uses getattr to access a PR attribute but does not handle the
case where the attribute might be missing, which could raise an AttributeError.
Add a try-except block around the getattr call to catch AttributeError and
handle it appropriately, such as logging a warning or setting a default value,
to prevent runtime errors.

141-141: 🛠️ Refactor suggestion

Avoid shadowing built-in filter function

Using filter as a variable name shadows Python's built-in function, which could lead to confusion.

-            for filter in incident_prs_setting.filters:
+            for pr_filter_config in incident_prs_setting.filters:
                incident_pr_number = self._extract_pr_number_from_regex(
-                    getattr(pr, filter["field"]), filter["value"]
+                    getattr(pr, pr_filter_config["field"]), pr_filter_config["value"]
                )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

            for pr_filter_config in incident_prs_setting.filters:
                incident_pr_number = self._extract_pr_number_from_regex(
                    getattr(pr, pr_filter_config["field"]), pr_filter_config["value"]
                )
🤖 Prompt for AI Agents
In backend/analytics_server/mhq/service/incidents/incidents.py at line 141, the
variable name `filter` shadows Python's built-in `filter` function, which can
cause confusion or errors. Rename the variable to a more descriptive name, such
as `incident_filter` or `current_filter`, to avoid shadowing the built-in
function.

93-169: ⚠️ Potential issue

Fix potential time boundary issue in resolution PRs interval

The resolution PRs interval uses time_now() as the upper bound, which may include PRs merged after the requested interval's end time. This could lead to incorrect incident associations.

-        resolution_prs_interval = Interval(
-            from_time=interval.from_time, to_time=time_now()
-        )
+        resolution_prs_interval = Interval(
+            from_time=interval.from_time, to_time=interval.to_time
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    def get_team_pr_incidents(
        self, team_id: str, interval: Interval, pr_filter: PRFilter
    ) -> List[Incident]:

        settings = self._settings_service.get_settings(
            setting_type=SettingType.INCIDENT_PRS_SETTING,
            entity_type=EntityType.TEAM,
            entity_id=team_id,
        )

        if not settings:
            return []

        incident_prs_setting: IncidentPRsSetting = settings.specific_settings

        if not incident_prs_setting.filters:
            return []

        team_repo_ids = list(
            tr.org_repo_id
            for tr in self._code_repo_service.get_active_team_repos_by_team_id(team_id)
        )
-        resolution_prs_interval = Interval(
-            from_time=interval.from_time, to_time=time_now()
-        )
+        resolution_prs_interval = Interval(
+            from_time=interval.from_time, to_time=interval.to_time
+        )
        resolution_prs_filter: PRFilter = apply_pr_filter(
            pr_filter=asdict(pr_filter),
            entity_type=EntityType.TEAM,
            entity_id=team_id,
            setting_types=[
                SettingType.EXCLUDED_PRS_SETTING,
                SettingType.INCIDENT_PRS_SETTING,
            ],
        )

        resolution_prs = self._code_repo_service.get_prs_merged_in_interval(
            repo_ids=team_repo_ids,
            interval=resolution_prs_interval,
            pr_filter=resolution_prs_filter,
        )
        ...
🤖 Prompt for AI Agents
In backend/analytics_server/mhq/service/incidents/incidents.py between lines 93
and 169, the resolution_prs_interval is created with from_time as
interval.from_time and to_time as time_now(), which may include PRs merged after
the original interval's end time. To fix this, replace time_now() with
interval.to_time to ensure the interval accurately reflects the requested time
window and prevents including PRs merged outside the specified interval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant