Skip to content

Fix: handle subprocess failing after startup #134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

winterfx
Copy link
Contributor

@winterfx winterfx commented Apr 10, 2025

this fixes an issue where the child process is created successfully, but immediately crashes.The client did not detect the failure.

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of subprocess startup failures, ensuring errors are properly reported if the subprocess exits early or fails to start.
  • Tests
    • Added a new test to verify correct error handling when a subprocess exits immediately on startup.
  • Chores
    • Enhanced test utilities to simulate immediate subprocess failure for more robust testing.

- Fix resource cleanup in error cases
- Improve error handling in waitUntilReadyOrExit
- Add proper pipe cleanup on command start failure
Copy link
Contributor

coderabbitai bot commented Apr 10, 2025

Walkthrough

The changes introduce enhanced error handling for subprocess exits in the Stdio transport layer, including tracking exit errors and refining readiness checks. Tests are updated to simulate and verify immediate subprocess exit scenarios. The mock server is modified to support environment-driven early termination for more robust testing of startup failure conditions.

Changes

Files/Paths Change Summary
client/transport/stdio.go Added atomic exit error tracking to Stdio struct, introduced helpers for safe channel closing and readiness/exit waiting, updated Start and Close methods to use new logic, and applied minor structural cleanups.
client/transport/stdio_test.go Added a subtest to simulate and assert behavior when the subprocess exits immediately upon start, using a new environment variable.
testdata/mockstdio_server.go Modified main function to exit early with an error if a specific environment variable is set, enabling simulation of startup failures.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 87dcc19 and 6f28020.

📒 Files selected for processing (1)
  • client/transport/stdio_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • client/transport/stdio_test.go
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@winterfx winterfx requested a review from cherrot April 14, 2025 08:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
client/transport/stdio.go (3)

18-21: Timeout constants are baked‑in – consider configurability

readyTimeout (5 s) and readyCheckTimeout (1 s) are reasonable defaults but
may be too aggressive/lenient for slower environments (CI, ARM boards, etc.).
Exposing them as optional parameters of NewStdio (or via functional options)
would let callers tune startup behaviour without touching the code.


41-41: Channel name could better convey intent

The field processExitErr is a channel, not the error itself. Renaming it to
processExitCh (or similar) clarifies its signalling nature and avoids mental
overhead when reading processExitErr <- cmd.Wait().

Also applies to: 58-61


98-101: Close the sender goroutine on context cancellation

If the parent ctx passed to Start() is cancelled, cmd.Wait() will return
but the goroutine remains blocked sending on processExitErr if the buffer is
full. Adding a select with a parent‑context <-ctx.Done() avoids a
potential goroutine leak:

-go func() {
-    c.processExitErr <- cmd.Wait()
-}()
+go func() {
+    err := cmd.Wait()
+    select {
+    case c.processExitErr <- err:
+    case <-ctx.Done():
+    }
+}()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89ea80c and fa5863c.

📒 Files selected for processing (1)
  • client/transport/stdio.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
client/transport/stdio.go (2)
client/transport/interface.go (1)
  • JSONRPCResponse (36-45)
testdata/mockstdio_server.go (1)
  • JSONRPCResponse (18-26)

@winterfx
Copy link
Contributor Author

Hi @ezynda3, could you please help me review the PR? I look forward to your reply. Thanks.

@ezynda3
Copy link
Contributor

ezynda3 commented May 1, 2025

Hi @ezynda3, could you please help me review the PR? I look forward to your reply. Thanks.

Looks like there are some merge conflicts. Can you have a look?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
client/transport/stdio_test.go (1)

428-428: Uncomment or remove the commented defer statement.

The defer statement for cleaning up the temporary file is commented out, which could leave temporary files behind after test execution.

-//defer os.Remove(mockServerPath)
+defer os.Remove(mockServerPath)
client/transport/stdio.go (1)

178-180: Misleading comment about multiple notification handlers.

The comment suggests multiple handlers can be registered, but the implementation in SetNotificationHandler only supports a single handler (it overwrites any previous handler).

-// OnNotification registers a handler function to be called when notifications are received.
-// Multiple handlers can be registered and will be called in the order they were added.
+// SetNotificationHandler registers a handler function to be called when notifications are received.
+// Only one handler can be active at a time; setting a new handler replaces any previous handler.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cddc723 and af8bec5.

📒 Files selected for processing (6)
  • client/stdio_test.go (1 hunks)
  • client/transport/sse_test.go (0 hunks)
  • client/transport/stdio.go (4 hunks)
  • client/transport/stdio_test.go (4 hunks)
  • server/sse.go (0 hunks)
  • testdata/mockstdio_server.go (1 hunks)
💤 Files with no reviewable changes (2)
  • client/transport/sse_test.go
  • server/sse.go
✅ Files skipped from review due to trivial changes (1)
  • client/stdio_test.go
🧰 Additional context used
🧬 Code Graph Analysis (1)
client/transport/stdio_test.go (1)
client/transport/stdio.go (1)
  • NewStdio (63-80)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: test
🔇 Additional comments (9)
testdata/mockstdio_server.go (1)

29-32: Good implementation of the immediate failure simulation.

This addition enables testing scenarios where a child process starts but immediately fails, which is exactly what the PR aims to fix. The error message is clear and the exit code (1) appropriately signals a failure.

client/transport/stdio_test.go (1)

410-439: Effective test for immediate subprocess failure.

This test case directly addresses the PR objective by testing what happens when a subprocess starts successfully but exits immediately before normal operation begins. The test properly sets up the test environment with the MOCK_FAIL_IMMEDIATELY=1 flag and correctly verifies that an error is returned from Start().

client/transport/stdio.go (7)

19-22: Good extraction of timeout constants.

Extracting these timeout values as constants improves readability and makes them easier to modify if needed.


42-43: Effective solution for tracking subprocess exit.

Adding these fields is a good approach to track subprocess exit and propagate errors without risk of deadlocks. The processExited channel signals subprocess termination, while the atomic exitErr safely stores the error value.


74-77: Proper initialization of the new channel.

Initializing processExited in the constructor ensures it's always available for signaling process termination.


124-130: Well-implemented subprocess exit monitoring.

This goroutine effectively addresses the PR objective by monitoring the subprocess for early termination. It safely stores any error in the atomic value and signals termination through channel closure, allowing other parts of the code to react to subprocess failure.


132-141: Improved readiness signaling with error detection.

This change correctly separates process startup from readiness signaling, allowing for proper error detection when a process exits early. The approach ensures that early subprocess failures are properly detected and reported.


145-159: Good implementation of readiness and early-exit detection.

This function effectively handles three scenarios:

  1. Process exits before signaling readiness
  2. Process signals readiness but exits immediately after
  3. Process fails to signal readiness within timeout

The short 1-second check after readiness signal is particularly important for detecting failures that happen right after startup.


171-174: Fixed potential deadlock in Close method.

This change resolves the deadlock issue mentioned in the previous review by waiting on channel closure rather than directly calling cmd.Wait(). It also properly propagates any exit error that was stored in the atomic value.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a5c8b3 and c65164d.

📒 Files selected for processing (1)
  • client/transport/stdio.go (4 hunks)
🔇 Additional comments (9)
client/transport/stdio.go (9)

7-8: Good additions to the import list.

Adding the necessary imports for error handling, atomic operations, and time manipulation is essential for the new functionality. These imports properly support the changes to handle subprocess failures.

Also applies to: 13-14


19-22: Good use of constants for timeout values.

Defining timeout constants at package level improves readability and maintainability. These constants support the new readiness and exit detection mechanism.


42-43: Good addition of atomic field for error storage.

The exitErr field using atomic.Value is a thread-safe way to store and retrieve subprocess exit errors across goroutines.


85-91: Good implementation of the readiness check.

The updated Start method properly separates concerns by using a goroutine to read responses and a dedicated function to wait for readiness or detect early process exit.


130-136: Great improvement to handle subprocess failures.

This goroutine properly waits for the subprocess to exit, stores any error using atomic operations, and signals completion by closing the done channel. This directly addresses the PR's goal of handling subprocess failures after startup.


140-147: Good implementation of safe channel closure.

The tryCloseDone function prevents panics by safely closing the channel only if it hasn't been closed yet. This is important when multiple goroutines might attempt to close the same channel.


148-162: Good implementation of timeout handling.

The waitUntilReadyOrExit function effectively handles multiple scenarios:

  1. Process exits before signaling readiness
  2. Process signals readiness but exits immediately after
  3. Process signals readiness and continues running
  4. Process doesn't signal readiness within the timeout period

This comprehensive approach ensures proper error handling in all cases.


167-169: Good update to Close method.

Using tryCloseDone in the Close method ensures proper cleanup even if the channel was already closed by the exit goroutine.


176-178: Great improvement to error handling.

Checking the stored exit error using atomic operations is thread-safe and properly propagates subprocess errors to the caller. This fixes the issue where subprocess failures weren't being detected.

@winterfx
Copy link
Contributor Author

winterfx commented May 6, 2025

Hi @ezynda3 I have resolved the conflict and added a test case.

@pottekkat pottekkat added type: bug Something isn't working as expected area: sdk SDK improvements unrelated to MCP specification labels May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: sdk SDK improvements unrelated to MCP specification type: bug Something isn't working as expected
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants