Skip to content

compiletest: Fix flaky Android gdb test runs #38883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 8, 2017

Conversation

alexcrichton
Copy link
Member

Local testing showed that I was able to reproduce an error where debuginfo tests
on Android would fail with "connection reset by peer". Further investigation
turned out that the gdb tests are android with bit of process management:

  • First an adb forward command is run to ensure that the host's port 5039 is
    the same as the emulator's.
  • Next an adb shell command is run to execute the gdbserver executable
    inside the emulator. This gdb server will attach to port 5039 and listen for
    remote gdb debugging sessions.
  • Finally, we run gdb on the host (not in the emulator) and then connect to
    this gdb server to send it commands.

The problem was happening when the host's gdb was failing to connect to the
remote gdbserver running inside the emulator. The previous test for this was
that after adb shell executed we'd sleep for a second and then attempt to make
a TCP connection to port 5039. If successful we'd run gdb and on failure we'd
sleep again.

It turns out, however, that as soon as we've executed adb forward all TCP
connections to 5039 will succeed. This means that we would only ever sleep for
at most one second, and if this wasn't enough time we'd just fail later because
we would assume that gdbserver had started but it may not have done so yet.

This commit fixes these issues by removing the TCP connection to test if
gdbserver is ready to go. Instead we read the stdout of the process and wait for
it to print that it's listening at which point we start running gdb. I've found
that locally at least I was unable to reproduce the failure after these changes.

Closes #38710

Local testing showed that I was able to reproduce an error where debuginfo tests
on Android would fail with "connection reset by peer". Further investigation
turned out that the gdb tests are android with bit of process management:

* First an `adb forward` command is run to ensure that the host's port 5039 is
  the same as the emulator's.
* Next an `adb shell` command is run to execute the `gdbserver` executable
  inside the emulator. This gdb server will attach to port 5039 and listen for
  remote gdb debugging sessions.
* Finally, we run `gdb` on the host (not in the emulator) and then connect to
  this gdb server to send it commands.

The problem was happening when the host's gdb was failing to connect to the
remote gdbserver running inside the emulator. The previous test for this was
that after `adb shell` executed we'd sleep for a second and then attempt to make
a TCP connection to port 5039. If successful we'd run gdb and on failure we'd
sleep again.

It turns out, however, that as soon as we've executed `adb forward` all TCP
connections to 5039 will succeed. This means that we would only ever sleep for
at most one second, and if this wasn't enough time we'd just fail later because
we would assume that gdbserver had started but it may not have done so yet.

This commit fixes these issues by removing the TCP connection to test if
gdbserver is ready to go. Instead we read the stdout of the process and wait for
it to print that it's listening at which point we start running gdb. I've found
that locally at least I was unable to reproduce the failure after these changes.

Closes rust-lang#38710
@rust-highfive
Copy link
Contributor

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@brson
Copy link
Contributor

brson commented Jan 6, 2017

@bors r+ p=1

@bors
Copy link
Collaborator

bors commented Jan 6, 2017

📌 Commit 9ced901 has been approved by brson

@bors
Copy link
Collaborator

bors commented Jan 7, 2017

⌛ Testing commit 9ced901 with merge 4ecca05...

@bors
Copy link
Collaborator

bors commented Jan 7, 2017

💔 Test failed - status-travis

@alexcrichton
Copy link
Member Author

alexcrichton commented Jan 7, 2017 via email

@bors
Copy link
Collaborator

bors commented Jan 7, 2017

⌛ Testing commit 9ced901 with merge af674f2...

@bors
Copy link
Collaborator

bors commented Jan 7, 2017

💔 Test failed - status-appveyor

@alexcrichton
Copy link
Member Author

alexcrichton commented Jan 8, 2017 via email

@bors
Copy link
Collaborator

bors commented Jan 8, 2017

⌛ Testing commit 9ced901 with merge 5219dad...

bors added a commit that referenced this pull request Jan 8, 2017
compiletest: Fix flaky Android gdb test runs

Local testing showed that I was able to reproduce an error where debuginfo tests
on Android would fail with "connection reset by peer". Further investigation
turned out that the gdb tests are android with bit of process management:

* First an `adb forward` command is run to ensure that the host's port 5039 is
  the same as the emulator's.
* Next an `adb shell` command is run to execute the `gdbserver` executable
  inside the emulator. This gdb server will attach to port 5039 and listen for
  remote gdb debugging sessions.
* Finally, we run `gdb` on the host (not in the emulator) and then connect to
  this gdb server to send it commands.

The problem was happening when the host's gdb was failing to connect to the
remote gdbserver running inside the emulator. The previous test for this was
that after `adb shell` executed we'd sleep for a second and then attempt to make
a TCP connection to port 5039. If successful we'd run gdb and on failure we'd
sleep again.

It turns out, however, that as soon as we've executed `adb forward` all TCP
connections to 5039 will succeed. This means that we would only ever sleep for
at most one second, and if this wasn't enough time we'd just fail later because
we would assume that gdbserver had started but it may not have done so yet.

This commit fixes these issues by removing the TCP connection to test if
gdbserver is ready to go. Instead we read the stdout of the process and wait for
it to print that it's listening at which point we start running gdb. I've found
that locally at least I was unable to reproduce the failure after these changes.

Closes #38710
@bors
Copy link
Collaborator

bors commented Jan 8, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: brson
Pushing 5219dad to master...

@bors bors merged commit 9ced901 into rust-lang:master Jan 8, 2017
@alexcrichton alexcrichton deleted the android-flaky branch January 13, 2017 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious Android testing failure: connection reset by peer
4 participants