Skip to content

Fix condvar.wait(distant future) return immediately on OSX #42604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 17, 2017

Conversation

stepancheg
Copy link
Contributor

Fixes issue #37440: pthread_cond_timedwait on macOS Sierra seems
to overflow ts_sec parameter and returns immediately. To work
around this problem patch rounds timeout down to year 3000.

Patch also fixes overflow when converting u64 to time_t.

@nagisa
Copy link
Member

nagisa commented Jun 12, 2017

Is year 3000 special in that sense that this is a maximum OS X will support?

@stepancheg
Copy link
Contributor Author

@nagisa OSX pthread_cond_timedwait breaks near year 3170843 on my notebook. But I'm not sure about behavior on other versions of OSX or iPhone, so I decided it is safer to round down to year 3000. I hope in a couple of centuries Apple will fix this issue, and this rounding down could be removed from rust std source.

@stepancheg
Copy link
Contributor Author

r? @alexcrichton

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an evidence of where the overflow is? The apparent source code for pthread_cond_timedwait shells out to _pthread_cond_wait which doesn't look like it has any overflows?

Using dtruss indicates that psynch_cvwait is failing with error 316. Headers don't seem to indicate what that is and related StackOverflow questions seem to also be similarly confused.

It seems that we don't know why this is failing and we're just blindly applying a workaround?

let (g, timeout_res) = c.wait_timeout(g, Duration::from_millis(u32::MAX as u64)).unwrap();
let (g, timeout_res) = c.wait_timeout(g, Duration::from_millis(u64::MAX)).unwrap();
// spurious wakeups mean this isn't necessarily true
// assert!(notified.load(Ordering::SeqCst));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can't have this assert, why have the Arc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So next person going to patch this part of the library could execute test with this line uncommented. I believe that prevous comment "spurious wakeups mean this isn't necessarily true" several lines above kept in the source for the same reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed in that case? It's surely easier to test this outside the standard library than inside, so this seems like it's just an extra diff for not a lot of value.

@stepancheg
Copy link
Contributor Author

Do you have an evidence of where the overflow is? The apparent source code for pthread_cond_timedwait shells out to _pthread_cond_wait which doesn't look like it has any overflows?

I believe, overflow is somewhere inside kernel. I couldn't found sources of latest kernel, but there's a source of 10.7.5.

It has code like this:

nanoseconds_to_absolutetime((uint64_t)ts.tv_sec * NSEC_PER_SEC + ts.tv_nsec,  &abstime );

which does not explain this particular issue, but shows that authors of xnu don't care much about integer overflows.

Using dtruss indicates that psynch_cvwait is failing with error 316.

Or maybe it is not overflow, but an indication of parameter error. Whatever.

It seems that we don't know why this is failing and we're just blindly applying a workaround?

That's fine with me.

BTW, now I think that it would probably be slightly better to wait with explicitly infinite timeout if duration is more than 1000 years, instead of rounding down to year 3000, but this version is OK too.

@stepancheg
Copy link
Contributor Author

BTW, now I think that it would probably be slightly better to wait with explicitly infinite timeout if duration is more than 1000 years, instead of rounding down to year 3000, but this version is OK too.

No, rounding down is legal, because cond.wait is allowed to wake up spuriously.

@stepancheg
Copy link
Contributor Author

Going to update patch now to round duration down to 1000 years instead of rounding down deadline to year 3000.

@stepancheg
Copy link
Contributor Author

Done.

@arielb1 arielb1 added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 13, 2017
@@ -116,7 +173,7 @@ impl Condvar {
(sys_now.tv_usec * 1000) as libc::c_long;
let extra = (nsec / 1_000_000_000) as libc::time_t;
let nsec = nsec % 1_000_000_000;
let seconds = dur.as_secs() as libc::time_t;
let seconds = wrapping_cast_to_time_t(dur.as_secs());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should no longer be necessary due to the clamp above, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if time_t is more than 32 bit (as in all modern OSes), this cast is not needed.

However, I'd keep this cast, because

  • it is not performance critical
  • makes code easier to read (reader does not need to check that there's no overflow)
  • makes OSX/Linux versions more like to each other

// println!("$");
// }
// }
// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty long comment to work with, could the program be extracted to something like a gist? Could the example program be C and not Rust?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten in C and moved to gist.

let max_dur = Duration::from_secs(1000 * 365 * 86400);

if dur > max_dur {
// Apparently `tv_sec` overflows somewhere around
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% sure that this is an overflow somewhere? The fact that the kernel returns an error seems to indicate that this comment is not correct. I think we can basically just say that OSX is buggy with super long durations, so we clamp which is also allowable per the API of wait_timeout because of spurious wakeups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comment to use diffent words.

let (g, timeout_res) = c.wait_timeout(g, Duration::from_millis(u32::MAX as u64)).unwrap();
let (g, timeout_res) = c.wait_timeout(g, Duration::from_millis(u64::MAX)).unwrap();
// spurious wakeups mean this isn't necessarily true
// assert!(notified.load(Ordering::SeqCst));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed in that case? It's surely easier to test this outside the standard library than inside, so this seems like it's just an extra diff for not a lot of value.

Fixes issue rust-lang#37440: `pthread_cond_timedwait` on macOS Sierra seems
to overflow `ts_sec` parameter and returns immediately. To work
around this problem patch rounds timeout down to approximately 1000
years.

Patch also fixes overflow when converting `u64` to `time_t`.
@stepancheg
Copy link
Contributor Author

Can this be removed in that case? It's surely easier to test this outside the standard library than inside, so this seems like it's just an extra diff for not a lot of value.

I've rewritten tests with loop until wake up is no spurious.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Collaborator

bors commented Jun 16, 2017

📌 Commit 0c26b59 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Jun 17, 2017

⌛ Testing commit 0c26b59 with merge 1169a1f...

bors added a commit that referenced this pull request Jun 17, 2017
Fix condvar.wait(distant future) return immediately on OSX

Fixes issue #37440: `pthread_cond_timedwait` on macOS Sierra seems
to overflow `ts_sec` parameter and returns immediately. To work
around this problem patch rounds timeout down to year 3000.

Patch also fixes overflow when converting `u64` to `time_t`.
@bors
Copy link
Collaborator

bors commented Jun 17, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 1169a1f to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants