Skip to content

TSCBasic: change the behaviour of exec on Windows #283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 24, 2022

Conversation

compnerd
Copy link
Member

This simultaneously makes exec less like exec and more like exec
on POSIX platforms. Replace the use of the ucrt _execve in favour of
spelling out the implementation inline with alterations. There are
multiple reasons that this needs to be done.

The concept of exec is impossible to map to the process management
structure on Windows (just as fork is). Fundamentally, exec is a
replacement of the process image which will retain the process id and
parts of the libc state (e.g. non-cloexec fds in their current
state). However, the process model on Windows does not have the ability
to do such an operation. Each process is immutable.

_execve is a wrapper for _spawnl with _P_OVERLAY - it simply will
create a new process and terminate the existing one. This implicitly
breaks the façade - the PID is not inherited - GetCurrentProcessId()
would return a different value (which would require to be passed from
the parent to the child as the parent state will be demolished and there
is no lineage that is preserved). Additionally, the new process will
only inherit HANDLEs which have marked bInheritable as TRUE at
construction time via CreateFileW.

More importantly, when the _execve is used, it firstly inherit the
ASCII traits which will further limit the use of this already
less-than-useful portability utility. It will limit the file paths even
more than the unicode variant, which is already limited by the Win32
subsystem and requires explicit escape via use of NT style paths to
work around the Win32 path limitations. Secondly, and more user
visible, is the fact that the implementation does not properly hand off
the console. The new process is launched in the background and the
current process is unceremoniously terminated, restoring control to the
command interpreter (cmd.com). The order in which this occurs is
unspecified and uncontrollable (i.e. the new process may start before or
after the termination). More problematically, this results in two
processes with access to the console stdin/stdout/stderr handles, which
now creates a problem of who acquires the input. Most often, this is
manifested as read by the command interpreter rather than the
application, followed by the application rather than the command
interpreter.

We have effectively re-implemented _execve in place here, with a few
exceptions:

  • we do not explicitly enumerate the inheritable handles and pass them
    via an undocumented handoff to ucrt (if for no other reason than we do
    not have a good solution to accessing the FD table)
  • we do not explicitly create the environment block, the normal process
    inheritance rules apply to the environment.
  • we do not pass in the first parameter to CreateProcessW, which would
    influence how the process is created (requires that the program suffix
    is a well-known suffix - .exe, .com, etc). Note that this will
    prevent the execution of a batch file as that requires that
    lpApplicationName is explicitly set to cmd.exe and that the first
    parameter of the argument string is /c rather than the executable
    path.
  • we use the unicode variant of the operations to allow us to access the
    filesystem properly
  • we create a job object to monitor the subsequent process hierarchy
    with a silent breakaway, kill-on-close Job Object to ensure that the
    subprocesses of the "exec"-ed image are treated as part of the same
    process tree
  • we now reliably create the process (suspended) and assign it to the
    job object, and then wait for the process termination before the
    process exit, preventing the problem of the interrupted execution.

While this has limitations in the precise emulation of exec as defined
by POSIX, it is sufficient to allow execution of subprocesses as
desired. This has user-visible differences, e.g. PID and file
descriptor states are lost. It has been opined by many others that
fork and exec are a mistake, and it may be a better approach to
replace the exec call with a invoke_tool operation which more
precisely matches the usecase and retains the behavioural differences
from exec.

Resolves SR-13806!

This simultaneously makes `exec` less like `exec` and more like `exec`
on POSIX platforms.  Replace the use of the ucrt `_execve` in favour of
spelling out the implementation inline with alterations.  There are
multiple reasons that this needs to be done.

The concept of `exec` is impossible to map to the process management
structure on Windows (just as `fork` is).  Fundamentally, `exec` is a
replacement of the process image which will retain the process id and
parts of the libc state (e.g.  non-`cloexec` fds in their current
state).  However, the process model on Windows does not have the ability
to do such an operation.  Each process is immutable.

`_execve` is a wrapper for `_spawnl` with `_P_OVERLAY` - it simply will
create a new process and terminate the existing one.  This implicitly
breaks the façade - the PID is not inherited - `GetCurrentProcessId()`
would return a different value (which would require to be passed from
the parent to the child as the parent state will be demolished and there
is no lineage that is preserved).  Additionally, the new process will
only inherit `HANDLE`s which have marked `bInheritable` as `TRUE` at
construction time via `CreateFileW`.

More importantly, when the `_execve` is used, it firstly inherit the
ASCII traits which will further limit the use of this already
less-than-useful portability utility.  It will limit the file paths even
more than the unicode variant, which is already limited by the Win32
subsystem and requires explicit escape via use of NT style paths to
work around the Win32 path limitations.  Secondly, and more user
visible, is the fact that the implementation does not properly hand off
the console.  The new process is launched in the background and the
current process is unceremoniously terminated, restoring control to the
command interpreter (cmd.com).  The order in which this occurs is
unspecified and uncontrollable (i.e. the new process may start before or
after the termination).  More problematically, this results in two
processes with access to the console stdin/stdout/stderr handles, which
now creates a problem of who acquires the input.  Most often, this is
manifested as read by the command interpreter rather than the
application, followed by the application rather than the command
interpreter.

We have effectively re-implemented `_execve` in place here, with a few
exceptions:
- we do not explicitly enumerate the inheritable handles and pass them
  via an undocumented handoff to ucrt (if for no other reason than we do
  not have a good solution to accessing the FD table)
- we do not explicitly create the environment block, the normal process
  inheritance rules apply to the environment.
- we do not pass in the first parameter to `CreateProcessW`, which would
  influence how the process is created (requires that the program suffix
  is a well-known suffix - .exe, .com, etc).  Note that this will
  prevent the execution of a batch file as that requires that
  `lpApplicationName` is explicitly set to `cmd.exe` and that the first
  parameter of the argument string is `/c` rather than the executable
  path.
- we use the unicode variant of the operations to allow us to access the
  filesystem properly
- we create a job object to monitor the subsequent process hierarchy
  with a silent breakaway, kill-on-close Job Object to ensure that the
  subprocesses of the "exec"-ed image are treated as part of the same
  process tree
- we now reliably create the process (suspended) and assign it to the
  job object, and then wait for the process termination before the
  process exit, preventing the problem of the interrupted execution.

While this has limitations in the precise emulation of `exec` as defined
by POSIX, it is sufficient to allow execution of subprocesses as
desired.  This has user-visible differences, e.g. PID and file
descriptor states are lost.  It has been opined by many others that
`fork` and `exec` are a mistake, and it may be a better approach to
replace the `exec` call with a `invoke_tool` operation which more
precisely matches the usecase and retains the behavioural differences
from `exec`.

Resolves SR-13806!
@compnerd
Copy link
Member Author

@swift-ci please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants