Skip to content

Verify the ExecSession pid before killing it. #25906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 8, 2025

Conversation

jankaluza
Copy link
Member

When container is being removed, podman iterates
through its exec sessions and checks whether exec
session pid is still alive.

The problem is that the pid can be reused for other processes, so that it may not belong to exec session.
In this scenario podman may kill another process

This commit prevents it by doing following changes:

  • Adds the PIDData string to ExecSession struct. This string is used to store additional context for a PID to later verify that the PID killed by the podman is really the one started by it.
  • Adds Container.generatePIDData(pid) which generates the PIDData string using the unix.NameToHandleAt if available. If not available, it fallbacks to start-time read from /proc/pid/stat file.
  • Adds Container.isSessionAlive(pid, pidData) to use this additional PIDData context to verify whether the pid is created by podman or not.

Fixes: #25104

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none labels Apr 17, 2025
@jankaluza jankaluza force-pushed the 25104-pidfs branch 2 times, most recently from cd78208 to 2ad99b0 Compare April 17, 2025 09:19
@@ -295,6 +299,10 @@ func (c *Container) ExecStart(sessionID string) error {

// Update and save session to reflect PID/running
session.PID = pid
session.PIDData, err = c.generatePIDData(pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit paranoid that if the process is already finished at this point, and someone else gets this PID number, the PIDData will contain data from someone else's process. I'd say the chances would be small, and maybe there's no need to address this.

Copy link
Member Author

@jankaluza jankaluza Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen in case the process exits right after we start it and the pid is also recycled within the same time. It is quite unlikely, but in theory it can for sure happen.

We could probably fix that by generating the pidhandle in readConmonPipeData while the pipe is still open (if that's event possible, not sure if it's not closed right after we receive the data). But not sure if that's worth it.

@jankaluza jankaluza force-pushed the 25104-pidfs branch 3 times, most recently from a32af9c to 1127797 Compare April 22, 2025 10:54
@jankaluza
Copy link
Member Author

I'm learning how to do some code for different systems. I will tell you here once the PR is ready to review again :-).

@jankaluza jankaluza force-pushed the 25104-pidfs branch 2 times, most recently from 12f2835 to 067f849 Compare April 22, 2025 11:10
Copy link

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@jankaluza jankaluza force-pushed the 25104-pidfs branch 2 times, most recently from 4caf6df to b768015 Compare April 22, 2025 12:52
@jankaluza
Copy link
Member Author

So what's left here is to deduplicate the pidhandle_linux.go and pidhandle.go, fix the unit-tests and check why the sys tests failed.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some implementation detail comments, not a proper review

@jankaluza jankaluza force-pushed the 25104-pidfs branch 5 times, most recently from 51b3f52 to d24fda0 Compare April 23, 2025 11:57
@jankaluza
Copy link
Member Author

I still have to rewrite the unit-tests to match the new code and I have a feeling it sometimes fails with "Invalid argument" for podman exec, so I will also try to debug this more. I hope I will find out the reason while rewriting the tests, let's see :-).

@jankaluza jankaluza force-pushed the 25104-pidfs branch 3 times, most recently from 95b3d5b to fb0d2c2 Compare April 24, 2025 09:39
@jankaluza jankaluza marked this pull request as ready for review April 24, 2025 10:22
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 24, 2025
@jankaluza
Copy link
Member Author

As a follow-up, it would be nice to also move waitPidStop to PIDHandle and used pidfs there.

@@ -48,7 +48,8 @@ function Local-Unit {
Build-Ginkgo
$skippackages="hack,internal\domain\infra\abi,internal\domain\infra\tunnel,libpod\lock\shm,pkg\api\handlers\libpod,pkg\api\handlers\utils,pkg\bindings,"
$skippackages+="pkg\domain\infra\abi,pkg\emulation,pkg\machine\apple,pkg\machine\applehv,pkg\machine\e2e,pkg\machine\libkrun,"
$skippackages+="pkg\machine\provider,pkg\machine\proxyenv,pkg\machine\qemu,pkg\specgen\generate,pkg\systemd,test\e2e,test\utils,cmd\rootlessport"
$skippackages+="pkg\machine\provider,pkg\machine\proxyenv,pkg\machine\qemu,pkg\specgen\generate,pkg\systemd,test\e2e,test\utils,cmd\rootlessport,"
$skippackages+="pkg\pidhandle"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the unit tests don't run on windows then they should have !windows build tags instead of skipping them here like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although your test file is already called linux_test.go so it should already be only called on linux so I don't quite follow why this would be needed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed, because otherwise the tests complain on windows that there is no built .go file in the pidhandle package. It seems there is some check for empty packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@l0rd Do you know how we can avoid this? IMO it is quite annoying to have all packages listed here. IF someone adds windows test to any of the dirs they likely don't remember to remove it from the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't know how to do that. For some reason, it fails to compile the package because "build constraints exclude all Go files". The only way I found to avoid adding the package in the skip list is to add a pidhandle_windows.go that only has the package def:

//go:build windows

package pidhandle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think including empty .go files is also not ideal. If someone adds new Windows test, that person also most likely tries to run this test on Windows and finds out that the package is not compiled there at all. This should hopefully lead to line removal from skippackages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using //go:build !remote? The remote is a remote client. Am I right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//go:build !remote or //go:build !windows have the same effect (on windows) and don't fix the problem.

@jankaluza jankaluza force-pushed the 25104-pidfs branch 6 times, most recently from 60b35dd to 16d020a Compare April 29, 2025 10:50
@jankaluza
Copy link
Member Author

@Luap99 , btw, this one is ready for another review, but no rush :-).

@jankaluza jankaluza force-pushed the 25104-pidfs branch 3 times, most recently from d4581c2 to 50e2c41 Compare May 5, 2025 12:18
When container is being removed, podman iterates
through its exec sessions and checks whether exec
session pid is still alive.

The problem is that the pid can be reused for other processes,
so that it may not belong to exec session.
In this scenario podman may kill another process

This commit prevents it by doing following changes:

- Adds the PIDData string to ExecSession struct. This string
  is used to store additional context for a PID to later verify
  that the PID killed by the podman is really the one started by
  it.
- Adds new package called pidhandle which implements the methods
  generating the PIDData, and killing the PID with the PIDData
  ensuring the right PID is killed by verifying the metadata.

The new code uses pidfd_open and name_to_handle_at when available.
It fallbacks to process start-time get using the gopsutil package.

Fixes: containers#25104

Signed-off-by: Jan Kaluza <[email protected]>
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return newFileHandle(255, []byte("test")), 1, nil
}

h, err := NewPIDHandleFromString(os.Getpid(), nameToHandlePrefix+"254 74657374")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are going to push a new version, could you add a test for the empty string?

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2025
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2025
Copy link
Contributor

openshift-ci bot commented May 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, jankaluza, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 58b2eae into containers:main May 8, 2025
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Podman may kill another processes instead of exec sessions
5 participants