Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restoration Fails with Open UDP Socket. Is There a Way to Ignore error and Proceed with Restoration? #2464

Open
wjstk16 opened this issue Aug 12, 2024 · 3 comments

Comments

@wjstk16
Copy link

wjstk16 commented Aug 12, 2024

Hi all,

I am attempting to checkpoint and restore a container with an open UDP socket. Unlike TCP, UDP is a connectionless communication protocol and doesn't maintain a stateful connection. However, restoration might require retransmission due to data buffered in the kernel.
My goal is for the UDP container to continue communication seamlessly after restoration, even if retransmissions occur. Unfortunately, I encounter the following error, which causes the restoration to fail.

I used Podman to create a UDP server container and a UDP client container, and the checkpoint/restore process works fine when performed on the same host machine. However, when I transfer the checkpointed tar file to a remote host and attempt to restore, I encounter the error mentioned below.

Here is the log output:

((00.108314) mnt: Switching to new ns to clean ghosts
(00.109452) net: Unlock network
(00.109496) Running network-unlock scripts
(00.109509)     RPC
(00.156697) pie: 1: seccomp: Restoring mode 1 flags 0x1 on tid 1 filter 0
(00.160377) pie: 1: seccomp: Restored mode 2 on tid 1
(00.160563) pie: 1: restoring lsm profile (current) changeprofile containers-default-0.44.4
(00.160707) pie: 1: Error (criu/pie/restorer.c:192): can't write lsm profile -2
(00.179980) pie: 1: Error (criu/pie/restorer.c:2168): BUG at criu/pie/restorer.c:2168
(00.180092) Error (compel/src/lib/infect.c:1612): Task 4148077 is in unexpected state: b7f
(00.180171) Error (compel/src/lib/infect.c:1618): Task stopped with 11: Segmentation fault
(00.180201) Error (criu/cr-restore.c:2469): Can't stop all tasks on rt_sigreturn
(00.180212) Error (criu/cr-restore.c:2530): Killing processes because of failure on restore.
The Network was unlocked so some data or a connection may have been lost.
(00.181450) Error (criu/mount.c:3689): mnt: Can't remove the directory /tmp/.criu.mntns.bVhQ14: No such file or directory
(00.181473) Error (criu/cr-restore.c:2557): Restoring FAILED.

I want to restore a process with an open UDP socket, even if it's not a complete restoration like with TCP (even if retransmission is necessary). Is there a way to ignore these errors and proceed with the restoration, similar to the tcp-close option?

Attachments:
criu.log

Any assistance would be greatly appreciated.
Thanks.

@adrianreber
Copy link
Member

The error you see has nothing to do with UDP. You do not provide much information about the systems you are using, but it seems you are using Ubuntu with AppArmor enabled. During restore CRIU tries to restore the AppArmor profile and it fails:

(00.160563) pie: 1: restoring lsm profile (current) changeprofile containers-default-0.44.4

I have never tested Podman with AppArmor, so I do not know if that works. I know it works in combination with SELinux, so you could retry it on Fedora/CentOS/RHEL. Or try to disable AppArmor.

@avagin
Copy link
Member

avagin commented Aug 12, 2024

@wjstk16 Before disabling apparmor, you need to check that you have containers-default-0.44.4 on the remove machine. I think it hasn't been installed there and it is the issue.

Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants