Friday, November 28, 2014

An elegant way to control a bunch of ssh connections

Lets say you want to establish ssh tunnel and put it into background:

$ ssh -fNL 12345:localhost:54321 user@remote

When you are done using tunnel you could use:

$ pkill -f "ssh -fNL 12345:localhost:54321 user@remote"

but it doesn't look nice, right?

Things get a lot more messy when you need to have a bunch of ssh sessions. For example, lets say, you have some client program and server program, and you want to use ssh to tunnel connection from host1 to host2. And you've created a simple script to manage everything:

#!/usr/bin/bash

# Start server program on the remote host(host2:54321)
ssh -ttf user@host2 "server --port 54321"
# -tt is to allocate tty anyway, so  if you kill ssh, server will die too

# Create a tunnel from localhost:12345(host1) to localhost:54321(host2)
ssh -fNL 12345:localhost:54321 user@remote

# Start client
client --port 12345

Note: ssh will ask user for password every time it is called, but you are using keys anyway, right? :)

If everything goes right, you will be left with, at least, ssh tunnel running in background. But for error handling(especially if there are more commands in your script), you will need to somehow check what ssh processes you have running in background. For example, might happen that after successfully launching server, tunnel fails and you are left with server process left running. Or client might fail, and you will have server and tunnel running. As we decided, using ps here doesn't look nice. So, what should we do?

It turns out, there is a good way to control a bunch of ssh sessions in an elegant way. SSH provides a way to make ssh sessions share one tcp connection by using ssh multiplexing. It  allows us to not only speedup establishing new ssh sessions by using existing tcp connection, but also allows us to easily control all ssh sessions. So, lets use it in our previous example.


#!/usr/bin/bash

# Tells our script to exit if some command fails
set -e

# Execute command "ssh -S ~/sock.sock -O exit" when script exits
trap 'ssh -S ~/sock.sock -O exit &> /dev/null' EXIT
# -O exit -- tells ssh to stop all ssh sessions that were using same tcp connection by using same control
#            socket ~/sock.sock (read below to get full picture of what is going on)

# Start server program on the remote host(host2:54321) with -M and -S options:
ssh -S ~/sock.sock -M -ttf user@host2 "server --port 54321"
# -S ~/sock.sock -- tells ssh to create a control socket that will be used for connection sharing.
#                   Note that we use ~/sock.sock, so other users can't access it due to home dir perms.
# -M -- tells ssh to make this ssh client into "master" mode

# Create a tunnel from localhost:12345(host1) to localhost:54321(host2) with -S ~/sock.sock option:
ssh -S ~/sock.sock -fNL 12345:localhost:54321 user@remote
# -S ~/sock.sock -- tells ssh to use socket that we created before to use
#                   the same tcp connection to open new ssh session

# Start client
client --port 12345


So, now, if script exits(if smth goes wrong or everything went okay(see "set -e")) all our ssh sessions
will be stopped with nice command ""ssh -S ~/sock.sock -O exit". You don't need to even worry about deleting ~/sock.sock later, ssh will handle it for you. 

Saturday, June 14, 2014

How to set PID using ns_last_pid

So there is this cool project called CRIU (Checkpoint/Restore In Userspace). And I was wondering how it gets certain PID when restoring a process. I always thought that it is not possible to set PID without some kind of kernel hacking.

I did some investigation and here is what I figured out. There is a file /proc/sys/kernel/ns_last_pid, which contains the last PID that was assigned by the kernel. So, when the kernel needs to assign a new one, it looks into ns_last_pid, gets last_pid, and assigns last_pid+1.

ns_last_pid was added by CRIU guys and has been available since 3.3 Kernel. Note that it requires the CONFIG_CHECKPOINT_RESTORE option to be set, which has been enabled by default in most popular distros (e.g. ubuntu and fedora), but not all. The most notable example is Arch Linux, which had it set for some time but then suddenly disabled it in ~3.11 (still disabled in 3.14.6-1).

Here is some C code to set PID for a forked child.

Update: I've also added this example to criu wiki https://criu.org/Pid_restore.

BEWARE! This program requires root. I don't take any responsibility for what this code might do to your system.

#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int fd, pid;
    char buf[32];

    if (argc != 2)
     return 1;

    printf("Opening ns_last_pid...\n");
    fd = open("/proc/sys/kernel/ns_last_pid", O_RDWR | O_CREAT, 0644);
    if (fd < 0) {
        perror("Can't open ns_last_pid");
        return 1;
    }
    printf("Done\n");

    printf("Locking ns_last_pid...\n");
    if (flock(fd, LOCK_EX)) {
        close(fd);
        printf("Can't lock ns_last_pid\n");
        return 1;
    }
    printf("Done\n");

    pid = atoi(argv[1]);
    snprintf(buf, sizeof(buf), "%d", pid - 1);

    printf("Writing pid-1 to ns_last_pid...\n");
    if (write(fd, buf, strlen(buf)) != strlen(buf)) {
        printf("Can't write to buf\n");
        return 1;
    }
    printf("Done\n");

    printf("Forking...\n");
    int new_pid;
    new_pid = fork();
    if (new_pid == 0) {
        printf("I'm child!\n");
        exit(0);
    } else if (new_pid == pid) {
        printf("I'm parent. My child got right pid!\n");
    } else {
        printf("pid does not match expected one\n");
    }
    printf("Done\n");

    printf("Cleaning up...");
    if (flock(fd, LOCK_UN)) {
        printf("Can't unlock");
    }

    close(fd);

    printf("Done\n");

    return 0;
}