Tuesday, April 7, 2015

CRIU as a debug tool and replacement for google coredumper

I'm currently working on a criu images -> core dump conversion for CRIT(CRiu Image Tool) and while looking for some info on manually generating core dump, I've found an interesting yet outdated(OMG last changed in 2007!) project called google coredumper[1] that allows generating core dumps whenever you want to, which looks like a cool thing thing for a debug. CRIU is able to dump process at any point too, yet providing a lot more info about process state, because it can be literally fully restored from images, so I thought that coredump users(if there are any today) could use CRIU for their purposes. Though, criu images -> core dump conversion looks like a complete waste of data, so I have another thought on somehow integrating criu images into gdb, so it could ask criu to restore process with --leave-stopped and then attach to the process for debug. It also may be a good thing to be able to save state of the task that is being debugged by detaching and calling criu dump.

[1] https://code.google.com/p/google-coredumper/

Wednesday, January 14, 2015

python google protobuf and optional field with empty repeated inside (or the "has_field = true" analog in python)

So I was searching for something to represent protobufs in a human-readable format. After lots of googling I've found that there is a magic built-in module called text_format, which does just what I need - it converts protobufs to/from human readable format, which looks quite similar to json. It is not a valid json, as json supported types don't match protobufs, and it has a slightly different format. Pb text is fine for reading, but it has a poor amount of tools that support it. For example, if you need some kind of xpath analog to search inside protobufs, you will be disappointed, as there is no such thing freely available(though, on some forums google developers mentioned that they have one, but they can't or don't want to share it). So, I decided to try to convert pb to json.

There are a bunch of not-so-popular pb<->json converters out there but, as it turned out, they all have the same bug related to handling an optional field with an empty repeated field inside. Here is what I mean:

message Bar {
    repeated int32 baz = 1;

message Foo {
    optional Bar bar = 1;

Even if you have baz containing 0 entries, it is still there, so bar should be present too.

Those pb<->json converters do convert pb to json appropriately, so Foo foo looks like:

    "bar" : {}

But when converting back, they just miss it, as repeated baz is represented by a python list, so if you have no entries in baz(baz == []) and you assign foo.bar = [] protobuf will think that you didn't set foo.bar at all. So, if you do convertion pb->json->pb->json you will see:


Which indicates that protobuf just dropped your optional field(that should be set) with empty optional inside.

In C, you have a has_* field, to mark that the field is present, so it is pretty straight forward.
But in python there wasn't such field to set, and a brief looking into pb methods didn't reveal anything appropriate. But after a bit of digging into text_format sources i found a method called SetInParent() that
does the same thing has_* field does in C. So if you do foo.bar.SetInParent(), it will set has_bar field and after pb->json->pb->json covertion you will see:

    "bar" : {}

Which is correct.

Friday, November 28, 2014

An elegant way to control a bunch of ssh connections

Lets say you want to establish ssh tunnel and put it into background:

$ ssh -fNL 12345:localhost:54321 user@remote

When you are done using tunnel you could use:

$ pkill -f "ssh -fNL 12345:localhost:54321 user@remote"

but it doesn't look nice, right?

Things get a lot more messy when you need to have a bunch of ssh sessions. For example, lets say, you have some client program and server program, and you want to use ssh to tunnel connection from host1 to host2. And you've created a simple script to manage everything:


#Start server program on the remote host(host2:54321)
ssh -ttf user@host2 "server --port 54321"
# -tt is to allocate tty anyway, so  if you kill ssh, server will die too

#Create a tunnel from localhost:12345(host1) to localhost:54321(host2)
ssh -fNL 12345:localhost:54321 user@remote

#Start client
client --port 12345

Note: ssh will ask user for password every time it is called, but you are using keys anyway, right? =)

If everything goes right, you will be left with, at least, ssh tunnel running in background. But for error handling(especially if there are more commands in your script), you will need to somehow check what ssh processes you have running in background. For example, might happen that after successfully launching server, tunnel fails and you are left with server process left running. Or client might fail, and you will have server and tunnel running. As we decided, using ps here doesn't look nice. So, what should we do?

It turns out, there is a good way to control a bunch of ssh sessions in an elegant way. SSH provides a way to make ssh sessions share one tcp connection by using ssh multiplexing. It  allows us to not only speedup establishing new ssh sessions by using existing tcp connection, but also allows us to easily control all ssh sessions. So, lets use it in our previous example.


#Tells our script to exit if some command fails
set -e

#Execute command "ssh -S ~/sock.sock -O exit" when script exits
trap 'ssh -S ~/sock.sock -O exit &> /dev/null' EXIT
# -O exit -- tells ssh to stop all ssh sessions that were using same tcp connection by using same control
#                 socket ~/sock.sock (read below to get full picture of what is going on)

#Start server program on the remote host(host2:54321) with -M and -S options:
ssh -S ~/sock.sock -M -ttf user@host2 "server --port 54321"
# -S ~/sock.sock -- tells ssh to create a control socket that will be used for connection sharing.
#                             Note that we use ~/sock.sock, so other users can't access it due to home dir perms.
# -M                    -- tells ssh to make this ssh client into "master" mode

#Create a tunnel from localhost:12345(host1) to localhost:54321(host2) with -S ~/sock.sock option:
ssh -S ~/sock.sock -fNL 12345:localhost:54321 user@remote
# -S ~/sock.sock -- tells ssh to use socket that we created before to use same tcp connection to open
#                              new ssh session

#Start client
client --port 12345

So, now, if script exits(if smth goes wrong or everything went okay(see "set -e")) all our ssh sessions
will be stopped with nice command ""ssh -S ~/sock.sock -O exit". You don't need to even worry about deleting ~/sock.sock later, ssh will handle it for you. 

Saturday, June 14, 2014

How to set PID using ns_last_pid

There is a cool project called CRIU(Checkpoint/Restore In Userspace). I was wondering, how it gets certain PID when restoring process. I thought, one can't easily set PID.

I did some code investigation and here is what I figured out. There is a file /proc/sys/kernel/ns_last_pid, it contains the last pid that was assigned by the kernel. So, when kernel needs to assign a new one, it looks into ns_last_pid, gets last_pid and assigns last_pid+1.

ns_last_pid was added by CRIU guys and is available since 3.3 kernel. It requires CONFIG_CHECKPOINT_RESTORE to be set. Most probably, it will work by default with your kernel(Tested on ubuntu). Btw, I don't know why, but in Arch kernel this option isn't set since ~3.11 until now (3.14.6-1).

Here is some C code to set PID for forked child.

Update: I've also added this example to criu wiki https://criu.org/Pid_restore.

BEWARE! This program requires root. I don't take any responsibility for what this code might do to your system.

#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
    int fd, pid;
    char buf[32];

    if (argc != 2)
     return 1;

    printf("Opening ns_last_pid...\n");
    fd = open("/proc/sys/kernel/ns_last_pid", O_RDWR | O_CREAT, 0644);
    if (fd < 0) {
        perror("Can't open ns_last_pid");
        return 1;

    printf("Locking ns_last_pid...\n");
    if (flock(fd, LOCK_EX)) {
        printf("Can't lock ns_last_pid\n");
        return 1;

    pid = atoi(argv[1]);
    snprintf(buf, sizeof(buf), "%d", pid - 1);

    printf("Writing pid-1 to ns_last_pid...\n");
    if (write(fd, buf, strlen(buf)) != strlen(buf)) {
        printf("Can't write to buf\n");
        return 1;

    int new_pid;
    new_pid = fork();
    if (new_pid == 0) {
        printf("I'm child!\n");
    } else if (new_pid == pid) {
        printf("I'm parent. My child got right pid!\n");
    } else {
        printf("pid does not match expected one\n");

    printf("Cleaning up...");
    if (flock(fd, LOCK_UN)) {
        printf("Can't unlock");



    return 0;