Friday, May 5, 2017

How to package and distribute standalone python project on Windows

UPDATE: thanks to lots of contributed hooks, I would choose pyinstaller these days

 I recently had a need to distribute an app written in python as standalone on windows machines. Usual way to do it is to use py2exe(or similar) to generate standalone exe from your python script and then use Inno Setup or NSIS to write an installer for it. As it turned out, py2exe, nuitka and pyinstall could not track all the dependencies of the project automatically(i..e. some conditional imports, etc) and even after I explicitly supplied hooks and helpers to fix that, still resulting exe was exiting silently. So to solve that, I approached it from the other end and started looking into ways to distribute the app with everything needed included. And, luckily, there are plenty of projects that supply portable python for windows:
From this list, Anaconda and WinPython looked like the most promising ones. And both of them supplied mini versions called Miniconda and WinPythonZero, that provide the most essential packages and tools for your project. For no particular reason, I chose Miniconda.

Preparing your project

Installing Miniconda couldn't be easier. You just download the installer and install it in your project directory. After that, you just go there and simply call

python -m pip install yourproject

If your project is not available on pypi, you might want to build sdist from it and then call pip to install it

python -m pip install -r requirements.txt
python setup.py sdist
python -m pip install dist\myproject-0.1.tar.gz

And that's all! Now, you have a fully functional portable python which you can use for your project.

Creating installer

I chose Inno Setup for my installer. There are plenty of tutorials on how to write an installer with it, so i'm not going to talk about it here. Inno Setup also provides a GUI wizard which you can use to generate an installer for your project, so it really couldn't be easier. Just make sure to specify the directory with your project and proper flags:

Source: "{#MyAppDir}\*"; DestDir: "{app}"; Flags: ignoreversion recursesubdirs createallsubdirs

You might also want your installer to modify PATH, so you might find modpath.iss useful:

[Tasks]
Name: modifypath
Description: Adds dvc's application directory to environmental path
Another feature that you might want to have your installer handled is adding CreateSymLink permissions and to do that, take a look at this brilliant post Adding Permission for creating Symlink using PowerShell. Just copy that script into script.iss and call it in [Task] section of your installer:
[Tasks]
Name: modifypath; Description: Adds dvc's application directory to environmental path;
Name: addsymlinkpermissions; Description: Add permission for creating symbolic links;
Now just call:
iscc setup.iss
and here you have yourself yourproject.exe installer that you can distribute to users that don't even know what Python is.

Tuesday, April 7, 2015

CRIU as a debug tool and replacement for google coredumper

I'm currently working on a criu images -> core dump conversion for CRIT(CRiu Image Tool) and while looking for some info on manually generating core dump, I've found an interesting yet outdated(OMG last changed in 2007!) project called google coredumper[1] that allows generating core dumps whenever you want to, which looks like a cool thing for a debug. CRIU is able to dump process at any point too, yet providing a lot more info about process state, because it can be literally fully restored from images, so I thought that coredump users(if there are any today) could use CRIU for their purposes. Though, criu images -> core dump conversion looks like a complete waste of data, so I have another thought on somehow integrating criu images into gdb, so it could ask criu to restore process with --leave-stopped and then attach to the process for debug. It also may be a good thing to be able to save state of the task that is being debugged by detaching and calling criu dump.

[1] https://code.google.com/p/google-coredumper/

Wednesday, January 14, 2015

python google protobuf and optional field with empty repeated inside (or the "has_field = true" analog in python)

So I was searching for something to represent protobufs in a human-readable format. After lots of googling I've found that there is a magic built-in module called text_format, which does just what I need - it converts protobufs to/from human readable format, which looks quite similar to json. It is not a valid json, as json supported types don't match protobufs, and it has a slightly different format. Pb text is fine for reading, but it has a poor amount of tools that support it. For example, if you need some kind of xpath analog to search inside protobufs, you will be disappointed, as there is no such thing freely available(though, on some forums google developers mentioned that they have one, but they can't or don't want to share it). So, I decided to try to convert pb to json.

There are a bunch of not-so-popular pb<->json converters out there but, as it turned out, they all have the same bug related to handling an optional field with an empty repeated field inside. Here is what I mean:
message Bar {
  repeated int32 baz = 1;
}

message Foo {
  optional Bar bar = 1;
}
Even if you have baz containing 0 entries, it is still there, so bar should be present too.

Those pb<->json converters do convert pb to json appropriately, so Foo foo looks like:
{
  "bar" : {}
}
But when converting back, they just miss it, as repeated baz is represented by a python list, so if you have no entries in baz(baz == []) and you assign foo.bar = [] protobuf will think that you didn't set foo.bar at all. So, if you do convertion pb->json->pb->json you will see:
{
}
Which indicates that protobuf just dropped your optional field(that should be set) with empty optional inside.

In C, you have a has_* field, to mark that the field is present, so it is pretty straight forward.
But in python there wasn't such field to set, and a brief looking into pb methods didn't reveal anything appropriate. But after a bit of digging into text_format sources i found a method called SetInParent() that
does the same thing has_* field does in C. So if you do foo.bar.SetInParent(), it will set has_bar field and after pb->json->pb->json covertion you will see:
{
  "bar" : {}
}
Which is correct.

Friday, November 28, 2014

An elegant way to control a bunch of ssh connections

Lets say you want to establish ssh tunnel and put it into background:

$ ssh -fNL 12345:localhost:54321 user@remote

When you are done using tunnel you could use:

$ pkill -f "ssh -fNL 12345:localhost:54321 user@remote"

but it doesn't look nice, right?

Things get a lot more messy when you need to have a bunch of ssh sessions. For example, lets say, you have some client program and server program, and you want to use ssh to tunnel connection from host1 to host2. And you've created a simple script to manage everything:

#!/usr/bin/bash

# Start server program on the remote host(host2:54321)
ssh -ttf user@host2 "server --port 54321"
# -tt is to allocate tty anyway, so  if you kill ssh, server will die too

# Create a tunnel from localhost:12345(host1) to localhost:54321(host2)
ssh -fNL 12345:localhost:54321 user@remote

# Start client
client --port 12345

Note: ssh will ask user for password every time it is called, but you are using keys anyway, right? :)

If everything goes right, you will be left with, at least, ssh tunnel running in background. But for error handling(especially if there are more commands in your script), you will need to somehow check what ssh processes you have running in background. For example, might happen that after successfully launching server, tunnel fails and you are left with server process left running. Or client might fail, and you will have server and tunnel running. As we decided, using ps here doesn't look nice. So, what should we do?

It turns out, there is a good way to control a bunch of ssh sessions in an elegant way. SSH provides a way to make ssh sessions share one tcp connection by using ssh multiplexing. It  allows us to not only speedup establishing new ssh sessions by using existing tcp connection, but also allows us to easily control all ssh sessions. So, lets use it in our previous example.


#!/usr/bin/bash

# Tells our script to exit if some command fails
set -e

# Execute command "ssh -S ~/sock.sock -O exit" when script exits
trap 'ssh -S ~/sock.sock -O exit &> /dev/null' EXIT
# -O exit -- tells ssh to stop all ssh sessions that were using same tcp connection by using same control
#            socket ~/sock.sock (read below to get full picture of what is going on)

# Start server program on the remote host(host2:54321) with -M and -S options:
ssh -S ~/sock.sock -M -ttf user@host2 "server --port 54321"
# -S ~/sock.sock -- tells ssh to create a control socket that will be used for connection sharing.
#                   Note that we use ~/sock.sock, so other users can't access it due to home dir perms.
# -M -- tells ssh to make this ssh client into "master" mode

# Create a tunnel from localhost:12345(host1) to localhost:54321(host2) with -S ~/sock.sock option:
ssh -S ~/sock.sock -fNL 12345:localhost:54321 user@remote
# -S ~/sock.sock -- tells ssh to use socket that we created before to use
#                   the same tcp connection to open new ssh session

# Start client
client --port 12345


So, now, if script exits(if smth goes wrong or everything went okay(see "set -e")) all our ssh sessions
will be stopped with nice command ""ssh -S ~/sock.sock -O exit". You don't need to even worry about deleting ~/sock.sock later, ssh will handle it for you. 

Saturday, June 14, 2014

How to set PID using ns_last_pid

So there is this cool project called CRIU (Checkpoint/Restore In Userspace). And I was wondering how it gets certain PID when restoring a process. I always thought that it is not possible to set PID without some kind of kernel hacking.

I did some investigation and here is what I figured out. There is a file /proc/sys/kernel/ns_last_pid, which contains the last PID that was assigned by the kernel. So, when the kernel needs to assign a new one, it looks into ns_last_pid, gets last_pid, and assigns last_pid+1.

ns_last_pid was added by CRIU guys and has been available since 3.3 Kernel. Note that it requires the CONFIG_CHECKPOINT_RESTORE option to be set, which has been enabled by default in most popular distros (e.g. ubuntu and fedora), but not all. The most notable example is Arch Linux, which had it set for some time but then suddenly disabled it in ~3.11 (still disabled in 3.14.6-1).

Here is some C code to set PID for a forked child.

Update: I've also added this example to criu wiki https://criu.org/Pid_restore.

BEWARE! This program requires root. I don't take any responsibility for what this code might do to your system.

#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    int fd, pid;
    char buf[32];

    if (argc != 2)
     return 1;

    printf("Opening ns_last_pid...\n");
    fd = open("/proc/sys/kernel/ns_last_pid", O_RDWR | O_CREAT, 0644);
    if (fd < 0) {
        perror("Can't open ns_last_pid");
        return 1;
    }
    printf("Done\n");

    printf("Locking ns_last_pid...\n");
    if (flock(fd, LOCK_EX)) {
        close(fd);
        printf("Can't lock ns_last_pid\n");
        return 1;
    }
    printf("Done\n");

    pid = atoi(argv[1]);
    snprintf(buf, sizeof(buf), "%d", pid - 1);

    printf("Writing pid-1 to ns_last_pid...\n");
    if (write(fd, buf, strlen(buf)) != strlen(buf)) {
        printf("Can't write to buf\n");
        return 1;
    }
    printf("Done\n");

    printf("Forking...\n");
    int new_pid;
    new_pid = fork();
    if (new_pid == 0) {
        printf("I'm child!\n");
        exit(0);
    } else if (new_pid == pid) {
        printf("I'm parent. My child got right pid!\n");
    } else {
        printf("pid does not match expected one\n");
    }
    printf("Done\n");

    printf("Cleaning up...");
    if (flock(fd, LOCK_UN)) {
        printf("Can't unlock");
    }

    close(fd);

    printf("Done\n");

    return 0;
}