Use tmux to debug distributed Python programs
It is always hard to debug distributed programs. Not only the concurrency is extremely naughty, but we don't have enough tools, or don't know there are several tools to debug the distributed programs. But I found that tmux is capable of handling multiple windows, which means it's possible to control numerous nodes without GUI.
Usage of tmux
Here is my tmux cheating sheet. For more details, check the website https://gist.github.com/henrik/1967800.
Create Session and Window
1 | # Shell Commands |
Window / Pane Conversion
Note: You are allowed to use autocomplete by clicking tab
or check the history by clicking arrow keys after press Ctrl-b
and :
.
1 | Ctrl-b :join-pane -s :2 # Move window 2 into a new split pane |
Sometimes -t
represents target
while -s
represents source
.
Example
RPC via SSH
This is a launcher which will spawn several processes on remote machines. (Source: DGL Library)
1 | def execute_remote(cmd, ip, port, thread_list): |
To debug the program, we need to create a session on the login node first.
1 | login$ tmux new -s dgl |
Then modify the source code of the launcher to let newly spawned processes attach to tmux.
- Put
tmux neww
at the beginning of the command - Put
;bash -i
at the end to prevent window from closing after program exited
1 | def execute_remote(cmd, ip, port, thread_list): |
Finally execute the modified launcher on login node directly. After that we could notice several windows are created and shown at the bottom of tmux.
1 | login$ python launch.py ... |
RPC via MPI
Just like what the previous section does, add something at the beginning or the end of the command.
1 | tmux new -s mpi |
with Debugger
It is easier to debug distributed programs when each remote process shown in a separated window is attached by a separated debugger.
PDB
PDB is a built-in utility, and it is easy to use, especially it allows the program to trap in interactive debugging mode by inserting one instruction explictly. For example, try to execute the following code.
1 | import pdb |
Then your Python program will pause and a interactive dialogue like gdb
appeared.
PUDB
This is basically PDB equipped with TUI (Text-based user interface), and its usage is quite similar to PDB's. But you have to download it before using it.
1 | conda install pudb # Install by conda |
1 | import pudb |
However, the TUI heavily relies on some features of pseudo-tty. Without it, the TUI cannot work correctly. But, by default SSH will not allocate pseudo-tty when using SSH to launch a remote program instead of a console. Thus, we need to do some modifications to the launcher.
- specify a SSH argument
-t
to force pseudo-tty allocation.
1 | def execute_remote(cmd, ip, port, thread_list): |