Chapter 4. Working with Remote Machines
Most data-processing tasks in bioinformatics require more computing power than we have on our workstations, which means we must work with large servers or computing clusters. For some bioinformatics projects, itâs likely youâll work predominantly over a network connection with remote machines. Unsurprisingly, working with remote machines can be quite frustrating for beginners and can hobble the productivity of experienced bioinformaticians. In this chapter, weâll learn how to make working with remote machines as effortless as possible so you can focus your time and efforts on the project itself.
Connecting to Remote Machines with SSH
There are many ways to connect to another machine over a network, but by far the most common is through the secure shell (SSH). We use SSH because itâs encrypted (which makes it secure to send passwords, edit private files, etc.), and because itâs on every Unix system. How your server, SSH, and your user account are configured is something you or your system administrator determines; this chapter wonât cover these system administration topics. The material covered in this section should help you answer common SSH questions a sysadmin may ask (e.g., âDo you have an SSH public key?â). Youâll also learn all of the basics youâll need as a bioinformatician to SSH into remote machines.
To initialize an SSH connection to a host (in this case, biocluster.myuniversity.edu), we use
the ssh
command:
$
ssh
biocluster.myuniversity.edu
Password:
Last login: Sun Aug 11 11:57:59 2013 from fisher.myisp.com
wsobchak@biocluster.myuniversity.edu$
When connecting to a remote host with SSH, youâll be prompted for your remote user accountâs password.
After logging in with your password, youâre granted a shell prompt on the remote host. This allows you to execute commands on the remote host just as youâd execute them locally.
SSH also works with IP addressesâfor example, you could connect to a machine with ssh
192.169.237.42
. If your server uses a different port than the default (port
22), or your username on the remote machine is different from your local
username, youâll need to specify these details when connecting:
$
ssh -p50453
cdarwin@biocluster.myuniversity.edu
Here, weâve specified the port with the flag -p
and the username by using
the syntax user@domain
. If youâre unable to connect to a host, using
ssh -v
(-v
for verbose) can help you spot the issue. You can
increase the verbosity by using -vv
or -vvv
; see man ssh
for more
details.
Storing Your Frequent SSH Hosts
Bioinformaticians are constantly having to SSH to servers, and typing out IP
addresses or long domain names can become quite tedious. Itâs also burdensome
to remember and type out additional details like the remote serverâs port or
your remote username. The developers behind SSH created a clever alternative:
the SSH config file. SSH config files store details about hosts you frequently
connect to. This file is easy to create, and hosts stored in this file work not
only with ssh
, but also with two programs weâll learn about in Chapter 6: scp
and rsync
.
To create a file, just create and edit the file at ~/.ssh/config. Each entry takes the following form:
Host bio_serv HostName 192.168.237.42 User cdarwin Port 50453
You wonât need to specify Port
and User
unless these differ from the remote
hostâs defaults. With this file saved, you can SSH into 192.168.236.42 using
the alias ssh bio_serv
rather than typing out ssh -p 50453
cdarwin@192.169.237.42
.
If youâre working with many remote machine connections in many terminal tabs, itâs
sometimes useful to be make sure youâre working on the host you think you are.
You can always access the hostname with the command hostname
:
$
hostnamebiocluster.myuniversity.edu
Similarly, if you maintain multiple accounts on a server (e.g., a user account
for analysis and a more powerful administration account for sysadmin tasks), it
can be useful to check which account youâre using. The command whoami
returns
your username:
$
whoamicdarwin
This is especially useful if you do occasionally log in with an administrator account with more privilegesâthe potential risks associated with making a mistake on an account with administrator privileges are much higher, so you should always be away when youâre on this account (and minimize this time as much as possible).
Quick Authentication with SSH Keys
SSH requires that you type your password for the account on the remote machine.
However, entering a password each time you login can get tedious, and not
always safe (e.g., keyboard input could be monitored). A safer, easier
alternative is to use an SSH public key. Public key cryptography is a
fascinating technology, but the details are outside the scope of this book. To
use SSH keys to log in into remote machines without passwords, we first need to
generate a public/private key pair. We do this with the command ssh-keygen
.
Itâs very important that you note the difference between your public and
private keys: you can distribute your public key to other servers, but your
private key must be kept safe and secure and never shared.
Letâs generate an SSH key pair using ssh-keygen
:
$
ssh-keygen -b 2048Generating public/private rsa key pair.
Enter file in which to save the key (/Users/username/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/username/.ssh/id_rsa.
Your public key has been saved in /Users/username/.ssh/id_rsa.pub.
The key fingerprint is:
e1:1e:3d:01:e1:a3:ed:2b:6b:fe:c1:8e:73:7f:1f:f0
The key's randomart image is:
+--[ RSA 2048]----+
|.o... ... |
| . . o |
| . * |
| . o + |
| . S . |
| o . E |
| + . |
|oo+.. . . |
|+=oo... o. |
+-----------------+
This creates a private key at ~/.ssh/id_rsa and a public key at
~/.ssh/id_rsa.pub. ssh-keygen
gives you the option to use an empty password,
but itâs generally recommended that you use a real password. If youâre
wondering, the random art ssh-keygen
creates is a way of validating your keys
(there are more details about this in man ssh
if youâre curious).
To use password-less authentication using SSH keys, first SSH to your remote
host and log in with your password. Change directories to ~/.ssh, and
append the contents of your public key file (id_rsa.pub, not your private
key!) to ~/.ssh/authorized_keys (note that the ~ may be expanded to
/home/username or /Users/username depending on the remote operating
system). You can append this file by copying your public key from your local
system, and pasting it to the ~/.ssh/authorized_keys file on the remote
system. Some systems have an ssh-copy-id
command that automatically does this
for you.
Again, be sure youâre using your public key, and not the private key. If your private key ever is accidentally distributed, this compromises the security of the machines youâve set up key-based authentication on. The ~/.ssh/id_rsa private key has read/write permissions only for the creator, and these restrictive permissions should be kept this way.
After youâve added your public key to the remote host, try logging in a few
times. Youâll notice that you keep getting prompted for your SSH keyâs
password. If youâre scratching your head wondering how this saves time, thereâs
one more trick to know: ssh-agent
. The ssh-agent
program runs in the
background on your local machine, and manages your SSH key(s). ssh-agent
allows you to use your keys without entering their passwords each timeâexactly what we want when we frequently connect to servers. SSH agent is
usually already running on Unix-based systems, but if not, you can use eval
to start it. Then, to tell ssh-agent
ssh-agent
about our key, we use
ssh-add
:
$
ssh-addEnter passphrase for /Users/username/.ssh/id_rsa:
Identity added: /Users/username/.ssh/id_rsa
Now, the background ssh-agent
process manages our key for us, and we wonât
have to enter our password each time we connect to a remote machine. I once
calculated that I connect to different machines about 16 times a day, and it
takes me about two seconds to enter my password on average (accounting for
typing mistakes). If we were to assume I didnât work on weekends, this works
out to about 8,320 seconds, or 2.3 hours a year of just SSH connections. After
10 years, this translates to nearly an entire day wasted on just connecting to
machines. Learning these tricks may take an hour or so, but over the course of
a career, this really saves time.
Maintaining Long-Running Jobs with nohup and tmux
In Chapter 3, we briefly discussed how processes (whether running in the
foreground or background) will be terminated when we close our terminal window.
Processes are also terminated if we disconnect from our servers or if our
network connection temporarily drops out. This behavior is intentionalâyour
program will receive the hangup signal (referred to more technically as
SIGHUP
), which will in almost all cases cause your application to exit
immediately. Because weâre perpetually working with remote machines in our daily
bioinformatics work, we need a way to prevent hangups from stopping
long-running applications. Leaving your local terminalâs connection to a remote
machine open while a program runs is a fragile solutionâeven the most
reliable networks can have short outage blips. Weâll look at two preferable
solutions: nohup
and Tmux. If you use a cluster, there are better
ways to deal with hangups (e.g., submitting batch jobs to your clusterâs
software), but these depend on your specific cluster configuration. In this
case, consult your system administrator.
nohup
nohup
is simple command that executes a command and catches hangup signals
sent from the terminal. Because the nohup
command is catching and ignoring
these hangup signals, the program youâre running wonât be interrupted. Running
a command with nohup
is as easy as adding nohup
before your command:
$
nohup
program1
>
output.txt
&
[1] 10900
We run the command with all options and arguments as we would normally, but by adding
nohup
this program will not be interrupted if your terminal were to close or the remote connection were to drop. Additionally, itâs a good idea to redirect standard output and standard error just as we did in Chapter 3 so you can check output later.nohup
returns the process ID number (or PID), which is how you can monitor or terminate this process if you need to (covered in âKilling Processesâ). Because we lose access to this process when we run it throughnohup
, our only way of terminating it is by referring to it by its process ID.
Working with Remote Machines Through Tmux
An alternative to nohup
is to use a terminal multiplexer. In addition to
solving the hangup problem, using a terminal multiplexer will greatly increase
your productivity when working over a remote connection. Weâll use a terminal
multiplexer called Tmux, but a popular alternative is GNU Screen. Tmux and
Screen have similar functionality, but Tmux is more actively developed and has
some additional nice features.
Tmux (and terminal multiplexers in general) allow you to create a session containing multiple windows, each capable of running their own processes. Tmuxâs sessions are persistent, meaning that all windows and their processes can easily be restored by reattaching the session.
When run on a remote machine, Tmux allows you to maintain a persistent session that wonât be lost if your connection drops or you close your terminal window to go home (or even quit your terminal program). Rather, all of Tmuxâs sessions can be reattached to whatever terminal youâre currently onâjust SSH back into the remote host and reattach the Tmux session. All windows will be undisturbed and all processes still running.
Installing and Configuring Tmux
Tmux is available through most package/port managers. On OS X, Tmux can be
installed through Homebrew and on Ubuntu itâs available through apt-get
.
After installing Tmux, I strongly suggest you go to this chapterâs directory on
GitHub and download the .tmux.conf file to your home directory. Just as your
shell loads configurations from ~/.profile or ~/.bashrc, Tmux will
load its configurations from ~/.tmux.conf. The minimal settings in
.tmux.conf make it easier to learn Tmux by giving you a useful display bar at
the bottom and changing some of Tmuxâs key bindings to those that are more
common among Tmux users.
Creating, Detaching, and Attaching Tmux Sessions
Tmux allows you to have multiple sessions, and within each session have multiple windows. Each Tmux session is a separate environment. Normally, I use a session for each different project Iâm working on; for example, I might have a session for maize SNP calling, one for developing a new tool, and another for writing R code to analyze some Drosophila gene expression data. Within each of these sessions, Iâd have multiple windows. For example, in my maize SNP calling project, I might have three windows open: one for interacting with the shell, one with a project notebook open in my text editor, and another with a Unix manual page open. Note that all of these windows are within Tmux; your terminal programâs concept of tabs and windows is entirely different from Tmuxâs. Unlike Tmux, your terminal cannot maintain persistent sessions.
Letâs create a new Tmux session. To make our examples a bit easier, weâre going to do this on our local machine. However, to manage sessions on a remote host, weâd need to start Tmux on that remote host (this is often confusing for beginners). Running Tmux on a remote host is no different; we just SSH in to our host and start Tmux there. Suppose we wanted to create a Tmux session corresponding to our earlier Maize SNP calling example:
$
tmux new-session -s maize-snps
Tmux uses subcommands; the new-session
subcommand just shown creates new sessions. The
-s
option simply gives this session a name so itâs easier to identify
later. If youâre following along and youâve correctly placed the .tmux.conf
file in your home directory, your Tmux session should look like
Figure 4-1.
Tmux looks just like a normal shell prompt except for the status bar it has added at the bottom of the screen (weâll talk more about this in a bit). When Tmux is open, we interact with Tmux through keyboard shortcuts. These shortcuts are all based on first pressing Control and a, and then adding a specific key after (releasing Control-a first). By default, Tmux uses Control-b rather than Control-a, but this is a change weâve made in our .tmux.conf to follow the configuration preferred by most Tmux users.
The most useful feature of Tmux (and terminal multiplexers in general) is the
ability to detach and reattach sessions without losing our work. Letâs see how
this works in Tmux. Letâs first enter something in our blank shell so we can
recognize this session later: echo "hello, tmux"
. To detach a session, we
use Control-a, followed by d (for detach). After entering this, you should
see Tmux close and be returned to your regular shell prompt.
After detaching, we can see that Tmux has kept our session alive by calling
tmux
with the list-sessions
subcommand:
$ tmux list-sessions maize-snps: 1 windows (created Mon Feb 10 00:06:00 2014) [180x41]
Now, letâs reattach our session. We reattach sessions with the
attach-session
subcommand, but the shorter attach
also works:
$
tmux attach
Note that because we only have one session running (our maize-snps
session)
we donât have to specify which session to attach. Had there been more than one
session running, all session names would have been listed when we executed
list-sessions
and we could reattach a particular session using -t
<session-name>
. With only one Tmux session running, tmux attach
is
equivalent to tmux attach-session -t maize-snps
.
Managing remote sessions with Tmux is no different than managing sessions locally as we did earlier. The only difference is that we create our sessions on the remote host after we connect with SSH. Closing our SSH connection (either intentionally or unintentionally due to a network drop) will cause Tmux to detach any active sessions.
Working with Tmux Windows
Each Tmux session can also contain multiple windows. This is especially handy
when working on remote machines. With Tmuxâs windows, a single SSH connection
to a remote host can support multiple activities in different windows. Tmux
also allows you to create multiple panes within a window that allow you to
split your windows into parts, but to save space Iâll let the reader learn this
functionality on their own. Consult the Tmux manual page (e.g., with man tmux
)
or read one of the many excellent Tmux tutorials on the Web.
Like other Tmux key sequences, we create and switch windows using Control-a
and then another key. To create a window, we use Control-a c, and we use
Control-a n and Control-a p to go to the next and previous windows, respectively.
Table 4-1 lists the most commonly used Tmux key sequences. See man tmux
for a complete list, or press Control-a ? from within a Tmux session.
Key sequence | Action |
---|---|
Control-a d |
Detach |
Control-a c |
Create new window |
Control-a n |
Go to next window |
Control-a p |
Go to previous window |
Control-a & |
Kill current window ( |
Control-a , |
Rename current window |
Control-a ? |
List all key sequences |
Table 4-2 lists the most commonly used Tmux subcommands.
Subcommand | Action |
---|---|
|
List all sessions. |
|
Create a new session named âsession-nameâ. |
|
Attach a session named âsession-nameâ. |
|
Attach a session named âsession-nameâ, detaching it first. |
If you use Emacs as your text editor, youâll quickly notice that the key binding Control-a may get in the way. To enter a literal Control-a (as used to go to the beginning of the line in Emacs or the Bash shell), use Control-a a.
Get Bioinformatics Data Skills now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.