Process Substitution – Subshells that Rock

If you are new to the world of *nix operating systems, you might still be wrapping your head around the concept of pipes. But if you spend any time at the command line, it won’t be long before you are throwing commands together with more pipes than a high rise plumbing job. This is because the ability to easily piece together the input and output of the literally thousands of command line tools is extremely useful.

And as you work with pipes some more, it won’t be long before you run up against a more complicated problem, specifically “how do I pipe between sets of commands?”. The answer is to group the sets of commands within subshells. I remember the first pipe and subshell solution I used was a quick (and very common) way to tar-copy a directory like so:

(cd /the/source && tar cf - .) | (cd /the/target && tar xf -)

This is a nice reliable cross-platform way of copying an entire directory from one path to another, and it works because the parenthesis spawn two subshells that run the “cd && tar” command separately, and then they are connected by the pipe. But what if you wanted to make a second copy of /the/source in one command? Or maybe you want to copy this directory to three targets all in one shot? The answer is Process Substitution.

Process substitution works by running a command in a subshell and returning a file-handle that you can connect to STDIN and STDOUT of other processes. So in my example of tar-copy to multiple destinations, you could do it this way:

(cd /the/source && tar cf - .) \
  | tee >(cd /the/target1 && tar xf -) \
  | (cd /the/target2 && tar xf -)

Breaking it down:

(cd /the/source && tar cf - .)
This starts a subshell with a tar of /the/source on STDOUT
tee >(cd /the/target1 && tar xf -)
The “tee” command duplicates STDIN to STDOUT and the named file. In this case, the named file is a process substitution, the bit in the ‘>()’, which runs a tar extracting the stream into /the/target1.
(cd /the/target2 && tar xf -)
This is a regular subshell that extracts the tar stream duplicated by “tee” into /the/target2.

Of course you could expand this to as many targets as you wanted by chaining more “tee” commands together. By the way, I have actually used the above command to make two backups of a single drive at one time, so it’s not a completely contrived example! Here are some other examples of when I have found process substitution handy.

Erasing multiple hardrives
I wanted to erase a bunch of identical hard drives using /dev/urandom at the same time without having to start multiple reads of the urandom device. This was the command I used:

cat /dev/urandom \
  | tee >(dd of=/dev/sda) \
  | tee >(dd of=/dev/sdb) \
  | tee >(dd of=/dev/sdc) \
  | dd of=/dev/sdd
Diff’ing OSX StickiesDatabase
OSX has a nifty little program called stickies that is really handy for jotting down little notes. I found myself relying on these so much that I wrote a pair of scripts to sync the sticky database between my two primary OSX machines. However, it is periodically useful to compare two versions of the database to make sure the latest changes are in the right file. Here is how I do it:

diff -urN \
  <(strings StickiesDatabase.test | sort) \
  <(strings StickiesDatabase | sort)
Restoring multiple MySQL slaves
I wanted to restore two MySQL replication slaves at the same time, without copying the dump file to the slaves and without copying the dump file more than once from the backup server:

ssh root@backup 'cat /d1/backup/mysql1/forslave.sql.gz' \
  | tee >(ssh root@slave1 'gzip -d -c | mysql')\
  | ssh root@slave2 'gzip -d -c | mysql'
Multiple stats from a logresolvemerge
I needed to combine the apache logs file from 6 hosts and then run separate stats collection on the combined stream, but I didn't want to bother with storing the combined file as an intermediate step.

for file in `ls /logs/hostA`; do \
  logresolvemerge.pl /logs/*/$file \
                | tee >( stats_script type1 > /results/type1 ) \
                | tee >( stats_script type2 > /results/type2 ) \
                | stats_script type3 > /results/type3;\
done
Filtering packets by hostname for mongosniff
The mongosniff command allows you to decode a mongodb connection in realtime, but the command does *not* allow you to filter by hostname. This means that when you run mongosniff on the primary of a replica set, you are inundated with replica traffic and can not easily separate the data from a specific client. While mongosniff does allow you to query from a file, this would require two steps: one to capture packets and then a second step to decode them. With process substitution you can make mongosniff work in realtime by reading from a tcpdump subprocess. For example:

mongosniff --source FILE <(sudo tcpdump -n -w- -U host foo.bar.com) 27017

It always feels satisfying to use process substitution. What problems have you solved with it?

Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.