... thousands of other wikis. Mediawiki allows users to upload images for display. Most users follow a manual process via web forms: click Choose File to bring up a file dialog, navigate to an image file and select it, add a descriptive comment in the form, and click Upload. Wiki administrators use a more automated method: a script that reads a whole directory and uploads its images. Each image file (say, bald_eagle.jpg) is paired with a text file (bald_eagle.txt) containing a descriptive comment about the image.

Imagine that you’re faced with a huge directory containing only JPEG and TXT files. You want to confirm that every image file has a matching text file and vice versa. Here’s a smaller version of that directory:

$ ls
bald_eagle.jpg  blue_jay.jpg  cardinal.txt  robin.jpg  wren.jpg
bald_eagle.txt  cardinal.jpg  oriole.txt    robin.txt  wren.txt

Let’s develop two different solutions to identify any unmatched files. For the first solution, create two lists, one for the JPEG files and one for the text files, and use cut to strip off their file extensions .txt and .jpg:

$ ls *.jpg | cut -d. -f1
bald_eagle
blue_jay
cardinal
robin
wren
$ ls *.txt | cut -d. -f1
bald_eagle
cardinal
oriole
robin
wren

Then compare the lists with diff using process substitution:

$ diff <(ls *.jpg | cut -d. -f1) <(ls *.txt | cut -d. -f1)
2d1
< blue_jay
3a3
> oriole

You could stop here, because the output indicates that the first list has an extra blue_jay (implying blue_jay.jpg) and the second list has an extra oriole (implying ...

Get Efficient Linux at the Command Line now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.