... thousands of other wikis. Mediawiki allows users to upload images for display. Most users follow a manual process via web forms: click Choose File to bring up a file dialog, navigate to an image file and select it, add a descriptive comment in the form, and click Upload. Wiki administrators use a more automated method: a script that reads a whole directory and uploads its images. Each image file (say, bald_eagle.jpg) is paired with a text file (bald_eagle.txt) containing a descriptive comment about the image.
Imagine that you’re faced with a huge directory containing only JPEG and TXT files. You want to confirm that every image file has a matching text file and vice versa. Here’s a smaller version of that directory:
$ ls bald_eagle.jpg blue_jay.jpg cardinal.txt robin.jpg wren.jpg bald_eagle.txt cardinal.jpg oriole.txt robin.txt wren.txt
Let’s develop two different solutions to identify any unmatched
files. For the first solution, create two lists, one for the JPEG
files and one for the text files, and use cut
to strip off their
file extensions .txt and .jpg:
$ ls *.jpg | cut -d. -f1 bald_eagle blue_jay cardinal robin wren $ ls *.txt | cut -d. -f1 bald_eagle cardinal oriole robin wren
Then compare the lists with diff
using process substitution:
$ diff <(ls *.jpg | cut -d. -f1) <(ls *.txt | cut -d. -f1) 2d1 < blue_jay 3a3 > oriole
You could stop here, because the output indicates that the first list has an extra blue_jay (implying blue_jay.jpg) and the second list has an extra oriole (implying ...
Get Efficient Linux at the Command Line now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.