Program: urlify
This program puts HTML links around URLs in files. It doesn’t work on all possible URLs, but does hit the most common ones. It tries hard to avoid including end-of-sentence punctuation in the marked-up URL.
It is a typical Perl filter, so it can be used by feeding it input:
% gunzip -c ~/mail/archive.gz | urlify > archive.urlified
or by supplying files on the command line:
% urlify ~/mail/*.inbox > ~/allmail.urlified
The program is shown in Example 6.13.
Example 6-13. urlify
#!/usr/bin/perl # urlify - wrap HTML links around URL-like constructs $urls = '(http|telnet|gopher|file|wais|ftp)'; $ltrs = '\w'; $gunk = '/#~:.?+=&%@!\-'; $punc = '.:?\-'; $any = "${ltrs}${gunk}${punc}"; while (<>) { s{ \b # start at word boundary ( # begin $1 { $urls : # need resource and a colon [$any] +? # followed by on or more # of any valid character, but # be conservative and take only # what you need to.... ) # end $1 } (?= # look-ahead non-consumptive assertion [$punc]* # either 0 or more punctuation [^$any] # followed by a non-url char | # or else $ # then end of the string ) }{<A HREF="$1">$1</A>}igox; print; }
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.