O'Reilly Hacks
oreilly.comO'Reilly NetworkSafari BookshelfConferences Sign In/My Account | View Cart   
Book List Learning Lab PDFs O'Reilly Gear Newsletters Press Room Jobs  


 
Buy the book!
Spidering Hacks
By Morbus Iff, Tara Calishain
October 2003
More Info

HACK
#41
Downloading MP3s from a Playlist
Automatically save the MP3 files that make up an M3U playlist
The Code
[Discuss (4) | Link to this hack]

Most MP3 players support an .m3u file: a plain text file that contains the MP3 filenames and locations that should be played, in a specific order (or randomized from within the player). These M3U files are slightly different from .pls files, which are typically used for streaming radio. A sample M3U is shown here:

VA-01-Lord_of_the_Rings_OST-The_Prophecy-GREY.mp3
VA-02-Lord_of_the_Rings_OST-Concerning_Hobbits-GREY.mp3
VA-03-Lord_of_the_Rings_OST-The_Shadow_Of_The_Past-GREY.mp3
VA-04-Lord_of_the_Rings_OST-The_Treason_Of_Isengard-GREY.mp3
VA-05-Lord_of_the_Rings_OST-The_Black_Rider-GREY.mp3
VA-06-Lord_of_the_Rings_OST-At_The_Sign_Of_The_Prancing-GREY.mp3
VA-07-Lord_of_the_Rings_OST-A_Knife_In_The_Dark-GREY.mp3

This isn't too exciting. However, M3U files can also be filled with MP3 URLs, which is often the case when a user has put her collection online and is streaming them personally or to anyone who'll listen. M3U files have also become the default for listing software, such as Andromeda (http://www.turnstyle.com/andromeda/) or Apache::MP3 (http://www.apachemp3.com. shows an Andromeda listing from DaylightStation.com.

Figure 1. Andromeda listing from DaylightStation.com

Here's a sample M3U with URLs:

http://example.com/awesome_album/track1.mp3
http://example.com/awesome_album/track2.mp3
http://example.com/awesome_album/track3.mp3
...
http://example.com/awesome_album/track6.mp3

This hack will read M3U files and smartly download the MP3 URLs referenced within—smartly in the sense that it will create prettier filenames, as opposed to heavily URL-encoded values like my%20greatest%20hits; understand parent folders, which are often the names of the albums; and take care not to download tracks it already has, without skipping tracks that it might have but which are incomplete. It will also give progress readouts as it's downloading.

A script like this has a major advantage over a utility like wget. Since it keeps the same hierarchy represented in the URL, the script has greater control over keeping files well organized, as opposed to wget's input command , which would download the URLs individually.

Running the Hack

To set the script going, invoke it on the command line with a local M3U file:

% perl leechm3u.pl m3ufile.m3u

This is all fine and dandy, but where are we to find these magical M3U files floating around on the Net? There's the rub: they're not premade; you make them yourself. Utilities like Andromeda and Apache::MP3, mentioned earlier in this hack, create an automated list of a user's MP3s. To listen to them, you pick and choose what you want via checkboxes, click the Play button, and whoop, get served an M3U file. Download that M3U playlist to disk instead of letting it queue up in your player, and you now have a local copy to pass to leechm3u.pl. You can even pass more than one M3U file at a time to the script.

The next issue is finding sites that run these listing utilities. Oftentimes, if you know the name of the utility, you can find a unique string that will identify it in a Google search. Take Andromeda, for instance. Anytime someone installs it, all the generated files have "Powered by Andromeda" in the footer. By doing a search for that quoted term, you can find a number of entries. Then, it just becomes a matter of queuing up what you want. Since more and more people are releasing their own MP3s this way, you're sure to get a decent number of matches for music you've never heard of, but should.

Here is a collected list of some "software signatures" for your perusal:

Andromeda (http://www.turnstyle.com/andromeda/)

"powered by andromeda" or "search andromeda" "play all"

Apache::MP3 (http://www.apachemp3.com/)

"apache::mp3 was written"

Edna (http://edna.sourceforge.net/)

"powered by edna"

GNUMP3d (http://gnump3d.sourceforge.net/)

intitle:"GNUMP3d" subdirectories

The following signatures aren't applicable to the script in this hack (because they generate either ugly M3U files or something entirely different), but aspiring hackers could probably modify this hack to work with them:

Ampache (http://www.ampache.org)

"welcome to ampache v"

Dynamic MP3 Lister (http://freshmeat.net/projects/dmp3lister/)

"dynamic mp3 lister - listing mp3s in"

The Code

Save the following code as a script called leechm3u.pl:

#!/usr/bin/perl -w
#
# LeechM3U - save mp3s listed in an .m3u file, smartly.
# Part of the Leecharoo suite - for all those hard to leech places.
# http://disobey.com/d/code/ or contact morbus@disobey.com.
#
# This code is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#

use strict; $|++;
my $VERSION = "1.0";
use File::Spec::Functions;

# make sure we have the modules we need, else die peacefully.
eval("use LWP 5.6.9;"); die "[err] LWP 5.6.9 or greater required.\n" if $@;
eval("use URI::Escape;"); die "[err] URI::Escape is not installed.\n" if $@;

my $dir = "mp3s";  # save downloads to...?
mkdir $dir;        # make sure that dir exists.
my $mp3_data;      # final holder of our MP3.
my $total_size;    # total size of the MP3.

# loop through each M3U file.
foreach my $file (@ARGV) {

    # open the passed M3U file or move onto the next.
    open(URLS, "<$file") or print "[err] Could not open $file: $!\n";

    # for each line.
    while (<URLS>) {
        next if /^#/;       # skip if it's a comment.
        chomp;              # remove trailing newline.
        my $url = $_;       # more semantic, yes?

        # split the URL into parts, defined by the "/" delimiter
        # in the URL. we'll use this to determine the name of
        # the file, as well as its parent directory. in most
        # cases, the parent directory is the album name.
        my @parts = split(/\//, $url);

        # properly encoded URLs are decimal encoded, with %20
        # representing a space, etc. without conversion, our
        # files would be named like that. we clean these up.
        foreach (@parts) { $_ = uri_unescape($_); }

        # take the second-to-last part, which is the parent
        # directory of our file. we're assuming an album name.
        my $album_dir = $parts[$#parts-1];

        # create an OS-specific path to our album and file.
        my $album_path = catdir($dir, $album_dir);
        my $file_name = $parts[$#parts]; # prettier.
        my $file_path = catfile($album_path, $file_name);
        mkdir $album_path; # to prepare for dumping.

        # get the size of the MP3 for our progress bar.
        # some sites block Perl User-Agents, so we fakir.
        print "Downloading \"$file_path\"...\n";
        my $ua = LWP::UserAgent->new(agent => &return;
            'Mozilla/4.76 [en] (Win98; U)');
        $total_size = $ua->head($url)->headers->content_length;

        # only download the file if it hasn't been before.
        if (-e $file_path and (stat($file_path))[7] == $total_size) {
           print " Skipping - this file has already been downloaded.\n";
           next;
        }

        # download the file with a callback for progress.
        $ua->get($url, ':content_cb' => \&callback);

        # with the data downloaded into $mp3_data with our
        # callback, save that information to our $file_path.
        # (note: bad grammar so word wrapping won't happen)
        open (MP3, ">$file_path") or die "[err] Can't save: $!\n";
        print MP3 $mp3_data; close(MP3); $mp3_data = undef;
    }

    # next file!
    close(URLS);
}

# per chunk.
sub callback {
   my ($data, $response, $protocol) = @_;
   $mp3_data .= $data; # append to existing data.
   print progress_bar( length($mp3_data), $total_size, 25, '=' );
}

# wget-style. routine by tachyon
# at http://tachyon.perlmonk.org/
sub progress_bar {
    my ( $got, $total, $width, $char ) = @_;
    $width ||= 25; $char ||= '=';
    my $num_width = length $total;
    sprintf "|%-${width}s| Got %${num_width}s bytes of %s (%.2f%%)\r", 
        $char x (($width-1)*$got/$total). '>', 
        $got, $total, 100*$got/+$total;
}


O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website: | Customer Service: | Book issues:

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.