Credit: Gisle Aas
urllib.urlopen
returns a file-like object, and you can call read
on it:
from urllib import urlopen doc = urlopen("http://www.python.org").read( ) print doc
Once you obtain a file-like object from urlopen
,
you can read it all at once into one big string by calling its
read
method, as I do in this recipe.
Alternatively, you can read it as a list of lines by calling its
readlines
method or, for special purposes, just
get one line at a time by calling its readline
method in a loop. In addition to these file-like operations, the
object that urlopen
returns offers a few other
useful features. For example, the following snippet gives you the
headers of the document:
doc = urlopen("http://www.python.org") print doc.info( )
such as the Content-Type
: header
(text/html
in this case) that defines the MIME
type of the document. doc.info
returns a
mimetools.Message
instance, so you can access it
in various ways without printing it or otherwise transforming it into
a string. For example, doc.info( ).getheader('Content-Type')
returns the
'text/html'
string. The
maintype
attribute of the
mimetools.Message
object is the
'text'
string, subtype
is the
'html'
string, and type
is also
the 'text/html'
string. If you need to perform
sophisticated analysis and processing, all the tools you need are
right there. At the same time, if your needs are simpler, you can
meet them in very simple ways, as this recipe shows.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.