Although we could get started with an extended discussion of specific social networking APIs, schemaless design, or many other things, let’s instead dive right into some introductory examples that illustrate how simple it can be to collect and analyze some social web data. This chapter is a drive-by tutorial that aims to motivate you and get you thinking about some of the issues that the rest of the book revisits in greater detail. We’ll start off by getting our development environment ready and then quickly move on to collecting and analyzing some Twitter data.
The example code in this book is written in Python, so if you
already have a recent version of Python and easy_install
on your system, you obviously know
your way around and should probably skip the remainder of this section. If
you don’t already have Python installed, the bad news is that you’re
probably not already a Python hacker. But don’t worry, because you will be
soon; Python has a way of doing that to people because it is easy to pick
up and learn as you go along. Users of all platforms can find instructions
for downloading and installing Python at http://www.python.org/download/,
but it is highly recommended that Windows users install ActivePython, which
automatically adds Python to your path at the Windows Command Prompt
(henceforth referred to as a “terminal”) and comes with
easy_install
, which we’ll discuss in just a moment. The
examples in this book were authored in and tested against the latest
Python 2.7 branch, but they should also work fine with other relatively
up-to-date versions of Python. At the time this book was written, Python Version 2 is
still the status quo in the Python community, and it is
recommended that you stick with it unless you are confident that all of
the dependencies you’ll need have been ported to Version 3, and you are
willing to debug any idiosyncrasies involved in the switch.
Once Python is installed, you should be able to type
python
in a terminal to spawn an interpreter. Try following
along with Example 1-1.
Example 1-1. Your very first Python interpreter session
>>>print "Hello World"
Hello World >>>#this is a comment
... >>>for i in range(0,10): # a loop
...print i, # the comma suppresses line breaks
... 0 1 2 3 4 5 6 7 8 9 >>>numbers = [ i for i in range(0,10) ] # a list comprehension
>>> print numbers [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>>if 10 in numbers: # conditional logic
...print True
...else:
...print False
... False
One other tool you’ll want to have on hand is easy_install
,[5] which is similar to a package manager on Linux systems; it
allows you to effortlessly install Python packages instead of downloading,
building, and installing them from source. You can download the latest
version of easy_install
from http://pypi.python.org/pypi/setuptools, where there are
specific instructions for each platform. Generally speaking, *nix users
will want to sudo easy_install
so that
modules are written to Python’s global installation directories. It is
assumed that Windows users have taken the advice to use ActivePython,
which automatically includes easy_install
as part of its
installation.
Note
Windows users might also benefit from reviewing the blog post
“Installing easy_install…could be
easier”, which discusses some common problems related to
compiling C code that you may encounter when running
easy_install
.
Once you have properly configured easy_install
, you
should be able to run the following command to install NetworkX—a package
we’ll use throughout the book for building and analyzing graphs—and
observe similar output:
$ easy_install networkx
Searching for networkx
...truncated output...
Finished processing dependencies for networkx
With NetworkX installed, you might think that you could just import it from the interpreter and get right to work, but occasionally some packages might surprise you. For example, suppose this were to happen:
>>> import networkx
Traceback (most recent call last):
... truncated output ...
ImportError: No module named numpy
Whenever an ImportError
happens, it means there’s a
missing package. In this illustration, the module we installed,
networkx
, has an unsatisfied dependency called numpy
, a highly optimized collection of tools for scientific computing.
Usually, another invocation of easy_install
fixes the problem, and this
situation is no different. Just close your interpreter and install the
dependency by typing easy_install numpy
in the terminal:
$ easy_install numpy
Searching for numpy
...truncated output...
Finished processing dependencies for numpy
Now that numpy
is installed, you
should be able to open up a new interpreter, import networkx
, and use it to build up graphs.
Example 1-2 demonstrates.
Example 1-2. Using NetworkX to create a graph of nodes and edges
>>>import networkx
>>>g=networkx.Graph()
>>>g.add_edge(1,2)
>>>g.add_node("spam")
>>>print g.nodes()
[1, 2, 'spam'] >>>print g.edges()
[(1, 2)]
At this point, you have some of your core Python development tools installed and are ready to move on to some more interesting tasks. If most of the content in this section has been a learning experience for you, it would be worthwhile to review the official Python tutorial online before proceeding further.
[5] Although the examples in this book use the well-known easy_install
, the Python community has
slowly been gravitating toward pip
, another build
tool you should be aware of and that generally “just works” with any
package that can be easy_install
’d.
If you have git tooling already installed, pip
is also
handy for installing directly from GitHub repositories for packages
that aren't available through PyPi as illustrated in Exploring the Graph API one connection at a time.
Get Mining the Social Web now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.