
|
|
|
Word Associations with Lexical Freenet
There will come a time when you want a little
more than simple word definitions, synonyms, or etymologies. Lexical
Freenet takes you beyond these simple results, providing associative
data, or "paths," from your word to
others
The Code
[Discuss (0) | Link to this hack] |
Lexical Freenet (http://www.lexfn.com) allows you to search
for word relationships like puns, rhymes, concepts, relevant people,
antonyms, and so much more. For example, a simple search for the word
disease returns a long listing of word
paths, each associated with other words by
different types of connecting arrows: disease
triggers both aids and cancer;
comprises triggers symptoms;
and bio triggers such relevant persons as
janet elaine adkins, james
parkinson, alois alzheimer, and so on.
This is but a small sampling of the available and verbose output.
In combination with Super Word
Lookup" ,
a command-line utility of the Lexical Freenet functionality would
bring immense lookup capabilities to writers, librarians, and
researchers. This hack shows you how to create
said interface, with the ability to customize which relationships
you'd like to see, as well as turn the visual
connections into text.
Running the Hack
As you can see from the code, the hack has several switches available
for you to decide which kind of word results you want. In this case,
we'll run a search for everything related to
disease:
% perl lexfn.pl -x disease
disease triggers aids
disease triggers cancer
disease triggers patients
disease triggers virus
disease triggers doctor
...
disease is more general than blood disorder
disease is more general than boutonneuse fever
disease is more general than cat scratch disease
...
disease rhymes with breeze
disease rhymes with briese
disease rhymes with cheese
disease rhymes with crees
...
Or perhaps a person's name is more to your liking:
% perl lexfn.pl -bdonT "lee harvey oswald"
lee harvey oswald was born in 1939
lee harvey oswald died in 1963
lee harvey oswald has the nationality american
lee harvey oswald has the occupation assassin
lee harvey oswald triggers 1956-1959
lee harvey oswald triggers 1959
lee harvey oswald triggers 1962
lee harvey oswald triggers attempted
lee harvey oswald triggers become
lee harvey oswald triggers book
lee harvey oswald triggers citizen
lee harvey oswald triggers communist
...
—Richard Rose
The CodeSave the following code as lexfn.pl: #!/usr/bin/perl-w
#
# Hack to query and report from www.lexfn.com
#
# This code is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.
#
# by rik - ora@rikrose.net
#
######################
# support stage #
######################
use strict;
use Getopt::Std qw(getopts);
use LWP::Simple qw(get);
use URI::Escape qw(uri_escape uri_unescape);
use HTML::TokeParser;
sub usage ( ) { print "
usage: lexfn [options] word1 [word2]
options available:
-s Synonymous -a Antonym -b Birth Year
-t Triggers -r Rhymes -d Death Year
-g Generalizes -l Sounds like -T Bio Triggers
-S Specialises -A Anagram of -k Also Known As
-c Comprises -o Occupation of
-p Part of -n Nationality
or -x for all
word1 is mandatory, but some searches require word2\n\n"
}
######################
# parse stage #
######################
# grab arguments, and put them into %args hash, leaving nonarguments
# in @ARGV for us to process later (where word1 and word2 would be)
# if we don't have at least one argument, we die with our usage.
my %args; getopts('stgScparlAonbdTkx', \%args);
if (@ARGV > 2 || @ARGV == 0) { usage( ); exit 0; }
# turn both our words into queries.
$ARGV[0] =~ s/ /\+/g; $ARGV[1] ||= "";
if ($ARGV[1]) { $ARGV[1] =~ s/ /\+/g; }
# begin our URL construction with the keywords.
my $URL = "http://www.lexfn.com/l/lexfn-cuff.cgi?sWord=$ARGV[0]".
"&tWord=$ARGV[1]&query=show&maxReach=2";
# now, let's figure out our command-line arguments. each
# argument is associated with a relevant search at LexFN,
# so we'll first create a mapping to and fro.
my %keynames = (
s => 'ASYN', t => 'ATRG', g => 'AGEN', S => 'ASPC', c => 'ACOM',
p => 'APAR', a => 'AANT', r => 'ARHY', l => 'ASIM', A => 'AANA',
o => 'ABOX', n => 'ABNX', b => 'ABBX', d => 'ABDX', T => 'ABTR',
k => 'ABAK'
);
# if we want everything all matches
# then add them to our arguments hash,
# in preparation for our URL.
if (defined($args{'x'}) && $args{'x'} == 1) {
foreach my $arg (qw/s t g l S c p a r l A o n b d T k/){
$args{$arg} = 1; # in preparation for URL.
} delete $args{'x'}; # x means nothing to LexFN.
}
# build the URL from the flags we want.
foreach my $arg (keys %args) { $URL .= '&' . $keynames{$arg} . '=on'; }
######################
# request stage #
######################
# and download it all for parsing.
my $content = get($URL) or die $!;
######################
# extract stage #
######################
# with the data sucked down, pass it off to the parser.
my $stream = HTML::TokeParser->new( \$content ) or die $!;
# skip the form on the page, then it's the first <b>
# after the form that we start extracting data from
my $tag = $stream->get_tag("/form");
while ($tag = $stream->get_tag("b")) {
print $stream->get_trimmed_text("/b") . " ";
$tag = $stream->get_tag("img");
print $tag->[1]{alt} . " ";
$tag = $stream->get_tag("a");
print $stream->get_trimmed_text("/a") . "\n";
}
exit 0;
The code is split into four basic stages: - Support code
Such as includes and any subroutines you will need - The parsing stage
Where we work out what the user actually wants and build a URL to
perform the request - The request stage itself
Where we retrieve the results - The extract stage
Where we recover the data
In this case, the Lexical Freenet site is basic enough that the
request is a single URL. A typical Freenet URL looks something like
this: http://www.lexfn.com/l/lexfn-cuff.cgi?fromresub=on&
ASYN=on&ATRG=on&AGEN=on&ASPC=on&ACOM=on&APAR=on&AANT=on&
ARHY=on&ASIM=on&AANA=on&ABOX=on&ABNX=on&ABBX=on&ABDX=on&
ABTR=on&ABAK=on&sWord=lee+harvey+oswald&tWord=disobey&query=SHOW
The data we wish to extract is formed by repeatedly pulling the
information from a standard and repetitive chunk of HTML in the
search results. This allows us to use the simple
HTML::TokeParser module to retrieve chunks of
data easily by parsing the HTML tags, allowing us to query their
attributes and retrieve the surrounding text. As you can tell from
the previous code, this is not too difficult.
|
O'Reilly Home | Privacy Policy

© 2007 O'Reilly Media, Inc.
Website:
| Customer Service:
| Book issues:
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
|
|