Parsing HTML with Perl

By A. Sinan Unur
February 6, 2014

The need to extract interesting bits of an HTML document comes up often enough that by now we have all seen many ways of doing it wrong and some ways of doing it right for some values of “right”. One …

Augmenting Unstructured Data

By Jesse Anderson
July 17, 2013

Our world is filled with unstructured data. By some estimates, it’s as high as 80% of all data. Unstructured data is data that isn’t in a specific format. It isn’t separated by a delimiter that you could split on and …

Four short links: 2 August 2010

By Nat Torkington
August 2, 2010

Hidden Features of Google (StackExchange) -- rather than Google's list of search features, here are the features that real (sophisticated) users find useful. My new favourite: the ~ operator for approximate searching. (via Hacker News) Natural Language Parsing for the Web -- JSON API to the Stanford Natural Language Parser. I wonder why the API to the library isn't...

Syntax coloring utility

By Kyle Dent
April 19, 2010

Syntax coloring utility works for several different programming languages.

Jotting on parsers for SGML-family document languages: SGML, HTML, XML #2 - Stateless semicoroutines may be convenient

By Rick Jelliffe
September 5, 2009

Here is Melvin Conway's foundation point from his 1963 paper defining coroutines: "That property of the design which makes it amenable to many segment configurations is its separability."

