Blogs

BROWSE: Most Recent | Popular Tags |

Tags > mapreduce

NoSQL Choices: To Misfit or Cargo Cult?

By Eric Redmond
July 29, 2013

Retreading old topics can be a powerful source of epiphany, sometimes more so than simple extra-box thinking. I was a computer science student, of course I knew statistics. But my recent years as a NoSQL (or better stated: distributed systems) …

Why Choose a Graph Database

By Michael Hunger
July 23, 2013

By this time, chances are very likely that you’ve heard of NoSQL, and of graph databases like Neo4j. NoSQL databases address important challenges that we face today, in terms of data size and data complexity. They offer a valuable solution …

Get Hadoop, Hive, and HBase Up and Running in Less Than 15 Minutes

By Mark Grover
July 19, 2013

If you have delved into Apache Hadoop and related projects, you know that installing and configuring Hadoop is hard. Often, a minor mistake during installation or configuration with messy tarballs will lurk for a long time until some otherwise innocuous …

Moving from Batch to Continuous Computing at Yahoo!

By Ben Lorica
June 29, 2013

My favorite session at the recent Hadoop Summit was a keynote by Bruno Fernandez-Ruiz, Senior Fellow & VP Platforms at Yahoo! He gave a nice overview of their analytic and data processing stack, and shared some interesting factoids about the …

Four short links: 18 January 2012

By Nat Torkington
January 18, 2012

Many Core Processors -- not the first time I've heard nondeterministic computing discussed as a solution to some of our parallel-programming travails. Can't imagine what a pleasure it is to debug. Pinterest Cloned -- it's not the pilfering of the idea that offends my sensibilities, it's the blatant clone of every aspect of the UI. I never thought much...

Four short links: 6 December 2011

By Nat Torkington
December 6, 2011

How to Dispel Your Illusions (NY Review of Books) -- Freeman Dyson writing about Daniel Kahneman's latest book. Only by understanding our cognitive illusions can we hope to transcend them. Appify-UI (github) -- Create the simplest possible Mac OS X apps. Uses HTML5 for the UI. Supports scripting with anything and everything. (via Hacker News) Translation Memory (Etsy) --...

Strata Week: Simplifying MapReduce through Java

By Audrey Watters
October 13, 2011

Cloudera's Crunch hopes to make MapReduce easier, Datafiniti launches a search engine for data, and the University of Oxford releases an Android app for monitoring CERN data.

Strata Week: Simplifying MapReduce through Java

Strata Week: Simplifying MapReduce through Java
By Audrey Watters
October 13, 2011

Cloudera's Crunch hopes to make MapReduce easier, Datafiniti launches a search engine for data, and the University of Oxford releases an Android app for monitoring CERN data.

Strata Week: MapReduce gets its arms around a million songs

By Audrey Watters
September 8, 2011

This week's data stories include a guide to using MapReduce to process the Million Song Dataset, a story about how GPS data can help reconstruct lost memories (and accidents), and evidence that emergency crowdsourcing goes back further than many realize.

Four short links: 25 July 2011

By Nat Torkington
July 25, 2011

Anonymity in Bitcoin -- TL;DR: Bitcoin is not inherently anonymous. It may be possible to conduct transactions is such a way so as to obscure your identity, but, in many cases, users and their transactions can be identified. We have performed an analysis of anonymity in the Bitcoin system and published our results in a preprint on arXiv. (via...

Four short links: 23 June 2011

By Nat Torkington
June 23, 2011

The Wisdom of Communities -- Luke Wroblewski's notes from Derek Powazek's talk at Event Apart. Wisdom of Crowds theory shows that, in aggregate, crowds are smarter than any single individual in the crowd. See this online in most emailed features, bit torrent, etc. Wise crowds are built on a few key characteristics: diversity (of opinion), independence (of other ideas),...

Hadoop: What it is, how it works, and what it can do

Hadoop: What it is, how it works, and what it can do
By James Turner
January 12, 2011

Hadoop gets a lot of buzz in database circles, but some folks are still hazy about what it is and how it works. In this interview, Cloudera CEO and Strata speaker Mike Olson discusses Hadoop's background and its current utility.

Big data faster: A conversation with Bradford Stephens

Big data faster: A conversation with Bradford Stephens
By David Sims
January 6, 2011

Bradford Stephens, founder of of Drawn to Scale, discusses big data systems that work in "user time."

Strata Week: Grabbing a slice

Strata Week: Grabbing a slice
By Julie Steele
September 23, 2010

In this edition of Strata Week: The 2,000,000,000,000,000th digit of pi is calculated with an assist from Hadoop and MapReduce; a new technique uses iPads to extrude light paintings across a long exposure shot; Historypin links historical photos to Google Street View shots; and this is the last week for Strata Conference proposal submissions.

The SMAQ stack for big data

The SMAQ stack for big data
By Edd Dumbill
September 22, 2010

We're at the beginning of a revolution in data-driven products and services, driven by a software stack that enables big data processing on commodity hardware. Learn about the SMAQ stack, and where today's big data tools fit in.

Strata Week: The challenge of real-time analytics

Strata Week: The challenge of real-time analytics
By Edd Dumbill
September 16, 2010

In the latest edition of Strata Week: Google's introduction of a new search-indexing system highlights an important limitation of MapReduce and Hadoop. Can MapReduce adapt to real-time needs or will others follow Google in creating new architectures for real-time analytics?

Pipelining and Real-time Analytics with MapReduce Online

By Ben Lorica
October 20, 2009

Most of the news related to the real-time web these days centers around the adoption of decentralized, push-oriented† protocols (pubsubhubbub, rsscloud) designed to reduce latency in web publishing. Less discussed are the analytic tools that can are capable of crunching through data in real-time. As more of the web moves towards these types of publishing tools, data-driven organizations will demand...

[AWS:ElasticMapReduce] Google-sized Parallel Computing on a You-sized Budget

By M. David Peterson
April 2, 2009

@ http://aws.amazon.com/elasticmapreduce/ you'll find an interesting new entry into Amazon's utility-based web service offerings: Elastic MapReduce.

Big Data: Technologies and Techniques for Large-Scale Data

By Ben Lorica
March 23, 2009

Our belief that proficiency in managing and analyzing large amounts of data distinguishes market leading companies, led to a recent report designed to help users understand the different large-scale data management techniques. Our report on Big Data Technologies was the result of interviews with over thirty experts, including research scientists, (open-source) hackers, vendors, data analysts, and entrepreneurs. Rather than endorse...


1 to 19 of 19
The Watering Hole