Four short links: 9 April 2014

By Nat Torkington
April 9, 2014

Jasper Project — an open source platform for developing always-on, voice-controlled applications. Shouting is the new swiping—I eagerly await Gartner touting the Internet-of-things-that-misunderstand-you. DeepBeliefSDK — deep neural network library for iOS. (via Pete Warden) Microsoft Spectrum Observatory — crowdsourcing spectrum …

Formulating Elixir

By Simon St. Laurent
March 28, 2014

I was delighted to sit down with Jose Valim, the creator of Elixir, earlier this month. He and Dave Thomas had just given a brave keynote exploring the barriers that keep people from taking advantage of Erlang’s many superpowers, challenging …

Four short links: 24 March 2014

By Nat Torkington
March 24, 2014

The Parable of Google Flu (PDF) — We explore two issues that contributed to [Google Flu Trends]’s mistakes—big data hubris and algorithm dynamics—and offer lessons for moving forward in the big data age. Overtrained and underfed? Duktape — a lightweight …

New Web Security Course Teaches Web Application Protection from Hackers

By Michael de Libero
March 18, 2014

Become a More Secure Programmer by Learning How to Find and Fix Security Bugs It seems like every few months we hear about a new data breach where millions of credit card numbers or passwords get into the hands of the bad guys. Why does this happen so often? It isn’t because the bad guys …

Crowdsourcing Feature discovery

By Ben Lorica
March 15, 2014

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to …

Four short links: 14 March 2014

By Nat Torkington
March 14, 2014

The Facebook experiment has failed. Let’s go back — Facebook gets worse the more you use it. The innovation within Facebook happens within a framework that’s taken as given. This essay questions that frame, well. Meet the People Making New …

An Invitation to Practical Machine Learning

By Ellen Friedman
March 3, 2014

Does it make sense for me to have a car? If so, which one is the best choice for my needs: a gasoline, hybrid, or electric?  And should I buy or lease? In order to make an effective decision, I …

Bridging the gap between research and implementation

By Ben Lorica
February 15, 2014

One of the most popular offerings at Strata Santa Clara was Hardcore Data Science day. Over the next few weeks we hope to profile some of the speakers who presented, and make the video of the talks available as a …

Four short links: 6 February 2014

By Nat Torkington
February 5, 2014

What Machines Can’t Do (NY Times) — In the 1950s, the bureaucracy was the computer. People were organized into technocratic systems in order to perform routinized information processing. But now the computer is the computer. The role of the human …

Business analysts want access to advanced analytics

By Ben Lorica
January 29, 2014

I talk with many new companies who build tools for business analysts and other non-technical users. These new tools streamline and simplify important data tasks including interactive analysis (e.g., pivot tables and cohort analysis), interactive visual analysis (as popularized by …

Four short links: 28 January 2014

By Nat Torkington
January 28, 2014

Intel On-Device Voice Recognition (Quartz) — interesting because the tension between client-side and server-side functionality is still alive and well. Features migrate from core to edge and back again as cycles, data, algorithms, and responsiveness expectations change. Meet Microsoft’s Personal …

Four short links: 22 January 2014

By Nat Torkington
January 22, 2014

How a Math Genius Hacked OkCupid to Find True Love (Wired) — if he doesn’t end up working for OK Cupid, productising this as a new service, something is wrong with the world. Humin: The App That Uses Context to …

The democratization of medical science

By Julie Steele
January 13, 2014

Vinod Khosla has stirred up some controversy in the healthcare community over the last several years by suggesting that computers might be able to provide better care than doctors. This includes remarks he made at Strata Rx in 2012, including that, …

Four short links: 10 January 2014

By Nat Torkington
January 10, 2014

Software in 2014 (Tim Bray) — a good state of the world, much of which I agree with. Client-side: Things are bad. You have to build everything three times: Web, iOS, Android. We’re talent-starved, this is egregious waste, and it’s …

Four short links: 9 January 2014

By Nat Torkington
January 9, 2014

Artificial Labour and Ubiquitous Interactive Machine Learning (Greg Borenstein) — in which design fiction, actual machine learning, legal discovery, and comics meet. One of the major themes to emerge in the 2H2K project is something we’ve taken to calling “artificial …

Four short links: 3 January 2014

By Nat Torkington
January 2, 2014

Commotion — open source mesh networks. WriteLaTeX — online collaborative LaTeX editor. No, really. This exists. In 2014. Distributed Systems — free book for download, goal is to bring together the ideas behind many of the more recent distributed systems …

Four short links: 30 December 2013

By Nat Torkington
December 30, 2013

tooldiag — a collection of methods for statistical pattern recognition. Implemented in C. Hacking MicroSD Cards (Bunnie Huang) — In my explorations of the electronics markets in China, I’ve seen shop keepers burning firmware on cards that “expand” the capacity …

Six reasons why I recommend scikit-learn

By Ben Lorica
December 28, 2013

I use a variety of tools for advanced analytics, most recently I’ve been using Spark (and MLlib), R, scikit-learn, and GraphLab. When I need to get something done quickly, I’ve been turning to scikit-learn for my first pass analysis. For …

Four short links: 27 December 2013

By Nat Torkington
December 27, 2013

Intel XDK — If you can write code in HTML5, CSS3 and JavaScript*, you can use the Intel® XDK to build an HTML5 web app or a hybrid app for all of the major app stores. It’s a .exe. What …

Four short links: 26 December 2013

By Nat Torkington
December 26, 2013

Nest Protect Teardown (Sparkfun) — initial teardown of another piece of domestic industrial Internet. Logs — The distributed log can be seen as the data structure which models the problem of consensus. Not kidding when he calls it “real-time data’s …

From Data Scientists to Marketers

By O'Reilly Strata
December 18, 2013

By Leland Wilkinson Big Data may seem like a familiar concept to those working in IT, but for most executives it’s difficult to imagine just how much Big Data impacts business on a daily basis. Most companies already collect customer …

Declare and It Happens

By Simon St. Laurent
December 18, 2013

Last week, I wrote about the need to make programming, at least much programming, more accessible. I was thinking in terms of business processes, so spreadsheets and flow-based programming sprang to mind. Today, though, Jeremy Keith reminds me that on …

Four short links: 10 December 2013

By Nat Torkington
December 10, 2013

ArangoDB — open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript extensions. Google’s Seven Robotics Companies (IEEE) — The seven companies are capable of creating …

Four short links: 6 December 2013

By Nat Torkington
December 6, 2013

Society of Mind — Marvin Minsky’s book now Creative-Commons licensed. Collaboration, Stars, and the Changing Organization of Science: Evidence from Evolutionary Biology — The concentration of research output is declining at the department level but increasing at the individual level. …

Four short links: 3 December 2013

By Nat Torkington
December 3, 2013

SAMOA — Yahoo!’s distributed streaming machine learning (ML) framework that contains a programming abstraction for distributed streaming ML algorithms. (via Introducing SAMOA) madlib — an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine-learning …

Four short links: 2 December 2013

By Nat Torkington
December 2, 2013

CalTech Machine Learning Video Library — a pile of video introductions to different machine learning concepts. Awesome Pokemon Hack — each inventory item has a number associated with it, they are kept at a particular memory location, and there’s a …

Four short links: 26 November 2013

By Nat Torkington
November 26, 2013

The Death and Life of Great Internet Cities — “The sense that you were given some space on the Internet, and allowed to do anything you wanted to in that space, it’s completely gone from these new social sites,” said …

Day-Long Immersions and Deep Dives at Strata Santa Clara 2014

By Ben Lorica
November 16, 2013

As the Program Development Director for Strata Santa Clara 2014, I am pleased to announce that the tutorial session descriptions are now live. We’re pleased to offer several day-long immersions including the popular Data Driven Business Day and Hardcore Data …

Four short links: 15 November 2013

By Nat Torkington
November 15, 2013

Google Wins Book Scanning Case (Giga Om) — will probably be appealed, though many authors will fear it’s good money after bad tilting at the fair use windmill. IBM Watson To Be A Platform (IBM) — press release indicates you’ll …

Four short links: 14 November 2013

By Nat Torkington
November 14, 2013

TPPA Trades Away Internet Freedoms (EFF) — commentary on the wikileaked text of the trade agreement. Deep Learning 101 — introduction to the machine learning trend of choice. Large Scale Rapid Prototyping Robots — an informal list of large rapid …

Simplifying interactive, realtime, and advanced analytics

By Ben Lorica
November 3, 2013

Here are a few observations based on conversations I had during the just concluded Strata NYC conference. Interactive query analysis on Hadoop remains a hot area A recent O’Reilly survey confirmed SQL is an important skill for data scientists. A …

Four short links: 31 October 2013

By Nat Torkington
October 31, 2013

Insect-Inspired Collision-Resistant Robot — clever hack to make it stable despite bouncing off things. The Battle for Power on the Internet (Bruce Schneier) — the state of cyberspace. [M]ost of the time, a new technology benefits the nimble first. [...] …

Four short links: 28 October 2013

By Nat Torkington
October 28, 2013

A Cyber Attack Against Israel Shut Down a Road — The hackers targeted the Tunnels’ camera system which put the roadway into an immediate lockdown mode, shutting it down for twenty minutes. The next day the attackers managed to break …

Four short links: 24 October 2013

By Nat Torkington
October 24, 2013

Visually Programming Arduino — good for little minds. Rapid Hardware Iteration at Scale (Forbes) — It’s part of the unique way that Xiaomi operates, closely analyzing the user feedback it gets on its smartphones and following the suggestions it likes …

Deep Learning oral traditions

By Ben Lorica
October 20, 2013

This past week I had the good fortune of attending two great talks1 on Deep Learning, given by Googlers Ilya Sutskever and Jeff Dean. Much of the excitement surrounding Deep Learning stems from impressive results in a variety of perception …

Four short links: 16 October 2013

By Nat Torkington
October 14, 2013

Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It (Jennifer Ouellette) — Yale University mathematician Ronald Coifman says that what is really needed is the big data equivalent of a Newtonian revolution, on …

Four short links: 14 October 2013

By Nat Torkington
October 14, 2013

An Interactive Machine Learning System for Recognizing Hand Gestures (Greg Borenstein) — a mixed-initiative interactive machine learning system for recognizing hand gestures. It attempts to give the user visibility into the classifier’s prediction confidence and control of the conditions under …

Stream Mining essentials

By Ben Lorica
October 13, 2013

A series of open source, distributed stream processing frameworks have become essential components in many big data technology stacks. Apache Storm remains the most popular, but promising new tools like Spark Streaming and Apache Samza are going to have their …

Semi-automatic method for grading a million homework assignments

By Ben Lorica
October 6, 2013

One of the hardest things about teaching a large class is grading exams and homework assignments. In my teaching days a “large class” was only in the few hundreds (still a challenge for the TAs and instructor). But in the …

Four short links: 4 October 2013

By Nat Torkington
October 4, 2013

Case and Molly, a Game Inspired by Neuromancer (Greg Borenstein) — On reading Neuromancer today, this dynamic feels all too familiar. We constantly navigate the tension between the physical and the digital in a state of continuous partial attention. We …

Four short links: 2 October 2013

By Nat Torkington
October 2, 2013

Instant Translator Glasses (ZDNet) — character recognition to do instant translating, and a UI that turns any flat surface into a touch-screen via a finger-ring sensor. — diagramming … In The Cloud! Airmail — Mac gmail client with offline …

Gaining access to the best machine-learning methods

By Ben Lorica
September 29, 2013

For companies in the early stages of grappling with big data, the analytic lifecycle (model building, deployment, maintenance) can be daunting. In earlier posts I highlighted some new tools that simplify aspects of the analytic lifecycle, including the early phases …

Four short links: 30 September 2013

By Nat Torkington
September 24, 2013

Steve Yegge on GROK (YouTube) — The Grok Project is an internal Google initiative to simplify the navigation and querying of very large program source repositories. We have designed and implemented a language-neutral, canonical representation for source code and compiler …

Four short links: 16 September 2013

By Nat Torkington
September 16, 2013

UAV Offers of Assistance in Colorado Rebuffed by FEMA — we were told by FEMA that anyone flying drones would be arrested. [...] Civil Air Patrol and private aircraft were authorized to fly over the small town tucked into the …

Four short links: 10 September 2013

By Nat Torkington
September 8, 2013

Sparkey — Spotify’s open-sourced simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts. The Truth of Fact, The Truth of Feeling (Ted Chiang) — story about what happens when lifelogs become searchable. Now with Remem, finding …

Data analysis tools target non-experts

By Ben Lorica
August 25, 2013

A new set of tools make it easier to do a variety of data analysis tasks. Some require no programming, while other tools make it easier to combine code, visuals, and text in the same workflow. They enable users who …

Four short links: 23 August 2013

By Nat Torkington
August 23, 2013

Bradley Manning and the Two Americas (Quinn Norton) — The first America built the Internet, but the second America moved onto it. And they both think they own the place now. The best explanation you’ll find for wtf is going …

Four short links: 20 August 2013

By Nat Torkington
August 20, 2013 — attempt to crowdsource rankings for tutorials for important products, so you’re not picking your way through Google search results littered with tutorials written by incompetent illiterates for past versions of the software. BBC Forum — American social psychologist …

So, You Want to Run a Young Coders Class?

By Katie Cunningham
August 16, 2013

Ever since PyCon 2013, the interest in the Young Coders class has been intensifying. Practically every Python conference since then has asked about doing one, and several have run their own. Classes outside of conferences have sprung up, as well, …

One-click analysis: Detecting and visualizing insights automatically

By Andy Oram
August 13, 2013

The importance of visualizing data is universally recognized. But, usually the data is passive input to some visualization tool and the users have to specify the precise graph they want to visualize. BeyondCore simplifies this process by automatically evaluating millions …

