Blogs

BROWSE: Most Recent | Popular Tags |

Tags > big data

Four short links: 17 June 2013

By Nat Torkington
June 17, 2013

Weekend Reads on Deep Learning (Alex Dong) — an article and two videos unpacking “deep learning” such as multilayer neural networks. The Internet of Actual Things — “I have 10 reliable activations remaining,” your bulb will report via some ridiculous …

Four short links: 12 June 2013

By Nat Torkington
June 12, 2013

geogit — opengeo project exploring the use of distributed management of spatial data. [...] adapts [git's] core concepts to handle versioning of geospatial data. Shapefiles, PostGIS or SpatiaLite data stored in a change-tracking repository, with all the fun gut features …

Four short links: 11 June 2013

By Nat Torkington
June 11, 2013

For Example — amazing discussion of 3D visualization techniques, full of examples using the D3.js library and bl.ocks.org example gist system. Gorgeous and informative. Anti-Gravity 3D Printer — uses strands to sculpt on any surface. (via Slashdot) How 3D Printing …

Four short links: 7 June 2013

By Nat Torkington
June 7, 2013

Accumulo — NSA’s BigTable implementation, released as an Apache project. How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting …

Big data vs. big reality

By Mike Barlow
June 5, 2013

This post originally appeared on Cumulus Partners. It’s republished with permission. Quentin Hardy’s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every …

Patients matter most, but technology matters a lot

By Andy Oram
June 4, 2013

Computing practices that used to be religated to experimental outposts are now taking up residence at the center of the health care field. From natural language processing to machine learning to predictive modeling, you see people promising at the health …

Four short links: 4 June 2013

By Nat Torkington
June 4, 2013

WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica) sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a …

Big data vs. big reality

By Mike Barlow
June 3, 2013

This post originally appeared on Cumulus Partners. It’s republished with permission. Quentin Hardy’s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every …

Understanding skepticism

By Mike Loukides
May 31, 2013

I’d like to correct the impression, given by Derrick Harris on GigaOm, that I’m part of a backlash against “big data.” I’m not skeptical about data or the power of data, but you don’t have to look very far or …

Strata Week: Can your passwords stand up to a cracker?

By Jenn Webb
May 31, 2013

Companies, developers need to do more to increase password security Google urged users this week to take more care in creating passwords. In a post on the Google Blog, Google Software Engineer Diana Smetters offered some guidelines, including using a …

How signals, geometry, and topology are influencing data science

By Ben Lorica
May 24, 2013

I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so …

Looking ahead to a world of data-dominated decisions

By Andy Oram
May 21, 2013

Measuring a world-shaking trend with feet planted in every area of human endeavor cannot be achieved in a popular book of 200 pages, but one has to start somewhere. I am happy to recommend the adept efforts of Viktor Mayer-Schönberger …

Improving options for unlocking your graph data

By Ben Lorica
May 19, 2013

The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open …

Google I/O, Big Data Adolescence, Visualization, and the Future of Open Source

By Adam Flaherty
May 17, 2013

Google I/O: O’Reilly Editor Rachel Roumeliotis reports from the conference floor. Big Data, Cool Kids: Fumbling toward the adolescence of big data tools. Code as Art: Interactive Data Visualization for the Web author Scott Murray on becoming a code artist. …

Six disruptive possibilities from big data

By Jeff Needham
May 15, 2013

My new book, Disruptive Possibilities: How Big Data Changes Everything, is derived directly from my experience as a performance and platform architect in the old enterprise world and the new, Internet-scale world. I pre-date the Hadoop crew at Yahoo!, but …

Big data, cool kids

By Edd Dumbill
May 13, 2013

The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity. These child prodigies of the data scene show great promise …

Genomics and Privacy at the Crossroads

By James Turner
May 13, 2013

Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might …

Four short links: 10 May 2013

By Nat Torkington
May 10, 2013

The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. …

Genomics and Privacy at the Crossroads

By James Turner
May 9, 2013

Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might …

Steering the ship that is data science

By Q Ethan McCallum
May 7, 2013

Mike Loukides recently recapped a conversation we’d had about leading indicators for data science efforts in an organization. We also pondered where the role of data scientist is headed and realized we could treat software development as a prototype case. …

Tachyon: An open source, distributed, fault-tolerant, in-memory file system

By Ben Lorica
April 28, 2013

In earlier posts I’ve written about how Spark and Shark run much faster than Hadoop and Hive by1 caching data sets in-memory. But suppose one wants to share datasets across jobs/frameworks, while retaining speed gains garnered by being in-memory? An …

Four short links: 23 April 2013

By Nat Torkington
April 23, 2013

Drawscript — Processing for Illustrator. (via BERG London) Archive Team Warrior — a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive. (via Ed Vielmetti) …

Four short links: 19 April 2013

By Nat Torkington
April 19, 2013

Bruce Sterling on Disruption — If more computation, and more networking, was going to make the world prosperous, we’d be living in a prosperous world. And we’re not. Obviously we’re living in a Depression. Slow first 25% but then it …

Four short links: 18 April 2013

By Nat Torkington
April 18, 2013

The Well Deserved Fortune of Satoshi Nakamoto — I can’t assure with 100% certainty that the all the black dots are owned by Satoshi, but almost all are owned by a single entity, and that entity began mining right from …

Four short links: 16 April 2013

By Nat Torkington
April 16, 2013

Triage — iPhone app to quickly triage your email in your downtime. See also the backstory. Awesome UI. Webcam Pulse Detector — I was wondering how long it would take someone to do the Eulerian video magnification in real code. …

Single server systems can tackle big data

By Ben Lorica
April 13, 2013

About a year ago a blog post from SAP posited1 that when it comes to analytics, most companies are in the multi-terabyte range: data sizes that are well-within the scope of distributed in-memory solutions like Spark, SAP HANA, ScaleOut Software, …

Four short links: 12 April 2013

By Nat Torkington
April 12, 2013

Wikileaks ProjectK Code (Github) — open-sourced map and graph modules behind the Wikileaks code serving Kissinger-era cables. (via Journalism++) Plan Your Digital Afterlife With Inactive Account Manager — you can choose to have your data deleted — after three, six, …

Predictive analytics and data sharing raise civil liberties concerns

By Alex Howard
April 11, 2013

Last winter, around the same time there was a huge row in Congress over the Cyber Intelligence Sharing and Protection Act (CISPA), U.S. Attorney General Holder quietly signed off on expanded rules on government data sharing. The rules allowed the National …

Four short links: 3 April 2013

By Nat Torkington
April 3, 2013

Capn Proto — open source faster protocol buffers (binary data interchange format and RPC system). Saddle — a high performance data manipulation library for Sacala. Vega — a visualization grammar, a declarative format for creating, saving and sharing visualization designs. …

Four short links: 1 April 2013

By Nat Torkington
April 1, 2013

MLDemos — an open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and …

Four short links: 29 March 2013

By Nat Torkington
March 29, 2013

Titan 0.3 Out — graph database now has full-text, geo, and numeric-range index backends. Mozilla Security Community Do a Reddit AMA — if you wanted a list of sharp web security people to follow on Twitter, you could do a …

The coming of the industrial internet

By Jon Bruner
March 27, 2013

Download this free report(PDF, Mobi, EPUB) The big machines that define modern life — cars, airplanes, furnaces, and so forth — have become exquisitely efficient, safe, and responsive over the last century through constant mechanical refinement. But mechanical refinement has …

Four short links: 25 March 2013

By Nat Torkington
March 25, 2013

Analytics for Learning — Since doing good learning analytics is hard, we often do easy learning analytics and pretend that they are good instead. But pretending doesn’t make it so. (via Dan Meyer) Reproducible Research — a list of links …

Four short links: 20 March 2013

By Nat Torkington
March 20, 2013

Digital Music Consumption on the Internet: Evidence from Clickstream Data (Scribd) — The goal of this paper is to analyze the behavior of digital music consumers on the Internet. Using clickstream data on a panel of more than 16,000 European …

Four short links: 15 March 2013

By Nat Torkington
March 15, 2013

Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment (PDF) — We find that new and infrequent users are positively influenced by ads but that existing loyal users whose purchasing behavior is not influenced by paid search account …

Four short links: 11 March 2013

By Nat Torkington
March 11, 2013

Adventures in the Ransom Trade — between insurance, protection, and ransoms, Sean Gourley describes it as “one of the more interesting grey markets.” (via Sean Gourley) About High School Computer Science Teachers (Selena Deckelmann) — Selena gets an education in …

Strata Week: Data brokers know more about us than we know

By Jenn Webb
March 8, 2013

The lowdown on data brokers, and the use of sensor data in the workplace ProPublica’s Lois Beckett takes a look this week at data brokers. She says that though Congress is making moves to make such companies give consumers more …

Untangling algorithmic illusions from reality in big data

By Alex Howard
March 6, 2013

Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, …

Untangling algorithmic illusions from reality in big data

By Alex Howard
March 4, 2013

Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, …

Four short links: 4 March 2013

By Nat Torkington
March 4, 2013

Life Inside the Aaron Swartz Investigation — do hard things and risk failure. What else are we on this earth for? crossfilter — open source (Apache 2) JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely …

Data Science Tools: Fast, easy to use, and scalable

By Ben Lorica
March 3, 2013

Here are a few observations based on conversations I had during the just concluded Strata Santa Clara conference. Spark is attracting attention I’ve written numerous times about components of the Berkeley Data Analytics Stack (Spark, Shark, MLbase). Two Spark-related sessions …

Four short links: 27 February 2013

By Nat Torkington
February 27, 2013

Open Source Cancer Informatics Software (NCIP) — we have tackled the main recommendation that came out of our June meeting with open-source thought leaders: Keep it simple. Make barriers to entry as low as possible, and reuse available resources. Specifically, …

On reading Mike Barlow’s “Real-Time Big Data Analytics: Emerging Architecture”

By Ann Spencer
February 26, 2013

During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?” “Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without …

Four short links: 26 Feb 2013

By Nat Torkington
February 26, 2013

School of Data — free online courses around data science and visualization. libshorttext — classify and analyse short-text of things like titles, questions, sentences, and short messages. MIT-style open source license, Python and C++ source. Letterboxd — a site for …

Big data is dead, long live big data: Thoughts heading to Strata

By Mike Loukides
February 25, 2013

A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your …

Big data is dead, long live big data: Thoughts heading to Strata

By Mike Loukides
February 25, 2013

A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your …

Strata Week: The data divide is growing

By Jenn Webb
February 22, 2013

Data mining opens new doors for discrimination, marginalization In a post at Scientific American, Michael Fertik took a look at how Internet data collection practices are beginning to create an unequal — even discriminatory — online environment. Fertik writes: “For …

BigData Top 100 Initiative

By O'Reilly Strata
February 20, 2013

By Milind Bhandarka, Chaitan Baru, Raghunath Nambiar, Meikel Poess, and Dr. Tilmann Rabl Big data systems are characterized by their flexibility in processing diverse data genres, such as transaction logs, connection graphs, and natural language text, with algorithms characterized by …

Four short links: 20 February 2013

By Nat Torkington
February 20, 2013

The Network of Global Control (PLoS One) — We find that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions. [...] From an empirical point of …

Four short links: 18 February 2013

By Nat Torkington
February 18, 2013

crowy — open source social media aggregator. Raytheon makes Social Media Tracking Software (Guardian) — the technology was shared with US government and industry as part of a joint research and development effort, in 2010, to help build a national …


1 to 50 of 117 Next
The Watering Hole