Tags > bigdata
December 19, 2011
Marc Goodman, consultant and cyber crime expert, explains how criminals and terrorists can put data, automation, and scalability to effective use.
Top Stories: November 28-December 2, 2011 - Info overload vs. consumption, how big data is shaping business, and why we need the "paperless book."Mac Slocum
December 2, 2011
This week on O'Reilly: Author Clay Johnson explained why information consumption, not overload, is what needs to be managed. Also, Alistair Croll looked at the relationship between business intelligence and big data, and Todd Sattersten made a case for the paperless book.
A young entrepreneur's perspective on Angolan innovation - Angolan entrepreneur Nyanga Tyitapeka on mobile commerce and data's potential.By Suzanne Axtell
December 1, 2011
Infonauta founder Nyanga Tyitapeka says Angola is on the cusp of a technology explosion. Mobile and data are overcoming low levels of literacy to change the lives of everyday Angolans.
Strata Week: The social graph that isn't - Pinboard founder questions the social graph, Cloudera and Kaggle raise money for big data.Audrey Watters
November 10, 2011
In this week's data news, Pinboard founder Maciej Ceglowski challenges the notion of a "social graph," Cloudera and Kaggle raise money for big data, and the Supreme Court looks at GPS and privacy issues.
Strata Week: Cloudera founder has a new data product - Odiago hints at the future of Hadoop-based services, Hortonworks shows off its products, and big data comes to edu material.Audrey Watters
November 3, 2011
Cloudera's founder launches Odiago, a new data startup. Elsewhere, Hortonworks reveals its suite of Hadoop products and services, and Knewton and Pearson bring big data to education content.
Four short links: 31 October 2011 - Solitude and Leadership, Data Repository, Copyright History, and Open Source AudioBy Nat Torkington
October 31, 2011
Solitude and Leadership -- an amazing essay on the value of managing one's information diet. Far more than yet another Carr/Morozov "the Internet is making us dumb!!" hate on short-form content, this is an eloquent exposition of the need for long-form thoughts. I find for myself that my first thought is never my best thought. My first thought is...
Four short links: 19 October 2011 - Ubiquitous Multitouch, Bitcoin Bust, vim Text Concepts, and Storage TroublesBy Nat Torkington
October 19, 2011
OmniTouch: Wearable Interaction Everywhere -- compact projector + kinect equivalents in shoulder-mounted multitouch glory. (via Slashdot) Price of Bitcoin Still Dropping -- currency is a confidence game, and there's no confidence in Bitcoins since the massive Mt Gox exchange hack. vim Text Objects -- I'm an emacs user, so this is like reading Herodotus. "On the far side of...
Four short links: 14 October 2011 - Relativity in Short Words, Set Math, Design Inspiration, and Internet of ThingsBy Nat Torkington
October 14, 2011
Theory of Relativity in Words of Four Letters or Less -- this does just what it says, and well too. I like it, as you may too. At the end, you may even know more than you do now. Effective Set Reconciliation Without Prior Context (PDF) -- paper on using Bloom filters to do set union (deduplication) efficiently. Useful...
Four short links: 13 October 2011 - Memorable Indexes, Mobile Sensors, Augmented Reality Toys, and Collaborative EditingBy Nat Torkington
October 13, 2011
Memorable Indexes (Futility Closet) -- Carroll's index also includes entries for "Boots for horizontal weather," "Horizontal rain, boots for," "Rain, horizontal, boots for," and "Weather, horizontal, boots for". They're silly and whimsical, but the underlying problem of making multiple accessible entrypoints into a single corpus of content is with us today and only compounded by the vast growth of...
Top Stories: October 3-7, 2011 - Why Oracle's big data move matters, inside PhoneGap, and data drives NYC's quest to become a premiere digital city.Mac Slocum
October 7, 2011
This week on O'Reilly: Edd Dumbill explained why Oracle's Big Data Appliance is both a validation and a sign of battles to come, we dug into PhoneGap's cross-platform app capabilities, and we surveyed New York City's data and open government efforts.
Top Stories: September 19-23, 2011 - True data over big data, community building through data, and the choreography of digital design.Mac Slocum
September 23, 2011
This week on O'Reilly: Alistair Croll explained why true data is more important than big data, we looked at how BuzzData is building community around datasets, and Liza Daly explained the connection between digital content and choreography.Alistair Croll
September 20, 2011
Open data and transparency aren't enough: we need True Data, not Big Data, as well as regulators and lawmakers willing to act on it.Mac Slocum
September 1, 2011
O'Reilly has released "Big Data Now," a free anthology that taps into the data themes and coverage featured on Radar over the last year.
Why the finance world should care about big data and data science - Roger Magoulas on data's potential to improve finance systems and create new businesses.Mac Slocum
August 31, 2011
O'Reilly director of market research Roger Magoulas discusses the intersection of big data and finance, and the opportunities this pairing creates for financial experts.
Strata Week: Green pigs and data - Rovio mines data to improve Angry Birds, HP bets on big data, Daily Dot parses the social web for stories.Audrey Watters
August 25, 2011
Rovio, the company behind Angry Birds, is using data and analytics to keep bird-launching gamers plugged in. Also, HP's acquisition of Autonomy reveals its data intentions, and the Daily Dot finds stories with an assist from data journalism.
Four short links: 12 August 2011 - Learning Adventure, Python Data Analysis, Lanyrd Technology, and New SensorBy Nat Torkington
August 12, 2011
Hippocampus Text Adventure -- written as an exercise in learning Python, you explore the hippocampus. It's simple, but I like the idea of educational text adventures. (Well, educational in that you learn about more than the axe-throwing behaviour of the cave-dwelling dwarf) Pandas -- BSD-licensed Python data analysis library. Building Lanyrd -- Simon Willison's talk (with slides) about the...
T-Mobile challenges churn with data - T-Mobile's architecture helps it put data to use across the business.Brett Sheppard
August 10, 2011
Mobile service provider T-Mobile uses a federated architecture and virtual data zones to empower innovations in regional marketing, churn management and customer care.
There's no such thing as big data - Even if you have petabyes of data, you still need to know how to ask the right questions to apply it.By Alistair Croll
August 9, 2011
Having a lot of data is not the same as using it well. Today's big companies are losing to small upstarts simply because those firms ask better questions. To compete, large enterprises need to learn how to harvest the data they have on customers, markets, competitors, and products.
Four short links: 3 August 2011 - Library Licensing, Mac Graphics, Coal Computing, and Human AugmentationBy Nat Torkington
August 3, 2011
Just Say No To Freegal -- an interesting view from the inside, speaking out against a music licensing system called Freegal which is selling to libraries. Libraries typically buy one copy of something, and then lend it out to multiple users sequentially, in order to get a good return on investment. Participating in a product like Freegal means that...Nat Torkington
July 28, 2011
23andMe Disproves Its Own Business Model -- a hostile article talking about how there's little predictive power in genetics for diabetes and Parkinson's so what's the point of buying a 23andMe subscription? The wider issue is that, as we've known for a while, mapping out your genome only helps with a few clearcut conditions. For most medical things that...
Four short links: 25 July 2011 - Minecraft Emergent Behaviour, Algorithmic 3D Printing, Automated MapReduce Optimization, and Multi-Device PreviewBy Nat Torkington
July 25, 2011
Anonymity in Bitcoin -- TL;DR: Bitcoin is not inherently anonymous. It may be possible to conduct transactions is such a way so as to obscure your identity, but, in many cases, users and their transactions can be identified. We have performed an analysis of anonymity in the Bitcoin system and published our results in a preprint on arXiv. (via...
Four short links: 22 July 2011 - Data Businesses, Multitouch Charting, 3D-Printing Glass, and Synthetic BiologyBy Nat Torkington
July 22, 2011
Competitive Advantage Through Data -- the applications and business models for erecting barriers around proprietary data assets. Sees data businesses in these four categories: contributory data sourcing, offering cleaner data, data generated from service you offer, and viz/ux. The author does not yet appear to be considering when open or communal data is better than proprietary data, and how...
July 21, 2011
July 20, 2011
Random Khan Exercises -- elegant hack to ensure repeatability for a user but difference across users. Note that they need these features of exercises so that they can perform meaningful statistical analyses on the results. Float, the Netflix of Reading (Wired) -- an interesting Instapaper variant with a stab at an advertising business model. I would like to stab...
Four short links: 14 July 2011 - Microchip Archaeology, OSM Map Library, Feedback Loops for Public Expenditure, and Mind-reading Big DataBy Nat Torkington
July 14, 2011
Digging into Technology's Past -- stories of the amazing work behind the visual 6502 project and how they reconstructed and simulated the legendary 6502 chip. To analyze and then preserve the 6502, James treated it like the site of an excavation. First, he needed to expose the actual chip by removing its packaging of essentially “billiard-ball plastic.” He eroded...
Four short links: 12 July 2011 - Rare Visualization, Google+ Tech, Scala+Erlang, and In-Database AnalyticsBy Nat Torkington
July 12, 2011
Slopegraphs -- a nifty Tufte visualization which conveys rank, value, and delta over time. Includes pointers to how to make them, and guidelines for when and how they work. (via Avi Bryant) Ask Me Anything: A Technical Lead on the Google+ Team -- lots of juicy details about technology and dev process. A couple nifty tricks we do: we...
Data journalism, data tools, and the newsroom stack - The 2011 Knight News Challenge winners illustrate data's ascendance in media and government.Alex Howard
July 5, 2011
The MIT Civic Media conference and 2011 Knight News Challenge winners made it clear that data journalism and data tools will play key roles in the future of media and open government.
Get started with Hadoop: From evaluation to your first production cluster - Best practices for evaluating Hadoop and setting up an initial cluster.Brett Sheppard
June 27, 2011
Focusing on the Hadoop Distributed File System (HDFS) and MapReduce, this in-depth piece offers tips for organizations that are looking to evaluate Hadoop and deploy an initial cluster.
June 24, 2011
Radar's top stories: June 13-17, 2011 - Big data and the semantic web, choosing the right license for data, 3 great ideas you should stealMac Slocum
June 17, 2011
This week on Radar: We looked at the links between big data and the semantic web, the thought process behind OpenStreetMap's move to the Open Database License was revealed, and we highlighted three ideas you should lift from HubSpot.Edd Dumbill
June 14, 2011
The big data and semantic web worlds seem to be disjunct. Yet big data is poised to light the fire beneath the long-held dreams of the semantic web, and the semantic web will enable data scientists to describe, organize and reason about their results.
Four short links: 13 June 2011 - Remote Fingerprint Scans, Playdough Circuits, Update-Sync, and Tweet FailageBy Nat Torkington
June 13, 2011
AIRPrint -- prototype box scans a fingerprint from six feet away. (via Greg Linden) Squishy Circuits -- teaching electronic circuits with conductive and insulating playdough. (via Hacker News) GraphLab -- alternative take on Map-Reduce, called Update-Sync, where tasks run on connected sets of nodes rather than on one node at a time. Tower Bridge Closed -- the @towerbridge account...
Four short links: 3 June 2011 - Distributed Drug Money, Science Game, Beautiful Machine Learning, and Stream Event ProcessingBy Nat Torkington
June 3, 2011
Silk Road (Gawker) -- Tor-delivered "web" site that is like an eBay for drugs, currency is Bitcoins. Jeff Garzik, a member of the Bitcoin core development team, says in an email that bitcoin is not as anonymous as the denizens of Silk Road would like to believe. He explains that because all Bitcoin transactions are recorded in a public...
Four short links: 19 May 2011 - Internet Access Rights, Statistical Peace, Vintage Jobs, and Errata EtymologyBy Nat Torkington
May 19, 2011
Right to Access the Internet -- a survey of different countries' rights to access to access the Internet. Peace Through Statistics -- three ex-Yugoslavian statisticians nominated for Nobel Peace Prize. In war-torn and impoverished countries, statistics provides a welcome arena in which science runs independent of ethnicity and religion. With so few resources, many countries are graduating few, if...Nat Torkington
May 13, 2011
Mathematical Intimidation: Driven by the Data (PDF) -- excellent article from Notices of the American Mathematical Society about the flaws in "value-added modelling", the latest fad whereby data about students' results in different classes are analysed to identify the effect of each teacher. People recognize that tests are an imperfect measure of educational success, but when sophisticated mathematics is...
Four short links: 11 May 2011 - API Explorer, Random Sampling, Open Cultural Collections, and Video LecturesBy Nat Torkington
May 11, 2011
webshell -- command-line tool for debugging/exploring APIs, open sourced (Apache v2) and written in node.js. (via Sean Coates) sample -- command-line filter for random sampling of input. Useful when you've got heaps of data and want to run your algorithms on a random sample of it. (via Scott Vokes) Yale Offers Open Access To PD Materials in Collections --...
Big data: Global good or zero-sum arms race? - It remains to be seen if big data will catalyze exponential growth.By Jim Stogdill
April 12, 2011
Will a big data revolution dramatically change lives, or will it instead yield a middle class feel-good machine that's irrelevant to the working poor?
Four short links: 1 April 2011 - Murky Future for Transparency, Browser Awesome, Future Realized, and Data BiasBy Nat Torkington
April 1, 2011
Transparency Sites to Close -- the US government's open data efforts will close in a few months as a result of the cuts in funding. Browser Wars, Plural (Alex Russell) -- nice rundown of demos of what modern browsers are capable of. Brief Descriptions of Potential Home Information Services (image) -- lovely 1971 piece of futurology, which you can...
Outliers and coexistence are the new normal for big data - Analysis of complete data sets and integration of new tools are leading to revenue growth and new business models.Brett Sheppard
March 31, 2011
To benefit from advanced analytics and study complete huge data sets, many enterprise architectures are evolving into coexistence environments that combine legacy and new systems.
Four short links: 24 March 2010 - Digital Subscriptions, Graph Database, Data Science, and High Speed CompressionBy Nat Torkington
March 24, 2011
Digital Subscription Prices -- the NY Times in context. Aie. Trinity -- Microsoft Research graph database. (via Hacker News) Data Science Toolkit -- prepackaged EC2 image of most useful data tools. (via Pete Warden) Snappy -- Google's open sourced compression library, as used in BigTable and MapReduce. Emphasis is on speed, with resulting lack of quality in filesize (20-100%...Nat Torkington
March 17, 2011
The Open Data Manual -- a HOWTO for organisations wanting to open up data. This report discusses legal, social and technical aspects of open data. The manual can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data — why to go open, what...
Four short links: 14 March 2011 - Future Retrospective, Political Entrepreneurs, Library DRM, and In-Database AnalyticsBy Nat Torkington
March 14, 2011
A History of the Future in 100 Objects (Kickstarter) -- blog+podcast+video+book project, to have future historians tell the story of our century in 100 objects. The BBC show that inspired it was brilliant, and I rather suspect this will be too. It's a clever way to tell a story of the future (his hardest problem will be creating a...
Four short links: 24 February 2011 - Network Snooping, Traffic Growth, Data Munging, and Open InteropBy Nat Torkington
February 24, 2011
Charles -- a debugging proxy that lets a developer view all HTTP and SSL traffic between their machine and the Internet. (via Andy Baio's excellent "How I Indexed The Daily) The Rise and Rise of Mobile Broadband -- the Blackberry is now the standard measure of traffic, apparently. The outcome is simple - Cisco estimates that global mobile data...
Mind-blowing, world-changing technology by the numbers - Facts and humor in this video illustrate the reach and impact of new technology.Jonathan Reichental, Ph.D.
February 16, 2011
This is a golden age of technology. Almost anyone with modest technology such as an internet connection or a mobile phone can have an impact on the world. This video is just a small slice of the staggering numbers and impact of technology that we witness today.Nat Torkington
February 7, 2011
Need faster machine learning? Take a set-oriented approach - How a days-long data process was completed in minutes.By Roger Magoulas
January 28, 2011
We recently faced the type of big data challenge we expect to become increasingly common: scaling up the performance of a machine learning classifier for a large set of unstructured data. In this post, we explain how a set-oriented approach led to huge performance gains.
Will data warehousing survive the advent of big data? - Analysis: How big data and traditional data warehousing can coexist.Barry Devlin
January 27, 2011
Data warehousing — and information management as a whole — must evolve in a radically new direction if we are to manage big data properly and solve the key issue of finding implicit meaning in data.
Four short links: 25 January 2011 - Scalable Scraping, iPad Tactility, Emotional Failbots, and Asking Good QuestionsBy Nat Torkington
January 25, 2011
node.io -- distributed node.js-based scraper system. Joystick-It -- adhesive joystick for the iPad. Compare the Fling analogue joystick. Tactile accessories for the iPad—hot new product category or futile attempt to make a stripped-down demi-computer into an aftermarked pimped-out hackomatic? (via Aza Raskin on Twitter) Programmed for Love (Chronicle of Higher Education) -- Sherry Turkle sees the danger in social...
Four short links: 7 January 2011 - User Experience, Big Data Case Study, DimDim Acquired, Secret to the WebBy Nat Torkington
January 7, 2011
Users Can Self-Report Problems -- users self-report 50% of the problems that professional usability testing uncovers, and they find problems that usability testers don't. (The other way to look at this is: self-reporting only finds half the actual problems in a site) The Learning Behind Gmail Priority Inbox (PDF) -- challenges faced by Gmail Priority Inbox and how they...
Everyone loves a science fair - Get your submission in for the Strata Conference Science Fair by January 14.By Alistair Croll
January 4, 2011
Strata's science fair will showcase the creative edges of big data. If you have an interesting tool or technology to show -- the more beta, the better -- let us know.
1 to 50 of 75 Next