|
|
|||
BlogsTags > big dataFour short links: 17 June 2013By Nat TorkingtonJune 17, 2013 Weekend Reads on Deep Learning (Alex Dong) — an article and two videos unpacking “deep learning” such as multilayer neural networks. The Internet of Actual Things — “I have 10 reliable activations remaining,” your bulb will report via some ridiculous … Four short links: 12 June 2013By Nat TorkingtonJune 12, 2013 geogit — opengeo project exploring the use of distributed management of spatial data. [...] adapts [git's] core concepts to handle versioning of geospatial data. Shapefiles, PostGIS or SpatiaLite data stored in a change-tracking repository, with all the fun gut features … Four short links: 11 June 2013By Nat TorkingtonJune 11, 2013 For Example — amazing discussion of 3D visualization techniques, full of examples using the D3.js library and bl.ocks.org example gist system. Gorgeous and informative. Anti-Gravity 3D Printer — uses strands to sculpt on any surface. (via Slashdot) How 3D Printing … Four short links: 7 June 2013By Nat TorkingtonJune 7, 2013 Accumulo — NSA’s BigTable implementation, released as an Apache project. How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting … Big data vs. big realityBy Mike BarlowJune 5, 2013 This post originally appeared on Cumulus Partners. It’s republished with permission. Quentin Hardy’s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every … Patients matter most, but technology matters a lotBy Andy OramJune 4, 2013 Computing practices that used to be religated to experimental outposts are now taking up residence at the center of the health care field. From natural language processing to machine learning to predictive modeling, you see people promising at the health … Four short links: 4 June 2013By Nat TorkingtonJune 4, 2013 WeevilScout — browser app that turns your browser into a worker for distributed computation tasks. See the poster (PDF). (via Ben Lorica) sregex (Github) — A non-backtracking regex engine library for large data streams. See also slide notes from a … Big data vs. big realityBy Mike BarlowJune 3, 2013 This post originally appeared on Cumulus Partners. It’s republished with permission. Quentin Hardy’s recent post in the Bits blog of The New York Times touched on the gap between representation and reality that is a core element of practically every … Understanding skepticismBy Mike LoukidesMay 31, 2013 I’d like to correct the impression, given by Derrick Harris on GigaOm, that I’m part of a backlash against “big data.” I’m not skeptical about data or the power of data, but you don’t have to look very far or … Strata Week: Can your passwords stand up to a cracker?By Jenn WebbMay 31, 2013 Companies, developers need to do more to increase password security Google urged users this week to take more care in creating passwords. In a post on the Google Blog, Google Software Engineer Diana Smetters offered some guidelines, including using a … How signals, geometry, and topology are influencing data scienceBy Ben LoricaMay 24, 2013 I’ve been noticing unlikely areas of mathematics pop-up in data analysis. While signal processing is a natural fit, topology, differential and algebraic geometry aren’t exactly areas you associate with data science. But upon further reflection perhaps it shouldn’t be so … Looking ahead to a world of data-dominated decisionsBy Andy OramMay 21, 2013 Measuring a world-shaking trend with feet planted in every area of human endeavor cannot be achieved in a popular book of 200 pages, but one has to start somewhere. I am happy to recommend the adept efforts of Viktor Mayer-Schönberger … Improving options for unlocking your graph dataBy Ben LoricaMay 19, 2013 The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open … Google I/O, Big Data Adolescence, Visualization, and the Future of Open SourceBy Adam FlahertyMay 17, 2013 Google I/O: O’Reilly Editor Rachel Roumeliotis reports from the conference floor. Big Data, Cool Kids: Fumbling toward the adolescence of big data tools. Code as Art: Interactive Data Visualization for the Web author Scott Murray on becoming a code artist. … Six disruptive possibilities from big dataBy Jeff NeedhamMay 15, 2013 My new book, Disruptive Possibilities: How Big Data Changes Everything, is derived directly from my experience as a performance and platform architect in the old enterprise world and the new, Internet-scale world. I pre-date the Hadoop crew at Yahoo!, but … Big data, cool kidsBy Edd DumbillMay 13, 2013 The big data world is a confusing place. We’re no longer in a market dominated mostly by relational databases, and the alternatives have multiplied in a baby boom of diversity. These child prodigies of the data scene show great promise … Genomics and Privacy at the CrossroadsBy James TurnerMay 13, 2013 Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might … Four short links: 10 May 2013By Nat TorkingtonMay 10, 2013 The Remixing Dilemma — summary of research on remixed projects, finding that (1) Projects with moderate amounts of code are remixed more often than either very simple or very complex projects. (2) Projects by more prominent creators are more generative. … Genomics and Privacy at the CrossroadsBy James TurnerMay 9, 2013 Two weeks ago, I had the privilege to attend the 2013 Genomes, Environments and Traits conference in Boston, as a participant of Harvard Medical School’s Personal Genome Project. Several hundreds of us attended the conference, eager to learn what new breakthroughs might … Steering the ship that is data scienceBy Q Ethan McCallumMay 7, 2013 Mike Loukides recently recapped a conversation we’d had about leading indicators for data science efforts in an organization. We also pondered where the role of data scientist is headed and realized we could treat software development as a prototype case. … Tachyon: An open source, distributed, fault-tolerant, in-memory file systemBy Ben LoricaApril 28, 2013 In earlier posts I’ve written about how Spark and Shark run much faster than Hadoop and Hive by1 caching data sets in-memory. But suppose one wants to share datasets across jobs/frameworks, while retaining speed gains garnered by being in-memory? An … Four short links: 23 April 2013By Nat TorkingtonApril 23, 2013 Drawscript — Processing for Illustrator. (via BERG London) Archive Team Warrior — a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive. (via Ed Vielmetti) … Four short links: 19 April 2013By Nat TorkingtonApril 19, 2013 Bruce Sterling on Disruption — If more computation, and more networking, was going to make the world prosperous, we’d be living in a prosperous world. And we’re not. Obviously we’re living in a Depression. Slow first 25% but then it … Four short links: 18 April 2013By Nat TorkingtonApril 18, 2013 The Well Deserved Fortune of Satoshi Nakamoto — I can’t assure with 100% certainty that the all the black dots are owned by Satoshi, but almost all are owned by a single entity, and that entity began mining right from … Four short links: 16 April 2013By Nat TorkingtonApril 16, 2013 Triage — iPhone app to quickly triage your email in your downtime. See also the backstory. Awesome UI. Webcam Pulse Detector — I was wondering how long it would take someone to do the Eulerian video magnification in real code. … Single server systems can tackle big dataBy Ben LoricaApril 13, 2013 About a year ago a blog post from SAP posited1 that when it comes to analytics, most companies are in the multi-terabyte range: data sizes that are well-within the scope of distributed in-memory solutions like Spark, SAP HANA, ScaleOut Software, … Four short links: 12 April 2013By Nat TorkingtonApril 12, 2013 Wikileaks ProjectK Code (Github) — open-sourced map and graph modules behind the Wikileaks code serving Kissinger-era cables. (via Journalism++) Plan Your Digital Afterlife With Inactive Account Manager — you can choose to have your data deleted — after three, six, … Predictive analytics and data sharing raise civil liberties concernsBy Alex HowardApril 11, 2013 Last winter, around the same time there was a huge row in Congress over the Cyber Intelligence Sharing and Protection Act (CISPA), U.S. Attorney General Holder quietly signed off on expanded rules on government data sharing. The rules allowed the National … Four short links: 3 April 2013By Nat TorkingtonApril 3, 2013 Capn Proto — open source faster protocol buffers (binary data interchange format and RPC system). Saddle — a high performance data manipulation library for Sacala. Vega — a visualization grammar, a declarative format for creating, saving and sharing visualization designs. … Four short links: 1 April 2013By Nat TorkingtonApril 1, 2013 MLDemos — an open-source visualization tool for machine learning algorithms created to help studying and understanding how several algorithms function and how their parameters affect and modify the results in problems of classification, regression, clustering, dimensionality reduction, dynamical systems and … Four short links: 29 March 2013By Nat TorkingtonMarch 29, 2013 Titan 0.3 Out — graph database now has full-text, geo, and numeric-range index backends. Mozilla Security Community Do a Reddit AMA — if you wanted a list of sharp web security people to follow on Twitter, you could do a … The coming of the industrial internetBy Jon BrunerMarch 27, 2013 Download this free report(PDF, Mobi, EPUB) The big machines that define modern life — cars, airplanes, furnaces, and so forth — have become exquisitely efficient, safe, and responsive over the last century through constant mechanical refinement. But mechanical refinement has … Four short links: 25 March 2013By Nat TorkingtonMarch 25, 2013 Analytics for Learning — Since doing good learning analytics is hard, we often do easy learning analytics and pretend that they are good instead. But pretending doesn’t make it so. (via Dan Meyer) Reproducible Research — a list of links … Four short links: 20 March 2013By Nat TorkingtonMarch 20, 2013 Digital Music Consumption on the Internet: Evidence from Clickstream Data (Scribd) — The goal of this paper is to analyze the behavior of digital music consumers on the Internet. Using clickstream data on a panel of more than 16,000 European … Four short links: 15 March 2013By Nat TorkingtonMarch 15, 2013 Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment (PDF) — We find that new and infrequent users are positively influenced by ads but that existing loyal users whose purchasing behavior is not influenced by paid search account … Four short links: 11 March 2013By Nat TorkingtonMarch 11, 2013 Adventures in the Ransom Trade — between insurance, protection, and ransoms, Sean Gourley describes it as “one of the more interesting grey markets.” (via Sean Gourley) About High School Computer Science Teachers (Selena Deckelmann) — Selena gets an education in … Strata Week: Data brokers know more about us than we knowBy Jenn WebbMarch 8, 2013 The lowdown on data brokers, and the use of sensor data in the workplace ProPublica’s Lois Beckett takes a look this week at data brokers. She says that though Congress is making moves to make such companies give consumers more … Untangling algorithmic illusions from reality in big dataBy Alex HowardMarch 6, 2013 Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, … Untangling algorithmic illusions from reality in big dataBy Alex HowardMarch 4, 2013 Microsoft principal researcher Kate Crawford (@katecrawford) gave a strong talk at last week’s Strata Conference in Santa Clara, Calif. about the limits of big data. She pointed out potential biases in data collection, questioned who may be excluded from it, … Four short links: 4 March 2013By Nat TorkingtonMarch 4, 2013 Life Inside the Aaron Swartz Investigation — do hard things and risk failure. What else are we on this earth for? crossfilter — open source (Apache 2) JavaScript library for exploring large multivariate datasets in the browser. Crossfilter supports extremely … Data Science Tools: Fast, easy to use, and scalableBy Ben LoricaMarch 3, 2013 Here are a few observations based on conversations I had during the just concluded Strata Santa Clara conference. Spark is attracting attention I’ve written numerous times about components of the Berkeley Data Analytics Stack (Spark, Shark, MLbase). Two Spark-related sessions … Four short links: 27 February 2013By Nat TorkingtonFebruary 27, 2013 Open Source Cancer Informatics Software (NCIP) — we have tackled the main recommendation that came out of our June meeting with open-source thought leaders: Keep it simple. Make barriers to entry as low as possible, and reuse available resources. Specifically, … On reading Mike Barlow’s “Real-Time Big Data Analytics: Emerging Architecture”By Ann SpencerFebruary 26, 2013 During a break in between offsite meetings that Edd and I were attending the other day, he asked me, “did you read the Barlow piece?” “Umm, no.” I replied sheepishly. Insert a sidelong glance from Edd that said much without … Four short links: 26 Feb 2013By Nat TorkingtonFebruary 26, 2013 School of Data — free online courses around data science and visualization. libshorttext — classify and analyse short-text of things like titles, questions, sentences, and short messages. MIT-style open source license, Python and C++ source. Letterboxd — a site for … Big data is dead, long live big data: Thoughts heading to StrataBy Mike LoukidesFebruary 25, 2013 A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your … Big data is dead, long live big data: Thoughts heading to StrataBy Mike LoukidesFebruary 25, 2013 A recent VentureBeat article argues that “Big Data” is dead. It’s been killed by marketers. That’s an understandable frustration (and a little ironic to read about it in that particular venue). As I said sarcastically the other day, “Put your … Strata Week: The data divide is growingBy Jenn WebbFebruary 22, 2013 Data mining opens new doors for discrimination, marginalization In a post at Scientific American, Michael Fertik took a look at how Internet data collection practices are beginning to create an unequal — even discriminatory — online environment. Fertik writes: “For … BigData Top 100 InitiativeBy O'Reilly StrataFebruary 20, 2013 By Milind Bhandarka, Chaitan Baru, Raghunath Nambiar, Meikel Poess, and Dr. Tilmann Rabl Big data systems are characterized by their flexibility in processing diverse data genres, such as transaction logs, connection graphs, and natural language text, with algorithms characterized by … Four short links: 20 February 2013By Nat TorkingtonFebruary 20, 2013 The Network of Global Control (PLoS One) — We find that transnational corporations form a giant bow-tie structure and that a large portion of control flows to a small tightly-knit core of financial institutions. [...] From an empirical point of … Four short links: 18 February 2013By Nat TorkingtonFebruary 18, 2013 crowy — open source social media aggregator. Raytheon makes Social Media Tracking Software (Guardian) — the technology was shared with US government and industry as part of a joint research and development effort, in 2010, to help build a national … 1 to 50 of 117 Next |
|||
|