Distributed deep learning on Spark
Alexander Ulanov offers an overview of tools and frameworks that have been proposed for performing deep learning on Spark.
Ideas and resources related to data tools.
Alexander Ulanov offers an overview of tools and frameworks that have been proposed for performing deep learning on Spark.
Evan Sparks describes the principles behind KeystoneML and introduces its programming model by way of example pipelines in NLP and image classification.
How Spark will fit into—and change—the current ecosystem of distributed computing tools.
Using Python, and other tools, for natural language processing, sentiment analysis, and data wrangling.
Crunching CERN’s colossal data with scalable analytics
Learn the basics of machine learning and deep learning using TensorFlow.
Using Apache Beam to become data-driven, even before you have big data.
A single, multitenant platform built with open source technologies, based on an understanding of basic common needs.
Kappa architecture and Bayesian models yield quick, accurate analytics in cloud monitoring systems.
Radu Gheorghe demonstrates how to create, retrieve, update, and delete documents in Elasticsearch. He also covers special Elasticsearch fields, like _type, _source, and _version, and the relationship between Elasticsearch shards and Lucene indices.
Bill Loconzolo reveals the lessons learned from building the Intuit Analytics Cloud.
Michael Armbrust and Tathagata Das explain updates to Spark version 2.0, demonstrating how stream processing is now more accessible with Spark SQL and DataFrame APIs.
Natalino Busa presents the Coral system, a solution for streaming anomaly detection.
Alex Robbins guides you through an in-depth look at the Python API for Apache Spark. In this segment, he explores RDDs--the central abstraction in Spark and essential knowledge for anyone working in the system.
Visualizations that show comparisons, connections, and conclusions offer analytical clarity.
The O’Reilly Podcast: Nikolaus Bates-Haus on tools and techniques for addressing data variety and augmentation at scale.
Jonathan Whitmore demonstrates how to install pivot tables and showcases the features of this extension by examining a dataset of restaurant scores.
Sean Owen and Yann Delacourt cover Spark's architecture, deployment strategies, and use cases, as well as Spark's impact on data science, analytics, and machine learning.
How QoS enables business-critical and low-priority applications to coexist in a single cluster.
With scikit-learn, you can deploy machine learning models in just a few lines of code. Andreas Mueller summarizes the classification, regression, and clustering algorithms in this powerful machine learning library.
Pete Warden walks through popular open source tools from the academic world and shows you step-by-step how to process images with them.
Apache Hadoop co-founders Doug Cutting and Mike Cafarella explore the future of Hadoop.
Eric Frenkiel explains how a trinity of real-time technologies—Kafka, Spark, MemSQL—is enabling Uber and others to power their companies with predictive apps and analytics.
Companies are differentiating themselves by acting on data in real time. But what does “real time” really mean? Jack Norris discusses the challenges of coordinating data flows, analysis, and integration at scale to shape business as it happens.