Strata + Hadoop World 2016 - London, United Kingdom: Video Compilation

Video description

Sold out Strata+Hadoop London 2016 is a tour through the giant city of data led by guides expert in knowing just where to go. There is a lot to see in this video compilation that shows you every bit: 211 speakers, 108 sessions, 20 keynotes and 14 tutorials. Start your trip with a long-form tutorial exploring data territory such as: An 8-hour deep dive into all phases of managing Hadoop clusters; an 8-hour excursion through the hardcore data science world of data management, machine learning, natural language processing, crowd-sourcing, and algorithm design; an 8-hour Spark camp on all things Apache; or 3½-hour tours on D3 data visualizations, artificial intelligence, optimizing workflow in R, and more. Want something shorter? Try visiting a mind-blowing conference session (30-40 minutes each) on topics ranging from H20 and TensorFlow to e-commerce A/B testing, predictive analysis, and natural language processing. Not interested? How about streaming analytics at 300 billion events per day with Kafka, Samza, and Druid or using Spark and Hadoop in high-speed trading environments? It’s a travelogue of data wonders with something for everyone.

  • Gain front row access to all 211 speakers, 108 sessions, 20 keynotes, and 14 tutorials
  • Download the videos or view them through O'Reilly's HD player
  • Hear from big data experts at Intel, deepsense.io, IBM, Google, Terradata, and more
  • Watch Cloudera’s Doug Cutting and Tom White predict the future of Apache Hadoop
  • Learn about Spark, Kafka Streams, Kudu, Kappa, Drill, Heron, Flink, Eagle, and NiFi
  • Be inspired by data innovations in cancer research, epilepsy monitoring, and mine field clearing
  • Explore Scotland's Data Lab, the Danish Agency for Digitstation, and the ethics of data processing
  • Hear about big data use at LinkedIn, Intuit, Uber, Etsy, HPE, Docker, Facebook, and Microsoft

Publisher resources

View/Submit Errata

Table of contents

  1. Keynotes
    1. Modern data strategy and CERN - Mike Olson (Cloudera) and Manuel Martin Marquez (CERN)
    2. The Internet of Things: It’s the (sensor) data, stupid - Martin Willcox (Teradata International)
    3. Data relativism and the rise of context services - Joe Hellerstein (UC Berkeley)
    4. Saving whales with deep learning - Piotr Niedzwiedz (deepsense.io)
    5. Data wants to be shareable - Mona Vernon (Thomson Reuters Labs)
    6. Analytics innovation in cancer research - Gilad Olswang (Intel)
    7. The future of (artificial) intelligence - Stuart Russell (UC Berkeley)
    8. The curious case of the data scientist - David Selby (IBM)
    9. Drawing insights from imperfection: A year of Dear Data - Stefanie Posavec (NA)
    10. Big data at Google: Solving problems at scale - Jordan Tigani (Google)
    11. The other half of big data - Tricia Wang (Constellate Data)
    12. Bringing big data and design to policy making - Cat Drew (UK Policy Lab and Government Data Science Partnership)
    13. Machine learning for human rights advocacy: Big benefits, serious consequences - Megan Price (Human Rights Data Analysis Group)
  2. Data innovations
    1. A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 1
    2. A hands-on introduction to Apache Kafka - Ian Wrigley (Confluent) - Part 2
    3. AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 1
    4. AI for business: A hands-on introduction to what machine learning can do - Marc Warner (ASI) - Part 2
    5. Experiments in The Data Lab: Creating a national hub for data science in Scotland - Brian Hills (The Data Lab)
    6. The innards of H2O - Cliff Click (0xdata)
    7. TensorFlow: Machine learning for everyone - Sherry Moore (Google)
    8. The future of column-oriented data processing with Arrow and Parquet - Julien Le Dem (Dremio)
    9. 90% of the world's trade is transported by sea, but what data do we have about ship activity worldwide? - Tal Guttman (Windward)
    10. The evolution of massive-scale data processing - Tyler Akidau (Google)
    11. Streaming analytics at 300 billion events per day with Kafka, Samza, and Druid - Xavier Léauté (Metamarkets)
    12. Triggers in Apache Beam (incubating): User-controlled balance of completeness, latency, and cost in streaming big data pipelines - Kenneth Knowles (Google)
    13. Introducing Kafka Streams, Apache Kafka's new stream processing library - Neha Narkhede (Confluent)
  3. Data science advanced analytics
    1. R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 1
    2. R and reproducible reporting for big data - Aimee Gott (Mango Solutions), Mark Sellors (Mango Solutions), and Richard Pugh (Mango Solutions) - Part 2
    3. Deep learning and natural language processing with Spark - Andy Petrella (Data Fellas) and Melanie Warrick (Skymind)
    4. Semantic natural language understanding with Spark Streaming, UIMA, and machine-learned ontologies - David Talby (Atigeo) and Claudiu Branzan (Atigeo)
    5. Sightseeing, venues, and friends: Predictive analytics with Spark ML and Cassandra - Natalino Busa (Teradata)
    6. Introduction to generalized low-rank models and missing values - Jo-fai Chow (H2O.ai)
    7. Petascale genomics - Tom White (Cloudera)
    8. Panel: The future of intelligence - Marc Warner (ASI), Stuart Russell (UC Berkeley), and Jaan Tallinn (CSER)
    9. The polyglot data scientist - Jeroen Janssens (Tilburg University)
    10. Beyond guide dogs: How advances in deep learning can empower the blind community - Anirudh Koul (Microsoft) and Saqib Shaikh (Microsoft)
    11. Predicting out-of-sample performance of a large cohort of trading algorithms with machine learning - Thomas Wiecki (Quantopian)
    12. Scala: The unpredicted lingua franca for data science - Andy Petrella (Data Fellas) and Dean Wampler (Lightbend)
    13. Land mine or Coke can: Machine learning from GPR data - Dirk Gorissen (Skycap | World Bank)
    14. Data modeling for data science: Simplify your workload with complex types - Marcel Kornacker (Cloudera)
    15. Applications of natural language understanding: Tools and technologies - Alyona Medelyan (Entopix)
  4. Data-driven business
    1. Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 1
    2. Developing a modern enterprise data strategy - Scott Kurth (Silicon Valley Data Science) and John Akred (Silicon Valley Data Science) - Part 2
    3. The Bag of Little Bootstraps: A/B experimenting with big data made small - Emily Sommer (Etsy)
    4. Beyond the hunch: Communicating uncertainty for effective data-driven business - Abigail Lebrecht (uSwitch)
    5. What’s next for music services? The answer is in the data - Paul Shannon (7digital Group Plc) and Alan Hannaway (7digital)
    6. Intuit, Uber, and Etsy: Scaling innovation with A/B testing - Lucian Lita (Intuit), Mita Mahadevan (Intuit Inc.), Shalin Mantri (Uber), and Gabrielle Gianelli (Etsy)
    7. How AI revolutionizes business strategy - Kenneth Cukier (The Economist)
    8. The best university in the world - Duncan Ross (TES Global) and Francine Bennett (Mastodon C)
    9. 20 percent blissful, 80 percent ignorance - Phil Harvey (DataShaka)
    10. Data gravity and complex systems - Dave McCrory (Basho Technologies)
    11. Analytics: A first-class architectural concern in a SaaS platform - Calum Murray (Intuit)
    12. Situational awareness: On the importance of mapping - Simon Wardley (Leading Edge Forum (CSC))
    13. Data-driven businesses: Disrupting business models with big data - Carme Artigas (Synergic Partners)
    14. Building better cross-team communication - Ellen Friedman (Independent)
    15. What Esperanto can teach us about collaboration in the big data environment - Anne Sophie Roessler (Dataiku)
    16. What should I eat: The road map to better food and smarter nutrition science - Taryn Fixel (ingredient1)
    17. Your TOS is not informed consent: Ethical experimentation for the Web - Rachel Shadoan (Akashic Labs)
    18. How to ask good questions - Farrah Bostic (The Difference Engine)
    19. Every business is a data business - Mona Vernon (Thomson Reuters Labs)
    20. Data scientists everywhere - Kim Nilsson (Pivigo)
    21. Harnessing big data to transform the energy sector - Erik Nygard (Limejump Ltd)
    22. Data science as catalyst of Autodesk's business model transformation - Laurent Gaubert (Autodesk)
    23. My AlgorithmicMe knows me better than Google or my mum - Majken Sander (BusinessAnalyst.dk)
    24. Otto’s little army of real-time bots: How online retailers can defend shopping carts and retarget customers in real time - Rupert Steffner (Otto GmbH Co. KG)
    25. My AlgorithmicMe: The "Who is. . .?" of the future - Majken Sander (BusinessAnalyst.dk) and Joerg Blumtritt (Datarella)
    26. Demonstrating the art of the possible with Spark and Hadoop - Joy Spohn (IBM) and Adrian Houselander (IBM)
  5. Enterprise adoption
    1. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 1
    2. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 2
    3. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 3
    4. Apache Hadoop operations for production systems - Jayesh Seshadri (Cloudera), Justin Hancock (Cloudera), Mark Samson (Cloudera), and Wellington Chevreuil (Cloudera) - Part 4
    5. Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 1
    6. Architecting a data platform - John Akred (Silicon Valley Data Science) and Stephen O'Sullivan (Silicon Valley Data Science) - Part 2
    7. Big SQL: The future of in-cluster analytics and enterprise adoption - Moderated by: Surya Mukherjee (Ovum) - Panelists: Lloyd Tabb (Looker Data Science), Nick Amabile (FullStack Analytics), Rex Gibson (Knewton), dp Suresh (Yahoo!)
    8. BI on Hadoop: What are your options? - Tomer Shiran (Dremio)
  6. Hadoop internals development
    1. Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 1
    2. Hadoop application architectures: Fraud detection - Jonathan Seidman (Cloudera), Mark Grover (Cloudera), Gwen Shapira (Confluent), and Ted Malaska (Cloudera) - Part 2
    3. The next 10 years of Apache Hadoop - Doug Cutting (Cloudera), Tom White (Cloudera), and Ben Lorica (O'Reilly Media)
    4. Hadoop's storage gap: Resolving transactional access/analytic performance trade-offs with Apache Kudu (incubating) - Todd Lipcon (Cloudera, Inc.)
    5. Building real-time BI systems with HDFS and Kudu - Ruhollah Farchtchi (Zoomdata)
    6. Why is my Hadoop job slow? - Bikas Saha (Hortonworks Inc)
    7. Scaling out to 10 clusters, 1,000 users, and 10,000 flows: The Dali experience at LinkedIn - Carl Steinbach (LinkedIn)
    8. Floating elephants: Developing data wrangling systems on Docker - Chad Metcalf (Docker) and Seshadri Mahalingam (Trifacta)
  7. Data 101
    1. Developing data scientists: Breaking the skills cap - Yuelin Li (ASI)
    2. The business case for Spark, Kafka, and friends - John Akred (Silicon Valley Data Science)
    3. What is AI? - Melanie Warrick (Skymind)
  8. Hardcore data science
    1. Mobile advertising: The preclick experience - Mounia Lalmas (Yahoo)
    2. Analytics for large-scale time series and event data - Ira Cohen (Anodot)
    3. Recent trends in recommender systems - Danny Bickson (1972)
    4. Visual data analysis for intelligent machines - Francesca Odone (University of Genova)
    5. Deep learning for web-scale text - Piotr Mirowski (Google DeepMind)
    6. Detecting anomalies in the real world - Alessandra Staglianò (The ASI)
    7. Recent advances in deep learning research - Olivier Grisel (Inria scikit-learn)
    8. Hardcore data science in practice - Mikio Braun (Zalando SE)
    9. Data science++: Improving data science by adding domain understanding - Matthew Smith (Microsoft Research)
    10. A methodology for taxonomy generation and maintenance from large collections of textual data - Roxana Danger (reed.co.uk)
    11. A functional data integration pipeline using Scala - Johannes Bauer (Cambridge Analytica)
  9. IoT real-time
    1. An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 1
    2. An Introduction to time series with Team Apache - Patrick McFadin (DataStax) - Part 2
    3. What does your smart car know about you? - Charles Givre (Booz | Allen | Hamilton)
    4. When it absolutely, positively has to be there: Reliability guarantees in Kafka - Gwen Shapira (Confluent) and Jeff Holoman (Cloudera)
    5. Real-time epilepsy monitoring with smart clothing: A case study in time series, open source technology, and connected devices - Eric Kramer (Dataiku)
    6. Industrial big data and sensor time series data: Different but not difficult - Gopal GopalKrishnan (OSIsoft, LLC.) and Hoa Tram (OSIsoft)
    7. High-performance data flow with a GUI—and guts - Simon Elliston Ball (Hortonworks)
    8. Watermarks: Time and progress in streaming dataflow and beyond - Slava Chernyak (Google Inc.)
    9. Putting Kafka into overdrive - Gwen Shapira (Confluent) and Todd Palino (LinkedIn)
    10. Stream analytics in the enterprise: A look at Intel’s internal IoT implementation - Moty Fania (Intel)
    11. Legacy or Kafka? What an ideal messaging system should bring to Hadoop - Jim Scott (MapR Technologies, Inc.)
    12. Making sense of exactly-once semantics - Flavio Junqueira (Confluent)
    13. Processing billions of events in real time with Heron - Karthik Ramasamy (Twitter)
    14. Data privacy in the age of the Internet of Things - Alasdair Allan (Babilim Light Industries)
    15. Kappa architecture in the telecom industry - Ignacio Manuel Mulas Viela (Ericsson) and Nicolas Seyvet (Ericsson AB)
  10. Spark beyond
    1. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 1
    2. Building machine-learning apps with Spark: MLlib, ML Pipelines, and GraphX - Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera), and Krishna Sankar (Volvo Cars) - Part 2
    3. Spark 2.0: What’s next? - Tathagata Das (Databricks)
    4. Anomaly detection in telecom with Spark - Ted Dunning (MapR Technologies)
    5. Beyond shuffling: Tips and tricks for scaling Spark jobs - Holden Karau (IBM)
    6. Securing Apache Spark on production Hadoop clusters - Kostas Sakellis (Cloudera)
    7. The future of streaming in Spark: Structured streaming - Tathagata Das (Databricks)
    8. Introduction to Apache Spark for Java and Scala developers - Ted Malaska (Cloudera)
    9. Breaking Spark: Top five mistakes to avoid when using Apache Spark in production - Neelesh Srinivas Salian (Cloudera)
  11. Visualization user experience
    1. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 1
    2. Introduction to visualizations using D3 - Brian Suda ((optional.is)) - Part 2
    3. Good city life - Daniele Quercia (Bell Labs)
    4. Pixels and place: What online experiences can borrow from offline spaces and vice versa - Kate O'Neill (KO Insights)
    5. Opportunities for hardware acceleration in big data analytics - Kanu Gulati (Zetta Venture Partners)
    6. The rise of the GPU: GPUs will change how you look at big data - Todd Mostak (MapD)
  12. Sponsored
    1. Which whale is it anyway? Face recognition for right whales using deep learning - Robert Bogucki (deepsense.io) and Maciej Klimek (deepsense.io)
    2. Realizing the value of combining the IoT and big data analytics - Frank Saeuberlich (Teradata) and Eliano Marques (Think Big Analytics)
    3. Federated analytics innovation in cancer research - Gilad Olswang (Intel)
    4. Best practices to extract value from Hadoop with predictive analytics - Zoltan Prekopcsak (RapidMiner)
    5. Building a modern data architecture - Ben Sharma (Zaloni)
    6. High-frequency decisioning, from big data to fast data - Tugdual Grall (MapR Technologies)
    7. Avoid big data becoming a big problem - Raghunath Nambiar (Cisco)
    8. Operating batch in the data-driven enterprise - Joe Goldberg (BMC Software Inc.)
    9. Developing a successful big data strategy - Seb Darrington (EMC)
    10. Business transformation and outcomes through big data - Louise Matthews (Hortonworks)
    11. The business bottom line of data lakes: Real-life experiences - Franz Aman (Informatica)
  13. Security
    1. Simplifying Hadoop with RecordService, a secure and unified data access path for compute frameworks - Alex Leblang (Cloudera)
    2. Best practices and solutions to manage and govern a multinational big data platform - Clara Fletcher (Accenture)
    3. HopsWorks: Multitenant Hadoop as a service - Jim Dowling (Swedish ICT - SICS)
  14. Hadoop use cases
    1. Improving the customer experience with big data wrangling on Hadoop - Dan Jermyn (Royal Bank of Scotland) and Connor Carreras (Trifacta)
    2. Simple, fast, and flexible risk aggregation in Hadoop - Deenar Toraskar (Think Reactive)
    3. Risk data aggregation and risk reporting for financial services - Ben Sharma (Zaloni)
    4. The future is now: Leveraging Hadoop for real-time, predictive insights - Steven Noels (NGDATA)
    5. Year 2025: Big data as enabler of fully automated vehicles - Dr. Thomas Beer (Continental) and Felix Werkmeister (Continental)
    6. Analyzing dynamic JSON with Apache Drill - Tomer Shiran (Dremio)
  15. Law, ethics, governance
    1. Denmark is data driven - Mads Hjorth (Danish Agency for Digitisation)
    2. Using data for evil IV: The journey home - Duncan Ross (TES Global) and Francine Bennett (Mastodon C)
    3. Protecting individual privacy in a data-driven world - Jason McFall (Privitar)
    4. Don't build a data swamp: Hadoop governance case studies for financial services - Mark Donsky (Cloudera) and Chang She (Cloudera)

Product information

  • Title: Strata + Hadoop World 2016 - London, United Kingdom: Video Compilation
  • Author(s): O'Reilly Media, Inc.
  • Release date: June 2016
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491944639