KeystoneML: Optimized large-scale machine-learning pipelines on Apache Spark Date: This event took place live on May 17 2016 Presented by: Evan Sparks Duration: Approximately 60 minutes. Questions? Please send email to Description:Moderated By: Ben Lorica KeystoneML is an open source software framework developed by the AMPLab for building large-scale machine-learning pipelines that run on Apache Spark. Evan Sparks describes the principles behind KeystoneML and introduces its programming model by way of example pipelines in NLP and image classification. Using these examples, Evan outlines the optimizations that KeystoneML makes to increase training throughput while preserving correctness and presents end-to-end results that demonstrate the scalability of the system to hundreds of nodes. After this webcast, you'll have learned:
About Evan SparksEvan Sparks is a PhD student in computer science at UC Berkeley working in the AMPLab. Evan's research focuses on the design and implementation of distributed systems for large-scale data analysis and machine learning. Prior to Berkeley, he spent several years tackling large-scale data problems as a quantitative financial analyst at MDT Advisers and a product engineer at Recorded Future. He holds a bachelor's degree from Dartmouth College and a master's degree in computer science from UC Berkeley. Twitter: @evanrsparks About Ben LoricaBen Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media, Inc. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services. |
|
|