Frank Kane

Frank Kane spent 9 years at Amazon and IMDb developing and managing the technology that delivers product recommendations to hundreds of millions of customers. Frank holds 17 patents in the fields of distributed computing, data mining, and machine learning. He now runs Sundog Software, a software company focused on virtual reality technology and on Big Data analysis training. He is the author of multiple titles on Spark, MapReduce, Spark Streaming, and Python.

Content

Enabling reliable, secure collaboration on data science and machine learning projects

August 21, 2018

A conversation with Paul Taylor, chief architect in Watson Data and AI, and IBM fellow.

Big data’s biggest secret: Hyperparameter tuning

August 23, 2017

The toughest part of machine learning with Spark isn't what you think it is.

How do I integrate Logstash with Amazon’s Elasticsearch Service (ES)?

June 27, 2017

Learn the somewhat quirky process for integrating Logstash with the Amazon Elasticsearch Service.

How do I connect to Kibana from Amazon’s Elasticsearch Service (ES)?

Explore techniques that allow specific IP address/proxy server access to Kibana, protect your ES cluster, and block entry by unauthorized users.

How do I configure access policies within Amazon’s Elasticsearch Service (ES)?

Learn to configure the access policies crucial to working successfully with the Amazon Elasticsearch service.

How do I configure Apache Spark on an Amazon Elastic MapReduce (EMR) cluster?

June 9, 2017

Learn how to manage Apache Spark configuration overrides for an AWS Elastic MapReduce cluster to save time and money.

How do I package a Spark Scala script with SBT for use on an Amazon Elastic MapReduce (EMR) cluster?

Learn how to create, structure, and compile your Scala script to a JAR file, and use SBT to run on a distributed Spark cluster.

How do I run an Apache Spark script on an Amazon Elastic MapReduce (EMR) cluster?

Learn how to use steps in the EMR console to schedule and run Spark scripts stored in Amazon S3, on both new and existing clusters.

How do I connect to my Amazon Elastic MapReduce (EMR) cluster with SSH?

June 7, 2017

Learn how to use SSH to connect to the master node of your Elastic MapReduce (EMR) cluster.

How do I connect to the web user interfaces (UIs) on my Hadoop cluster using Amazon’s Elastic MapReduce (EMR) service?

Learn how to set up an SSH tunnel and web proxy to use tools like Hue, Zeppelin, and ResourceManager.

How can I run Hive queries on my Amazon Elastic MapReduce (EMR) cluster?

Learn three different ways of running Hive queries on your EMR cluster: by script via terminal, the Hue web interface, or steps in the EMR console.