Data – O’Reilly

How do use cases benefit from real-time processing?

By Jesse Anderson

Learn some of the benefits of using real-time processing of data for some use cases.

What does dysfunction look like on a data team?

By Jesse Anderson

Learn to identify problems that may indicate data team dysfunction.

Take the 2018 Data Science Salary Survey

As a data professional, you are invited to share your valuable insights. Help us gain insight into the demographics, work environments, tools, and compensation of practitioners in our growing field. All responses are reported in aggregate to assure your anonymity. The survey will require approximately 5-10 minutes to complete.

JupyterHub on Google Cloud

By Andrew Odewahn

A step-by-step tutorial on how to install and run JupyterHub on gcloud.

How do I integrate Logstash with Amazon’s Elasticsearch Service (ES)?

By Frank Kane

Learn the somewhat quirky process for integrating Logstash with the Amazon Elasticsearch Service.

How do I configure access policies within Amazon’s Elasticsearch Service (ES)?

By Frank Kane

Learn to configure the access policies crucial to working successfully with the Amazon Elasticsearch service.

How do I connect to Kibana from Amazon’s Elasticsearch Service (ES)?

By Frank Kane

Explore techniques that allow specific IP address/proxy server access to Kibana, protect your ES cluster, and block entry by unauthorized users.

Probabilistic programming from scratch

By Mike Lee Williams

Working with uncertainty in real-world data.

How do I run an Apache Spark script on an Amazon Elastic MapReduce (EMR) cluster?

By Frank Kane

Learn how to use steps in the EMR console to schedule and run Spark scripts stored in Amazon S3, on both new and existing clusters.

How do I package a Spark Scala script with SBT for use on an Amazon Elastic MapReduce (EMR) cluster?

By Frank Kane

Learn how to create, structure, and compile your Scala script to a JAR file, and use SBT to run on a distributed Spark cluster.

How do I configure Apache Spark on an Amazon Elastic MapReduce (EMR) cluster?

By Frank Kane

Learn how to manage Apache Spark configuration overrides for an AWS Elastic MapReduce cluster to save time and money.

How do I connect to my Amazon Elastic MapReduce (EMR) cluster with SSH?

By Frank Kane

Learn how to use SSH to connect to the master node of your Elastic MapReduce (EMR) cluster.

How can I run Hive queries on my Amazon Elastic MapReduce (EMR) cluster?

By Frank Kane

Learn three different ways of running Hive queries on your EMR cluster: by script via terminal, the Hue web interface, or steps in the EMR console.

How do I connect to the web user interfaces (UIs) on my Hadoop cluster using Amazon’s Elastic MapReduce (EMR) service?

By Frank Kane

Learn how to set up an SSH tunnel and web proxy to use tools like Hue, Zeppelin, and ResourceManager.

Accelerate analytics and AI innovations with Intel

By Ziya Ma

Ziya Ma outlines the challenges for applying machine learning and deep learning at scale and shares solutions that Intel has enabled for customers and partners.

Machine learning is a moonshot for us all

By Darren Strange

Darren Strange asks: What part will we each play in what is sure to be one of the most exciting times in computer science?

Another one bytes the dust

By Paul Brook

Using the music industry as an example, Paul Brook shows how modern information points bring new data that changes the way an organization will make decisions.