How do use cases benefit from real-time processing?
Learn some of the benefits of using real-time processing of data for some use cases.
Our take on the ideas, information, and tools that make data work.
Learn some of the benefits of using real-time processing of data for some use cases.
Learn to identify problems that may indicate data team dysfunction.
As a data professional, you are invited to share your valuable insights. Help us gain insight into the demographics, work environments, tools, and compensation of practitioners in our growing field. All responses are reported in aggregate to assure your anonymity. The survey will require approximately 5-10 minutes to complete.
A step-by-step tutorial on how to install and run JupyterHub on gcloud.
Learn the somewhat quirky process for integrating Logstash with the Amazon Elasticsearch Service.
Learn to configure the access policies crucial to working successfully with the Amazon Elasticsearch service.
Explore techniques that allow specific IP address/proxy server access to Kibana, protect your ES cluster, and block entry by unauthorized users.
Working with uncertainty in real-world data.
Learn how to use steps in the EMR console to schedule and run Spark scripts stored in Amazon S3, on both new and existing clusters.
Learn how to create, structure, and compile your Scala script to a JAR file, and use SBT to run on a distributed Spark cluster.
Learn how to manage Apache Spark configuration overrides for an AWS Elastic MapReduce cluster to save time and money.
Learn how to use SSH to connect to the master node of your Elastic MapReduce (EMR) cluster.
Learn three different ways of running Hive queries on your EMR cluster: by script via terminal, the Hue web interface, or steps in the EMR console.
Learn how to set up an SSH tunnel and web proxy to use tools like Hue, Zeppelin, and ResourceManager.
Ziya Ma outlines the challenges for applying machine learning and deep learning at scale and shares solutions that Intel has enabled for customers and partners.
Darren Strange asks: What part will we each play in what is sure to be one of the most exciting times in computer science?
Using the music industry as an example, Paul Brook shows how modern information points bring new data that changes the way an organization will make decisions.
This excerpt from Jake VanderPlas' Python Data Science Handbook
Learn how to use the gensim Python library to determine the similarity between two or more documents.
Learn how to use Impala’s SQL analytics layer to create a Kudu table.
Learn how hash partitioning affects performance and stability in Kudu.
Learn how to pair two top-tier open source technologies to create scalable data engineering pipelines.
The O’Reilly Podcast: Transforming batch storage into streaming data.
Dinesh Nirmal discusses how your data can help you build the right cognitive systems to engage with your customers.