NoSQL Databases and Elastic Stack Primer
Published by Pearson
NoSQL databases (DBs) have gained much attention with the high volume of data that is generated every minute of every day. As large amounts of this data is not immediately suitable for storage in relational databases, it makes sense to find another way. This is where NoSQL (and consequently platforms such as Elasticsearch) come into play. In this course we learn the elastic stack for NoSQL data storage and retrieval. In more detail, we cover how to use the elastic stack to aggregate log events data in real-time. The elastic stack consists of the following four powerful tools: Elasticsearch, Logstash, Kibana and Beats.
Elasticsearch is a NoSQL DB, distributed search and analytics engine that has multiple benefits. For example, it is easy to install and use and it is a powerful search technology (based on Apache Lucene). Logstash is a log shipping and filtering service (a transportation pipeline) used to populate elasticsearch with data. Kibana is a web-interface that connects users with the elastic search database. It enables visualizations, dashboards and search options. Elasticsearch has become popular with the large open-source community due to its many powerful aspects. Beats is a lightweight data collector.
In this course you will learn the elastic stack from the ground up. We will go through several features of the components of the elastic stack and explain the terminology. We will see live how to install it and configure it correctly. We will also learn how to install useful plugins, see how to add documents to it and execute queries to retrieve any data. In addition, we will cover how to communicate with elasticsearch programmatically (using programming languages such as Python, Java and R).
What you’ll learn and how you can apply it
- Develop understanding of what NoQL databases are
- Learn what the elastic stack is and develop an understanding of its components
- Learn how to correctly install and configure all components of the elastic stack and ensure they can communicate successfully
- Develop an understanding of Elasticsearch’s terminology, indexing and how to create/delete indices
- Learn how to use Logstash, Kibana and Beats with Elasticsearch
- Learn how to add new documents, retrieve documents (i.e. run queries), delete and/or update documents
- Learn how to communicate with Elasticsearch programmatically (using programming languages such as Python, Java and R)
This live event is for you because...
- You are familiar relational databases and how to perform several processes such as storing/retrieving/updating/deleting data but you want to extend your skills to the state-of-the-art way of data storage and retrieval
- You would like to learn what NoSQL is, why it is useful and what are the best scenarios to use it (i.e. you need to store data and you must decide how it is stored)
- You would like to become a competent user of the elastic stack (which is becoming popular by the day)
- You would like to learn how to correctly install and configure the elastic stack on different operating systems
- You would like to learn how to use Elasticsearch programmatically (using programming languages such as Python and R)
Prerequisites
- Familiarity with relational database management systems such as MySQL MS SQL Server and others
- Familiarity with the JSON file format (JavaScript Object Notation)
- Familiarity with communicating with RESTful APIs
Course Set-up
- Any operating system is fine
- Speedy internet connection
- Java 1.8 or later installed on your operating system (with JAVA_HOME setup correctly)
Recommended Preparation
- Book: Elasticsearch: The Definitive Guide. By: Clinton Gormley and Zachary Tong. https://www.oreilly.com/library/view/elasticsearch-the-definitive/9781449358532/
- Video: Amazon Web Services AWS LiveLessons 2nd Edition. By Richard Jones. https://www.oreilly.com/library/view/amazon-web-services/9780135581247/
Recommended Follow-up
- Video: Supercharging Elasticsearch for extended Knowledge Graph use cases. By: Giovanni Tummarello. https://www.oreilly.com/library/view/supercharging-elasticsearch-for/0636920371977/
- Video: Learning Path: Advanced Architecture for Big Data Applications. By: O'Reilly Media, Inc. https://www.oreilly.com/library/view/learning-path-advanced/9781491978665/
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Part 1: Introduction and Elastic Stack Installation and Configuration (50 minutes)
- Introduction and overview of the elastic stack
- Why use the elastic stack (learn many features of its components)
- Understanding the data flow in the elastic stack
- Installing the elastic stack on a cloud instance (on AWS):
- Configuring Elasticsearch, Logstash, Kibana and Beats to work and communicate correctly
- Q&A
Break (10 minutes)
Part 2: Understanding Elasticsearch and Performing CRUD operations on it (50 minutes)
- A deeper look into Elasticsearch and how it works
- Understanding Elasticsearch’s Terminology
- Some useful Elasticsearch plugins
- CRUD operations on Elasticsearch
- Creating Documents in Elasticsearch
- Retrieving Documents from Elasticsearch
- Updating Documents in Elasticsearch
- Deleting Documents from Elasticsearch
- Communicating with Elasticsearch programmatically (using Python and R)
- Q&A
Break (10 minutes)
Part 3: Adding Documents and Logs to the Elastic Stack (50 minutes)
- Installing and configuring nginx to work as a reverse proxy so Kibana can be accessed on the internet
- Using Logstash to collect static Apache logs and analyzing them using Kibana
- Using Logstash to collect static .CSV file and analyzing its data using Kibana
- Collecting real-time web-logs, configuring Beats to upload them to Elasticsearch and analyzing them using Kibana
- Monitoring the performance of the Elastic Stack
Q&A (10 minutes)
Course wrap up
Your Instructor
Noureddin Sadawi
Dr. Noureddin Sadawi is a consultant in machine/deep learning and data science. He has several years’ experience in various areas involving data manipulation and analysis. He received his PhD from the University of Birmingham, United Kingdom. He is the winner of two international scientific software development contests - at TREC2011 and CLEF2012.
Noureddin is an avid scientific software researcher and developer with a passion for learning and teaching new technologies. He is an experienced scientific software developer and data analyst; over the last few years he has been using Python as his preferred programming language. Also, he has been involved in several projects spanning a variety of fields such as bioinformatics, textual/image/video data analysis, drug discovery, omics data analysis and computer network security. He has taught at multiple universities in the UK and has worked as a software engineer in different roles. He is the founder of SoftLight LTD (https://www.softlight.tech/), a London-based company that specialises in data science and machine/deep learning. Recently, he has joined the University of Oxford as a part-time lecturer.