SQLite for Data Scientists

Intermediate to advanced

This live event utilizes Jupyter Notebook technology

After years of NoSQL hype, data scientists are starting to love SQL again. Modern hardware can now easily and economically support the “traditional” relation approach without needing complex distributed systems. SQLite—a C-language library that implements a lightweight, full-featured SQL database engine—is one of the most popular databases in the industry and is used in everything from web applications, web browsers, and every Android smartphone to airplanes.

Join expert Florents Tselai to learn how SQLite can help data scientists iterate the data process as quickly as possible. You’ll explore some of the more advanced features of SQL and SQLite that will help you empower and streamline your workflow as you discover how to seamlessly integrate SQLite into a data pipeline and use it to produce reports simply and efficiently, analyze unstructured and semistructured data, and store machine learning models. Along the way, you’ll also get hands-on with features like triggers—all integrated into Python-based data science scenarios.

What you’ll learn and how you can apply it

By the end of this live online course, you’ll understand:

How standard database concepts such as triggers fit into the modern data science practice
Why SQL is still relevant
How SQL can be more than enough for data cleaning and transformation
How SQL and Python can work nicely together
How to use SQLite to produce advanced reports simply and efficiently
How to integrate SQLite seamlessly into your data pipeline
How to use SQLite to analyse unstructured data (e.g., text) and semistructured data (e.g., JSON documents) without having to use complex systems like ElasticSearch or NLTK
How to use SQLite to store machine learning models

And you’ll be able to:

Replace multiple pandas one-liners with a few lines of SQL code
Replace time-consuming Python-based data cleaning with SQL triggers
Use SQLite to generate reports
Avoid using complex and expensive big data systems and still achieve better results
Use Python for true machine learning and experimentation, instead of “boring” data cleaning
Reduce the financial cost of your machine learning models and analyses

This live event is for you because...

You think SQL could solve many of your problems.
You’re tired of having to book meetings with database administrators, data engineers, business analysts, and project managers to produce a useful report.
Your ETL processes have become too complex, and you want to simplify them.
You’re interested in automating report generation but don’t know how to go about it.
You want to reduce the amount of time you spend writing ETL processes in Python.

Prerequisites

Familiarity with SQL concepts (e.g., simple queries, joins, and group-by clauses) and Python syntax
A basic understanding of the Python data stack (e.g., tools like pandas and scikit-learn) (useful but not required)

Recommended follow-up:

Read Python for Data Analysis, second edition (book)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Understanding SQLite (20 minutes)

Presentation: How SQLite is different from pandas and other databases
Demo: Install SQLite and load a dataset
Q&A

SQL with SQLite: Part 1 (30 minutes)

Presentation: Time-related queries with SQLite; views; common-table expressions (CTEs)
Demo: Run SQL on the SQLite CLI; run SQLite queries via Python/pandas
Jupyter Notebook exercise: Write an SQL view and CTE

Break (5 minutes)

SQL with SQLite: Part 2 (30 minutes)

Presentation: Execution plan; database indices
Jupyter Notebook exercise: Create an index to speed up execution time
Demo: Storing Machine Learning Models
Q&A

Querying Unstructured JSON Data (30 minutes)

Presentation: Unstructured data; JSON and document stores; schemaless data and indices on expressions
Demo: Query JSON with SQLite
Jupyter notebook exercises: Query JSON data, create views, and fetch results into a pandas DataFrame; create indices to speed up JSON queries
Q&A

Break (5 minutes)

Implementing Full-Text Search (FTS5) (30 minutes)

Presentation: How full-text search (FTS) works
Demo: Run FTS queries with SQLite
Jupyter Notebook exercise: Run FTS queries from Python
Q&A

Advanced SQL for data scientists (20 minutes)

Presentation: ETL data with triggers
Presentation: Window functions
Jupyter Notebook exercise: Write a trigger function to clean input data
Q&A

Wrap-up and Q&A (10 minutes)

Your Instructor

Florents Tselai
Florents Tselai has research, engineering, and leadership experience across the entire data supply chain, from modeling and applying machine learning algorithms to engineering and managing production deployments of big data pipelines. He’s collaborated with partners from industries as varied as retail, advertising, energy, insurance, pharmaceuticals, and telecoms, mainly in the areas of predictive analytics, market analysis, and marketing budget allocation. Companies Florents has worked for include Deutsche Telekom, BASF, GfK, and XING.

linkedin search