Time Series Indexing

Book description

Build and use the most popular time series index available today with Python to search and join time series at the subsequence level Purchase of the print or Kindle book includes a free PDF eBook.

Key Features

  • Learn how to implement algorithms and techniques from research papers
  • Get to grips with building time series indexes using iSAX
  • Leverage iSAX to solve real-world time series problems

Book Description

Time series are everywhere, ranging from financial data and system metrics to weather stations and medical records. Being able to access, search, and compare time series data quickly is essential, and this comprehensive guide enables you to do just that by helping you explore SAX representation and the most effective time series index, iSAX.

The book begins by teaching you about the implementation of SAX representation in Python as well as the iSAX index, along with the required theory sourced from academic research papers. The chapters are filled with figures and plots to help you follow the presented topics and understand key concepts easily. But what makes this book really great is that it contains the right amount of knowledge about time series indexing using the right amount of theory and practice so that you can work with time series and develop time series indexes successfully. Additionally, the presented code can be easily ported to any other modern programming language, such as Swift, Java, C, C++, Ruby, Kotlin, Go, Rust, and JavaScript.

By the end of this book, you'll have learned how to harness the power of iSAX and SAX representation to efficiently index and analyze time series data and will be equipped to develop your own time series indexes and effectively work with time series data.

What you will learn

  • Find out how to develop your own Python packages and write simple Python tests
  • Understand what a time series index is and why it is useful
  • Gain a theoretical and practical understanding of operating and creating time series indexes
  • Discover how to use SAX representation and the iSAX index
  • Find out how to search and compare time series
  • Utilize iSAX visualizations to aid in the interpretation of complex or large time series

Who this book is for

This book is for practitioners, university students working with time series, researchers, and anyone looking to learn more about time series. Basic knowledge of UNIX, Linux, and Python and an understanding of basic programming concepts are needed to grasp the topics in this book. This book will also be handy for people who want to learn how to read research papers, learn from them, and implement their algorithms.

Table of contents

  1. Time Series Indexing
  2. Contributors
  3. About the author
  4. About the reviewer
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share your thoughts
    9. Download a free PDF copy of this book
  6. Chapter 1: An Introduction to Time Series and the Required Python Knowledge
    1. Technical requirements
    2. Understanding time series
      1. Time series are everywhere
      2. Essential definitions
      3. Time series data mining
      4. Comparing time series
      5. The Euclidean distance
      6. The Chebyshev distance
    3. What is an index and why do we need indexing?
    4. The Python knowledge that we are going to need
      1. Timing Python code
      2. An introduction to Anaconda
      3. The required Python packages
      4. Setting up our environment
      5. Printing package versions
      6. Creating sample data
      7. Publicly available time series data
      8. How time series are processed
    5. Reading time series from disk
      1. Is all data numeric?
      2. Do all lines have the same amount of data?
      3. Creating subsequences
    6. Visualizing time series
    7. Working with the Matrix Profile
    8. Exploring the MPdist distance
    9. Summary
    10. Resources and useful links
    11. Exercises
  7. Chapter 2: Implementing SAX
    1. Technical requirements
    2. The required theory
      1. Why do we need SAX?
      2. Normalization
      3. Visualizing normalized time series
    3. An introduction to SAX
      1. The cardinality parameter
      2. The segments parameter
      3. How to manually find the SAX representation of a subsequence
      4. Ηow can we divide 10 data points into 3 segments?
      5. Reducing the cardinality of a SAX representation
    4. Developing a Python package
      1. The basics of Python packages
      2. The SAX Python package
    5. Working with the SAX package
      1. Computing the SAX representations of the subsequences of a time series
    6. Counting the SAX representations of a time series
    7. The tsfresh Python package
    8. Creating a histogram of a time series
    9. Calculating the percentiles of a time series
    10. Summary
    11. Useful links
    12. Exercises
  8. Chapter 3: iSAX – The Required Theory
    1. Technical requirements
    2. Background information
      1. Trees and binary trees
    3. Understanding how iSAX works
      1. The cardinality parameter
      2. The segments parameter
      3. The threshold parameter
      4. Computing the normalized mean values
      5. How big can an iSAX index get?
      6. What happens when there is no space left for adding more subsequences to an iSAX index?
    4. How iSAX is constructed
      1. How iSAX is searched
      2. Promotion strategy
      3. Splitting nodes
    5. Manually constructing an iSAX index
    6. Updating the counting.py utility
    7. Summary
    8. Useful links
    9. Exercises
  9. Chapter 4: iSAX – The Implementation
    1. Technical requirements
    2. A quick look at the iSAX Python package
      1. The class for storing subsequences
      2. The class for iSAX nodes
      3. The class for entire iSAX indexes
    3. Explaining the missing parts
    4. Exploring the remaining files
      1. The tools.py file
      2. The variables.py file
      3. The sax.py file
    5. Using the iSAX Python package
      1. Reading the iSAX parameters
      2. How to process subsequences to create an iSAX index
      3. Creating our first iSAX index
      4. Counting the subsequences of an iSAX index
      5. How long does it take to create an iSAX index?
      6. Dealing with iSAX overflows
    6. Summary
    7. Useful links
    8. Exercises
  10. Chapter 5: Joining and Comparing iSAX Indexes
    1. Technical requirements
    2. How the sliding window size affects the iSAX construction speed
    3. Checking the search speed of iSAX indexes
    4. Joining iSAX indexes
    5. Implementing the joining of iSAX indexes
    6. Explaining the Python code
    7. Using the Python code
      1. We have a long list of Euclidean distances, so what?
      2. Saving the output
      3. Finding iSAX nodes without a match
    8. Writing Python tests
      1. What are we going to test?
      2. Comparing the number of subsequences
      3. Checking the number of node splits
      4. All Euclidean distances are 0
      5. Running the tests
    9. Summary
    10. Useful links
    11. Exercises
  11. Chapter 6: Visualizing iSAX Indexes
    1. Technical requirements
    2. Storing an iSAX index in JSON format
      1. Downloading the JavaScript code locally
      2. Running the code locally
    3. Visualizing an iSAX index
      1. Visualizing iSAX as a tree
    4. Trying something radical
    5. More iSAX index visualizations
    6. Using icicle plots
    7. Visualizing iSAX as a Collapsible Tree
    8. Summary
    9. Useful links
    10. Exercises
  12. Chapter 7: Using iSAX to Approximate MPdist
    1. Technical requirements
    2. Understanding the Matrix Profile
      1. What does the Matrix Profile compute?
      2. Manually computing the exact Matrix Profile
    3. Computing the Matrix Profile using iSAX
      1. What happens if there is not a valid match?
      2. Calculating the error
      3. Approximate Matrix Profile implementation
      4. Comparing the accuracy of two different parameter sets
    4. Understanding MPdist
      1. How to compute MPdist
      2. Manually computing MPdist
    5. Calculating MPdist using iSAX
    6. Implementing the MPdist calculation in Python
      1. Using the approximate Matrix Profile way
      2. Using the join of two iSAX indexes
    7. Using the Python code
      1. Comparing the accuracy and the speed of the methods
    8. Summary
    9. Useful links
    10. Exercises
  13. Chapter 8: Conclusions and Next Steps
    1. Concluding all that we have learned so far
    2. Other variations of iSAX
    3. Interesting research papers on time series
    4. Interesting research papers on databases
    5. Useful books
      1. Useful books on databases
      2. Building a strong computer science background
      3. Books on UNIX and Linux
      4. Books on the Python programming language
    6. Summary
    7. Useful links
    8. Exercises
  14. Index
    1. Why subscribe?
  15. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share your thoughts
    3. Download a free PDF copy of this book

Product information

  • Title: Time Series Indexing
  • Author(s): Mihalis Tsoukalos
  • Release date: June 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781838821951