Chapter 11. Time-Windowed Features for Real-Time Machine Learning

In Chapter 8, we briefly explored incorporating time-windowed features, such as the moving average of taxi-out delay at the originating airport, as an input to the model. We found that the time-windowed features reduced the model error. However, it was unclear how clients (who know only about the flight they are on) would be able to provide the correct value. Because of that, we decided to drop the time-windowed features. In this chapter, we will address that shortcoming by implementing a real-time, streaming machine learning pipeline that uses Cloud Dataflow and Vertex AI.

All of the code snippets in this chapter are available in the folder 11_realtime of the GitHub repository. See the README.md file in that directory for instructions on how to do the steps described in this chapter.

Time Averages

What time-windowed aggregate features did we want to use, but couldn’t? Flight arrival times are scheduled based on the average taxi-out time at the departure airport at that specific hour. The machine learning model will learn this average quite easily because we are showing the entire dataset and telling the ML model the name of the origin airport. For example, at peak hours in New York’s JFK airport, taxi-out times on the order of an hour are quite common, so airlines take that into account when publishing their flight schedules. It is only when the taxi-out time exceeds the average that we ought to be worried. ...

Get Data Science on the Google Cloud Platform, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.