Chapter 4. Frequency and the Fast Fourier Transform
If you want to find the secrets of the universe, think in terms of energy, frequency and vibration.
Nikola Tesla
This chapter was written in collaboration with SW’s father, PW van der Walt.
This chapter will depart slightly from the format of the rest of the book. In particular, you may find the code in the chapter quite modest. Instead, we want to illustrate an elegant algorithm, the Fast Fourier Transform (FFT), that is endlessly useful, is implemented in SciPy, and works, of course, on NumPy arrays.
Introducing Frequency
We’ll start by setting up some plotting styles and importing the usual suspects:
# Make plots appear inline, set custom plotting style
%
matplotlib
inline
import
matplotlib.pyplot
as
plt
plt
.
style
.
use
(
'style/elegant.mplstyle'
)
import
numpy
as
np
The discrete1 Fourier transform (DFT) is a mathematical technique used to convert temporal or spatial data into frequency domain data. Frequency is a familiar concept, due to its colloquial occurrence in the English language: the lowest notes your headphones can rumble out are around 20 Hz, whereas middle C on a piano lies around 261.6 Hz; Hertz, or oscillations per second, in this case literally refers to the number of times per second at which the membrane inside the headphone moves to-and-fro. That, in turn, creates compressed pulses of air which, upon arrival at your eardrum, induces a vibration at the same frequency. So, if you take a simple periodic function, sin(10 × 2πt), you can view it as a wave:
f
=
10
# Frequency, in cycles per second, or Hertz
f_s
=
100
# Sampling rate, or number of measurements per second
t
=
np
.
linspace
(
0
,
2
,
2
*
f_s
,
endpoint
=
False
)
x
=
np
.
sin
(
f
*
2
*
np
.
pi
*
t
)
fig
,
ax
=
plt
.
subplots
()
ax
.
plot
(
t
,
x
)
ax
.
set_xlabel
(
'Time [s]'
)
ax
.
set_ylabel
(
'Signal amplitude'
);
Or you can equivalently think of it as a repeating signal of frequency 10 Hz (it repeats once every 1/10 seconds—a length of time we call its period). Although we naturally associate frequency with time, it can equally well be applied to space. For example, a photo of a textile patterns exhibits high spatial frequency, whereas the sky or other smooth objects have low spatial frequency.
Let us now examine our sinusoid through application of the DFT:
from
scipy
import
fftpack
X
=
fftpack
.
fft
(
x
)
freqs
=
fftpack
.
fftfreq
(
len
(
x
))
*
f_s
fig
,
ax
=
plt
.
subplots
()
ax
.
stem
(
freqs
,
np
.
abs
(
X
))
ax
.
set_xlabel
(
'Frequency in Hertz [Hz]'
)
ax
.
set_ylabel
(
'Frequency Domain (Spectrum) Magnitude'
)
ax
.
set_xlim
(
-
f_s
/
2
,
f_s
/
2
)
ax
.
set_ylim
(
-
5
,
110
)
(-5, 110)
We see that the output of the FFT is a 1D array of the same shape as the input, containing complex values. All values are zero, except for two entries. Traditionally, we visualize the magnitude of the result as a stem plot, in which the height of each stem corresponds to the underlying value.
(We explain why you see positive and negative frequencies later on in “Discrete Fourier Transforms”. You may also refer to that section for a more in-depth overview of the underlying mathematics.)
The Fourier transform takes us from the time to the frequency domain, and this turns out to have a massive number of applications. The fast Fourier transform (FFT) is an algorithm for computing the DFT; it achieves its high speed by storing and reusing results of computations as it progresses.
In this chapter, we examine a few applications of the DFT to demonstrate that the FFT can be applied to multidimensional data (not just 1D measurements) to achieve a variety of goals.
Illustration: A Birdsong Spectrogram
Let’s start with one of the most common applications, converting a sound signal (consisting of variations of air pressure over time) to a spectrogram. You might have seen spectrograms on your music player’s equalizer view, or even on an old-school stereo (Figure 4-1).
Listen to this snippet of the nightingale birdsong (released under CC BY 4.0):
from
IPython.display
import
Audio
Audio
(
'data/nightingale.wav'
)
If you are reading the paper version of this book, you’ll have to use your imagination! It goes something like this: chee-chee-woorrrr-hee-hee cheet-wheet-hoorrr-chirrr-whi-wheo-wheo-wheo-wheo-wheo-wheo.
Since we realize that not everyone is fluent in bird-speak, perhaps it’s best if we visualize the measurements—better known as “the signal”—instead.
We load the audio file, which gives us the sampling rate (number of measurements per second) as well as audio data as an (N, 2)
array—two columns because this is a stereo recording.
from
scipy.io
import
wavfile
rate
,
audio
=
wavfile
.
read
(
'data/nightingale.wav'
)
We convert to mono by averaging the left and right channels.
audio
=
np
.
mean
(
audio
,
axis
=
1
)
Then, we calculate the length of the snippet and plot the audio (Figure 4-2).
N
=
audio
.
shape
[
0
]
L
=
N
/
rate
(
f
'Audio length: {L:.2f} seconds'
)
f
,
ax
=
plt
.
subplots
()
ax
.
plot
(
np
.
arange
(
N
)
/
rate
,
audio
)
ax
.
set_xlabel
(
'Time [s]'
)
ax
.
set_ylabel
(
'Amplitude [unknown]'
);
Audio length: 7.67 seconds
Well, that’s not very satisfying, is it? If I sent this voltage to a speaker, I might hear a bird chirping, but I can’t very well imagine how it would sound in my head. Is there a better way of seeing what is going on?
There is, and it is called the discrete Fourier transform, or DFT, where discrete refers to the recording consisting of time-spaced sound measurements, in contrast to a continual recording as, for example, on magnetic tape (remember cassettes?). The DFT is often computed using the FFT algorithm, a name informally used to refer to the DFT itself. The DFT tells us which frequencies or “notes” to expect in our signal.
Of course, a bird sings many notes throughout the song, so we’d also like to know when each note occurs. The Fourier transform takes a signal in the time domain (i.e., a set of measurements over time) and turns it into a spectrum—a set of frequencies with corresponding (complex2) values. The spectrum does not contain any information about time!3
So, to find both the frequencies and the time at which they were sung, we’ll need to be somewhat clever. Our strategy is as follows: take the audio signal, split it into small, overlapping slices, and apply the Fourier transform to each (a technique known as the short time Fourier transform).
We’ll split the signal into slices of 1,024 samples—that’s about 0.02 seconds of audio. The reason we chose 1,024 and not 1,000 we’ll explain in a second when we examine performance. The slices will overlap by 100 samples as shown here:
Start by chopping up the signal into slices of 1024 samples, each slice overlapping the previous by 100 samples. The resulting slices
object contains one slice per row.
from
skimage
import
util
M
=
1024
slices
=
util
.
view_as_windows
(
audio
,
window_shape
=
(
M
,),
step
=
100
)
(
f
'Audio shape: {audio.shape}, Sliced audio shape: {slices.shape}'
)
Audio shape: (338081,), Sliced audio shape: (3371, 1024)
Generate a windowing function (see “Windowing” for a discussion of the underlying assumptions and interpretations of each) and multiply it with the signal:
win
=
np
.
hanning
(
M
+
1
)[:
-
1
]
slices
=
slices
*
win
It’s more convenient to have one slice per column, so we take the transpose:
slices
=
slices
.
T
(
'Shape of `slices`:'
,
slices
.
shape
)
Shape of `slices`: (1024, 3371)
For each slice, calculate the DFT, which returns both positive and negative frequencies (more on that in “Frequencies and Their Ordering”), so we slice out the positive M2 frequencies for now.
spectrum
=
np
.
fft
.
fft
(
slices
,
axis
=
0
)[:
M
//
2
+
1
:
-
1
]
spectrum
=
np
.
abs
(
spectrum
)
(As a quick aside, you’ll note that we use scipy.fftpack.fft
and np.fft
interchangeably. NumPy provides basic FFT functionality, which SciPy extends further, but both include an fft
function, based on the Fortran FFTPACK.)
The spectrum can contain both very large and very small values. Taking the log compresses the range significantly.
Here we do a log plot of the ratio of the signal divided by the maximum signal (shown in Figure 4-3). The specific unit used for the ratio is the decibel, 20log10 (amplitude ratio).
f
,
ax
=
plt
.
subplots
(
figsize
=
(
4.8
,
2.4
))
S
=
np
.
abs
(
spectrum
)
S
=
20
*
np
.
log10
(
S
/
np
.
max
(
S
))
ax
.
imshow
(
S
,
origin
=
'lower'
,
cmap
=
'viridis'
,
extent
=
(
0
,
L
,
0
,
rate
/
2
/
1000
))
ax
.
axis
(
'tight'
)
ax
.
set_ylabel
(
'Frequency [kHz]'
)
ax
.
set_xlabel
(
'Time [s]'
);
Much better! We can now see that the frequencies vary over time, and the spectrogram corresponds to the way the audio sounds. See if you can match our earlier description: chee-chee-woorrrr-hee-hee cheet-wheet-hoorrr-chirrr-whi-wheo-wheo-wheo-wheo-wheo-wheo. (I didn’t transcribe the 3 to 5 second mark—that’s another bird.)
SciPy already includes an implementation of this procedure as scipy.signal.spectrogram
(Figure 4-4), which can be invoked as follows:
from
scipy
import
signal
freqs
,
times
,
Sx
=
signal
.
spectrogram
(
audio
,
fs
=
rate
,
window
=
'hanning'
,
nperseg
=
1024
,
noverlap
=
M
-
100
,
detrend
=
False
,
scaling
=
'spectrum'
)
f
,
ax
=
plt
.
subplots
(
figsize
=
(
4.8
,
2.4
))
ax
.
pcolormesh
(
times
,
freqs
/
1000
,
10
*
np
.
log10
(
Sx
),
cmap
=
'viridis'
)
ax
.
set_ylabel
(
'Frequency [kHz]'
)
ax
.
set_xlabel
(
'Time [s]'
);
The only differences between the manual spectrogram that we created versus the SciPy’s built-in function are that SciPy returns the spectrum magnitude squared (which turns measured voltage into measured energy), and multiplies it by some normalization factors.4
History
Tracing the exact origins of the Fourier transform is tricky. Some related procedures go as far back as Babylonian times, but it was the hot topics of calculating asteroid orbits and solving the heat (flow) equation that led to several breakthroughs in the early 1800s. Whom exactly among Clairaut, Lagrange, Euler, Gauss, and D’Alembert we should thank is not exactly clear, but Gauss was the first to describe the fast Fourier transform (an algorithm for computing the DFT, popularized by Cooley and Tukey in 1965). Joseph Fourier, after whom the transform is named, first claimed that arbitrary periodic5 functions can be expressed as a sum of trigonometric functions.
Implementation
The DFT functionality in SciPy lives in the scipy.fftpack
module. Among other things, it provides the following DFT-related functionality:
fft
,fft2
,fftn
Compute the DFT using the FFT algorithm in 1, 2, or
n
dimensions.ifft
,ifft2
,ifftn
Compute the inverse of the DFT.
dct
,idct
,dst
,idst
Compute the cosine and sine transforms, and their inverses.
fftshift
,ifftshift
Shift the zero-frequency component to the center of the spectrum and back, respectively (more about that soon).
fftfreq
Return the DFT sample frequencies.
rfft
Compute the DFT of a real sequence, exploiting the symmetry of the resulting spectrum for increased performance. Also used by
fft
internally when applicable.
This list is complemented by the following functions in NumPy:
np.hanning
,np.hamming
,np.bartlett
,np.blackman
,np.kaiser
Tapered windowing functions.
The DFT is also used to perform fast convolutions of large inputs by scipy.signal.fftconvolve
.
SciPy wraps the Fortran FFTPACK library—it is not the fastest out there, but unlike packages such as FFTW, it has a permissive free software license.
Choosing the Length of the DFT
A naive calculation of the DFT takes (N2) operations.6 How come? Well, you have N (complex) sinusoids of different frequencies (2πf × 0, 2πf × 1; 2πf × 3, ..., 2πf × (N – 1)), and you want to see how strongly your signal corresponds to each. Starting with the first, you take the dot-product with the signal (which, in itself, entails N multiplication operations). Repeating this operation N times, once for each sinusoid, then gives N2 operations.
Now, contrast that with the FFT, which is (N log N) in the ideal case due to the clever reuse of calculations—a great improvement! However, the classical Cooley-Tukey algorithm implemented in FFTPACK (and used by SciPy) recursively breaks up the transform into smaller (prime-sized) pieces and only shows this improvement for “smooth” input lengths (an input length is considered smooth when its largest prime factor is small, as shown in Figure 4-5). For large prime-sized pieces, the Bluestein or Rader algorithms can be used in conjunction with the Cooley-Tukey algorithm, but this optimization is not implemented in FFTPACK.7
Let us illustrate:
import
time
from
scipy
import
fftpack
from
sympy
import
factorint
K
=
1000
lengths
=
range
(
250
,
260
)
# Calculate the smoothness for all input lengths
smoothness
=
[
max
(
factorint
(
i
)
.
keys
())
for
i
in
lengths
]
exec_times
=
[]
for
i
in
lengths
:
z
=
np
.
random
.
random
(
i
)
# For each input length i, execute the FFT K times
# and store the execution time
times
=
[]
for
k
in
range
(
K
):
tic
=
time
.
monotonic
()
fftpack
.
fft
(
z
)
toc
=
time
.
monotonic
()
times
.
append
(
toc
-
tic
)
# For each input length, remember the *minimum* execution time
exec_times
.
append
(
min
(
times
))
f
,
(
ax0
,
ax1
)
=
plt
.
subplots
(
2
,
1
,
sharex
=
True
)
ax0
.
stem
(
lengths
,
np
.
array
(
exec_times
)
*
10
**
6
)
ax0
.
set_ylabel
(
'Execution time (µs)'
)
ax1
.
stem
(
lengths
,
smoothness
)
ax1
.
set_ylabel
(
'Smoothness of input length
\n
(lower is better)'
)
ax1
.
set_xlabel
(
'Length of input'
);
The intuition is that, for smooth numbers, the FFT can be broken up into many small pieces. After performing the FFT on the first piece, we can reuse those results in subsequent computations. This explains why we chose a length of 1,024 for our audio slices earlier—it has a smoothness of only 2, resulting in the optimal “radix-2 Cooley-Tukey” algorithm, which computes the FFT using only (N/2)log2N = 5,120 complex multiplications, instead of N2 = 1,048,576. Choosing N = 2m always ensures a maximally smooth N (and, thus, the fastest FFT).
More DFT Concepts
Next, we present a couple of common concepts worth knowing before operating heavy Fourier transform machinery, whereafter we tackle another real-world problem: analyzing target detection in radar data.
Frequencies and Their Ordering
For historical reasons, most implementations return an array where frequencies vary from low to high to low (see “Discrete Fourier Transforms” for further explanation of frequencies). For example, when we do the real Fourier transform of a signal of all ones, an input that has no variation and therefore only has the slowest, constant Fourier component (also known as the “DC,” or Direct Current, component—just electronics jargon for “mean of the signal”), appearing as the first entry:
from
scipy
import
fftpack
N
=
10
fftpack
.
fft
(
np
.
ones
(
N
))
# The first component is np.mean(x) * N
array([ 10.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.-0.j, 0.-0.j, 0.-0.j, 0.-0.j])
When we try the FFT on a rapidly changing signal, we see a high-frequency component appear:
z
=
np
.
ones
(
10
)
z
[::
2
]
=
-
1
(
f
'Applying FFT to {z}'
)
fftpack
.
fft
(
z
)
Applying FFT to [-1. 1. -1. 1. -1. 1. -1. 1. -1. 1.]
array([ 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, 0.+0.j, -10.+0.j, 0.-0.j, 0.-0.j, 0.-0.j, 0.-0.j])
Note that the FFT returns a complex spectrum that, in the case of real inputs, is conjugate symmetrical (i.e., symmetric in the real part and antisymmetric in the imaginary part):
x
=
np
.
array
([
1
,
5
,
12
,
7
,
3
,
0
,
4
,
3
,
2
,
8
])
X
=
fftpack
.
fft
(
x
)
np
.
set_printoptions
(
precision
=
2
)
(
"Real part: "
,
X
.
real
)
(
"Imaginary part:"
,
X
.
imag
)
np
.
set_printoptions
()
Real part: [ 45. 7.09 -12.24 -4.09 -7.76 -1. -7.76 -4.09 -12.24 7.09] Imaginary part: [ 0. -10.96 -1.62 12.03 6.88 0. -6.88 -12.03 1.62 10.96]
(And, again, recall that the first component is np.mean(x) * N
.)
The fftfreq
function tells us which frequencies we are looking at specifically:
fftpack
.
fftfreq
(
10
)
array([ 0. , 0.1, 0.2, 0.3, 0.4, -0.5, -0.4, -0.3, -0.2, -0.1])
The result tells us that our maximum component occurred at a frequency of 0.5 cycles per sample. This agrees with the input, where a minus-one-plus-one cycle repeated every second sample.
Sometimes, it is convenient to view the spectrum organized slightly differently, from high-negative to low-to-high-positive (for now, we won’t dive too deeply into the concept of negative frequency, other than to say a real-world sine wave is produced by a combination of positive and negative frequencies). We reshuffle the spectrum using the fftshift
function.
Let’s examine the frequency components in a noisy image (Figure 4-7). Note that, while a static image has no time-varying component, its values do vary across space. The DFT applies equally to either case.
First, load and display the image:
from
skimage
import
io
image
=
io
.
imread
(
'images/moonlanding.png'
)
M
,
N
=
image
.
shape
f
,
ax
=
plt
.
subplots
(
figsize
=
(
4.8
,
4.8
))
ax
.
imshow
(
image
)
((
M
,
N
),
image
.
dtype
)
(474, 630) uint8
Do not adjust your monitor! The image you are seeing is real, although clearly distorted by either the measurement or transmission equipment.
To examine the spectrum of the image, we use fftn
(instead of fft
) to compute the DFT, since it has more than one dimension. The 2D FFT is equivalent to taking the 1D FFT across rows and then across columns, or vice versa.
F
=
fftpack
.
fftn
(
image
)
F_magnitude
=
np
.
abs
(
F
)
F_magnitude
=
fftpack
.
fftshift
(
F_magnitude
)
Again, we take the log of the spectrum to compress the range of values, before displaying:
f
,
ax
=
plt
.
subplots
(
figsize
=
(
4.8
,
4.8
))
ax
.
imshow
(
np
.
log
(
1
+
F_magnitude
),
cmap
=
'viridis'
,
extent
=
(
-
N
//
2
,
N
//
2
,
-
M
//
2
,
M
//
2
))
ax
.
set_title
(
'Spectrum magnitude'
);
Note the high values around the origin (middle) of the spectrum—these coefficients describe the low frequencies or smooth parts of the image, a vague canvas of the photo. Higher-frequency components, spread throughout the spectrum, fill in the edges and detail. Peaks around higher frequencies correspond to the periodic noise.
From the photo, we can see that the noise (measurement artifacts) is highly periodic, so we hope to remove it by zeroing out the corresponding parts of the spectrum (Figure 4-8).
The image with those peaks suppressed indeed looks quite different!
# Set block around center of spectrum to zero
K
=
40
F_magnitude
[
M
//
2
-
K
:
M
//
2
+
K
,
N
//
2
-
K
:
N
//
2
+
K
]
=
0
# Find all peaks higher than the 98th percentile
peaks
=
F_magnitude
<
np
.
percentile
(
F_magnitude
,
98
)
# Shift the peaks back to align with the original spectrum
peaks
=
fftpack
.
ifftshift
(
peaks
)
# Make a copy of the original (complex) spectrum
F_dim
=
F
.
copy
()
# Set those peak coefficients to zero
F_dim
=
F_dim
*
peaks
.
astype
(
int
)
# Do the inverse Fourier transform to get back to an image.
# Since we started with a real image, we only look at the real part of
# the output.
image_filtered
=
np
.
real
(
fftpack
.
ifft2
(
F_dim
))
f
,
(
ax0
,
ax1
)
=
plt
.
subplots
(
2
,
1
,
figsize
=
(
4.8
,
7
))
ax0
.
imshow
(
np
.
log10
(
1
+
np
.
abs
(
F_dim
)),
cmap
=
'viridis'
)
ax0
.
set_title
(
'Spectrum after suppression'
)
ax1
.
imshow
(
image_filtered
)
ax1
.
set_title
(
'Reconstructed image'
);
Windowing
If we examine the Fourier transform of a rectangular pulse, we see significant side lobes in the spectrum:
x
=
np
.
zeros
(
500
)
x
[
100
:
150
]
=
1
X
=
fftpack
.
fft
(
x
)
f
,
(
ax0
,
ax1
)
=
plt
.
subplots
(
2
,
1
,
sharex
=
True
)
ax0
.
plot
(
x
)
ax0
.
set_ylim
(
-
0.1
,
1.1
)
ax1
.
plot
(
fftpack
.
fftshift
(
np
.
abs
(
X
)))
ax1
.
set_ylim
(
-
5
,
55
);
In theory, you would need a combination of infinitely many sinusoids (frequencies) to represent any abrupt transition; the coefficients would typically have the same side lobe structure as seen here for the pulse.
Importantly, the DFT assumes that the input signal is periodic. If the signal is not, the assumption is simply that, right at the end of the signal, it jumps back to its beginning value. Consider the function, x(t), shown here:
We measure the signal for only a short time, labeled Teff. The Fourier transform assumes that x(8) = x(0), and that the signal is continued as the dashed, rather than the solid, line. This introduces a big jump at the edge, with the expected oscillation in the spectrum:
t
=
np
.
linspace
(
0
,
1
,
500
)
x
=
np
.
sin
(
49
*
np
.
pi
*
t
)
X
=
fftpack
.
fft
(
x
)
f
,
(
ax0
,
ax1
)
=
plt
.
subplots
(
2
,
1
)
ax0
.
plot
(
x
)
ax0
.
set_ylim
(
-
1.1
,
1.1
)
ax1
.
plot
(
fftpack
.
fftfreq
(
len
(
t
)),
np
.
abs
(
X
))
ax1
.
set_ylim
(
0
,
190
);
Instead of the expected two lines, the peaks are spread out in the spectrum.
We can counter this effect by a process called windowing. The original function is multiplied with a window function such as the Kaiser window K(N, β). Here we visualize it for β ranging from 0 to 100:
f
,
ax
=
plt
.
subplots
()
N
=
10
beta_max
=
5
colormap
=
plt
.
cm
.
plasma
norm
=
plt
.
Normalize
(
vmin
=
0
,
vmax
=
beta_max
)
lines
=
[
ax
.
plot
(
np
.
kaiser
(
100
,
beta
),
color
=
colormap
(
norm
(
beta
)))
for
beta
in
np
.
linspace
(
0
,
beta_max
,
N
)
]
sm
=
plt
.
cm
.
ScalarMappable
(
cmap
=
colormap
,
norm
=
norm
)
sm
.
_A
=
[]
plt
.
colorbar
(
sm
)
.
set_label
(
r
'Kaiser $\beta$'
);
By changing the parameter β, we can change the shape of the window from rectangular (β = 0, no windowing) to a window that produces signals that smoothly increase from zero and decrease to zero at the endpoints of the sampled interval, producing very low side lobes (β typically between 5 and 10).9
Applying the Kaiser window here, we see that the side lobes have been drastically reduced, at the cost of a slight widening in the main lobe.
The effect of windowing our previous example is noticeable:
win
=
np
.
kaiser
(
len
(
t
),
5
)
X_win
=
fftpack
.
fft
(
x
*
win
)
plt
.
plot
(
fftpack
.
fftfreq
(
len
(
t
)),
np
.
abs
(
X_win
))
plt
.
ylim
(
0
,
190
);
Real-World Application: Analyzing Radar Data
Linearly modulated FMCW (Frequency-Modulated Continuous-Wave) radars make extensive use of the FFT algorithm for signal processing and provide examples of various applications of the FFT. We will use actual data from an FMCW radar to demonstrate one such application: target detection.
Roughly, an FMCW radar works like this (see “A Simple FMCW Radar System” and Figure 4-9 for more detail):
-
A signal with changing frequency is generated. This signal is transmitted by an antenna, after which it travels outward, away from the radar. When it hits an object, part of the signal is reflected back to the radar, where it is received, multiplied by a copy of the transmitted signal, and sampled, turning it into numbers that are packed into an array. Our challenge is to interpret those numbers to form meaningful results.
-
The preceding multiplication step is important. From school, recall the trigonometric identity:
-
Thus, if we multiply the received signal by the transmitted signal, we expect two frequency components to appear in the spectrum: one that is the difference in frequencies between the received and transmitted signal, and one that is the sum of their frequencies.
-
We are particularly interested in the first, since that gives us some indication of how long it took the signal to reflect back to the radar (in other words, how far away the object is from us!). We discard the other by applying a low-pass filter to the signal (i.e., a filter that discards any high frequencies).
To summarize, we should note that:
- The data that reaches the computer consists of N samples sampled (from the multiplied, filtered signal) at a sample frequency of fs.
- The amplitude of the returned signal varies depending on the strength of the reflection (i.e., is a property of the target object and the distance between the target and the radar).
- The frequency measured is an indication of the distance of the target object from the radar.
To start our analysis of radar data, we’ll generate some synthetic signals, after which we’ll turn our focus to the output of an actual radar.
Recall that the radar is increasing its frequency as it transmits at a rate of S Hz/s. After a certain amount of time, t, has passed, the frequency will now be tS higher (Figure 4-10). In that same time span, the radar signal has traveled d = t/v meters, where v is the speed of the transmitted wave through air (roughly the same as the speed of light, 3 × 108 m/s).
Combining the above observations, we can calculate the amount of time it would take the signal to travel to, bounce off, and return from a target that is distance R away:
pi
=
np
.
pi
# Radar parameters
fs
=
78125
# Sampling frequency in Hz, i.e., we sample 78125
# times per second
ts
=
1
/
fs
# Sampling time, i.e., one sample is taken each
# ts seconds
Teff
=
2048.0
*
ts
# Total sampling time for 2048 samples
# (AKA effective sweep duration) in seconds.
Beff
=
100e6
# Range of transmit signal frequency during the time the
# radar samples, known as the "effective bandwidth"
# (given in Hz)
S
=
Beff
/
Teff
# Frequency sweep rate in Hz/s
# Specification of targets. We made these targets up, imagining they
# are objects seen by the radar with the specified range and size.
R
=
np
.
array
([
100
,
137
,
154
,
159
,
180
])
# Ranges (in meter)
M
=
np
.
array
([
0.33
,
0.2
,
0.9
,
0.02
,
0.1
])
# Target size
P
=
np
.
array
([
0
,
pi
/
2
,
pi
/
3
,
pi
/
5
,
pi
/
6
])
# Randomly chosen phase offsets
t
=
np
.
arange
(
2048
)
*
ts
# Sample times
fd
=
2
*
S
*
R
/
3E8
# Frequency differences for these targets
# Generate five targets
signals
=
np
.
cos
(
2
*
pi
*
fd
*
t
[:,
np
.
newaxis
]
+
P
)
# Save the signal associated with the first target as an example for
# later inspection
v_single
=
signals
[:,
0
]
# Weigh the signals, according to target size and sum, to generate
# the combined signal seen by the radar.
v_sim
=
np
.
sum
(
M
*
signals
,
axis
=
1
)
## The above code is equivalent to:
#
# v0 = np.cos(2 * pi * fd[0] * t)
# v1 = np.cos(2 * pi * fd[1] * t + pi / 2)
# v2 = np.cos(2 * pi * fd[2] * t + pi / 3)
# v3 = np.cos(2 * pi * fd[3] * t + pi / 5)
# v4 = np.cos(2 * pi * fd[4] * t + pi / 6)
#
## Blend them together
# v_single = v0
# v_sim = (0.33 * v0) + (0.2 * v1) + (0.9 * v2) + (0.02 * v3) + (0.1 * v4)
Here, we generated a synthetic signal, vsingle, received when looking at a single target (see Figure 4-11). By counting the number of cycles seen in a given time period, we can compute the frequency of the signal and thus the distance to the target.
A real radar will rarely receive only a single echo, though. The simulated signal vsim shows what a radar signal will look like with five targets at different ranges (including two close to one another at 154 and 159 meters), and vactual(t) shows the output signal obtained with an actual radar. When we add multiple echoes together, the result makes little intuitive sense (Figure 4-11); until, that is, we look at it more carefully through the lens of the DFT.
The real-world radar data is read from a NumPy-format .npz file (a lightweight, cross-platform, and cross-version compatible storage format). These files can be saved with the np.savez
or np.savez_compressed
functions. Note that SciPy’s io
submodule can also easily read other formats, such as MATLAB and NetCDF files.
data
=
np
.
load
(
'data/radar_scan_0.npz'
)
# Load variable 'scan' from 'radar_scan_0.npz'
scan
=
data
[
'scan'
]
# The dataset contains multiple measurements, each taken with the
# radar pointing in a different direction. Here we take one such as
# measurement, at a specified azimuth (left-right position) and elevation
# (up-down position). The measurement has shape (2048,).
v_actual
=
scan
[
'samples'
][
5
,
14
,
:]
# The signal amplitude ranges from -2.5V to +2.5V. The 14-bit
# analogue-to-digital converter in the radar gives out integers
# between -8192 to 8192. We convert back to voltage by multiplying by
# $(2.5 / 8192)$.
v_actual
=
v_actual
*
(
2.5
/
8192
)
Since .npz files can store multiple variables, we have to select the one we want: data['scan']
. That returns a structured NumPy array with the following fields:
time
Unsigned 64-bit (8 byte) integer (
np.uint64
)size
Unsigned 32-bit (4 byte) integer (
np.uint32
)position
-
az
32-bit float (
np.float32
)el
32-bit float (
np.float32
)region_type
Unsigned 8-bit (1 byte) integer (
np.uint8
)region_ID
Unsigned 16-bit (2 byte) integer (
np.uint16
)gain
Unsigned 8-bit (1 byte) integer (
np.uin8
)samples
2,048 unsigned 16-bit (2 byte) integers (
np.uint16
)
While it is true that NumPy arrays are homogeneous (i.e., all the elements inside are the same), it does not mean that those elements cannot be compound elements, as is the case here.
An individual field is accessed using dictionary syntax:
azimuths
=
scan
[
'position'
][
'az'
]
# Get all azimuth measurements
To summarize what we’ve seen so far: the shown measurements (vsim and vactual) are the sum of sinusoidal signals reflected by each of several objects. We need to determine each of the constituent components of these composite radar signals. The FFT is the tool that will do this for us.
Signal Properties in the Frequency Domain
First, we take the FFTs of our three signals (synthetic single target, synthetic multi-target, and real) and then display the positive frequency components (i.e., components 0 to N/2; see Figure 4-12). These are called the range traces in radar terminology.
fig
,
axes
=
plt
.
subplots
(
3
,
1
,
sharex
=
True
,
figsize
=
(
4.8
,
2.4
))
# Take FFTs of our signals. Note the convention to name FFTs with a
# capital letter.
V_single
=
np
.
fft
.
fft
(
v_single
)
V_sim
=
np
.
fft
.
fft
(
v_sim
)
V_actual
=
np
.
fft
.
fft
(
v_actual
)
N
=
len
(
V_single
)
with
plt
.
style
.
context
(
'style/thinner.mplstyle'
):
axes
[
0
]
.
plot
(
np
.
abs
(
V_single
[:
N
//
2
]))
axes
[
0
]
.
set_ylabel
(
"$|V_\mathrm{single}|$"
)
axes
[
0
]
.
set_xlim
(
0
,
N
//
2
)
axes
[
0
]
.
set_ylim
(
0
,
1100
)
axes
[
1
]
.
plot
(
np
.
abs
(
V_sim
[:
N
//
2
]))
axes
[
1
]
.
set_ylabel
(
"$|V_\mathrm{sim} |$"
)
axes
[
1
]
.
set_ylim
(
0
,
1000
)
axes
[
2
]
.
plot
(
np
.
abs
(
V_actual
[:
N
//
2
]))
axes
[
2
]
.
set_ylim
(
0
,
750
)
axes
[
2
]
.
set_ylabel
(
"$|V_\mathrm{actual}|$"
)
axes
[
2
]
.
set_xlabel
(
"FFT component $n$"
)
for
ax
in
axes
:
ax
.
grid
()
Suddenly, the information makes sense!
The plot for |V0| clearly shows a target at component 67, and for |Vsim| shows the targets that produced the signal that was uninterpretable in the time domain. The real radar signal, |Vactual|, shows a large number of targets between component 400 and 500 with a large peak in component 443. This happens to be an echo return from a radar illuminating the high wall of an open-cast mine.
To get useful information from the plot, we must determine the range! Again, we use the formula:
In radar terminology, each DFT component is known as a range bin.
This equation also defines the range resolution of the radar: targets will only be distinguishable if they are separated by more than two range bins, for example:
This is a fundamental property of all types of radar.
This result is quite satisfying—but the dynamic range is so large that we could very easily miss some peaks. Let’s take the log as before with the spectrogram:
c
=
3e8
# Approximately the speed of light and of
# electromagnetic waves in air
fig
,
(
ax0
,
ax1
,
ax2
)
=
plt
.
subplots
(
3
,
1
)
def
dB
(
y
):
"Calculate the log ratio of y / max(y) in decibel."
y
=
np
.
abs
(
y
)
y
/=
y
.
max
()
return
20
*
np
.
log10
(
y
)
def
log_plot_normalized
(
x
,
y
,
ylabel
,
ax
):
ax
.
plot
(
x
,
dB
(
y
))
ax
.
set_ylabel
(
ylabel
)
ax
.
grid
()
rng
=
np
.
arange
(
N
//
2
)
*
c
/
2
/
Beff
with
plt
.
style
.
context
(
'style/thinner.mplstyle'
):
log_plot_normalized
(
rng
,
V_single
[:
N
//
2
],
"$|V_0|$ [dB]"
,
ax0
)
log_plot_normalized
(
rng
,
V_sim
[:
N
//
2
],
"$|V_5|$ [dB]"
,
ax1
)
log_plot_normalized
(
rng
,
V_actual
[:
N
//
2
],
"$|V_{\mathrm{actual}}|$ [dB]"
,
ax2
)
ax0
.
set_xlim
(
0
,
300
)
# Change x limits for these plots so that
ax1
.
set_xlim
(
0
,
300
)
# we are better able to see the shape of the peaks.
ax2
.
set_xlim
(
0
,
len
(
V_actual
)
//
2
)
ax2
.
set_xlabel
(
'range'
)
The observable dynamic range is much improved in these plots. For instance, in the real radar signal the noise floor of the radar has become visible (i.e., the level where electronic noise in the system starts to limit the radar’s ability to detect a target).
Windowing, Applied
We’re getting there, but in the spectrum of the simulated signal, we still cannot distinguish the peaks at 154 and 159 meters. Who knows what we’re missing in the real-world signal! To sharpen the peaks, we’ll return to our toolbox and make use of windowing.
Here are the signals used thus far in this example, windowed with a Kaiser window with β = 6.1:
f
,
axes
=
plt
.
subplots
(
3
,
1
,
sharex
=
True
,
figsize
=
(
4.8
,
2.8
))
t_ms
=
t
*
1000
# Sample times in milli-second
w
=
np
.
kaiser
(
N
,
6.1
)
# Kaiser window with beta = 6.1
for
n
,
(
signal
,
label
)
in
enumerate
([(
v_single
,
r
'$v_0 [V]$'
),
(
v_sim
,
r
'$v_5 [V]$'
),
(
v_actual
,
r
'$v_{\mathrm{actual}} [V]$'
)]):
with
plt
.
style
.
context
(
'style/thinner.mplstyle'
):
axes
[
n
]
.
plot
(
t_ms
,
w
*
signal
)
axes
[
n
]
.
set_ylabel
(
label
)
axes
[
n
]
.
grid
()
axes
[
2
]
.
set_xlim
(
0
,
t_ms
[
-
1
])
axes
[
2
]
.
set_xlabel
(
'Time [ms]'
);
And the corresponding FFTs, or “range traces,” in radar terms:
V_single_win
=
np
.
fft
.
fft
(
w
*
v_single
)
V_sim_win
=
np
.
fft
.
fft
(
w
*
v_sim
)
V_actual_win
=
np
.
fft
.
fft
(
w
*
v_actual
)
fig
,
(
ax0
,
ax1
,
ax2
)
=
plt
.
subplots
(
3
,
1
)
with
plt
.
style
.
context
(
'style/thinner.mplstyle'
):
log_plot_normalized
(
rng
,
V_single_win
[:
N
//
2
],
r
"$|V_{0,\mathrm{win}}|$ [dB]"
,
ax0
)
log_plot_normalized
(
rng
,
V_sim_win
[:
N
//
2
],
r
"$|V_{5,\mathrm{win}}|$ [dB]"
,
ax1
)
log_plot_normalized
(
rng
,
V_actual_win
[:
N
//
2
],
r
"$|V_\mathrm{actual,win}|$ [dB]"
,
ax2
)
ax0
.
set_xlim
(
0
,
300
)
# Change x limits for these plots so that
ax1
.
set_xlim
(
0
,
300
)
# we are better able to see the shape of the peaks.
ax1
.
annotate
(
"New, previously unseen!"
,
(
160
,
-
35
),
xytext
=
(
10
,
15
),
textcoords
=
"offset points"
,
color
=
'red'
,
size
=
'x-small'
,
arrowprops
=
dict
(
width
=
0.5
,
headwidth
=
3
,
headlength
=
4
,
fc
=
'k'
,
shrink
=
0.1
));
Compare these with the earlier range traces. There is a dramatic lowering in side-lobe level, but at a price: the peaks have changed in shape, widening and becoming less peaky, thus lowering the radar resolution—that is, the ability of the radar to distinguish between two closely spaced targets. The choice of window is a compromise between side-lobe level and resolution. Even so, referring to the trace for Vsim, windowing has dramatically increased our ability to distinguish the small target from its large neighbor.
In the real radar data range trace, windowing has also reduced the side lobes. This is most visible in the depth of the notch between the two groups of targets.
Radar Images
Knowing how to analyze a single trace, we can expand to looking at radar images.
The data is produced by a radar with a parabolic reflector antenna. It produces a highly directive round pencil beam with a two-degree spreading angle between half-power points. When directed with normal incidence at a plane, the radar will illuminate a spot of about 2m in diameter at a distance of 60m. Outside this spot, the power drops off quite rapidly, but strong echoes from outside the spot will nevertheless still be visible.
By varying the pencil beam’s azimuth (left-right position) and elevation (up-down position), we can sweep it across the target area of interest. When reflections are picked up, we can calculate the distance to the reflector (the object hit by the radar signal). Together with the current pencil beam azimuth and elevation, this defines the reflector’s position in 3D.
A rock slope consists of thousands of reflectors. A range bin can be thought of as a large sphere with the radar at its center that intersects the slope along a ragged line. The scatterers on this line will produce reflections in this range bin. The wavelength of the radar (distance the transmitted wave travels in one oscillation second) is about 30mm. The reflections from scatterers separated by odd multiples of a quarter wavelength in range, about 7.5mm, will tend to interfere destructively, while those from scatterers separated by multiples of a half wavelength will tend to interfere constructively at the radar. The reflections combine to produce apparent spots of strong reflections. This specific radar moves its antenna in order to scan small regions consisting of 20 degrees azimuth and 30 degrees elevation bins scanned in steps of 0.5 degrees.
We will now draw some contour plots of the resulting radar data. Refer to Figure 4-13 to see how the different slices are taken. A first slice at fixed range shows the strength of echoes against elevation and azimuth. Another two slices at fixed elevation and azimuth, respectively, show the slope (see Figures 4-13 and 4-14). The stepped construction of the high wall in an opencast mine is visible in the azimuth plane.
data
=
np
.
load
(
'data/radar_scan_1.npz'
)
scan
=
data
[
'scan'
]
# The signal amplitude ranges from -2.5V to +2.5V. The 14-bit
# analogue-to-digital converter in the radar gives out integers
# between -8192 to 8192. We convert back to voltage by multiplying by
# $(2.5 / 8192)$.
v
=
scan
[
'samples'
]
*
2.5
/
8192
win
=
np
.
hanning
(
N
+
1
)[:
-
1
]
# Take FFT for each measurement
V
=
np
.
fft
.
fft
(
v
*
win
,
axis
=
2
)[::
-
1
,
:,
:
N
//
2
]
contours
=
np
.
arange
(
-
40
,
1
,
2
)
# ignore MPL layout warnings
import
warnings
warnings
.
filterwarnings
(
'ignore'
,
'.*Axes.*compatible.*tight_layout.*'
)
f
,
axes
=
plt
.
subplots
(
2
,
2
,
figsize
=
(
4.8
,
4.8
),
tight_layout
=
True
)
labels
=
(
'Range'
,
'Azimuth'
,
'Elevation'
)
def
plot_slice
(
ax
,
radar_slice
,
title
,
xlabel
,
ylabel
):
ax
.
contourf
(
dB
(
radar_slice
),
contours
,
cmap
=
'magma_r'
)
ax
.
set_title
(
title
)
ax
.
set_xlabel
(
xlabel
)
ax
.
set_ylabel
(
ylabel
)
ax
.
set_facecolor
(
plt
.
cm
.
magma_r
(
-
40
))
with
plt
.
style
.
context
(
'style/thinner.mplstyle'
):
plot_slice
(
axes
[
0
,
0
],
V
[:,
:,
250
],
'Range=250'
,
'Azimuth'
,
'Elevation'
)
plot_slice
(
axes
[
0
,
1
],
V
[:,
3
,
:],
'Azimuth=3'
,
'Range'
,
'Elevation'
)
plot_slice
(
axes
[
1
,
0
],
V
[
6
,
:,
:]
.
T
,
'Elevation=6'
,
'Azimuth'
,
'Range'
)
axes
[
1
,
1
]
.
axis
(
'off'
)
3D visualization
We can also visualize the volume in three dimensions (Figure 4-15).
We first compute the argmax (the index of the maximum value) in the range direction. This should give an indication of the range at which the radar beam hit the rock slope. Each argmax index is converted to a 3D (elevation-azimuth-range) coordinate:
r
=
np
.
argmax
(
V
,
axis
=
2
)
el
,
az
=
np
.
meshgrid
(
*
[
np
.
arange
(
s
)
for
s
in
r
.
shape
],
indexing
=
'ij'
)
axis_labels
=
[
'Elevation'
,
'Azimuth'
,
'Range'
]
coords
=
np
.
column_stack
((
el
.
flat
,
az
.
flat
,
r
.
flat
))
Taking these coordinates, we project them onto the azimuth-elevation plane (by dropping the range coordinate) and perform a Delaunay tesselation. The tesselation returns a set of indices into our coordinates that define triangles (or simplices). While the triangles are strictly speaking defined on the projected coordinates, we use our original coordinates for the reconstruction, thereby adding back the range component:
from
scipy
import
spatial
d
=
spatial
.
Delaunay
(
coords
[:,
:
2
])
simplexes
=
coords
[
d
.
vertices
]
For display purposes, we swap the range axis to be the first:
coords
=
np
.
roll
(
coords
,
shift
=-
1
,
axis
=
1
)
axis_labels
=
np
.
roll
(
axis_labels
,
shift
=-
1
)
Now, Matplotlib’s trisurf
can be used to visualize the result:
# This import initializes Matplotlib's 3D machinery
from
mpl_toolkits.mplot3d
import
Axes3D
# Set up the 3D axis
f
,
ax
=
plt
.
subplots
(
1
,
1
,
figsize
=
(
4.8
,
4.8
),
subplot_kw
=
dict
(
projection
=
'3d'
))
with
plt
.
style
.
context
(
'style/thinner.mplstyle'
):
ax
.
plot_trisurf
(
*
coords
.
T
,
triangles
=
d
.
vertices
,
cmap
=
'magma_r'
)
ax
.
set_xlabel
(
axis_labels
[
0
])
ax
.
set_ylabel
(
axis_labels
[
1
])
ax
.
set_zlabel
(
axis_labels
[
2
],
labelpad
=-
3
)
ax
.
set_xticks
([
0
,
5
,
10
,
15
])
# Adjust the camera position to match our diagram above
ax
.
view_init
(
azim
=-
50
);
Further Applications of the FFT
The preceding examples show just one of the uses of the FFT in radar. There are many others, such as movement (Doppler) measurement and target recognition. The FFT is pervasive, and is seen everywhere from MRI to statistics. With the basic techniques that this chapter outlines in hand, you should be well equipped to use it!
Further Reading
- Athanasios Papoulis, The Fourier Integral and Its Applications (New York: McGraw-Hill, 1960).
- Ronald A. Bracewell, The Fourier Transform and Its Applications (New York: McGraw-Hill, 1986).
On radar signal processing:
- Mark A. Richards, James A. Scheer, and William A. Holm, eds., Principles of Modern Radar: Basic Principles (Raleigh, NC: SciTech, 2010).
- Mark A. Richards, Fundamentals of Radar Signal Processing (New York: McGraw-Hill, 2014).
Exercise: Image Convolution
The FFT is often used to speed up image convolution (convolution is the application of a sliding filter). Convolve an image with np.ones((5, 5))
, using a) NumPy’s np.convolve
and b) np.fft.fft2
. Confirm that the results are identical.
Hints:
- The convolution of
x
andy
is equivalent toifft2(X * Y)
, whereX
andY
are the FFTs of x and y, respectively. - In order to multiply
X
andY
, they have to be the same size. Usenp.pad
to extendx
andy
with zeros (toward the right and bottom) before taking their FFT. - You may see some edge effects. You can remove these by increasing the padding size, so that both
x
andy
have dimensionsshape(x) + shape(y) - 1
.
Check out “Solution: Image Convolution”.
1 The DFT operates on sampled data, in contrast to the standard Fourier transform, which is defined for continuous functions.
2 The Fourier transform essentially tells us how to combine a set of sinusoids of varying frequency to form the input signal. The spectrum consists of complex numbers—one for each sinusoid. A complex number encodes two things: a magnitude and an angle. The magnitude is the strength of the sinusoid in the signal, and the angle is how much it is shifted in time. At this point, we only care about the magnitude, which we calculate using np.abs
.
3 For more on techniques for calculating both (approximate) frequencies and time of occurrence, read up on wavelet analysis.
4 SciPy goes to some effort to preserve the energy in the spectrum. Therefore, when taking only half the components (for N even), it multiplies the remaining components, apart from the first and last components, by two (those two components are “shared” by the two halves of the spectrum). It also normalizes the window by dividing it by its sum.
5 The period can, in fact, also be infinite! The general continuous Fourier transform provides for this possibility. DFTs are generally defined over a finite interval, and this is implicitly the period of the time domain function that is transformed. In other words, if you do the inverse DFT, you always get a periodic signal out.
6 In computer science, the computational cost of an algorithm is often expressed in “Big O” notation. The notation gives us an indication of how an algorithm’s execution time scales with an increasing number of elements. If an algorithm is (N), its execution time increases linearly with the number of input elements (e.g., searching for a given value in an unsorted list is (N). Bubble sort is an example of an O(N2) algorithm; the exact number of operations performed may, hypothetically, be , meaning that the computational cost grows quadratically with the number of input elements.
7 While ideally we don’t want to reimplement existing algorithms, sometimes it becomes necessary in order to obtain the best execution speeds possible, and tools like Cython, which compiles Python to C, and Numba, which does just-in-time compilation of Python code, make life a lot easier (and faster!). If you are able to use GPL-licensed software, you may consider using PyFFTW for faster FFTs.
8 We leave it as an exercise for the reader to picture the situation for N odd. In this chapter, all examples use even-order DFTs.
9 The classical windowing functions include Hann, Hamming, and Blackman. They differ in their side lobe levels and in the broadening of the main lobe (in the Fourier domain). A modern and flexible window function that is close to optimal for most applications is the Kaiser window—a good approximation to the optimal prolate spheroid window, which concentrates the most energy into the main lobe. We can tune the Kaiser window to suit the particular application, as illustrated in the main text, by adjusting the parameter β.
Get Elegant SciPy now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.