Appendix A. Advanced NumPy

In this appendix, I will go deeper into the NumPy library for array computing. This will include more internal details about the ndarray type and more advanced array manipulations and algorithms.

This appendix contains miscellaneous topics and does not necessarily need to be read linearly. Throughout the chapters, I will generate random data for many examples that will use the default random number generator in the numpy.random module:

In [11]: rng = np.random.default_rng(seed=12345)

A.1 ndarray Object Internals

The NumPy ndarray provides a way to interpret a block of homogeneously typed data (either contiguous or strided) as a multidimensional array object. The data type, or dtype, determines how the data is interpreted as being floating point, integer, Boolean, or any of the other types we’ve been looking at.

Part of what makes ndarray flexible is that every array object is a strided view on a block of data. You might wonder, for example, how the array view arr[::2, ::-1] does not copy any data. The reason is that the ndarray is more than just a chunk of memory and a data type; it also has striding information that enables the array to move through memory with varying step sizes. More precisely, the ndarray internally consists of the following:

  • A pointer to data—that is, a block of data in RAM or in a memory-mapped file

  • The data type or dtype describing fixed-size value cells in the array

  • A tuple indicating the array’s shape

  • A tuple of strides—integers indicating ...

Get Python for Data Analysis, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.