CUDA Fortran for Scientists and Engineers, 2nd Edition

Book description

CUDA Fortran for Scientists and Engineers: Best Practices for Efficient CUDA Fortran Programming shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran. In order to add CUDA Fortran to existing Fortran codes, they explain how to understand the target GPU architecture, identify computationally-intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance – all in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison. This second edition provides much needed updates on how to efficiently program GPUs in CUDA Fortran. It can be used either as a tutorial on GPU programming in CUDA Fortran as well as a reference text.

  • Presents optimization strategies for current hardware, including Hopper generation GPUs
  • Includes discussions of new language and hardware features, including managed memory, tensor cores, shuffle instructions, new multi-GPU paradigms
  • Offers resources and strategies for porting large codes to GPUs, including language features as well as library use

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Preface to the Second Edition
  7. Preface to the First Edition
    1. References
  8. Acknowledgments
  9. Part 1: CUDA Fortran programming
    1. Chapter 1: Introduction
      1. Abstract
      2. 1.1. A brief history of GPU computing
      3. 1.2. Parallel computation
      4. 1.3. Basic concepts
      5. 1.4. Determining CUDA hardware features and limits
      6. 1.5. Error handling
      7. 1.6. Compiling CUDA Fortran code
      8. 1.7. CUDA Driver, Toolkit, and compatibility
    2. Chapter 2: Correctness, accuracy, and debugging
      1. Abstract
      2. 2.1. Assessing correctness of results
      3. 2.2. Debugging
    3. Chapter 3: Performance measurement and metrics
      1. Abstract
      2. 3.1. Measuring execution time
      3. 3.2. Instruction, bandwidth, and latency bound kernels
      4. 3.3. Memory bandwidth
    4. Chapter 4: Synchronization
      1. Abstract
      2. 4.1. Synchronization of kernel execution and data transfers
      3. 4.2. Synchronization of kernel threads on the device
    5. Chapter 5: Optimization
      1. Abstract
      2. 5.1. Transfers between host and device
      3. 5.2. Device memory
      4. 5.3. Execution configuration
      5. 5.4. Instruction optimization
    6. Chapter 6: Porting tips and techniques
      1. Abstract
      2. 6.1. CUF kernels
      3. 6.2. Conditional inclusion of code
      4. 6.3. Renaming variables
      5. 6.4. Minimizing memory footprint for work arrays
      6. 6.5. Array compaction
      7. References
    7. Chapter 7: Interfacing with CUDA C code and CUDA libraries
      1. Abstract
      2. 7.1. Calling user-written CUDA C code
      3. 7.2. cuBLAS
      4. 7.3. cuSPARSE
      5. 7.4. cuSOLVER
      6. 7.5. cuTENSOR
      7. 7.6. Thrust
    8. Chapter 8: Multi-GPU programming
      1. Abstract
      2. 8.1. CUDA multi-GPU features
      3. 8.2. Multi-GPU programming with MPI
      4. References
  10. Part 2: Case studies
    1. Chapter 9: Monte Carlo method
      1. Abstract
      2. 9.1. CURAND
      3. 9.2. Computing π with CUF kernels
      4. 9.3. Computing π with reduction kernels
      5. 9.4. Accuracy of summation
      6. 9.5. Option pricing
      7. References
    2. Chapter 10: Finite difference method
      1. Abstract
      2. 10.1. Nine-point 1D finite difference stencil
      3. 10.2. 2D Laplace equation
      4. References
    3. Chapter 11: Applications of the fast Fourier transform
      1. Abstract
      2. 11.1. CUFFT
      3. 11.2. Spectral derivatives
      4. 11.3. Convolution
      5. 11.4. Poisson solver
      6. References
    4. Chapter 12: Ray tracing
      1. Abstract
      2. 12.1. Generating an image file
      3. 12.2. Vectors in CUDA Fortran
      4. 12.3. Rays, a simple camera, and background
      5. 12.4. Adding a sphere
      6. 12.5. Surface normals and multiple objects
      7. 12.6. Antialiasing
      8. 12.7. Material types
      9. 12.8. Positionable camera
      10. 12.9. Defocus blur
      11. 12.10. Where next?
      12. 12.11. Triangles
      13. 12.12. Lights
      14. 12.13. Textures
      15. References
  11. Part 3: Appendices
    1. Appendix A: System and environment management
      1. A.1. Environment variables
      2. A.2. nvidia-smi – System Management Interface
    2. References
      1. References
    3. Index

Product information

  • Title: CUDA Fortran for Scientists and Engineers, 2nd Edition
  • Author(s): Gregory Ruetsch, Massimiliano Fatica
  • Release date: July 2024
  • Publisher(s): Morgan Kaufmann
  • ISBN: 9780443219764