Book description
GPU Computing Gems Emerald Edition offers practical techniques in parallel computing using graphics processing units (GPUs) to enhance scientific research. The first volume in Morgan Kaufmann's Applications of GPU Computing Series, this book offers the latest insights and research in computer vision, electronic design automation, and emerging data-intensive applications. It also covers life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, video and image processing.
This book is intended to help those who are facing the challenge of programming systems to effectively use GPUs to achieve efficiency and performance goals. It offers developers a window into diverse application areas, and the opportunity to gain insights from others' algorithm work that they may apply to their own projects. Readers will learn from the leading researchers in parallel programming, who have gathered their solutions and experience in one volume under the guidance of expert area editors. Each chapter is written to be accessible to researchers from other domains, allowing knowledge to cross-pollinate across the GPU spectrum. Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution. The insights and ideas as well as practical hands-on skills in the book can be immediately put to use.
Computer programmers, software engineers, hardware engineers, and computer science students will find this volume a helpful resource. For useful source codes discussed throughout the book, the editors invite readers to the following website:
Table of contents
- Cover Image
- Table of Contents
- Front Matter
- Copyright
- Editors, Reviewers, and Authors
- Introduction
- Introduction
- Chapter 1. GPU-Accelerated Computation and Interactive Display of Molecular Orbitals
- 1.1. Introduction, Problem Statement, and Context
- 1.2. Core Method
- 1.3. Algorithms, Implementations, and Evaluations
- 1.4. Final Evaluation
- 1.5. Future Directions
- Chapter 2. Large-Scale Chemical Informatics on GPUs
- 2.1. Introduction, Problem Statement, and Context
- 2.2. Core Methods
- 2.3. Gaussian Shape Overlay: Parallelization and Arithmetic Optimization
- 2.4. LINGO: Algorithmic Transformation and Memory Optimization
- 2.5. Final Evaluation
- 2.6. Future Directions
- Chapter 3. Dynamical Quadrature Grids
- 3.1. Introduction
- 3.2. Core Method
- 3.3. Implementation
- 3.4. Performance Improvement
- 3.5. Future Work
- Chapter 4. Fast Molecular Electrostatics Algorithms on GPUs
- 4.1. Introduction, Problem Statement, and Context
- 4.2. Core Method
- 4.3. Algorithms, Implementations, and Evaluations
- 4.4. Final Evaluation
- 4.5. Future Directions
- Chapter 5. Quantum Chemistry
- 5.1. Problem Statement
- 5.2. Core Technology and Algorithm
- 5.3. The Key Insight on the Implementation—the Choice of Building Blocks
- 5.4. Final Evaluation and Benefits
- 5.5. Conclusions and Future Directions
- Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-Body Algorithm
- 6.1. Introduction, Problem Statement, and Context
- 6.2. Core Methods
- 6.3. Algorithms and Implementations
- 6.4. Evaluation and Validation of Results, Total Benefits, and Limitations
- 6.5. Future Directions
- Chapter 7. Leveraging the Untapped Computation Power of GPUs
- 7.1. Background and Problem Statement
- 7.2. Flux Calculation and Aggregation
- 7.3. The GRASSY Platform
- 7.4. Initial Testing
- 7.5. Impact and Future Directions
- Chapter 8. Black Hole Simulations with CUDA
- 8.1. Introduction
- 8.2. The Post-Newtonian Approximation
- 8.3. Numerical Algorithm
- 8.4. GPU Implementation
- 8.5. Performance Results
- 8.6. GPU Supercomputing Clusters
- 8.7. Statistical Results for Black Hole Inspirals
- 8.8. Conclusion
- Chapter 9. Treecode and Fast Multipole Method for N-Body Simulation with CUDA
- 9.1. Introduction
- 9.2. Fast N-Body Simulation
- 9.3. CUDA Implementation of the Fast N-Body Algorithms
- 9.4. Improvements of Performance
- 9.5. Detailed Description of the GPU Kernels
- 9.6. Overview of Advanced Techniques
- 9.7. Conclusions
- Chapter 10. Wavelet-Based Density Functional Theory Calculation on Massively Parallel Hybrid Architectures
- 10.1. Introduction, Problem Statement, and Context
- 10.2. Core Method
- 10.3. Algorithms, Implementations, and Evaluations
- 10.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 10.5. Conclusions and Future Directions
- Introduction
- Chapter 11. Accurate Scanning of Sequence Databases with the Smith-Waterman Algorithm
- 11.1. Introduction, Problem Statement, and Context
- 11.2. Core Method
- 11.3. CUDA implementation of the SW algorithm for identification of homologous proteins
- 11.4. Discussion
- 11.5. Final Evaluation
- Chapter 12. Massive Parallel Computing to Accelerate Genome-Matching
- 12.1. Introduction, Problem Statement, and Context
- 12.2. Core Methods
- 12.3. Algorithms, Implementations, and Evaluations
- 12.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 12.5. Future Directions
- Chapter 13. GPU-Supercomputer Acceleration of Pattern Matching
- 13.1. Introduction, Problem Statement, and Context
- 13.2. Core Method
- 13.3. Algorithms, Implementations, and Evaluations
- 13.4. Final Evaluation
- 13.5. Future Direction
- Chapter 14. GPU Accelerated RNA Folding Algorithm
- 14.1. Problem Statement
- 14.2. Core Method
- 14.3. Algorithms, Implementations, and Evaluations
- 14.4. Final Evaluation
- 14.5. Future Directions
- Chapter 15. Temporal Data Mining for Neuroscience
- 15.1. Introduction
- 15.2. Core Methodology
- 15.3. GPU Parallelization: Algorithms and Implementations
- 15.4. Experimental Results
- 15.5. Discussion
- Introduction
- Chapter 16. Parallelization Techniques for Random Number Generators
- 16.1. Introduction
- 16.2. L'Ecuyer's Multiple Recursive Generator MRG32k3a
- 16.3. Sobol Generator
- 16.4. Mersenne Twister MT19937
- 16.5. Performance Benchmarks
- Chapter 17. Monte Carlo Photon Transport on the GPU
- 17.1. Physics of Photon Transport
- 17.2. Photon Transport on the GPU
- 17.3. The Complete System
- 17.4. Results and Evaluation
- 17.5. Future Directions
- Chapter 18. High-Performance Iterated Function Systems
- 18.1. Problem Statement and Mathematical Background
- 18.2. Core Technology
- 18.3. Implementation
- 18.4. Final Evaluation
- 18.5. Conclusion
- Introduction
- Chapter 19. Large-Scale Machine Learning
- 19.1. Introduction
- 19.2. Core Technology
- 19.3. GPU Algorithm and Implementation
- 19.4. Improvements of Performance
- 19.5. Conclusions and Future Work
- Chapter 20. Multiclass Support Vector Machine
- 20.1. Introduction, Problem Statement, and Context
- 20.2. Core Method
- 20.3. Algorithms, Implementations, and Evaluations
- 20.4. Final Evaluation
- 20.5. Future Direction
- Chapter 21. Template-Driven Agent-Based Modeling and Simulation with CUDA
- 21.1. Introduction, Problem Statement, and Context
- 21.2. Final Evaluation and Validation of Results
- 21.3. Conclusions, Benefits and Limitations, and Future Work
- Chapter 22. GPU-Accelerated Ant Colony Optimization
- 22.1. Introduction, Problem Statement, and Context
- 22.2. Core Method
- 22.3. Algorithms, Implementations, and Evaluations
- 22.4. Final Evaluation
- 22.5. Future Direction
- Introduction
- Chapter 23. High-Performance Gate-Level Simulation with GP-GPUs
- 23.1. Introduction
- 23.2. Simulator Overview
- 23.3. Compilation and Simulation
- 23.4. Experimental Results
- 23.5. Future Directions
- Chapter 24. GPU-Based Parallel Computing for Fast Circuit Optimization
- 24.1. Introduction, Problem Statement, and Context
- 24.2. Core Method
- 24.3. Algorithms, Implementations, and Evaluations
- 24.4. Final Evaluation
- 24.5. Future Direction
- Introduction
- Chapter 25. Lattice Boltzmann Lighting Models
- 25.1. Introduction, Problem Statement, and Context
- 25.2. Core Methods
- 25.3. Algorithms, Implementation, and Evaluation
- 25.4. Final Evaluation
- 25.5. Future Directions
- 25.6. Derivation of the Diffusion Equation
- Chapter 26. Path Regeneration for Random Walks
- 26.1. Introduction
- 26.2. Path Tracing as Case Study
- 26.3. Random Walks in Path Tracing
- 26.4. Implementation Details
- 26.5. Results
- 26.6. Discussion
- Chapter 27. From Sparse Mocap to Highly Detailed Facial Animation
- 27.1. System Overview
- 27.2. Background
- 27.3. Core Technology and Algorithms
- 27.4. Future Directions
- Chapter 28. A Programmable Graphics Pipeline in CUDA for Order-Independent Transparency
- 28.1. Introduction, Problem Statement, and Context
- 28.2. Core Method
- 28.3. Algorithms, Implementations, and Evaluations
- 28.4. Final Evaluation
- 28.5. Future Direction
- Introduction
- Chapter 29. Fast Graph Cuts for Computer Vision
- 29.1. Introduction, Problem Statement, and Context
- 29.2. Core Method
- 29.3. Algorithms, Implementations, and Evaluations
- 29.4. Final evaluation and validation of results
- 29.5. Multilabel Graph Cuts
- Chapter 30. Visual Saliency Model on Multi-GPU
- 30.1. Introduction
- 30.2. Visual Saliency Model
- 30.3. GPU Implementation
- 30.4. Results
- 30.5. Conclusion
- Chapter 31. Real-Time Stereo on GPGPU Using Progressive Multiresolution Adaptive Windows
- 31.1. Introduction, Problem Statement, and Context
- 31.2. Core Method
- Chapter 32. Real-Time Speed-Limit-Sign Recognition on an Embedded System Using a GPU
- 32.1. Introduction
- 32.2. Methods
- 32.3. Implementation
- 32.4. Results and Discussion
- 32.5. Conclusion and Future Work
- Chapter 33. Haar Classifiers for Object Detection with CUDA
- 33.1. Introduction
- 33.2. Viola-Jones Object Detection Retrospective
- 33.3. Object Detection Pipeline with NVIDIA CUDA
- 33.4. Benchmarking and Implementation Details
- 33.5. Future Direction
- 33.6. Conclusion
- Introduction
- Chapter 34. Experiences on Image and Video Processing with CUDA and OpenCL
- 34.1. Introduction, Problem Statement, and Background
- 34.2. Core Technology or Algorithm
- 34.3. Key Insights from Implementation and Evaluation
- 34.4. Final Evaluation
- Chapter 35. Connected Component Labeling in CUDA
- 35.1. Introduction
- 35.2. Core Algorithm
- 35.3. CUDA Algorithm and Implementation
- 35.4. Final Evaluation and Results
- Chapter 36. Image De-Mosaicing
- 36.1. Introduction, Problem Statement, and Context
- 36.2. Core Method
- 36.3. Algorithms, Implementations, and Evaluations
- 36.4. Final Evaluation
- Introduction
- Chapter 37. Efficient Automatic Speech Recognition on the GPU
- 37.1. Introduction, Problem Statement, and Context
- 37.2. Core Methods
- 37.3. Algorithms, Implementations, and Evaluations
- 37.4. Conclusion and Future Directions
- Chapter 38. Parallel LDPC Decoding
- 38.1. Introduction, Problem Statement, and Context
- 38.2. Core Technology
- 38.3. Algorithms, Implementations, and Evaluations
- 38.4. Final Evaluation
- 38.5. Future Directions
- Chapter 39. Large-Scale Fast Fourier Transform
- 39.1. Introduction
- 39.2. Memory Hierarchy of GPU Clusters
- 39.3. Large-Scale Fast Fourier Transform
- 39.4. Algebraic Manipulation of Array Dimensions
- 39.5. Performance Results
- 39.6. Conclusion and Future Work
- Introduction
- Chapter 40. GPU Acceleration of Iterative Digital Breast Tomosynthesis
- 40.1. Introduction
- 40.2. Digital Breast Tomosynthesis
- 40.3. Accelerating Iterative DBT using GPUs
- 40.4. Conclusions
- Chapter 41. Parallelization of Katsevich CT Image Reconstruction Algorithm on Generic Multi-Core Processors and GPGPU
- 41.1. Introduction, Problem, and Context
- 41.2. Core Methods
- 41.3. Algorithms, Implementations, and Evaluations
- 41.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 41.5. Related Work
- 41.6. Future Directions
- 41.7. Summary
- Chapter 42. 3-D Tomographic Image Reconstruction from Randomly Ordered Lines with CUDA
- 42.1. Introduction
- 42.2. Core Methods
- 42.3. Implementation
- 42.4. Evaluation and Validation of Results, Total Benefits, and Limitations
- 42.5. Future Directions
- Chapter 43. Using GPUs to Learn Effective Parameter Settings for GPU-Accelerated Iterative CT Reconstruction Algorithms
- 43.1. Introduction, Problem Statement, and Context
- 43.2. Core Method(s)
- 43.3. Algorithms, Implementations, and Evaluations
- 43.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 43.5. Future Directions
- Chapter 44. Using GPUs to Accelerate Advanced MRI Reconstruction with Field Inhomogeneity Compensation
- 44.1. Introduction
- 44.2. Core Method: Advanced Image Reconstruction Toolbox for MRI
- 44.3. MRI Reconstruction Algorithms and Implementation on GPUs
- 44.4. Final Results and Evaluation
- 44.5. Conclusion and Future Directions
- Chapter 45. ℓ1 Minimization in ℓ1-SPIRiT Compressed Sensing MRI Reconstruction
- 45.1. Introduction, Problem Statement, and Context
- 45.2. Core Methods (High Level Description)
- 45.3. Algorithms, Implementations, and Evaluations (Detailed Description)
- 45.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 45.5. Discussion and Conclusion
- Chapter 46. Medical Image Processing Using GPU-Accelerated ITK Image Filters
- 46.1. Introduction
- 46.2. Core Methods
- 46.3. Implementation
- 46.4. Results
- 46.5. Future Directions
- 46.6. Acknowledgments
- Chapter 47. Deformable Volumetric Registration Using B-Splines
- 47.1. Introduction
- 47.2. An Overview of B-Spline Registration
- 47.3. Implementation Details
- 47.4. Results
- 47.5. Conclusions
- Chapter 48. Multiscale Unbiased Diffeomorphic Atlas Construction on Multi-GPUs
- 48.1. Introduction, Problem Statement, and Context
- 48.2. Core Methods
- 48.3. Algorithms, Implementations, and Evaluations
- 48.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 48.5. Future Directions
- Chapter 49. GPU-Accelerated Brain Connectivity Reconstruction and Visualization in Large-Scale Electron Micrographs
- 49.1. Introduction
- 49.2. Core Methods
- 49.3. Implementation
- 49.4. Results
- 49.5. Future Directions
- Chapter 50. Fast Simulation of Radiographic Images Using a Monte Carlo X-Ray Transport Algorithm Implemented in CUDA
- 50.1. Introduction, Problem Statement, and Context
- 50.2. Core Methods
- 50.3. Algorithms, Implementations, and Evaluations
- 50.4. Final Evaluation and Validation of Results, Total Benefits, and Limitations
- 50.5. Future Directions
- Index
Product information
- Title: GPU Computing Gems Emerald Edition
- Author(s):
- Release date: January 2011
- Publisher(s): Morgan Kaufmann
- ISBN: 9780123849892
You might also like
book
GPU Computing Gems Jade Edition
GPU Computing Gems, Jade Edition, offers hands-on, proven techniques for general purpose GPU programming based on …
book
Mobile 3D Graphics SoC: From Algorithm to Chip
The first book to explain the principals behind mobile 3D hardware implementation, helping readers understand advanced …
book
GPU Gems 3
“The GPU Gems series features a collection of the most essential algorithms required by Next-Generation 3D …
book
High Performance Deformable Image Registration Algorithms for Manycore Processors
High Performance Deformable Image Registration Algorithms for Manycore Processors develops highly data-parallel image registration algorithms suitable …