Book description
Intel Xeon Phi Processor High Performance Programming is an all-in-one source of information for programming the Second-Generation Intel Xeon Phi product family also called Knights Landing. The authors provide detailed and timely Knights Landingspecific details, programming advice, and real-world examples. The authors distill their years of Xeon Phi programming experience coupled with insights from many expert customers — Intel Field Engineers, Application Engineers, and Technical Consulting Engineers — to create this authoritative book on the essentials of programming for Intel Xeon Phi products.
Intel® Xeon Phi™ Processor High-Performance Programming is useful even before you ever program a system with an Intel Xeon Phi processor. To help ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi processors, or other high-performance microprocessors. Applying these techniques will generally increase your program performance on any system and prepare you better for Intel Xeon Phi processors.
- A practical guide to the essentials for programming Intel Xeon Phi processors
- Definitive coverage of the Knights Landing architecture
- Presents best practices for portable, high-performance computing and a familiar and proven threads and vectors programming model
- Includes real world code examples that highlight usages of the unique aspects of this new highly parallel and high-performance computational product
- Covers use of MCDRAM, AVX-512, Intel® Omni-Path fabric, many-cores (up to 72), and many threads (4 per core)
- Covers software developer tools, libraries and programming models
- Covers using Knights Landing as a processor and a coprocessor
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Acknowledgments
- Foreword
- Preface
-
Section I: Knights Landing
- Introduction
-
Chapter 1: Introduction
- Abstract
- Introduction to Many-Core Programming
- Trend: More Parallelism
- Why Intel® Xeon Phi™ Processors Are Needed
- Processors Versus Coprocessor
- Measuring Readiness for Highly Parallel Execution
- What About GPUs?
- Enjoy the Lack of Porting Needed but Still Tune!
- Transformation for Performance
- Hyper-Threading Versus Multithreading
- Programming Models
- Why We Could Skip To Section II Now
- For More Information
- Chapter 2: Knights Landing overview
-
Chapter 3: Programming MCDRAM and Cluster modes
- Abstract
- Programming for Cluster Modes
- Programming for Memory Modes
- Query Memory Mode and MCDRAM Available
- SNC Performance Implications of Allocation and Threading
- How to Not Hard Code the NUMA Node Numbers
- Approaches to Determining What to Put in MCDRAM
- Why Rebooting Is Required to Change Modes
- BIOS
- Summary
- For More Information
- Chapter 4: Knights Landing architecture
- Chapter 5: Intel Omni-Path Fabric
- Chapter 6: μarch optimization advice
-
Section II: Parallel Programming
- Introduction
- Chapter 7: Programming overview for Knights Landing
- Chapter 8: Tasks and threads
-
Chapter 9: Vectorization
- Abstract
- Why Vectorize?
- How to Vectorize
- Three Approaches to Achieving Vectorization
- Six-Step Vectorization Methodology
- Streaming Through Caches: Data Layout, Alignment, Prefetching, and so on
- Compiler Tips
- Compiler Options
- Compiler Directives
- Use Array Sections to Encourage Vectorization
- Look at What the Compiler Created: Assembly Code Inspection
- Numerical Result Variations With Vectorization
- Summary
- For More Information
-
Chapter 10: Vectorization advisor
- Abstract
- Getting Started With Intel Advisor for Knights Landing
- Enabling and Improving AVX-512 Code With the Survey Report
- Memory Access Pattern Report
- AVX-512 Gather/Scatter Profiler
- Mask Utilization and FLOPs Profiler
- Advisor Roofline Report
- Explore AVX-512 Code Characteristics Without AVX-512 Hardware
- Example — Analysis of a Computational Chemistry Code
- Summary
- For More Information
- Chapter 11: Vectorization with SDLT
- Chapter 12: Vectorization with AVX-512 intrinsics
-
Chapter 13: Performance libraries
- Abstract
- Intel Performance Library Overview
- Intel Math Kernel Library Overview
- Intel Data Analytics Library Overview
- Together: MKL and DAAL
- Intel Integrated Performance Primitives Library Overview
- Intel Performance Libraries and Intel Compilers
- Native (Direct) Library Usage
- Offloading to Knights Landing While Using a Library
- Precision Choices and Variations
- Performance Tip for Faster Dynamic Libraries
- For More Information
- Chapter 14: Profiling and timing
- Chapter 15: MPI
- Chapter 16: PGAS programming models
- Chapter 17: Software-defined visualization
- Chapter 18: Offload to Knights Landing
-
Chapter 19: Power analysis
- Abstract
- Power Demand Gates Exascale
- Power 101
- Hardware-Based Power Analysis Techniques
- Software-Based Knights Landing Power Analyzer
- ManyCore Platform Software Package Power Tools
- Running Average Power Limit
- Performance Profiling on Knights Landing
- Intel Remote Management Module
- Summary
- For More Information
-
Section III: Pearls
- Introduction
-
Chapter 20: Optimizing classical molecular dynamics in LAMMPS
- Abstract
- Acknowledgment
- Molecular Dynamics
- LAMMPS
- Knights Landing Processors
- LAMMPS Optimizations
- Data Alignment
- Data Types and Layout
- Vectorization
- Neighbor List
- Long-Range Electrostatics
- MPI and OpenMP Parallelization
- Performance Results
- System, Build, and Run Configurations
- Workloads
- Organic Photovoltaic Molecules
- Hydrocarbon Mixtures
- Rhodopsin Protein in Solvated Lipid Bilayer
- Coarse Grain Liquid Crystal Simulation
- Coarse-Grain Water Simulation
- Summary
- For More Information
-
Chapter 21: High performance seismic simulations
- Abstract
- High-Order Seismic Simulations
- Numerical Background
- Application Characteristics
- Intel Architecture as Compute Engine
- Highly-efficient Small Matrix Kernels
- Sparse Matrix Kernel Generation and Sparse/Dense Kernel Selection
- Dense Matrix Kernel Generation: AVX2
- Dense Matrix Kernel Generation: AVX-512
- Kernel Performance Benchmarking
- Incorporating Knights Landing’s Different Memory Subsystems
- Performance Evaluation
- Mount Merapi
- 1992 Landers
- Summary and Take-Aways
- For More Information
-
Chapter 22: Weather research and forecasting (WRF)
- Abstract
- WRF Overview
- WRF Execution Profile: Relatively Flat
- History of WRF on Intel Many-Core (Intel Xeon Phi Product Line)
- Our Early Experiences With WRF on Knights Landing
- Compiling WRF for Intel Xeon and Intel Xeon Phi Systems
- WRF CONUS12km Benchmark Performance
- MCDRAM Bandwidth
- Vectorization: Boost of AVX-512 Over AVX2
- Core Scaling
- Summary
- For More Information
-
Chapter 23: N-Body simulation
- Abstract
- Parallel Programming for Noncomputer Scientists
- Step-by-Step Improvements
- N-Body simulation
- optimization
- Initial Implementation (Optimization Step 0)
- Thread parallelism (optimization step 1)
- Scalar Performance Tuning (Optimization Step 2)
- Vectorization with SOA (optimization step 3)
- Memory traffic (optimization step 4)
- Impact of MCDRAM on Performance
- Summary
- For More Information
- Chapter 24: Machine learning
- Chapter 25: Trinity workloads
- Chapter 26: Quantum chromodynamics
- Contributors
- Glossary
- Index
Product information
- Title: Intel Xeon Phi Processor High Performance Programming, 2nd Edition
- Author(s):
- Release date: May 2016
- Publisher(s): Morgan Kaufmann
- ISBN: 9780128091951
You might also like
book
Intel® Xeon Phi™ Coprocessor Architecture and Tools: The Guide for Application Developers
Intel® Xeon Phi™ Coprocessor Architecture and Tools: The Guide for Application Developers provides developers a comprehensive …
book
Microsoft® Windows® Scripting with WMI: Self-Paced Learning Guide
Visit the catalog page for Microsoft® Windows® Scripting with WMI: Self-Paced Learning GuideVisit the errata page …
book
Apple Training Series Mac OS X Deployment v10.6: A Guide to Deploying and Maintaining Mac OS X and Mac OS X Software
Apple Training Series: Mac OS X Deployment v10.6 uses a combination of task-based instruction and strong …
book
Broadband Optical Access Networks
Broadband optical access network is an ideal solution to alleviate the first/last mile bottleneck of current …