Book description
The increasing complexity of programming environments provides a number of opportunities for assembly language programmers. 32/64-Bit 80x86 Assembly Language Architecture attempts to break through that complexity by providing a step-by-step understanding of programming Intel and AMD 80x86 processors in assembly language. This book explains 32-bit and 64-bit 80x86 assembly language programming inclusive of the SIMD (single instruction multiple data) instruction supersets that bring the 80x86 processor into the realm of the supercomputer, gives insight into the FPU (floating-point unit) chip in every Pentium processor, and offers strategies for optimizing code.
Table of contents
- Copyright
- Preface
- 1. Introduction
- 2. Coding Standards
-
3. Processor Differential Insight
- Processor Overview
- History
- The 64-Bit Processor
- 80x86 Registers
- CPU Status Registers (EFLAGS/64-Bit RFLAGS)
- NOP — No Operation
- Floating-Point 101
- Processor Data Type Encoding
- EMMS — Enter/Leave MMX State
- FEMMS — Enter/Leave MMX State
- Destination/Source Orientations
- Big/Little-Endian
- Alignment Quickie
- (Un)aligned Memory Access
- System Level Functionality
- Indirect Memory Addressing
- Translation Table
- String Instructions
- Special (Non-Temporal) Memory Instructions
- Exercises
- 4. Bit Mangling
- 5. Bit Wrangling
-
6. Data Conversion
- Data Interlacing, Exchanging, Unpacking, and Merging
- Byte Swapping
-
Data Interlacing
- PUNPCKLBW — Parallel Extend Lower from Byte
- PUNPCKHBW — Parallel Extend Upper from Byte
- PUNPCKLWD — Parallel Extend Lower from 16-Bit
- PUNPCKHWD — Parallel Extend Upper from 16-Bit
- PUNPCKLDQ — Parallel Extend Lower from 32-Bit
- PUNPCKHDQ — Parallel Extend Upper from 32-Bit
- MOVSS — Move Scalar (SPFP)
- MOVQ2DQ — Move Scalar (1×32-Bit) MMX to XMM
- MOVDQ2Q — Move Scalar (1×32-bit) XMM to MMX
- MOVLPS — Move Low Packed (2×SPFP)
- MOVHPS — Move High Packed (2×SPFP)
- MOVLHPS — Move Low to High Packed (2×SPFP)
- MOVHLPS — Move High to Low Packed (2×SPFP)
- MOVSD — Move Scalar (1×DPFP)
- MOVLPD — Move Low Packed (1×DPFP)
- MOVHPD — Move High Packed (1×DPFP)
- PUNPCKLQDQ — Parallel Copy Lower (2×64-Bit)
- PUNPCKHQDQ — Parallel Copy Upper (2×64-Bit)
-
Swizzle, Shuffle, and Splat
- PINSRW — Shuffle (1×16-Bit) to (4×16-Bit)
- PSHUFW — Shuffle Packed Words (4×16-Bit)
- PSHUFLW — Shuffle Packed Low Words (4×16-Bit)
- PSHUFHW — Shuffle Packed High Words (4×16-Bit)
- PSHUFD — Shuffle Packed Double Words (4×32-Bit)
- SHUFPS — Shuffle Packed SPFP Values (4×SPFP)
- MOVSLDUP — Splat Packed Even SPFP to (4×SPFP)
- MOVSHDUP — Splat Packed Odd SPFP to (4×SPFP)
- MOVDDUP — Splat Lower DPFP to Packed (2×DPFP)
- SHUFPD — Shuffle Packed DPFP (2×64-Bit)
-
Data Bit Expansion
- CBW Convert Signed AL (Byte) to AX (Word)
- CWDE Convert Signed AX (Word) to EAX (DWord)
- CDQE Convert Signed EAX (DWord) to RAX (QWord)
- MOVSX/MOVSXD — Move with Sign Extension
- MOVZX — Move with Zero Extension
- CWD — Convert Signed AX (Word) to DX:AX
- CDQ — Convert Signed EAX (DWord) to EDX:EAX
- CQO — Convert Signed RAX (QWord) to RDX:RAX
- PEXTRW — Extract (4×16-bit) into Integer to (1×16)
- Data Bit Reduction (with Saturation)
-
Data Conversion (Integer : Float, Float : Integer, Float : Float)
- PI2FW — Convert Packed Even int16 to SPFP
- CVTDQ2PS — Convert Packed int32 to SPFP
- CVTPS2DQ — Convert Packed SPFP to int32
- CVTPI2PS — Convert Lo Packed int32 to SPFP
- CVTPS2PI — Convert Lo Packed SPFP to int32
- CVTSI2SS — Convert Scalar int32 to SPFP
- CVTDQ2PD — Convert Even Packed int32 to DPFP
- CVTPD2DQ — Convert Packed DPFP to Even int32
- CVTPD2PS — Convert Packed DPFP to Lo SPFP
- CVTPS2PD — Convert Lo Packed SPFP to DPFP
- CVTPD2PI — Convert Packed DPFP to int32
- CVTPI2PD — Convert Packed int32 to DPFP
- CVTSS2SI — Convert Scalar SPFP to int32/64
- CVTSD2SI — Convert Scalar DPFP to Int
- CVTSI2SD — Convert Scalar Int to DPFP
- CVTSD2SS — Convert Scalar DPFP to SPFP
- CVTSS2SD — Convert Scalar SPFP to DPFP
- Exercises
-
7. Interger Math
- General Integer Math
- Packed Addition and Subtraction
- Vector Addition and Subtraction (Fixed Point)
- Averages
- Sum of Absolute Differences
- Integer Multiplication
- Packed Integer Multiplication
- Integer Division
- Exercises
-
8. Floating-Point Anyone?
- The Floating-Point Number
- Loading/Storing Numbers and the FPU Stack
-
General Math Instructions
- FCHS — FPU Two's Complement ST(0) = – ST(0)
- FABS — FPU Absolute Value ST(0) = |ST(0)|
- FADD/FADDP/FIADD — FPU Addition D = ST(0) + A
- FSUB/FSUBP/FISUB — FPU Subtraction D = ST(0) – A
- FSUBR/FSUBRP/FISUBR — FPU Reverse Subtraction D = A – ST(0)
- FMUL/FMULP/FIMUL — FPU Multiplication D = ST(0) × A
- FDIV/FDIVP/FIDIV — FPU Division D = Dst ÷ Src
- FDIVR/FDIVRP/FIDIVR — FPU Reverse Division D = Src ÷ Dst
- FPREM — FPU Partial Remainder
- FPREM1 — FPU Partial Remainder
- FRNDINT — FPU Round to Integer
- Advanced Math Instructions
- Floating-Point Comparison
- FPU BCD (Binary-Coded Decimal)
- FPU Trigonometry
-
FPU System Instructions
- FINIT/FNINIT — FPU Init
- FCLEX/FNCLEX — FPU Clear Exceptions
- FFREE — FPU Free FP Register
- FSAVE/FNSAVE — FPU Save X87 FPU, MMX, SSE, SSE2
- FRSTOR — FPU Restore x87 State
- FXSAVE — FPU Save x87 FPU, MMX, SSE, SSE2, SSE3
- FXRSTOR — FPU Restore x87 FPU, MMX, SSE, SSE2, SSE3
- FSTENV/FNSTENV — FPU Store x87 Environment
- FLDENV — FPU Load x87 Environment
- FSTCW/FNSTCW — FPU Store x87 Control Word
- FLDCW — FPU Load x87 Control Word
- FSTSW/FNSTSW — FPU Store x87 Status Word
- Validating (Invalid) Floating-Point
- Exercises
-
9. Comparison
- TEST — Logical Compare A B
- Indexed Bit Testing
- SETcc — Set Byte on Condition
- Comparing Operands and Setting EFLAGS
- CMP — Packed Comparison
- Extract Packed Sign Masks
- SCAS/SCASB/SCASW/SCASD/SCASQ —Scan String
- CMOVcc — Conditional Move
- CMPXCHG — Compare and Exchange
-
Boolean Operations upon Floating-Point Numbers
- ANDPS — Logical AND of Packed SPFP D = A B
- ANDPD — Logical AND of Packed DPFP
- Pseudo Vec — (XMM) FABS — FP Absolute A = | A |
- Pseudo Vec — (3DNow!) FABS — FP Absolute A = | A |
- ORPS — Logical OR of Packed SPFP D = A B
- ORPD — Logical OR of Packed DPFP
- XORPS — Logical XOR of Packed SPFP D = A B
- XORPD — Logical XOR of Packed DPFP
- Pseudo Vec — FCHS — FP Change Sign A = – A
- ANDNPS — Logical ANDC of Packed SPFP D = A (¬B)
- ANDNPD — Logical ANDC of Packed DPFP
- Min — Minimum
- Max — Maximum
-
10. Branching
- Jump Unconditionally
- Jump Conditionally
- Branch Prediction
- PAUSE — (Spin Loop Hint)
- Pancake Memory LIFO Queue
- Stack
- CALL Procedure (Function)
- Calling Conventions (Stack Argument Methods)
- Interrupt Handling
- 11. Branchless
-
12. Floating-Point Vector Addition and Subtraction
- Floating-Point Vector Addition and Subtraction
- Vector Scalar Addition and Subtraction
- Special — FP Vector Addition and Subtraction
- Exercises
-
13. FP Vector Multiplication and Division
- Floating-Point Multiplication
- Vector Scalar Multiplication
- Vector Floating-Point Division
- Exercises
-
14. Floating-Point Deux
-
SQRT — Square Root
- 1×SPFP Scalar Square Root
- 4×SPFP Square Root
- 1×DPFP Scalar Square Root
- 2×DPFP Square Root
- 1×SPFP Scalar Reciprocal Square Root (15-Bit)
- Pseudo Vec
- Pseudo Vec (x86)
- SPFP Square Root (2 Stage) (24-Bit)
- vmp_FSqrt (3DNow!) Standard Float 24-Bit Precision
- Pseudo Vec
- Pseudo Vec (x86)
- Graphics 101 — Vector Magnitude (aka 3D Pythagorean Theorem)
- Pseudo Vec
- Pseudo Vec (x86)
- Vector Normalize
-
SQRT — Square Root
- 15. Binary-Coded Decimal (BCD)
- 16. What CPUID?
- 17. PC I/O
-
18. System
- System "Lite"
- System Timing Instructions
- Cache Manipulation
-
System Instructions
- ARPL — Adjust Requested Privilege Level
- BOUND — Check Array Index For Bounding Error
- CLTS — Clear Task Switch Flag
- HLT — Halt Processor
- UD2 — Undefined Instruction
- INVLPG — Invalidate TLB
- LAR — Load Access Rights
- LOCK — Assert Lock # Signal Prefix
- LSL — Load Segment Limit
- MOV — Move To/From Control Registers
- MOV — Move To/From Debug Registers
- STMXCSR — Save MXCSR Register State
- LDMXCSR — Load MXCSR Register State
- SGDT/SIDT — Save Global/Interrupt Descriptor Table
- LGDT/LIDT — Load Global/Interrupt Descriptor Table
- SLDT — Save Local Descriptor Table
- LLDT — Load Local Descriptor Table
- SMSW — Save Machine Status Word
- LMSW — Load Machine Status Word
- STR — Save Task Register
- LTR — Load Task Register
- RDMSR — Read from Model Specific Register
- WRMSR — Write to Model Specific Register
- SWAPGS — Swap GS Base Register
- SYSCALL — 64-Bit Fast System Call
- SYSRET — Fast Return from 64-Bit Fast System Call
- SYSENTER — Fast System Call
- SYSEXIT — Fast Return from Fast System Call
- RSM — Resume from System Management Mode
- VERR/VERW — Verify Segment for Reading
- LDS/LES/LFS/LGS/LSS — Load Far Pointer
- Hyperthreading Instructions
- 19. Gfx 'R' Asm
- 20. MASM vs. NASM vs. TASM vs. WASM
- 21. Debugging Functions
- 22. Epilogue
- A. Data Structure Definitions
- B. Mnemonics
- C. Reg/Mem Mapping
- Glossary
- References
Product information
- Title: 32/64-Bit 80x86 Assembly Language Architecture
- Author(s):
- Release date: August 2005
- Publisher(s): Jones & Bartlett Learning
- ISBN: 9781449612702
You might also like
article
Splitting Strings on Any of Multiple Delimiters
Build your knowledge of Python with this Shortcuts collection. Focusing on common problems involving text manipulation, …
article
Use Github Copilot for Prompt Engineering
Using GitHub Copilot can feel like magic. The tool automatically fills out entire blocks of code--but …
article
Reinventing the Organization for GenAI and LLMs
Previous technology breakthroughs did not upend organizational structure, but generative AI and LLMs will. We now …
book
The x86 Microprocessors: 8086 to Pentium, Multicores, Atom and the 8051 Microcontroller, 2nd Edition
This second edition of The x86 Microprocessors has been revised to present the hardware and software …