Errata

Errata for C++ AMP

The errata list is a list of errors and their corrections that were found after the product was released. If the error was corrected in a later version or reprint the date of the correction will be displayed in the column titled "Date Corrected".

The following errata were submitted by our customers and approved as valid errors by the author or editor.

Color key: Serious technical mistake Minor technical mistake Language or formatting error Typo Question Note Update

Version	Location	Description	Submitted By	Date submitted
Printed, PDF, ePub, Mobi,	Page 8 1st code paragraph	This line contains two mistakes: "bool bSSEInstructions = (CpuInfo[3] >> 24 && 0x1)" First, the integer array is name CPUInfo and not CpuInfo. Secondly, logical and operator "&&" with constant 0x1 (true) does not have any effect on the result of the expression. I guess it should be a bit-wise and operation instead. Note from the Author or Editor: Should read: bool bSSEInstructions = (CPUInfo[3] >> 24 & 0x1);	Matias Dons Dollerup	Oct 09, 2012
Printed, PDF, ePub, Mobi,	Page 10 Last paragraph on page	Originally submitted on: http://blogs.msdn.com/b/nativeconcurrency/archive/2012/10/11/c-amp-book-now-available.aspx#10361456 Jo Blow 20 Oct 2012 8:34 PM Just started reading the book. I'd like to point out that parallelization of delayed recurrence relationships is actually possible contrary to what it says on p10. One can partition the array into pieces. If some pieces depend other pieces it is still possible, in some cases, to parallel them. The example gives a[k] = a[k-1] + b[k]. We can assume that when k-1 is outside the partition it has a value of zero. This, then will throw off each value in the array by some constant. We can add it back in after the fact quite easily. Essentially it is a boundary value problem. The problem is that we have to potentially loop back over the entire array(possibly multiple times) and this may defeat the speed up in the first place. It will depend on the specific case. (there are other "tricks" that could potentially be used too... the point here, is only that it is possible). Ade: This should read: For example, this loop is not parallelizable in its current form:	Ade Miller	Oct 22, 2012
PDF, ePub, Mobi, , Other Digital Version	Page 12 End of second paragraph, which is below the code snippet.	The author refers readers to Chapter 2 for a description of lambdas in C++, while the description is actually on page 53, in Chapter 3. Note from the Author or Editor: This section should read: If you are not familiar with lambdas, see the ?Lambdas in C++11? section in Chapter 3, ?C++ AMP Fundamentals,? for an overview.	Fernando Montenegro	Dec 05, 2012
PDF, ePub, Mobi, , Other Digital Version	Page 36 Last line of for_each: acc = r * s	I was puzzled by the "acc = r * s;" single CPU code on page 36 in this function: void NBodySimpleInteractionEngine::BodyBodyInteraction(const ParticleCpu* const pParticlesIn, ParticleCpu& particleOut, int numParticles) const { float_3 pos(particleOut.pos); float_3 vel(particleOut.vel); float_3 acc(0.0f); std::for_each(pParticlesIn, pParticlesIn + numParticles, [=, &acc](const ParticleCpu& p) { const float_3 r = p.pos - pos; float distSqr = SqrLength(r) + m_softeningSquared; float invDist = 1.0f / sqrt(distSqr); float invDistCube = invDist * invDist * invDist; float s = m_particleMass * invDistCube; acc = r * s; }); vel += acc * m_deltaTime; vel = m_dampingFactor; pos += vel m_deltaTime; particleOut.pos = pos; particleOut.vel = vel; } because the final value of acc depended ONLY on the last call of the lambda. But the sum of all the accelerations caused by each point should be the final value of acc. And in fact, if I look at the AMP version, I find the code I expected (acc += r * s;) //-------------------------------------------------------------------------------------- // Calculate the acceleration (force * mass) change for a pair of particles. //-------------------------------------------------------------------------------------- void BodyBodyInteraction(float_3& acc, const float_3 particlePosition, const float_3 otherParticlePosition, float softeningSquared, float particleMass) restrict(amp) { float_3 r = otherParticlePosition - particlePosition; float distSqr = SqrLength(r) + softeningSquared; float invDist = concurrency::fast_math::rsqrt(distSqr); float invDistCube = invDist * invDist * invDist; float s = particleMass * invDistCube; acc += r * s; } So it looks like a minor bug to be fixed, and Amit agrees. http://social.msdn.microsoft.com/Forums/en-US/parallelcppnative/thread/657296d8-0322-4a7e-b453-c6c12f4a5553 Note from the Author or Editor: The line: acc = r * s; Should read: acc += r * s;	Andrew Webb	Nov 27, 2012
PDF, ePub, Mobi, , Other Digital Version	Page 51 code example	array_view doesn't have a member called "grid". This should be "extent". Thus, the fourth line should be: parallel_for_each(av.extent, [=](index<1> idx) restrict(amp) Note from the Author or Editor: In the PDF this is on page 52. parallel_for_each(av.grid, [=](index<1> idx) restrict(amp) Should read parallel_for_each(av.extent, [=](index<1> idx) restrict(amp)	Edd Porter	Dec 19, 2012
Printed, PDF, ePub, Mobi,	Page 74 source code, loop on i and loop on k, bottom quarter of page	"i += TS" and "k < TS" should almost certainly be "i += TileSize" and "k < TileSize" Note from the Author or Editor: Should read: for (int i = 0; i < W; i += TileSize) { tile_static float sA[TileSize][TileSize]; tile_static float sB[TileSize][TileSize]; sA[row][col] = a(tidx.global[0], col + i); sB[row][col] = b(row + i, tidx.global[1]); for (int k = 0; k < TileSize; k++) sum += sA[row][k] * sB[k][col]; }	Anonymous	Oct 17, 2012
Printed, PDF, ePub, Mobi,	Page 149 1st para and diagram	Reader feedback (Mark Delaney): I have one additional confusion. Not sure if it is my confusion or a mistake in the book. I am replying by email to include graphic content. On page 149, printed book, I am very confused by the figures. Ade: For updated content see Errata at http://ampbook.codeplex.com/	Ade Miller	Nov 04, 2012
Printed, PDF, ePub, Mobi, , Other Digital Version	Page 198 1st paragraph	At the end of the paragraph, the ante-penultimate phrase states "The emulated accelerators, WARP and REF, have warp sizes of 1 and 4, respectively". In the current C++ AMP implementation both devices use a warp size of 4. Thank you. Note from the Author or Editor: This is not actually incorrect text, however I would reword as follows: The emulated accelerators, WARP and REF, have warp sizes of 1 and 4, respectively. These numbers may change in the future so you should not rely on this when implementing applications that will run on a wide range of hardware platforms.	Alex Voicu	Feb 18, 2013
Printed, PDF, ePub, Mobi,	Page 296 Time-Out Detection and Recovery	Currently the TDR feature is not supported correctly in the NVIDIA and AMD drivers. This is tracked in a issue on CodePlex. http://ampbook.codeplex.com/workitem/33361 While the code and text in the book is correct it will not work correctly with the current drivers. No accelerator_view_removed is thrown.	Ade Miller	Nov 14, 2012