In our vision task at hand, recovering geometry in the scene, we will employ the pinhole camera model, which is a big simplification of the way images are acquired in our advanced digital cameras. The pinhole model essentially describes the transformation of world objects to pixels in the camera images. The following diagram illustrates this process:
Camera images have a local 2D coordinate frame (in pixels), while the location of 3D objects in the world are described in arbitrary units of length, such as millimeters, meters, or inches. To reconcile these two coordinate frames, the pinhole camera model offers two transforms: ...