### CS231A Course Notes 3: Epipolar Geometry

mpi代写 | Algorithm代写 | 代做mining – 这是一个Algorithm面向对象设计的practice, 考察Algorithm的理解, 涵盖了mpi/Algorithm/mining等程序代做方面

##### Kenji Hata and Silvio Savarese

#### 1 Introduction

Previously, we have seen how to compute the intrinsic and extrinsic param- eters of a camera using one or more views using a typical camera calibration procedure or single view metrology. This process culminated in deriving properties about the 3D world from one image. However, in general, it is not possible to recover the entire structure of the 3D world from just one image. This is due to the intrinsic ambiguity of the 3D to the 2D mapping: some information is simply lost.

Figure 1: A single picture such as this picture of a man holding up the Leaning Tower of Pisa can result in ambiguous scenarios. Multiple views of the same scene help us resolve these potential ambiguities.

For example, in Figure 1, we may be initially fooled to believe that the man is holding up the Leaning Tower of Pisa. Only by careful inspection can we tell that this is not the case and merely an illusion based on the projection of different depths onto the image plane. However, if we were able to view this scene from a completely different angle, this illusion immediately disappears and we would instantly figure out the correct scene layout.

The focus of these lecture notes is to show how having knowledge of geom- etry when multiple cameras are present can be extremely helpful. Specifically, we will first focus on defining the geometry involved in two viewpoints and then present how this geometry can aid in further understanding the world around us.

#### 2 Epipolar Geometry

Figure 2: The general setup of epipolar geometry. The gray region is the epipolar plane. The orange line is the baseline, while the two blue lines are the epipolar lines.

Often in multiple view geometry, there are interesting relationships be- tween the multiple cameras, a 3D point, and that points projections in each of the cameras image plane. The geometry that relates the cameras, points in 3D, and the corresponding observations is referred to as the epipolar geometryof a stereo pair. As illustrated in Figure 2, the standard epipolar geometry setup involves two cameras observing the same 3D pointP, whose projection in each of the image planes is located atpandp respectively. The camera centers are located atO 1 andO 2 , and the line between them is referred to as the baseline. We call the plane defined by the two camera centers andP the epipolar plane. The locations of where the baseline intersects the two image

Figure 3: An example of epipolar lines and their corresponding points drawn on an image pair.

planes are known as the theepipoleseande. Finally, the lines defined by the intersection of the epipolar plane and the two image planes are known as theepipolar lines. The epipolar lines have the property that they intersect the baseline at the respective epipoles in the image plane.

Figure 4: When the two image planes are parallel, then the epipoleseand eare located at infinity. Notice that the epipolar lines are parallel to theu axis of each image plane.

An interesting case of epipolar geometry is shown in Figure 4, which occurs when the image planes are parallel to each other. When the image planes are parallel to each other, then the epipoleseandewill be located at infinity since the baseline joining the centersO 1 ,O 2 is parallel to the image planes. Another important byproduct of this case is that the epipolar lines are parallel to an axis of each image plane. This case is especially useful

and will be covered in greater detail in the subsequent section on image rectification. In real world situations, however, we are not given the exact location of the 3D locationP, but can determine its projection in one of the image planes p. We also should be able to know the cameras locations, orientations, and camera matrices. What can we do with this knowledge? With the knowledge of camera locationsO 1 ,O 2 and the image pointp, we can define the epipolar plane. With this epipolar plane, we can then determine the epipolar lines^1. By definition,Ps projection into the second imagepmust be located on the epipolar line of the second image. Thus, a basic understanding of epipolar geometry allows us to create a strong constraint between image pairs without knowing the 3D structure of the scene.

Figure 5: The setup for deter mining the essential and fundamental matrices, which help map points and epipolar lines across views.

We will now try to develop seamless ways to map points and epipolar lines across views. If we take the setup given in the original epipolar geometry framework (Figure 5), then we shall further defineMandMto be the camera projection matrices that map 3D points into their respective 2D image plane locations. Let us assume that the world reference system is associated to the first camera with the second camera offset first by a rotationRand then by a translationT. This specifies the camera projection matrices to be:

```
M=K
```

###### [

###### I 0

###### ]

###### M=K

###### [

###### RT RTT

###### ]

###### (1)

(^1) This means that epipolar lines can be determined by just knowing the camera centers O 1 , O 2 and a point in one of the imagesp

#### 3 The Essential Matrix

In the simplest case, let us assume that we havecanonical cameras, in whichK=K=I. This reduces Equation 1 to

```
M=
```

###### [

###### I 0

###### ]

###### M=

###### [

###### RT RTT

###### ]

###### (2)

Furthermore, this means that the location ofpin the first cameras ref- erence system isRp+T. Since the vectorsRp+TandTlie in the epipolar plane, then if we take the cross product ofT(Rp+T) =T(Rp), we will get a vector normal to the epipolar plane. This also means thatp, which lies in the epipolar plane is normal toT(Rp), giving us the constraint that their dot product is zero:

```
pT[T(Rp)] = 0 (3)
```

From linear algebra, we can introduce a different and compact expression for the cross product: we can represent the cross product between any two vectorsaandbas a matrix-vector multiplication:

```
ab=
```

```
0 az ay
az 0 ax
ay ax 0
```

```
bx
by
bz
```

```
= [a]b (4)
```

Combining this expression with Equation 3, we can convert the cross product term into matrix multiplication, giving

```
pT[T](Rp) = 0
pT[T]Rp= 0
```

###### (5)

The matrixE= [T]Ris known as theEssential Matrix, creating a com- pact expression for the epipolar constraint:

```
pTEp= 0 (6)
```

The Essential matrix is a 33 matrix that contains 5 degrees of freedom. It has rank 2 and is singular. The Essential matrix is useful for computing the epipolar lines associated withpandp. For instance,`=ETpgives the epipolar line in the image plane of camera 2. Similarly`

=Epgives the epipolar line in the image plane of camera 1. Other interesting properties of the essential matrix is that its dot product with the epipoles equate to zero: ETe = Ee = 0. Because for any pointx(other thane) in the image of camera 1, the corresponding epipolar line in the image of camera 2,l =ETx, contains the epipolee. ThusesatisfieseT(ETx) = (eTET)x= 0 for all thex, soEe= 0. Similarly ETe= 0.

#### 4 The Fundamental Matrix

Although we derived a relationship betweenpandpwhen we have canoni- cal cameras, we should be able to find a more general expression when the cameras are no longer canonical. Recall that gives us the projection matrices:

###### M=K

###### [

###### I 0

###### ]

###### M=K

###### [

###### RT RTT

###### ]

###### (7)

First, we must definepc=K^1 pandpc=K^1 pto be the projections of Pto the corresponding camera images if the cameras were canonical. Recall that in the canonical case:

```
pTc[T]Rpc= 0 (8)
```

By substituting in the values ofpcandpc, we get

```
pTKT[T]RK^1 p= 0 (9)
```

The matrixF =KT[T]RK^1 is known as theFundamental Matrix, which acts similar to the Essential matrix from the previous section but also encodes information about the camera matricesK,Kand the relative translation T and rotation Rbetween the cameras. Therefore, it is also useful in computing the epipolar lines associated withpandp, even when the camera matricesK,Kand the transformationR,Tare unknown. Similar to the Essential matrix, we can compute the epipolar lines`=FTpand`

=Fp from just the Fundamental matrix and the corresponding points. One main difference between the Fundamental matrix and the Essential matrix is that the Fundamental matrix contains 7 degrees of freedom, compared to the Essential matrixs 5 degrees of freedom. But how is the Fundamental matrix useful? Like the Essential matrix, if we know the Fundamental matrix, then simply knowing a point in an image gives us an easy constraint (the epipolar line) of the corresponding point in the other image. Therefore, without knowing the actual position ofPin 3D space, or any of the extrinsic or intrinsic characteristics of the cameras, we can establish a relationship between anypandp.

##### 4.1 The Eight-Point Algorithm

Still, the assumption that we can have the Fundamental matrix, which is defined by a matrix product of the camera parameters, seems rather large. However, it is possible to estimate the Fundamental matrix given two images of the same scene and without knowing the extrinsic or intrinsic parameters of the camera. The method we discuss for doing so is known as theEight-Point

Figure 6: Corresponding points are drawn in the same color on each of the respective images.

Algorithm, which was proposed by Longuet-Higgins in 1981 and extended by Hartley in 1995. As the title suggests, the Eight-Point Algorithm assumes that a set of at least 8 pairs of corresponding points between two images is available. Each correspondencepi= (ui,vi,1) andpi= (ui,vi,1) gives us the epipo- lar constraintpTiFpi= 0. We can reformulate the constraint as follows:

###### [

```
uiui viui ui uivi vivi vi ui vi 1
```

###### ]

###### F 11

###### F 12

###### F 13

###### F 21

###### F 22

###### F 23

###### F 31

###### F 32

###### F 33

###### = 0 (10)

Since this constraint is a scalar equation, it only constrains one degree of freedom. Since we can only know the Fundamental matrix up to scale, we require eight of these constraints to determine the Fundamental matrix:

```
u 1 u 1 v 1 u 1 u 1 u 1 v 1 v 1 v 1 v 1 u 1 v 1 1
u 2 u 2 v 2 u 2 u 2 u 2 v 2 v 2 v 2 v 2 u 2 v 2 1
u 3 u 3 v 3 u 3 u 3 u 3 v 3 v 3 v 3 v 3 u 3 v 3 1
u 4 u 4 v 4 u 4 u 4 u 4 v 4 v 4 v 4 v 4 u 4 v 4 1
u 5 u 5 v 5 u 5 u 5 u 5 v 5 v 5 v 5 v 5 u 5 v 5 1
u 6 u 6 v 6 u 6 u 6 u 6 v 6 v 6 v 6 v 6 u 6 v 6 1
u 7 u 7 v 7 u 7 u 7 u 7 v 7 v 7 v 7 v 7 u 7 v 7 1
u 8 u 8 v 8 u 8 u 8 u 8 v 8 v 8 v 8 v 8 u 8 v 8 1
```

###### F 11

###### F 12

###### F 13

###### F 21

###### F 22

###### F 23

###### F 31

###### F 32

###### F 33

###### = 0 (11)

This can be compactly written as

```
Wf= 0 (12)
```

whereW is anN9 matrix derived fromN 8 correspondences andf is the values of the Fundamental matrix we desire. In practice, it often is better to use more than eight correspondences and create a largerWmatrix because it reduces the effects of noisy measurements. The solution to this system of homogeneous equations can be found in the least-squares sense by Singular Value Decomposition (SVD), asW is rank- deficient. SVD will give us a estimate of the Fundamental matrixF, which may have full rank. However, we know that the true Fundamental matrix has rank 2. Therefore, we should look for a solution that is the best rank- approximation ofF. To do so, we solve the following optimization problem:

```
minimize
F
```

###### FFF

```
subject to detF= 0
```

###### (13)

This problem is solved again by SVD, whereF = UVT, then the best rank-2 approximation is found by

###### F=U

###### 1 0 0

###### 0 2 0

###### 0 0 0

###### VT (14)

##### 4.2 The Normalized Eight-Point Algorithm

In practice, the standard least-squares approach to the Eight-Point Algo- rithm is not precise. Often, the distance between a pointpiand its corre- sponding epipolar line`i= Fpwill be very large, usually on the scale of 10+ pixels. To reduce this error, we can consider a modified version of the Eight-Point Algorithm called theNormalized Eight-Point Algorithm.

The main problem of the standard Eight-Point Algorithm stems from the fact thatWis ill-conditioned for SVD. For SVD to work properly,Wshould have one singular value equal to (or near) zero, with the other singular values being nonzero. However, the correspondencespi= (ui,vi,1) will often have extremely large values in the first and second coordinates due to the pixel range of a modern camera (i.e. pi= (1832, 1023 ,1)). If the image points used to constructW are in a relatively small region of the image, then each of the vectors forpiandpiwill generally be very similar. Consequently, the constructedW matrix will have one very large singular value, with the rest relatively small. To solve this problem, we will normalize the points in the image before constructingW. This means wepre-conditionWby applying both a trans- lation and scaling on the image coordinates such that two requirements are satisfied. First, the origin of the new coordinate system should be located at the centroid of the image points (translation). Second, the mean square distance of the transformed image points from the origin should be 2 pix- els (scaling). We can compactly represent this process by a transformation matricesT,Tthat translate by the centroid and scale by the scaling factor

###### (

###### 2 N

```
Ni=1||xi x||^2
```

###### )^1 /^2

, for each respective image. Afterwards, we normalize the coordinates:

```
qi=Tpi qi=Tpi (15)
```

Using the new, normalized coordinates, we can compute the newFq using the regular least-squares Eight Point Algorithm. However, the matrixFqis the fundamental matrix for the normalized coordinates. For it to be usable on regular coordinate space, we need to de-normalize it, giving

```
F=TTFqT (16)
```

Ultimately, this new Fundamental matrixFgives good results in real-world applications.

#### 5 Image Rectification

Recall that an interesting case for epipolar geometry occurs when two images are parallel to each other. Let us first compute the Essential matrixEin the case of parallel image planes. We can assume that the two cameras have the

sameKand that there is no relative rotation between the cameras (R=I). In this case, let us assume that there is only a translation along thexaxis, givingT= (Tx, 0 ,0). This gives

###### E= [T]R=

###### 0 0 0

```
0 0 Tx
0 Tx 0
```

###### (17)

OnceEis known, we can find the directions of the epipolar lines associ- ated with points in the image planes. Let us compute the direction of the epipolar line`associated with pointp:

```
`=Ep=
```

###### 0 0 0

```
0 0 Tx
0 Tx 0
```

```
u
v
1
```

###### =

###### 0

```
Tx
Txv
```

###### (18)

We can see that the direction of`is horizontal, as is the direction of`

, which is computed in a similar manner.

Figure 7: The process of image rectification involves computing two homo- graphies that we can apply to a pair of images to make them parallel.

If we use the epipolar constraintpTEp= 0, then we arrive at the fact thatv=v, demonstrating thatpandpshare the samev-coordinate. Con- sequently, there exists a very straightforward relationship between the cor- responding points. Therefore,rectification, or the process of making any two given images parallel, becomes useful when discerning the relationships between corresponding points in images.

Figure 8: The rectification problem setup: we compute two homographies that we can apply to the image planes to make the resulting planes parallel.

Rectifying a pair of images does not require knowledge of the two camera matricesK,Kor the relative transformationR,Tbetween them. Instead, we can use the Fundamental matrix estimated by the Normalized Eight Point Algorithm. Upon getting the Fundamental matrix, we can compute the epipolar lines`iand`

ifor each correspondencepiandpi. From the set of epipolar lines, we can then estimate the epipoleseandeof each image. This is because we know that the epipole lies in the intersection of all the epipolar lines. In the real world, due to noisy measurements, all the epipolar lines will not intersect in a single point. Therefore, computing the epipole can be found by minimizing the least squared error of fitting a point to all the epipolar lines. Recall that each epipolar line can be represented as a vector`such that all points on the line (represented in homogeneous coordinates) are in the set{x|`

Tx= 0}. If we define each epipolar line as

`i=

###### [

```
`i, 1 `i, 2 `i, 3
```

###### ]T

, then we can we formulate a linear system of equations and solve using SVD to find the epipolee:

```
```

###### `T 1

###### ..

###### .

```
`Tn
```

```
e= 0 (19)
```

After finding the epipoleseande, we will most likely notice that they are not points at infinity along the horizontal axis. If they were, then, by definition, the images would already be parallel. Thus, we gain some insight into how to make the images parallel: can we find a homography to map an epipoleeto infinity along the horizontal axis? Specifically, this means that we want to find a pair of homographiesH 1 ,H 2 that we can apply to the images to map the epipoles to infinity. Let us start by finding a homographyH 2 that

maps the second epipoleeto a point on the horizontal axis at infinity (f, 0 ,0). Since there are many possible choices for this homography, we should try to choose something reasonable. One condition that leads to good results in practice is to insist that the homography acts like a transformation that applies a translation and rotation on points near the center of the image. The first step in achieving such a transformation is to translate the second image such that the center is at (0, 0 ,1) in homogeneous coordinates. We can do so by applying the translation matrix

###### T=

```
1 0 width 2
0 1 height 2
0 0 1
```

###### (20)

After applying the translation, we apply a rotation to place the epipole on the horizontal axis at some point (f, 0 ,1). If the translated epipoleTe is located at homogeneous coordinates (e 1 ,e 2 ,1), then the rotation applied is

###### R=

```
e
```

```
1
e 12 +e 22
```

```
e^2
e 12 +e 22 0
e
```

```
2
e 12 +e 22
e
```

```
1
e 12 +e 22
```

###### 0

###### 0 0 1

###### (21)

where= 1 ife 1 0 and=1 otherwise. After applying this rotation, notice that given any point at (f, 0 ,1), bringing it to a point at infinity on the horizontal axis (f, 0 ,0) only requires applying a transformation

###### G=

###### 1 0 0

###### 0 1 0

```
^1 f 0 1
```

###### (22)

After applying this transformation, we finally have an epipole at infinity, so we can translate back to the regular image space. Thus, the homographyH 2 that we apply on the second image to rectify it is

```
H 2 =T^1 GRT (23)
```

Now that a validH 2 is found, we need to find a matching homographyH 1 for the first image. We do so by finding a transformationH 1 that minimizes the sum of square distances between the corresponding points of the images

```
arg min
H 1
```

```
i
```

```
H 1 piH 2 pi^2 (24)
```

Although the derivation^2 is outside the scope of this class, we can actually prove that the matchingH 1 is of the form:

```
H 1 =HAH 2 M (25)
```

whereF= [e]Mand

###### HA=

```
a 1 a 2 a 3
0 1 0
0 0 1
```

###### (26)

with (a 1 ,a 2 ,a 3 ) composing the elements of a certain vectorathat will be computed later. First, we need to know whatM is. An interesting property of any 3 3 skew-symmetric matrixAisA=A^3 up to scale. Because any cross product matrix [e]is skew-symmetric and that we can only know the Fundamental matrixFup to scale, then

```
F= [e]M= [e][e][e]M= [e][e]F (27)
```

By grouping the right terms, we can find that

```
M= [e]F (28)
```

Notice that if the columns ofMwere added by any scalar multiple ofe, then theF= [e]M still holds up to scale. Therefore, the more general case of definingM is M= [e]F+evT (29)

for some vectorv. In practice, definingM by settingvT =

###### [

###### 1 1 1

###### ]

works very well. To finally solve forH 1 , we need to compute theavalues ofHA. Recall that we want to find aH 1 ,H 2 to minimize the problem posed in Equation 24. Since we already know the value ofH 2 andM, then we can substitute pi=H 2 mpi and pi=H 2 piand the minimization problem becomes

```
arg min
HA
```

```
i
```

```
HApipi^2 (30)
```

In particular, if we let pi= (xi,yi,1) and pi= (xi,yi,1), then the mini- mization problem can be replaced by:

```
arg min
a
```

```
i
```

```
(a 1 xi+a 2 yi+a 3 xi)^2 + (yiyi)^2 (31)
```

(^2) If you are interested in the details, please see Chapter 11 of Hartley & Zissermans textbookMultiple View Geometry

Since yiyiis a constant value, the minimization problem further reduces to arg min a

```
i
```

```
(a 1 xi+a 2 yi+a 3 xi)^2 (32)
```

Ultimately, this breaks down into solving a least-squares problemWa=b forawhere

###### W=

```
x 1 y 1 1
..
.
xn yn 1
```

```
b=
```

```
x 1
..
.
xn
```

###### (33)

After computinga, we can computeHAand finallyH 1. Thus, we gen- erated the homographiesH 1 ,H 2 to rectify any image pair given a few corre- spondences.