RESULTS: Stereo Tracking Algorithm
Sea foor imagery is a rich source of data for themarine scientific comunity since it provides high resolution information at a high rates. Marine scientists such as biologists, geologists or archaeologist have great iterest on obtaining high resolution and accurate maps of the underwater structures related to their studies.

The stereo tracker algorithm is capable of constructing accurate 3D maps using the information comming from two cameras built as a fixed stereo rig. The system constructs the map during the survey (on-line process) using feature-based registering techniques (i.e. SURF and SIFT) exteded to a stereo framework. The algorithm runs a final bundle adjustment optimization (off-line) refining the structure and the camera motion by minimizing the reprojection error of the 3D estimates to the cameras.

Before executing any survey, the stereo system must be calibrated in order to obtain the individual intrinsic camera parameters and extrinsinc parameters relating both cameras. In the calibration process the non-linear image distortion parameters (radial and tangential) are estimated and will anable us to remove this distortions afterwards. Furthermore, after calibration, each stereo image pair obtained by the stereo rig can be rectified. The rectification process transform the image pairs to new pairs that can be considered to be obtained by a fronto-parallel stereo system. Using this type of system correspondece points in the images are allways in scan-lines. This introduces a constraint that reduces the correspondence search from 2D to 1D.

Figure 1 shows two image pairs on the left taken at time t and time t+1. On the right, it is shown the result after applying the correction of the non-linear distortions and the rectification.
Source Images Rectified Images
Images
Acquired
at Time
t+1
Source Pair at Time t+1 Right Arrow Rectified Pair at Time t+1
Images
Acquired
at Time
t
Source Pair at Time t Right Arrow Rectified Pair at Time t
Fig 1. Left column shows the acquired images at time t and t+1 and
the right column shows images after the non-linear corrections and the stereo rectification process at time t and t+1.

To achive the reconstruction the Stereo Tracker Algorithm executes the following actions at each step:

  • Feature Detection
  • Feature Matching
  • Triangulation
  • 3D Registration and ego-motion estimation

Figure 2 depicts detected features using SURF in the quadruplet of rectified images gathered by the stereo rig at time t and time t+1. The best features are selected equaly sparsed withing images by using the non-maximal suppression algorithm.
Detected Features
Fig 2. Detected features using SURF algoritm.
Once the features are detected, SURF descriptors arround them are computed. The feature descriptors are matched in image pairs a) left-right at time t; b) left-right at time t+1; c) left-left at time t and t+1; and d) righ-right at time t and time t+1.

Figure 3 shows on the left all the matches found and, on the right, the matches that remain after applying the epipolar, disparity and quadruplet constraints. Epipolar constraint only allows to find a correspondence at (x+Δx, y) in the right image for a feature at (x, y) on the left. The disparity constraint bounds the Δx along the closer range centered in x. Finally, the quadruplet constraint filters out the quadruplet matches that are not closed set, this is, if we select a matched feature on the left image at time t, and we follow the matching links as edges in a graph, after for steps we must end up in the initial feature. In other words, the resulting shape when linking a set of 4 matched features must be a parallelogram.
Matched Features Right Arrow Filtered Matched Features
Fig 3. The left image shows all the correspondecens obtained after the matching process.
The right image shows the filtered matches using the epipolar, the disparity and the quadruplet constraints.
Using the pair wise matches at time t we triangulate the 3D position of any matched pair of points. The same process is done by the matched features at time t+1. After this step we obtain two sets of 3D points and the correspondences between them since the have computed the time t and t+1 matches. Using this information, a robust 3D registration technique by executing a RANSAC over the absolute orientation algorithm is executed to obtain the stereo rig motion (Rotation and Translation) between time t and time t+1.

The incrementally estimated trajectory suffers from drifting along the time. That is, since more measurements are incorporated the more error is accumulated. We improve the solution by reducing slightly this drift executing an off-line optimization (bundle adjustment) step. Therefore, to incorporate more information to the optimization, the 2D features are tracked within consecutive frames providing two or more 2D projections for each 3D point. We use an optimized sparse-bundle implementation focused on solving structure from motion problems that can account multiple 2D projections of a single 3D point within frames.

Figure 4 shows a video of a small 50 frames estimated camera trajectory and the corresponding 3D structure in red lines and points, respectivelly. The green lines and the bluish points correspond to the refined trajectory and structure, respectivelly, after the optimization step.
Reconstructed Points
Fig 4. This video shows an small (50 frames) trajectory. The red trajectory is the incrementally estimated trajectory while the green one is the result after the bundle ajustment. The red points represent the 3D on-line estimated structure while the bluish points represent the final alignment after the bundle adjustment step.
Finally, we create a 3D mesh structure using the estimated 3D points and we project the images as textures over it to obtain a full 3D map of the surveyed area. Figure 5 shows a video of a full 2000 images survey.
3D Textured Mesh
Fig 5. This video shows the final 3D map obtained applying the gathered images as texture over a 3D mesh calculated using the 3D estimated structure.
Ongoing work addresses the problem of the drift by trying to incorporate SLAM techniques that help on closing loops in the trajectory. Loops detected in the trajectory allow to reduce the uncertanty in the estimates and, therefore, correct the drift. We are working as well in match a propagation step using the triangle constraint in joint-view triangulations (see Fig 6.).
Matched Features
Fig 6. This picture shows the Joint-View triangulation using Delaunay criterion. Independent Delaunay triangulations are executed in all of the views. Individal triangles are matched within all the views using the feature correspondences. Only those triangles successfully matched in all views are kept. Then, new matches could be propagated within the areas of these triangles.