RESULTS: Stereo Tracking Algorithm  
Sea foor imagery is a rich source of data for themarine scientific comunity since
it provides high resolution information at a high rates. Marine scientists such as biologists,
geologists or archaeologist have great iterest on obtaining high resolution and accurate maps of
the underwater structures related to their studies. The stereo tracker algorithm is capable of constructing accurate 3D maps using the information comming from two cameras built as a fixed stereo rig. The system constructs the map during the survey (online process) using featurebased registering techniques (i.e. SURF and SIFT) exteded to a stereo framework. The algorithm runs a final bundle adjustment optimization (offline) refining the structure and the camera motion by minimizing the reprojection error of the 3D estimates to the cameras. Before executing any survey, the stereo system must be calibrated in order to obtain the individual intrinsic camera parameters and extrinsinc parameters relating both cameras. In the calibration process the nonlinear image distortion parameters (radial and tangential) are estimated and will anable us to remove this distortions afterwards. Furthermore, after calibration, each stereo image pair obtained by the stereo rig can be rectified. The rectification process transform the image pairs to new pairs that can be considered to be obtained by a frontoparallel stereo system. Using this type of system correspondece points in the images are allways in scanlines. This introduces a constraint that reduces the correspondence search from 2D to 1D. Figure 1 shows two image pairs on the left taken at time t and time t+1. On the right, it is shown the result after applying the correction of the nonlinear distortions and the rectification.  
 
To achive the reconstruction the Stereo Tracker Algorithm executes the following actions at each step:
 
Figure 2 depicts detected features using SURF in the quadruplet of rectified images gathered by the stereo rig at time t and time t+1. The best features are selected equaly sparsed withing images by using the nonmaximal suppression algorithm.  
 
Once the features are detected, SURF
descriptors arround them are computed. The feature descriptors are matched in image pairs a)
leftright at time t; b) leftright at time t+1; c) leftleft at time t and t+1;
and d) righright at time t and time t+1. Figure 3 shows on the left all the matches found and, on the right, the matches that remain after applying the epipolar, disparity and quadruplet constraints. Epipolar constraint only allows to find a correspondence at (x+Δx, y) in the right image for a feature at (x, y) on the left. The disparity constraint bounds the Δx along the closer range centered in x. Finally, the quadruplet constraint filters out the quadruplet matches that are not closed set, this is, if we select a matched feature on the left image at time t, and we follow the matching links as edges in a graph, after for steps we must end up in the initial feature. In other words, the resulting shape when linking a set of 4 matched features must be a parallelogram.  
 
Using the pair wise matches at time t we triangulate the 3D position of any matched
pair of points. The same process is done by the matched features at time t+1. After this step we obtain
two sets of 3D points and the correspondences between them since the have computed the time t and t+1
matches. Using this information, a robust 3D registration technique by executing a
RANSAC over the
absolute orientation
algorithm is executed to obtain the stereo rig motion (Rotation and Translation) between time t and time t+1. The incrementally estimated trajectory suffers from drifting along the time. That is, since more measurements are incorporated the more error is accumulated. We improve the solution by reducing slightly this drift executing an offline optimization (bundle adjustment) step. Therefore, to incorporate more information to the optimization, the 2D features are tracked within consecutive frames providing two or more 2D projections for each 3D point. We use an optimized sparsebundle implementation focused on solving structure from motion problems that can account multiple 2D projections of a single 3D point within frames. Figure 4 shows a video of a small 50 frames estimated camera trajectory and the corresponding 3D structure in red lines and points, respectivelly. The green lines and the bluish points correspond to the refined trajectory and structure, respectivelly, after the optimization step.  
 
Finally, we create a 3D mesh structure using the estimated 3D points and we project the images as textures over it to obtain a full 3D map of the surveyed area. Figure 5 shows a video of a full 2000 images survey.  
 
Ongoing work addresses the problem of the drift by trying to incorporate SLAM techniques that help on closing loops in the trajectory. Loops detected in the trajectory allow to reduce the uncertanty in the estimates and, therefore, correct the drift. We are working as well in match a propagation step using the triangle constraint in jointview triangulations (see Fig 6.).  
