Estimation of Projection Matrices from a Sparse Set of Feature Points for 3D Tree Reconstruction from Multiple Images

Received May24, 2017 Revised Aug 18, 2017 Accepted Oct 18, 2017 3D reconstruction of trees is an important task for tree analysis but the most affordable approach to capture real objects is with a camera. Although, there already exist methods for 3D reconstruction of trees from multiple photographs, they mostly handle only self-standing trees captured at narrow angles. In fact, dense feature detection and matching is in most cases only the first step of the reconstructionand requires a large set of features and high similarity between individual pictures. However, capturing trees in the orchard is in most cases possible only at wider angles between the individual pictures and with overlapping branches from other trees, which prevents reliable feature matching. We introduce a new approach for estimating projection matrices to produce 3D point clouds of trees from multiple photographs. By manually relating a smaller number of points on images to reference objects, we substitute the missing dense set of features. We assign to each image a projection matrix and minimize the projection error between the images and reference objects using simulated annealing. Thereby, we produce correct projection matrices for further steps in 3D reconstruction. Our approach is tested on a simple application for 3D reconstruction of trees to produce a 3D point cloud. We analyze convergence rates of the optimization and show that the proposed approach can produce feasible projection matrices from a sufficiently large set of feature points. In the future, this approach will be a part of a complete system for tree reconstruction and analysis. Keyword:


Introduction
Reliable measurement of branches is important for successful examination of trees.However, manual measurement and examination of trees inside orchards is a tedious task, which is prone to errors and also requires extensive workforce.It is more preferable to capture real trees and perform examination on digitized trees inside the office.Therefore, numerous algorithms for automatic 3D reconstruction of trees have been developed.Many of them require special devices for tree capturing, e.g.terrestrial LIDAR [1] and Kinect [2], which are not widely deployed.On the other hand, cameras (e.g., on mobile phones) are the most affordable and widely adopted approach to capture real objects.Pipelines for 3D reconstruction from images in most cases [3] consist of numerous steps: detecting feature points on a sequence of images, matching feature points between image pairs, camera calibration (i.e., pose estimation), and triangulation [4].The biggest issue with this approach lies in the heavy dependence on detecting and matching feature points.Most commonly, SIFT [5] or SURF [6] are used for this task, but they require distinct features on images.When trying to perform 3D reconstruction of trees inside the orchards, we cannot easily produce enough distinct feature points because trees might be captured at wider angles, have repeating patterns, look quite similar and therefore lack distinct features.Overlapping trees in the orchards further aggravate the aforementioned problem.3D tree reconstruction inside orchards is a problem which cannot be easily solved with the standard 3D reconstruction pipeline.Although numerous approaches for 3D reconstruction of trees from photographs were developed, many of them are designed mostly for convincing visualization or animation of reconstructed 3D models and therefore perform more coarse reconstruction [7], [8], [9], [10].On the other hand, other more exact approaches [11], [12] have specific requirements, such as clear background, no overlapping trees, or smaller recording angles.Because we cannot adhere to aforementioned requirements, we must use alternative approach for 3D reconstruction.In our case, an approach similar to voxel coloring [13] is used, which performs 3D reconstruction in inverse direction than ordinarily (i.e., with triangulation [3]) and it thus does not require matched feature points.On the other hand, it requires correct camera projection matrices in advance.Evolutionary algorithms are useful tool for camera calibration when calculating intrinsic parameters [14] or even extrinsic parameters by considering relation between real 3D object with known coordinates and its occurrence on images [15].However, considering solely relation between 3D object and its occurrence on images is not enough when 3D object covers only part of the image.In this case, the projective error on other parts of the image increases, which prevents correct 3D reconstruction.In this work, we introduce new approach for estimating camera matrices which considers in addition to few correspondences between points on a known object and its occurrences on multiple images also correspondences between points on multiple images.We use simulated annealing [16] for calculating projection matrices because it requires only few parameters and is quite successful on various applications (e.g., automated tree pruning [17]).We show that consideration of direct relations between images lowers projective error and thus enables 3D reconstruction.

Overview of the method
Automatic feature extraction and matching from our photographs is not reliable, therefore we provide alternative data for estimating matrices of images.We can easily provide just a set P of few corresponding points between neighbor images and reference object.Thus before recording, we place a unit cube inside the scene.After taking pictures, we manually match each vertex of the cube with its occurrence on all pictures.For each vertex p of the cube, we thus define its 3D coordinates up inside the scene and its 2D positionvp,i on each picture i.Additional data are provided by manually relating few tips of branches between the images.Therefore, we providevp,ifor branch tips and thus match each point p on a branch tip between different pictures.Our approach for estimating projection matrices is designed to consider input data which can be easily provided.To find 3×4 camera projection matrix Mi of each image i from a set of images I, two-step approach is used.In the first step, we calculate initial estimation of projection matrices, where relations between known object in 3D space and its projection in images are considered.Here, a draft approximation of metric reconstruction is obtained.In the second step, we refine matrices from the previous step by considering also the correspondences between points (tips of branches) on multiple images.

Initial projection matrix estimation
The objective of the first step is to find for each image i a coarse projection matrix for metric reconstruction.With simulated annealing [16], we calculate projection matrix Mi of picture i by minimizing for verticesp∈P the distance between projectionMi of 3D pointup and its 2D positionvp,ion the image i. Criteria/energy function of matrix Mi is: where function g(M,u, v) calculates distance between point v and projection M of point u (in pixels): The initial state of simulated annealing is for the first image a random matrix which has values in the range of a typical projection matrix: ].
Here, ri∈[0, 1]is a random number from the uniform distribution.New candidate projection matrix ′  is calculated from Mi.One random element at row r and column cof matrix Mi is in each iteration of simulated annealing perturbed according to parameter F and current temperature T: ′ ,, =  ,, +  ,, (2 − 1).
Other elements are perturbed with probability C. If candidate′  is better solution than Mi, it is accepted as the new state (Mi), otherwise it is accepted with Metropolis criterion [18].The best estimated projection matrix of image i−1 is used as the initial state of image iin order to speed up the estimation of subsequent matrices.

Final estimation of matrices
In this step, we refine matrices from the first step by considering also the correspondence between points on multiple images.With simulated annealing from the previous step we produce a set of projection matrices M. Initial state of simulated annealing is thus a set of matrices from the first step M={M1, … ,M||I||}.Criteria function for minimizationin the second step is: ( Here, function t(p, M) calculates 3D position from correspondencesvp of point pby averaging triangulated positions between image-pairs [19].Unlike the first step, each iteration of simulated annealing changes one random element of randomly chosen matrix.Other elements are perturbed with probability C.

Results and discussion
Our approach for estimating projection matrices was tested on a set of 8 orchard images (Fig. 1) which are obtained synthetically from EduApple [21].For each test we executed simulated annealing 10 times.At first, we estimated coarse projection matrix of the first image.Here, we used next parameters for simulated annealing: T0=1, F=0.5, and C=0.Fig. 2 displays convergence curves of median, highest, and lowest errors.Results indicate successful estimation as best solution had errors (i.e.average distance of pixels) smaller than 10 pixels.The best solution of each estimation of Miwas used foreach subsequent image (Fig. 3).Here, the parameters of simulated annealing were:T0=1, F=0.1, and C=0.5.For the first image, we again used previous estimation and lowered the error even further.The errors of other images were also significantly lowered from the initial estimation, although initial errors were not so high because projection matrices of consecutive images are already similar to final estimation.Here, we used resulting matrices for initial version of 3D reconstruction pipeline, which works similar to voxel coloring [13].Fig. 5 shows result of successful 3D reconstruction, where produced tree is similar to the tree at the cube (Fig. 1).

Conclusions
In this paper, we presented alternative approach for estimating projection matrices and show that two-step approach was able to correctly estimate projection matrices and enable further steps of 3D reconstruction.Simulated annealing supported steady convergence of our optimization problem.Initial estimation of projection matrices lowered complexity of the problem for the second step and thus increased estimation speed of final projection matrices.Finally, we have successfully produced 3D point cloud of simple tree from multiple images inside the orchards.3D structure was similar to input images and thus appropriate for subsequent analyses.
In the future, we plan to develop a complete framework for tree reconstruction.Matching corresponding branches manually is tedious, therefore we want to automate this task to a greater extent.Simulated annealing in current version requires large number of criteria function evaluations.Therefore, we plan to increase convergence speed of the presented approach by tuning input parameters.Simulated annealing is also only one of the algorithms for global optimization.We plan to verifyother algorithms for global optimization, e.g., DE [22] or jDE [23], or even multi-criteria optimization algorithms [24].Finally, we plan to use our approach for further growth analysis.

Figure 1 .
Figure 1.A sequence of images used in 3D reconstruction.

Figure 3 .
Figure 3. Convergence curves of initial estimation of all projection matrices (median/best/worst estimations).

Fig. 4
Fig.4shows convergence rates of the second step of matrix estimation.Here, we used next parameters: F=0.1, T0=0.01, and C=0.05.Lower initial temperature was used because matrices were in most cases already oriented towards the optimal solution.Smaller variations between best and worst estimation confirm the aforementioned fact.The final error was just few pixels, which enables further steps of 3D reconstruction.Here, we used resulting matrices for initial version of 3D reconstruction pipeline, which works similar to voxel coloring[13].Fig.5shows result of successful 3D reconstruction, where produced tree is similar to the tree at the cube (Fig.1).

Figure 4 .
Figure 4. Convergence curves of final optimization of all eight matrices (median/best/worst estimations).

Figure 5 .
Figure 5. Resulting 3D structure of the reconstructed tree.