Mathematical models for metric features extraction from RGB-D sensor

The use of the RGB-D camera has been applied in several fields of science. That popularization is due to the emergence of technologies such as the Intel ® RealSenseTM D400 series. However, despite the actual demand from some potential users, few studies concern the characterization of these sensors for object measurements. Our study sought to estimate models dealing with calculating the area and length between targets or points within RGB and depth images. An experiment was set up with white cardboard fixed on a flat surface with colored pins. We measured the distance between the camera and cardboard by calculating the average distance from the pixels belonging to the target area. The Information Criterion AIC and BIC associated with R 2 were performed to select the best models. Polynomial and power regression models reached the highest coefficient of determination and smallest values of AIC and BIC.


Introduction
Even though RGB-D sensors were extensively employed in the past decade, mainly to promote human interaction through video game consoles, their use for object detection and metric measurements in productive activities has been in the early stages. Beyond entertainment use, these sensors were also used to solve object recognition complex problems, including human tracking activity (Wang et al., 2014;Basso et al., 2013;Nguyen et al., 2014). Nowadays, its characteristics of low-cost RGB-D sensor and stable 3D vision technologies have been applied to detect animal and agricultural targets bringing promising outcomes to activities that are currently time-consuming, labor-intensive, and expensive work. Studies related to livestock production demonstrated a high correlation (R 2 > 0.9) between 3D cameras data and manual measurements (Hu et al., 2021, Pezzuolo et al., 2018Condotta et al., 2018;Kongso, 2014). Most RGB-D information is analyzed through regression or machine learning models to estimate metric measurements, surface area, and volume.
Focusing on this prominent market, Intel® launched the RealSense™ D400 series of RGB-D cameras, providing colored image flow and depth maps. These cameras rely on an infrared light source, which brings improvements in data quality, allowing the most reliable outdoor applications, such as a tropical environment with a high luminous variation (Condotta et al., 2020). Despite its potentials, there is little information about performance and methodological standards for using it so far, which will be crucial for scientific applications. The approach implemented by Vit and Shani et al.,2018 used Euclidean distance in the 3D plane to measure artificial objects' size while Hu et al., 2021 extracted length and width from RGB-D images of non-restrained pigs using primary pixel count.
This work aimed to develop and validate mathematical models.s to perform area estimation and metric measurements using the depth and color images from an RGB-D camera.

Materials and Methods
The study was carried out at SIGEO Laboratory in Embrapa Agrosilvopastoral, Sinop -Mato Grosso State, Brazil. In order to obtain the relation between the target pixel area and depth images at the RGB-D camera, an experiment was set up: a white cardboard was fixed to a flat surface with colored pins (Fig. 1), perpendicular to the camera. The cardboard had 0.33 m 2 and was the planar target for further considerations. Pins were attached to form a spatial grid and support horizontal, vertical, and diagonal measurements.
An Intel ® RealSense ™ D435i camera captured images of the cardboard and its surroundings in the lab illuminated by fluorescent light of approximately 5 watts. The D435i model of RealSense technology was chosen because of its larger field of view (FOV) and superior global shutter performance compared to D415 models. These characteristics deal accordingly with blind spots The camera was then placed at 17 increasing distances, from 0.9 m to 2.5 m, with a step of 0.1 m. Although this camera has a measuring range of up to 10 m, a maximum distance of 2.5 m was chosen because this range had satisfactory sensitivity to agriculture applications, similar to observed by Condotta et al. (2020). Both RGB and depth images were simultaneously acquired at each distance. Surface cardboard length was measured; however, only the diagonal was considered for distance measurements among predetermined dots for depth images. In this case, the distance between the camera and the cardboard was calculated through the depth image without performing stream alignment. One example of images captured from the RGB and depth sensors are shown in Figure 1.

Mathematical modeling and accuracy assessment
To verify Intel ® RealSense ™ D435i camera's suitability for metric measurements as the best mathematical model to correlate digital and manual measurements of distance and area, four distinct mathematical models were compared: linear, logarithmic, polynomial, and power. For this reason, statistical analysis was performed to assess the accuracy of models where the best fit model should emphasize the robustness criteria and computational simplicity. This study has employed three statistical analyses: coefficient of determination (R 2 ), Akaike Information Criterion (AIC) (Akaike, 1974), and the Bayesian Information Criterion (BIC) (Stone, 1979). AIC and BIC compare candidate models. These techniques are based on an in-sample fit to estimate a model's log-likelihood and its complexity to predict future values, as shown by equations 4 and 5, respectively. Therefore, the models achieving the lowest values of AIC and BIC are considered the best models. The coefficient of determination (R 2 ) is a well-known technique that quantifies the proportion of variance explained by a statistical model through measured (Am) and estimated (Ae) pixels area or distance. (1)

Model validation
For model validation purposes, area and length best-fit equations were applied to assess measured and predicted length, width, and area at the three corrugated paper boxes with a defined size.
The objects were positioned perpendicularly to the camera and sampled individually over three distances, 1.01 m, 1.54 m, and 2 m.
The difference of the actual boxes' length and boxes´ width was computed from the same measurements captured by the sensor using depth and RGB images dimensions. The goal in the validation step was to evaluate RGB-D sensor capability to register sizes on best-fit equations. Lately, a paired t-test was applied to assess significant differences in box surface area and sides measurements.
The individual area was evaluated for each image acquired between the camera sensor´s and the studied scenario at increasing distances. The boxes' area variations were evaluated by increasing the distance between the camera and the studied scenario. Similarly, the differences in actual area and acquired by sensor were compared. The area validation was only applied on depth images at three size boxes over different distances.

Models Development
From 0.9 m to 2.5 m, the cardboard area ratio (m 2 /pixel) was recorded and translated from pixels to metric units for each distance range. Figures 2 and 3 show the trend lines and best equations adjusted by linear, logarithmic, polynomial, and power models. Equations 6 and 7 were rewritten because they obtained the highest coefficient of determination associated with model simplicity to calculate area in depth and RGB images. As can be observed from these figures, for both depth and RGB images, power and polynomial models were the best to fit experimental data (R 2 = 1, p-value < 0.001) .

(6)
where: Am = area in square meters, Ap = area of one pixel assuming Ap =1; x = distance from the sensor to the object varying from 0.8 m to 2.5m.
where: Am = area in square meters = distance of the camera to the object; Ap = area in pixels that should be replaced by 1 and multiplied by the number of pixels associated with a specific distance from the sensor. (Ap = 1 in eq. 7), especially when the target has an irregular surface.
Up to 2 meters, there is significant variation of length in both types of images, as observed in figures 4 and 5. In this case, the sensor recorded a few segments with the supposed same length with slightly different lengths. Also, the difference increases as the sensor is moved away from the target. The errors of vertical and horizontal measurements are likely different, and this difference increases with the distance (Intel RealSense, 2019). The study of Carfagni et al. (2017) corroborates this finding and describes it as systematic errors on the Intel SR300 sensor. In homogeneous distance errors cause both length and area measurement inconsistencies. These errors are mainly due to the sensor's limitations to perceive the target as planar and depth offset. Choo (2015) and Condotta et al. (2020) reported a similar behavior with a Microsoft Kinect ® sensor. The stochastic nature of the error is mainly affected by the depth of the object in the scene. Furthermore, radial errors cause little distortion, which explains greater variation at depth images compared to colored.
where: Dcm = distance in centimeters within cardboard reference points; Dp = distance in pixels; X = distance of the camera to the object. where: = distance in centimeters within cardboard reference points; Dp = distance in pixels; = distance of the camera to the object using data from an associated colored image.
Similarly, from Figures 4 and 5 it can be inferred that polynomial and power models obtained a high coefficient of determination (R 2 > 0.98).
Applying the criterion that the model with the minimum values of AIC and BIC linked with the highest R 2 was the optimal model chosen, the polynomial approach was the best for both models of area and length from depth or color images, as shown in Table 1

Models Validation
Paired t-test compared box's length or width measurements from RGB and depth images resulted in no statistically significant difference (p < 0.05) in any box size and distance from the sensor. The mean difference at length estimates suggests an increment of 1.62 cm for RGB images compared to depth images. Also, the mean difference in box's width estimations showed a decrement of 1.3 cm for RGB images compared to depth images. This result agrees with the satisfactory performance of equations generated in this study to estimate linear measurements (e.g., length and width) from both image formats.
For validation purposes, we computed the power mathematical regression models (eq. 6, 7, 8, and 9) at three different box´s sizes dealing with estimating area (cm 2 ), width and length (cm). The highest errors measurements (Digital estimated value -Analogic measured value) were found in depth images compared to RGB images values. This means that real box dimensions were over and underestimated in all observed distances (Fig. 6 a). In particular, at a 1.5 m distance from the camera sensor, errors from depth images were closer to zero, except length measurement at the large box. For RGB images, in general, measurements at the closest distance (1 m) had the lowest errors compared to further distances (2m). Also, estimated values were underestimated at closer distances and overestimated at farther distances for small and medium boxes.
Considering the boxes' width, Figure 6b shows that depth and RGB images from small and medium boxes present a lesser difference than the larger box at three distances. Generally, those results agree with Khoshelam (2012) results using a Microsoft Kinect® sensor.  ). However, when we calculate the proportional error, taking into account the actual area box as a unit, we found out that, generally, in RGB images, the estimated areas of the three types of boxes were lower than actual areas up to 1.5 m, varying from 0.64 % to 6.7 %. On the other hand, at 2 meters of distance from the camera, the area values were greater and varied from 3.2 % to 7.4 %. Nevertheless, considering the depth image, the area estimated overall from depth images was greater than the actual boxes' area for each distance. The only exception was a large box distant 1.5 m, where the measured area was underestimated by 10%. Figure 7. Area error (cm 2 ) between measured paper boxes area in RGB and depth images obtained at different distances with equation 6.

Conclusion
RGB-D sensor is becoming essential to diverse industrial and agricultural applications that need to measure the within area and distance to specific targets. This paper presents best-fit models for area and length using Intel ® RealSense™ D345i sensor in RGB and depth images. We estimated mathematical models from recorded, both RGB and depth, images at incremental distances. This study suggested that the polynomial and power regression model obtained the combination of a high coefficient of determination (R 2 > 0.9) and minimum values of AIC and BIC. However, considering model simplicity, We might say the power regression model has fewer computational demands, which in the on-the-fly automatic application can be the best option. Those models are either recommended in Condotta et al. (2020) study. The validation step has shown that objects with large areas, despite their distance from RGB-D sensor, demonstrated larger errors than actual areas. Until 1.5 m the area on both RBG and depth images was underestimated, and at 2 meters from the sensor, they were overestimated. The area at medium and small targets had fewer errors than large objects, reaching 64 cm 2 of error. The main limitations of this experimental setup are: i) the small scale of the area target (0.33 m 2 ), which for targets much greater than can be challenging; (ii) the restricted range of the sensor (0.8 -2.5 m). We will plan to tackle these constraints in future studies and targets with other complex morphological features.