Skip to main content

Digital Twins: Precision of 3D Gaussian Splatting

Introduction

The evolution of 3D representation techniques has enabled significant advancements in digital recreation. While methods like Neural Radiance Fields (NeRF) have caugh attention, this analysis emphasizes 3D Gaussian Splatting (3DGS) due to its growing relevance, high portability, and performance, making it ideal for integration into multiple platforms such as mobile devices or web applications. Additionally, the accuracy of LiDAR technology available in consumer-grade devices like the iPhone will be evaluated across various scenarios and compared to results achieved through 3DGS.

These advancements directly impact the development of Digital Twins, which rely on precise and up-to-date virtual representations to optimize monitoring and predictive maintenance processes. By integrating real-time data and the versatility of 3DGS, new opportunities emerge to drive innovation in sectors such as construction, manufacturing, and smart city management.

This article focuses on comparing the accuracy of 3D reconstruction using two primary approaches: 3D Gaussian Splatting (3DGS) and LiDAR systems implemented in mobile devices like the iPhone. Performance will be evaluated across different scales and environments as follows:

  • Medium-scale environment (100m² office): Accuracy in representation and spatial consistency will be evaluated.
  • Large object (7m modular furniture piece): The ability to reconstruct highly complex shapes will be assessed.
  • Large-scale environment (5000m² using 360º cameras): Full coverage in expansive spaces will be achieved, making data capture easier using 360º cameras.

The aim is to identify the strengths and limitations of each method, providing key insights for future applications in digital twins and real-time monitoring.

Devices Used

To measure the accuracy of the technologies described, we must first establish a reference framework by defining error metrics.

For this purpose, we will use the Leica RTC360 LiDAR, a portable, high-speed 3D laser scanner designed to generate colored point clouds with an accuracy ranging between 1 and 2 millimeters and a range of up to 130 meters. This makes it an ideal tool for applications requiring high-fidelity 3D documentation, as in our case.

Regarding the devices used for data capture, we employed a variety of tools to cover different environments:

  • Insta360 Pro2: A professional 360-degree camera designed for immersive content capture, capable of recording 8K video in both 2D and 3D with exceptional quality. In this study, it is primarily used for its full-coverage capability, enabling rapid and efficient recording of large spaces while streamlining the filming process.

  • iPhone 12 Pro: This Apple device was the first iPhone model equipped with LiDAR technology. While its primary function is to enhance the device’s spatial positioning in challenging environments, it has proven highly effective in generating 3D point clouds representing scanned spaces. We will leverage this feature to capture objects tested in this study.

  • GoPRO 13 Hero Black: A quintessential action camera, valued for its compact size and high video quality. These cameras will be mounted on a 1.80-meter-long tripod and positioned at three different heights, allowing us to achieve comprehensive coverage in less time.

What is 3D Gaussian Splatting and How is it Generated

3D Gaussian Splatting (3DGS) is a 3D rendering technique that uses “Gaussian splats” (points with position, size, and color attributes) to represent and render three-dimensional scenes. Unlike traditional polygon-based methods, this approach enables resource-efficient, highly detailed 3D representations that adapt to complex environments. Its applications show promise in fields like architectural visualization, visual effects, and simulations.

The workflow begins by generating an initial set of 3D Gaussians from a sparse point cloud produced through Structure from Motion (SfM). For this step, COLMAP (open-source software) is primarily used, though notable alternatives include Epic’s Reality Capture (free) and Agisoft’s Metashape (paid license).

This process aligns with classical photogrammetry methods, resulting in a point cloud containing all extracted features and the corresponding camera pose for each input image.

Ilustración 1 COLMAP’s Incremental Structure-from-Motion Pipeline

Following this, an adaptive method generates test views using the initial Gaussians and compares them to the actual photos to calculate errors. It then iteratively adjusts the position, size, and color of each Gaussian to improve alignment, distributing them uniformly. Through repeated refinement, the Gaussians become finely tuned, enabling instant photorealistic exploration of the scene.

Ilustración 2 Adaptive Training Process of Gaussians

If the preceding steps are successful, the result (composed of several million Gaussian splats) will resemble something like the following example:

Illustration 3:  Result of 3DGS with 7 million Gaussians

Precision Study

 

In this precision study, we will evaluate the following environments:

  • Medium-scale environment (100m² office)
  • Large object (4.7m office furniture piece)
  • Large-scale environment using a 360º camera (5000m² facility)

To measure the accuracy of 3DGS across various scenarios in reconstructing real-world spaces, we will use data captured by the Leica RTC360 LiDAR as a reference benchmark. This provides a high-density, high accuracy point cloud with an error margin of 1 to 2 millimeters.

We will use CloudCompare, an open-source point cloud processing software, to compare and quantify errors:

  • Alignment: The 3DGS reconstruction (or iPhone LiDAR-derived point cloud) will be aligned with the reference point cloud obtained by Leica.
  • Error Measurement: The “Cloud/Cloud distance” function will calculate per-point error distances. This assigns a color gradient to each point based on its deviation from the reference cloud, while also generating mean error and standard deviation metrics.

Medium-Scale Environment (100m² Office)

The first scenario is a workshop office module in the Research department located in Seville, Spain, covering approximately 100 square meters.

Leica LiDAR Reference (Ground Truth)

A reference scan of the office was performed using the Leica RTC360 LiDAR:

3D Gaussian Splatting

The 3DGS capture was performed using the 3 GoPRO rig mentioned earlier. Positioned at different heights (approximately 80 cm apart from each other), this setup offers the significant advantage of capturing the environment more efficiently, as a single pass covers the vertical range typically required for accurate 3D reconstruction. It is essential to avoid shadows or areas that haven’t been captured by the camera, and by working at different levels, we minimize these shadowed regions.

The training process utilized a total of 529 images selected from the three recorded videos (one from each GoPro).

Below is the result:

After generating the 3DGS, we proceeded to measure the error:

 

This image shows a gradient where blue indicates points with an error of less than 1 centimeter, while those exceeding 10 cm are marked in red.

The overall results were:

  • Mean error: 3.49 cm
  • Standard deviation: 5.6 cm

Key considerations:

  • 3DGS tends to produce diffuse areas on surfaces lacking definition, due to its inability to extract features from the images that can be accurately positioned. As a result, areas like floors, walls, or blinds may display points displaced from their true location, resulting in red points:
  • Additionally, certain elements can shift slightly (like some chairs), or even suddenly appear (for example, the Leica LiDAR itself):

iPhone LiDAR

A scan of the office was also captured using the iPhone 12 Pro’s LiDAR, providing immediate results. Specifically, the Scaniverse app was used, allowing export to a PLY file for point cloud comparison in CloudCompare. The error analysis is shown below:

The visualization range parameters are identical to the previous case.

The overall results were:

  • Mean error: 7.01 cm
  • Standard deviation: 23.2 cm

The first noticeable aspect is the reduced definition compared to 3DGS, caused by the low resolution and limited range (around 2 meters) of the iPhone 12 Pro’s LiDAR sensor (its effective range forms a 2-meter sphere). When scanning larger spaces, this limitation leads to error accumulation and proportion drift; in fact, as you move around with the phone and return to the same point, the scan may differ and alter previously captured results.

Large Object (4.7-Meter Modular Furniture)

The next object of study is a large item, in this case, a modular furniture unit within the same office environment as the previous scenario. The goal is to evaluate the accuracy of mobile devices using automated dimension mapping via QR codes.

Leica LiDAR Reference (Ground Truth)

For this case, a portion of the point cloud from the previous scenario was isolated to focus solely on the furniture region. The modular furniture measures 4.78 meters in width and 0.75 meters in height.

3D Gaussian Splatting

For the 3DGS capture, the iPhone 12 Pro was used. A 1-minute 32-second video was recorded, consisting of three passes at different heights, from which the 236 best images were selected.

A new element in this case was the placement of printed markers throughout the scene, specifically AprilTags, which enable automatic scale calibration for the 3D reconstruction. Instead of COLMAP, Reality Capture was used for image alignment and pose generation due to its built-in support for this feature:

With these adjustments, the result is as follows:

 

After generating the 3DGS, we proceeded to measure the error:

The dimensions derived from the markers performed exceptionally well, with the furniture’s length measured at 4.78 meters, nearly identical to the reference measurement.

Overall results:

  • Mean error: 1.32 cm
  • Standard deviation: 2.5 cm

Like the previous case, elements with high contrast (edges, objects, etc.) were resolved with high precision, with errors mostly below 1 centimeter.

iPhone LiDAR

The iPhone 12 Pro’s LiDAR produced inaccurate results. This case highlights the device’s inability to resolve positional drift during large-motion captures. To rule out app-specific issues, two applications were used to scan the modular furniture:

  • Scaniverse
  • Polycam

Significant variability was observed in measurements:

  • Scaniverse: 5.08 meters in length (30 cm overestimation).
  • Polycam: 4.59 meters in length (19 cm underestimation).

Large-scale environment (5000m² facility)

This represents a significant leap in the scale of the area targeted for reconstruction. In this case, the project involves reconstructing a section of the building housing the Research offices in Seville:

To conduct precision testing in extensive, large-area environments, for this purpose, previous solutions are not suitable, as capturing all angles using conventional methods would require excessive time and effort.

To address this, the decision was made to use 360° cameras. These devices capture images or videos with a 360° spherical field of view, covering every angle around them. Specifically, we use the Insta360 Pro2, a professional-grade camera known for its high image quality (8K resolution).

Leica LiDAR Reference (Ground Truth)

We again leverage a complete point cloud of the environment as a reference:

3D Gaussian Splatting

360° cameras typically produce equirectangular images, which are flat projections of spherical imagery, usually in a 2:1 aspect ratio, as shown below:

This is an immediate challenge, as both COLMAP and 3DGS training methods are designed for traditional input images. These assume a Pinhole camera model with perspective projection (straight lines in the real world appear straight in images) and standard intrinsic parameters (focal length, optical center, radial/tangential distortions).
In contrast, equirectangular 360° images use a spherical projection:

  • Each pixel encodes an azimuth-elevation direction rather than a straight ray.
  • “Straight lines” in the real-world curve with latitude, and scale varies significantly from poles to the equator.

To achieve this, each equirectangular image is divided into six faces (like the sides of a cube) using traditional perspective projection. However, only four of these faces are useful for 3DGS training, as the top and bottom faces are discarded.

After extensive testing, we developed the following workflow, which yielded optimal results:

  • Frame selection from the 360° video
    We split the 360° video into 1‑second windows, and within each window we pick the best frame. A frame is considered “high quality” if it is sharp (no motion blur) and well‑defined. To quantify sharpness, we compute the Laplacian variance of each frame. The outcome of this step is a list of equirectangular images:
  • Conversion to planar views
    From each equirectangular image, we generate four 120° FoV planar images with sufficient overlap to enable more accurate alignment in COLMAP:
  • COLMAP processing
    We send the roughly 2,000 individual images generated above into COLMAP. Due to the data volume, we manually tweak COLMAP’s default settings, most notably by using a larger visual vocabulary (Vocabulary Tree con 256k visual words). The result is a set of camera poses correctly registered in space:
  • 3D Gaussian Splatting training
    Finally, we train a 3D Gaussian Splatting (3DGS) model

The output looks like this:

We then measure the reconstruction error:

Now, red indicates errors greater than 50 cm, while blue indicates errors below 3 cm.

Overall results:

  • Mean Error: 7.82 cm
  • Standard Deviation: 11.49 cm

A few observations:

  • Most surfaces and high‑contrast points achieve good accuracy:
  • Many Gaussians appear in areas (such as the building’s interior glass façades) where no geometry should exist. This artifact arises because 3DGS handles reflective surfaces (like large glass windows) by placing Gaussians in those regions:

3DGS capture limitations in large scenes

When reconstructing expansive environments with 3DGS, bear in mind:

  • Typically, capturing data at 2 or 3 different heights is necessary for accurate reconstruction. However, 360° videos are usually captured at a single height, which may result in poorer outcomes.
  • Large-scale outdoor environments, like the one in this example, require special attention under certain conditions:
    • Fixed exposure limitations: Ideally, a fixed camera exposure is preferred, but in highly sunny environments with shaded areas, the dynamic range of the camera may fail to resolve extreme contrast, resulting in heavily shadowed or overexposed regions:
    • Temporal inconsistencies: Delays during capture (9 minutes in this scenario) cause significant changes in cloud positions and cast shadows, leading to reconstruction artifacts in affected areas:

Future Improvements

3DGRUT

During the development of this project, 3DGRUT—a hybrid technique combining 3D Gaussian Ray Tracing (3DGRT) and 3D Gaussian Unscented Transform (3DGUT)—was introduced. This methodology enables training camera poses directly from non-conventional camera inputs, such as fisheye or 360° images, without requiring prior rectification.

While 360° image-based reconstructions hold promise, 3DGRUT currently has limitations:

  • It relies on specialized hardware with ray-tracing cores.
  • Its design is tailored for desktop environments, restricting deployment on web platforms.

Spherical Alignment via MetaShape

COLMAP lacks native support for spherical images. As discussed earlier, the typical workaround involves splitting equirectangular images into overlapping planar views. However, this approach risks misalignment between slices from the same panorama, leading to inconsistent pose estimations:

In contrast, Agisoft Metashape natively supports spherical imagery. By configuring the camera type as “spherical” during calibration, equirectangular panoramas can be processed without splitting, ensuring more precise and consistent alignment. ​

Currently, at Plain Concept Research, we are working on the possibility of using the camera poses obtained with Metashape (for each sliced equirectangular frame) to generate relevant data for integration into COLMAP workflows. This hybrid strategy aims to enhance reconstruction quality in pipelines using techniques like 3D Gaussian Splatting (3DGS)by combining the strengths of both platforms.

Conclusions

3D Gaussian Splatting (3DGS) is a 3D representation technique that relies on Structure from Motion (SfM) data, typically generated using tools like COLMAP. It serves as a rendering layer capable of high-fidelity, real-time scene visualization.

The study demonstrates that 3DGS achieves acceptable accuracy across diverse environments, from small-scale scenes to larger spaces. This is attributed to its ability to preserve structural integrity and proportional consistency of scene elements during resolution. However, the accuracy of camera poses remains critical for optimal results.

Regarding iPhone LiDAR, it delivers acceptable quality for reconstructing small-scale objects. However, for elements beyond the sensor’s immediate range (requiring displacement during capture) significant drift becomes evident, degrading reconstruction reliability.

Useful Links

Software tools used in this study:

References

 

 

Author
David Ávila
Senior Staff Researcher