Gaussian Splatting: rapid 3D with AI tools

by Nathan Saucier, Leiden University, The Netherlands.

Thanks to recently-released AI tools, it is now possible to create highly realistic 3D environments, using only simple photos or videos of a given location. One of the most promising of these tools, Gaussian Splatting, offers a new means of quickly generating 3D content of a kind which used to take days or even weeks to produce, consequently lowering the technical bar to creating immersive 3D experiences.

Background

At the Leiden Learning and Innovation Centre (LLInC), we have been working for a number of years with 360° video and VR, demonstrating their pedagogical applications. 360° video is an appealing medium owing to how easily it can be both created and viewed as well as its high degree of photorealism, yet it has one flaw: the user is “stuck” in one location, constrained to looking around the inside of a video sphere. Because of this limitation, many media producers have opted for VR, owing to its interactivity and full freedom of movement. VR, however, most often looks like a video game and photorealism falls by the wayside.

This has posed a dilemma for many producers, forcing a choice between two avenues of expression for a given project. But this dilemma may soon be resolved, thanks to the promising new technology of Gaussian Splatting – a novel machine learning technique which allows us to make navigable 3D scenes that look real and are easy to create.

First published at SIGGRAPH 2023 – a leading conference centred upon computer graphics and interactive techniques – Gaussian Splatting lets us construct 3D environments using simple photos or videos. The images are first processed to determine the geometry of the scene before the program calculates where in the space each picture belongs, creating a point cloud to build a 3D mesh. Next, the AI training begins. The point cloud has textures (“splats”) mapped onto it through an iterative process. The model attempts to find the most accurate colour for each point, refining it many thousands of times, before blending each point together with its neighbours (these are called 3D Gaussians). The result is a smooth, continuous surface with a high degree of photorealism.

3D Gaussians Under Construction

Compared to similar technologies which came before it, Gaussian Splatting offers much faster training times to produce its final result. Moreover, it is more photo-accurate. These qualities are an important part of any media project, especially those which are as large in scope as many educational projects. Consider, for instance, an archaeology project that seeks to capture in 3D the environments of many sites around the world. In this case, the speed and accuracy of Gaussian Splatting provides a clear advantage over previous scanning technologies.

Building a Scene

In order to train any model, we need some data. For our Gaussian Splats, this means video. Capturing video to create a 3D model is not a difficult process but it should be a comprehensive one. Walking around a given scene, the camera operator must strive for as much coverage as possible. This means not only shooting in all relevant directions, but also at different distances from objects and at several different heights. One option is to deploy a 360° camera, enabling an omnidirectional shoot in a fairly small amount of time. The spherical video may then be exported as several “framed” videos or stills – flat perspectives from within the 360° sphere.

Framed Perspectives from our 360 Camera

Once the video is ready, we can train our model using either a cloud service or a local machine. At LLInC, we have (thus far) opted for the latter, using the free Windows program Postshot to train on our own Nvidia hardware. This we chose since local training affords much more granular control and yields better results than cloud options. Our splats can be monitored in real time and their parameters can even be adjusted on the fly without restarting the entire process. The fact that the cloud options are currently commercial software was another significant consideration. This being said, the cloud route is, however, currently more user friendly since splats can be produced entirely on a smartphone and hosted on the web. From our research thus far, Luma.ai and Polycam are the apparent leaders in cloud-based Gaussian Splatting, although new commercial options are appearing all the time.

Postshot Training Underway

Unity Scene with Gaussian Splat

Training times will depend on the complexity and size of the scene, as well as the power of the computer used to train the model. The alleyway near our LLInC office in the Hague required about 8 hours of training. The result was rather impressive and we were easily able to export the final 3D model for use in Unity – our game development software of choice. After making the scene navigable with Unity, this was able to be exported to a VR headset for further exploration.

Use in Education

This technology has many promising applications within the field of immersive education. At LLInC, we are already working on a pilot project with the Law Faculty in which elements of an urban scene can be modified to test the public’s response to planning interventions. But the sky is the limit here – any application in which a 3D environment is desirable is now much easier to create than before. As Gaussian Splatting becomes widespread, the tools to deploy it will likewise become easier to use.

Behind the scenes, AI models are changing production pipelines in numerous fields. Media production for education is no exception. LLInC’s Data and Media team will be keeping tabs on the aftermath of this year’s SIGGRAPH conference, where radiance fields (the group to which Gaussian Splatting belongs) made up 4 distinct categories for the first time ever. Stay tuned! 

Author

Nathan Saucier, Leiden University, The Netherlands