GeoAI in 3D with PyTorch3D

Published in

GeoAI

9 min readJun 22, 2023

--

Introducing a PyTorch3D fork to support workflows on 3D meshes with multiple texture and with vertices in real-world coordinates.

In black and white, a top-down view in 2D of a 3D mesh where each face in the mesh is a triangle defined by vertices. The collection of faces and verts combine to describe the geometries of a city scene that includes buildings, trees, and even a crane. — A top-down view of a photogrammetric mesh rendered by the author without textures from an obj file in ArcGIS Pro. The obj is a commonly used data structure to store 3D geometries, such as buildings (middle), streets (upper left), and even a crane (upper right). Although not depicted in this image, the obj can optionally store textures for each face in the mesh to render scenes in full color. Image source: 2021 Hexagon / Esri. Captured with Leica CityMapper-2.

This article may be for you if you are interested in mesh segmentation, data engineering for 3D data structures, or learning about our pull request (PR) to add these features to PyTorch3D.

TL/DR

The AI Prototypes Team at Esri is sharing a few feature enhancements as a series of PRs to the PyTorch3D API. These features enable input/output (I/O) for meshes in obj format with multiple textures and with vertex coordinates that represent real-world geometries. For GeoAI tasks, these features enable tasks across the mesh segmentation pipeline, such as creating training data, extracting features, and applying labels to mesh faces during inference. As an open-source contribution, we hope these features enable the community to advance the state of 3D mesh segmentation.

The Code

We split the feature enhancements into three branches in the ArcGIS/PyTorch3D fork as they contain orthogonal changes to the current main branch. Of the three branches, we recommend Branch 3 — multitexture-obj-high-precision as the primary branch to consider integrating into PyTorch3D main since it contains all the changes in Branch 1 and 2 plus support for vertices in floating point 64. However, each branch can stand on its own.

The Fork: github.com/Esri/pytorch3d
Branch 1 (PR#1572): multitexture-obj-io-support
Branch 2 (PR#1573): multitexture-obj-point-sampler
Branch 3 (PR#1574): multitexture-obj-high-precision
Demo Notebook: github.com/Esri/pytorch3d/…/tutorials/…

Introduction

At Esri, I am engaged in research & development as a data scientist and product engineer on the AI Prototypes Team. One of my key tasks has been to develop a data engineering pipeline that can allow my teammates to manipulate meshes for a wide variety of GeoAI workflows. In one workflow, for example, we apply point segmentation models to point clouds sampled from mesh faces. In others, we operate directly on the mesh. In general, the goal is to produce a fast, accurate, and end-to-end workflow to learn models that can help us segment meshes of cityscapes into semantic classes such as building, tree, or any other class we choose to focus on.

Issue

Of the many open-source libraries available to work with 3D data structures, which ones are the best fit? Although this is a topic we are frequently evaluating, we found PyTorch3D to be a natural and obvious fit since we like working with PyTorch and other libraries such as PyTorch Lightning. After all, the PyTorch3D introduction says all you need to know about why by stating that “PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch3D”.

It is true, PyTorch3D helps us do amazing things when it comes to applied deep learning in 3D. However, despite the many great features, I found that the current open-source library lacked a few key features. For example, PyTorch3D does not fully support meshes that reference multiple texture files. In addition, if using PyTorch3D to sample a point cloud from a mesh, it is difficult to link each point to the mesh face it was sampled from.

Further, when working with meshes that can span multiple city blocks or multiple cities across a region, the ability to process all available texture files and link points to faces are critical requirements to name just a few.

Images produced by the author and rendered in ArcGIS Pro representing an input mesh of an urban scene in Munich that compares sampling textures from the mesh with Esri’s fork of PyTorch3D. Given the input mesh (Figure 1a), we should expect a sampled point cloud to render similar colors. However, multiple textures are not fully supported in PyTorch3D (Figure 1b). With Esri’s PR for multi-texture support and high precision sampling, Figure 1c demonstrates how we can work with mesh textures in full. Image source: 2021 Hexagon / Esri. Captured with Leica CityMapper-2

Solution

At first, we built helper functions around PyTorch3D to meet our needs for dealing with multiple textures. Later, we started patching PyTorch3D with custom functions to produce tensors that link sampled points to their origin faces. However, our team quickly realized that custom patches were not going to scale unless we implemented solutions that could be sustained internally to PyTorch3D — so that is exactly what we ended up doing.

This section provides a detailed summary of the feature enhancements.

Images produced by the author in Python representing an input mesh of an urban scene in Hessigheim that compares sampling textures from the mesh with Esri’s fork of PyTorch3D. The point cloud in Figure 2a represents the result of undesirable texture sampling of vegetation on building faces while the point cloud in Figure 2b represents the expected results. Image source: The Hessigheim 3D (H3D) benchmark on semantic segmentation of high-resolution 3D point clouds and textured meshes from UAV LiDAR and Multi-View-Stereo.

High-level Summary of Feature Enhancements

multitexture-obj-io-support: This branch establishes multitexture support for obj meshes by modifying pytorch3d.io.obj_io. Currently, PyTorch3D does not fully support reading and writing obj meshes with multiple texture files. In such cases, only the first of many textures are read into memory. For meshes that contain varying textures and multiple files, the results of texture sampling may lead to undesirable results, i.e., vegetation textures sampled onto building faces (Figure 2a and 2b). Specifically, we created pytorch3d.io.obj_io.subset_obj and modified pytorch3d.io.obj_io.save_obj to implement these features. In addition, a new utility function is provided in the form of pytorch3d.utils.obj_utils that consolidates multiple helper functions used to support obj sub-setting and validation operations. This branch and following branches are updated to support recent changes to the API that now support I/O for face vert normals as of PyTorch release 0.7.4. Addresses multiple existing issues to include #694, #1017, and #1277.
multitexture-obj-point-sampler: This branch includes all changes in multitexture-obj-io-support and adds support for sampling points directly from meshes in obj format. This branch introduces a new function, pytorch3d.ops.sample_points_from_obj, that leverages core functions that already exist in pytorch3d.ops.sample_points_from_meshes. Sampling points directly from an obj that has many large texture files can be advantageous over a Meshes data structure since the Meshes structure concatenates textures in memory. There are three key features to highlight. First, this branch allows both sample_points_from_meshes and sample_points_from_obj to return a mappers tensor that links each sampled point to the face it was sampled from. Second, this branch allows one to force sample_points_from_obj to return at least one point from all faces, regardless of face area with (sample_all_faces=True). Third, this branch allows a user to specify the density of the desired point cloud with min_sampling_factor rather than a fixed number of points. Linked to issue #1571.
multitexture-obj-high-precision: Adds support to pytorch3d.io.obj_io and pytorch3d.ops.sample_points_from_obj by reading obj vertices with floating point precision 64, i.e., double tensors. Although PyTorch3D currently allows one to control the output of decimal place, if vertex coordinates of a mesh in an obj are based on real-world coordinates, there is a chance that vertex values lose significant numeric precision. In practice, this means that vertex coordinates of mesh faces could be offset by up to a meter or more. Linked to issue #1570.

Example of mesh vertices read from an obj file with and without the multitexture-obj-high-precision high precision branch.

In the following code snip, we provide an example of two tensors read from the same obj file. The first tensor shows each of the y-vertex values (middle column rounded to the nearest .5 value. The second tensor shows the same vertex values at or closer to their actual position, without rounding. For reference, the source objs have a spatial reference of ETRS 1989 UTM Zone 32 (for more see Spatial References at developers.arcgis.com).

# example obj verts sampled without high-precision fork
tensor([
      [692202.1875, 5336173.5000, 512.4358],     
      [692202.8125, 5336174.0000, 512.1414],
      [692202.1250, 5336174.0000, 512.5216],   
      ...,
      [692324.2500, 5336135.5000, 510.2307],
      [692349.1875, 5336142.0000, 526.5537], 
      [692352.4375, 5336169.5000, 522.2985]]
)
     
# example obj verts sampled with high-precision fork
tensor([
      [692202.1617, 5336173.5623, 512.4358], 
      [692202.7912, 5336174.0104, 512.1414],
      [692202.1285, 5336174.0917, 512.5216], 
      ..., 
      [692324.2332, 5336135.5843, 510.2307], 
      [692349.1590, 5336142.1059, 526.5537], 
      [692352.4089, 5336169.6091, 522.2984]],
      dtype=torch.float64
)

Detailed Summary of Feature Enhancements

1. pytorch3d.ops.sample_points_from_obj() is a new function that allows a user to sample at least one point from all faces with a new auto sampling feature that determines a number of points to sample. Although a new function, sample_points_from_obj repackages existing PyTorch3D functionality from pytorch3d.ops.sample_points_from_meshes(). The enhancements in sample_points_from_obj importantly allow for sampling all faces with a minimum sampling factor and point to face index mappers tensor that allows a link each point to its origin face.

2. pytorch3d.ops.sample_points_from_meshes() is modified to enable sample_points_from_obj() by grouping key capabilities into helper functions that can be leveraged in sample_points_from_obj. Further, sample_points_from_meshes is modified slightly to optionally return point-to-face index mappers which can allow a user to recover the face for each point; however, modifications to sample_points_from_meshes are kept minimal and do not provide the new features in sample_points_from_obj.

3. pytorch3d.io.obj_io.subset_obj() is a new function that allows a user to subset an obj mesh based on selected face indices. For example, if a workflow predicts a per-face classification, this function can be used to subset the mesh for only those faces.

4. pytorch3d.io.obj_io.save_obj() and pytorch3d.io.obj_io.load_obj_as_meshes() provide integrated multi-texture obj support to allow users to read and process all available textures; PyTorch3D previously only reads the first texture in an input obj file with multiple textures which can lead to undesirable texture sampling and output.

5. pytorch3d.utils.obj_utils provides common utilities for use in both pytorch3d.ops and pytorch3d.io.obj_io such as consolidating obj validation (_validate_obj) and core implementation for subsetting obj data.

6. multitexture-obj-high-precision: this branch includes all features from multitexture-obj-io-support and multitexture-point-sampler and introduces the option to load obj vertices with high_precision set to True; meaning that full precision for real-world coordinates and geometries is supported. Additional utility functions are modified to support dtype expectations for floating point 32 and 64 throughout the code base to include Transform3d and in the Meshes structure class. This is a notable feature because the default floating point 32 can result in significant loss in precision of each vertex in the mesh. We recommend using this high-precision branch over the other two due to support for real-world coordinates.

Example Code

For quick reference, this section contains code snippets that demonstrate a few key changes to the API.

Install and Import Libraries

Example Usage to Load New Functions in PyTorch3D

Loading an obj with high-precision

Example Usage for load_obj with high_precision set to True

Sampling Points from an obj

Example Usage for sample_points_from_obj.py

Subsetting an obj

Example Usage for subset_obj.py

Saving an obj

Example Usage for save_obj

Next Steps

With this PyTorch3D fork and other workflows, our team is actively developing methods that apply a variety of deep learning workflows to segment both meshes and point clouds. A concrete application of this fork is preparing data for multi-class classification tasks as provided by the Hessigheim (H3D) Benchmark — a benchmark that we hope to test soon. If you are not familiar with H3D, I encourage you to consider participating. For more, see the Hessigheim Benchmark.

Conclusion

Support for multi-textured meshes with high-precision vertices in obj format is a critical capability for applied deep learning on scenes projected into real-world spatial coordinates. For the AI Prototypes Team at Esri, we modified PyTorch3D to provide those capabilities and are open sourcing those modifications.

With our PyTorch3D fork, one can apply point cloud segmentation to mesh segmentation problems or manipulate multi-textured meshes and labels in obj data structures. These are key features for all interested users that may be best sustained as part of the main PyTorch3D library over the long run.

About the Author

Justin Chae is the author of the PyTorch3D fork described in this article. He is a Consultant in Esri’s Emerging Technology Markets Team and Product Engineer in Esri’s AI Prototypes Team where he is engaged in project management and artificial intelligence research and development. GeoAI is just one of many areas that Esri specializes in — if you are interested in learning more, please feel free to reach out to jchae@esri.com or your main point of contact at Esri.

Contributors

Team Lead: Dmitry Kudinov Sr. Principal Data Scientist, dkudinov@esri.com
Franziska Lippoldt, Data Scientist, flippoldt@esri.com
Hakeem Frank, Sr. Data Scientist, hfrank@esri.com
Caleb Buffa, Data Scientist, cbuffa@esri.com

Acknowledgements

Special thanks to the Institute for Photogrammetry, the team behind the H3D Benchmark, and our wider team of collaborators and supporters.