GeoAI in 3D with PyTorch3D
Introducing a PyTorch3D fork to support workflows on 3D meshes with multiple texture and with vertices in real-world coordinates.
This article may be for you if you are interested in mesh segmentation, data engineering for 3D data structures, or learning about our pull request (PR) to add these features to PyTorch3D.
TL/DR
The AI Prototypes Team at Esri is sharing a few feature enhancements as a series of PRs to the PyTorch3D API. These features enable input/output (I/O) for meshes in obj format with multiple textures and with vertex coordinates that represent real-world geometries. For GeoAI tasks, these features enable tasks across the mesh segmentation pipeline, such as creating training data, extracting features, and applying labels to mesh faces during inference. As an open-source contribution, we hope these features enable the community to advance the state of 3D mesh segmentation.
The Code
We split the feature enhancements into three branches in the ArcGIS/PyTorch3D fork as they contain orthogonal changes to the current main branch. Of the three branches, we recommend Branch 3 — multitexture-obj-high-precision as the primary branch to consider integrating into PyTorch3D main since it contains all the changes in Branch 1 and 2 plus support for vertices in floating point 64. However, each branch can stand on its own.
- The Fork: github.com/Esri/pytorch3d
- Branch 1 (
PR#1572
): multitexture-obj-io-support - Branch 2 (
PR#1573
): multitexture-obj-point-sampler - Branch 3 (
PR#1574
): multitexture-obj-high-precision - Demo Notebook: github.com/Esri/pytorch3d/…/tutorials/…
Introduction
At Esri, I am engaged in research & development as a data scientist and product engineer on the AI Prototypes Team. One of my key tasks has been to develop a data engineering pipeline that can allow my teammates to manipulate meshes for a wide variety of GeoAI workflows. In one workflow, for example, we apply point segmentation models to point clouds sampled from mesh faces. In others, we operate directly on the mesh. In general, the goal is to produce a fast, accurate, and end-to-end workflow to learn models that can help us segment meshes of cityscapes into semantic classes such as building, tree, or any other class we choose to focus on.
Issue
Of the many open-source libraries available to work with 3D data structures, which ones are the best fit? Although this is a topic we are frequently evaluating, we found PyTorch3D to be a natural and obvious fit since we like working with PyTorch and other libraries such as PyTorch Lightning. After all, the PyTorch3D introduction says all you need to know about why by stating that “PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch3D”.
It is true, PyTorch3D helps us do amazing things when it comes to applied deep learning in 3D. However, despite the many great features, I found that the current open-source library lacked a few key features. For example, PyTorch3D does not fully support meshes that reference multiple texture files. In addition, if using PyTorch3D to sample a point cloud from a mesh, it is difficult to link each point to the mesh face it was sampled from.
Further, when working with meshes that can span multiple city blocks or multiple cities across a region, the ability to process all available texture files and link points to faces are critical requirements to name just a few.
Solution
At first, we built helper functions around PyTorch3D to meet our needs for dealing with multiple textures. Later, we started patching PyTorch3D with custom functions to produce tensors that link sampled points to their origin faces. However, our team quickly realized that custom patches were not going to scale unless we implemented solutions that could be sustained internally to PyTorch3D — so that is exactly what we ended up doing.
This section provides a detailed summary of the feature enhancements.
High-level Summary of Feature Enhancements
- multitexture-obj-io-support: This branch establishes multitexture support for obj meshes by modifying
pytorch3d.io.obj_io
. Currently, PyTorch3D does not fully support reading and writing obj meshes with multiple texture files. In such cases, only the first of many textures are read into memory. For meshes that contain varying textures and multiple files, the results of texture sampling may lead to undesirable results, i.e., vegetation textures sampled onto building faces (Figure 2a and 2b). Specifically, we createdpytorch3d.io.obj_io.subset_obj
and modifiedpytorch3d.io.obj_io.save_obj
to implement these features. In addition, a new utility function is provided in the form ofpytorch3d.utils.obj_utils
that consolidates multiple helper functions used to support obj sub-setting and validation operations. This branch and following branches are updated to support recent changes to the API that now support I/O for face vert normals as ofPyTorch release 0.7.4
. Addresses multiple existing issues to include#694
,#1017
, and#1277
. - multitexture-obj-point-sampler: This branch includes all changes in
multitexture-obj-io-support
and adds support for sampling points directly from meshes in obj format. This branch introduces a new function,pytorch3d.ops.sample_points_from_obj
, that leverages core functions that already exist inpytorch3d.ops.sample_points_from_meshes
. Sampling points directly from an obj that has many large texture files can be advantageous over a Meshes data structure since the Meshes structure concatenates textures in memory. There are three key features to highlight. First, this branch allows bothsample_points_from_meshes
andsample_points_from_obj
to return a mappers tensor that links each sampled point to the face it was sampled from. Second, this branch allows one to forcesample_points_from_obj
to return at least one point from all faces, regardless of face area with (sample_all_faces=True
). Third, this branch allows a user to specify the density of the desired point cloud withmin_sampling_factor
rather than a fixed number of points. Linked to issue#1571
. - multitexture-obj-high-precision: Adds support to
pytorch3d.io.obj_io
andpytorch3d.ops.sample_points_from_obj
by reading obj vertices with floating point precision 64, i.e., double tensors. Although PyTorch3D currently allows one to control the output of decimal place, if vertex coordinates of a mesh in an obj are based on real-world coordinates, there is a chance that vertex values lose significant numeric precision. In practice, this means that vertex coordinates of mesh faces could be offset by up to a meter or more. Linked to issue#1570
.
Example of mesh vertices read from an obj file with and without the multitexture-obj-high-precision high precision branch.
In the following code snip, we provide an example of two tensors read from the same obj file. The first tensor shows each of the y-vertex values (middle column rounded to the nearest .5 value. The second tensor shows the same vertex values at or closer to their actual position, without rounding. For reference, the source objs have a spatial reference of ETRS 1989 UTM Zone 32 (for more see Spatial References at developers.arcgis.com).
# example obj verts sampled without high-precision fork
tensor([
[692202.1875, 5336173.5000, 512.4358],
[692202.8125, 5336174.0000, 512.1414],
[692202.1250, 5336174.0000, 512.5216],
...,
[692324.2500, 5336135.5000, 510.2307],
[692349.1875, 5336142.0000, 526.5537],
[692352.4375, 5336169.5000, 522.2985]]
)
# example obj verts sampled with high-precision fork
tensor([
[692202.1617, 5336173.5623, 512.4358],
[692202.7912, 5336174.0104, 512.1414],
[692202.1285, 5336174.0917, 512.5216],
...,
[692324.2332, 5336135.5843, 510.2307],
[692349.1590, 5336142.1059, 526.5537],
[692352.4089, 5336169.6091, 522.2984]],
dtype=torch.float64
)
Detailed Summary of Feature Enhancements
1. pytorch3d.ops.sample_points_from_obj()
is a new function that allows a user to sample at least one point from all faces with a new auto sampling feature that determines a number of points to sample. Although a new function, sample_points_from_obj
repackages existing PyTorch3D functionality from pytorch3d.ops.sample_points_from_meshes()
. The enhancements in sample_points_from_obj
importantly allow for sampling all faces with a minimum sampling factor and point to face index mappers tensor that allows a link each point to its origin face.
2. pytorch3d.ops.sample_points_from_meshes()
is modified to enable sample_points_from_obj()
by grouping key capabilities into helper functions that can be leveraged in sample_points_from_obj
. Further, sample_points_from_meshes
is modified slightly to optionally return point-to-face index mappers which can allow a user to recover the face for each point; however, modifications to sample_points_from_meshes
are kept minimal and do not provide the new features in sample_points_from_obj
.
3. pytorch3d.io.obj_io.subset_obj()
is a new function that allows a user to subset an obj mesh based on selected face indices. For example, if a workflow predicts a per-face classification, this function can be used to subset the mesh for only those faces.
4. pytorch3d.io.obj_io.save_obj()
and pytorch3d.io.obj_io.load_obj_as_meshes()
provide integrated multi-texture obj support to allow users to read and process all available textures; PyTorch3D previously only reads the first texture in an input obj file with multiple textures which can lead to undesirable texture sampling and output.
5. pytorch3d.utils.obj_utils
provides common utilities for use in both pytorch3d.ops
and pytorch3d.io.obj_io
such as consolidating obj validation (_validate_obj
) and core implementation for subsetting obj data.
6. multitexture-obj-high-precision
: this branch includes all features from multitexture-obj-io-support
and multitexture-point-sampler
and introduces the option to load obj vertices with high_precision
set to True
; meaning that full precision for real-world coordinates and geometries is supported. Additional utility functions are modified to support dtype expectations for floating point 32 and 64 throughout the code base to include Transform3d
and in the Meshes structure class
. This is a notable feature because the default floating point 32 can result in significant loss in precision of each vertex in the mesh. We recommend using this high-precision branch over the other two due to support for real-world coordinates.
Example Code
For quick reference, this section contains code snippets that demonstrate a few key changes to the API.
Install and Import Libraries
Loading an obj with high-precision
Sampling Points from an obj
Subsetting an obj
Saving an obj
Next Steps
With this PyTorch3D fork and other workflows, our team is actively developing methods that apply a variety of deep learning workflows to segment both meshes and point clouds. A concrete application of this fork is preparing data for multi-class classification tasks as provided by the Hessigheim (H3D) Benchmark — a benchmark that we hope to test soon. If you are not familiar with H3D, I encourage you to consider participating. For more, see the Hessigheim Benchmark.
Conclusion
Support for multi-textured meshes with high-precision vertices in obj format is a critical capability for applied deep learning on scenes projected into real-world spatial coordinates. For the AI Prototypes Team at Esri, we modified PyTorch3D to provide those capabilities and are open sourcing those modifications.
With our PyTorch3D fork, one can apply point cloud segmentation to mesh segmentation problems or manipulate multi-textured meshes and labels in obj data structures. These are key features for all interested users that may be best sustained as part of the main PyTorch3D library over the long run.
About the Author
Justin Chae is the author of the PyTorch3D fork described in this article. He is a Consultant in Esri’s Emerging Technology Markets Team and Product Engineer in Esri’s AI Prototypes Team where he is engaged in project management and artificial intelligence research and development. GeoAI is just one of many areas that Esri specializes in — if you are interested in learning more, please feel free to reach out to jchae@esri.com or your main point of contact at Esri.
Contributors
- Team Lead: Dmitry Kudinov Sr. Principal Data Scientist, dkudinov@esri.com
- Franziska Lippoldt, Data Scientist, flippoldt@esri.com
- Hakeem Frank, Sr. Data Scientist, hfrank@esri.com
- Caleb Buffa, Data Scientist, cbuffa@esri.com
Acknowledgements
Special thanks to the Institute for Photogrammetry, the team behind the H3D Benchmark, and our wider team of collaborators and supporters.