Open sourcing 3D avatar reconstruction from a single image
This README focuses on how run the code for more detailed information please read the report
The Phorhum paper from Google showed astonishing results in reconstructing 3D avatars from a single image. This work tries to implement their proposed architecture with TensorFlow in an open source fashion. It features a dataset creator, the reimplemented network architecture, a trained model, and a point cloud viewer. While the results are far from the one Google showed, this could be used as a starting point to build upon. This was part of a two-month research internship at the university of tübingen, the human-computer interaction department.
To solve the task of predicting the surface of a single image, a image dataset containing a human and a signed distance field point cloud dataset with color and normal information is needed. We built the datasets from scratch since there aren't any ready-to-use datasets available. We use the Microsoft Rocketbox avatar dataset as a starting point. The image dataset was constructed by rendering the avatar models from the avatar dataset within environments that are lit with HDRs. For the SDF dataset we collected 1 Mio. points per avatar in the split into 500k near points sampled on the mesh and 500k far points sampled around the mesh and the unit sphere.
. ├── dataset │ ├── exporter # Tools for dataset creation │ │ ├── ... .py # Python scripts that create the dataset │ │ └── ... .sh # Corrseponding bash scripts that call python scripts │ │ │ ├── loader # Tools for dataset loading │ │ ├── imageDatasetLoader.py # Image loaded with plotting │ │ └── avatarDatasetLoader.py # Avatar OBJ and SDF loader with plotting │ │ │ └── datasets │ ├── hdrs # HDR dataset │ ├── sdfs # SDF dataset │ ├── images # Image dataset │ ├── avatars # Avatar dataset │ └── environments # Environment dataset .
A step by step guide to create all the needed datasets (HDRs, SDFs, images, avatars and environments).
The final datasets
images are needed to train the model.
Contact me if you want to have my dataset (it was to big to upload it to GitHub).
- Download the Rocketbox avatar dataset.
- Create avatar OBJ dataset by running the modified Mesh2 library by running `createOBJDataset.sh
- Find SDF dataset of
farpoints of each avatar in
- Download HDRs from Polyheaven and copy them into the
- Find environments of 3D photogrammetry scanned scenes from Sketchfab and download them
- Preprocess environments by creating a blender scene for each environment with the center of the floor at (0,0,0)
- Store blender scenes in
environmentswith a subdirectory for each scene variation
- (Optional) change avatar and camera augmentation settings in
- Run avatar image dataset creation with
- Find avatar image dataset in
For solving the task of inferring the surface and its color from a single image, we use an end-to-end learnable neural network model, that is inspired by Phorhum.
Given the time and computational constraints of the project, we couldn’t reproduce the full model and used subsets of their implementation.
In the following you find our implementaion with modifications such as the surface projection loss.
In the report we propose an attention lookup that you can find in the
. ├── network │ ├── customLayer # Custom 3rd party keras layers │ ├── previous # Previous network implementations (including attention) │ ├── tests # Test for losses and custom layers │ ├── featureExtractorNetwork.py # Implementation of the feature extractor network G │ ├── geomertyNetwork.py # Implementation of the geometry network f │ ├── loss.py # Cusotom losses (including surface projection) │ └── network.py # End to end network with training and inference .
Alltought training results are far from the results Google provides, the network does learn some kind of 3D avatar structure. Sadly color and detailed geometry can not be reconstructed. By examine the results more closely one could state that there is an issue within the feature extractor network and the network is not able to infer color and geometry information from the images.
. ├── train │ ├── logs # Tensorboard logs │ │ │ ├── models # Previous trained models │ │ ├── f # Models for feature extractor network │ │ └── g # Models for geometry network │ │ │ ├── train.ipynb # Start and configure training jupyter notebook │ └── train.py # Start and configure training python script .
The network can be trained by executing either
We trained the network on a machine with 45 GiB RAM, 8 CPUs, and an A6000 GPU with 48GiB for roughly 2 hours for about 6200 steps.
For visualization purposes, a custom real-time 3D viewer was built rendering millions of points efficiently and enabling the developer to better identify prediction errors. A client-server architecture was chosen with the server running a Flask application directly interacting with the React Three Fiber client.
. ├── viewer │ ├── react # React three fiber client │ └── app.py # Flask server .
- Choose the correct model in
- Start the flask server with
flask runin directory
- Run react three fiber client by calling
yarn devin directory
This project was part of a research internship at the human-computer interaction department by the university of tübingen. Big thanks to Efe Bozkir for his help and mentorship along the project and Timo Alldieck and his colleagues for his amazing work on Phorhum.