When you take a photo the scene is compressed into two dimensions. All that beautiful 3Dness is lost when it's projected onto your camera's sensor.
Photo reconstruction is a process for recovering a 3D scene and camera parameters from a set of regular photographs. For example, with 4 photographs of my wife and dog working in bed, I was able to reconstruct the scene bellow using two software packages called OpenMVG and PMVS2.
Built with OpenMVG and PMVS3.
I wanted to try to use these same techniques to track my movements. The idea is to record images with a head mounted camera and run the image through OpenMVG a library and software pipeline for performing photo reconstruction.
The result should be something like a GPS path, but much more accurate though less global. When everything goes well the photo reconstruction can tell us not just what side of the street the camera was on, but which part of the side walk, how high the camera was and even tell us the direction we were looking.
Road Less Traveled
For this expermient I've used Pivothead or as I call them "nerd glasses." The Pivothead can be set to take timelapse photographs at a rate of 1 per second. Bellow is an image from the glasses.
After completing a walk around the block I downloaded the hundreds of images that I had captured and began the process of reconstructing the scene and the camera positions.
The first step is to generate a list of all of the images with extracted metadata.
openMVG_main_CreateList -i ./ -o ./out/matches -d ~/openMVG/src/software/SfM/cameraSensorWidth/cameraGenerated.txt Camera "Pivothead Video Recorder" model "Pivothead 8000" doesn't exist in the database
In this step, if the camera model is common, and the metadata has been preserved, then the focal length is specified. In many of my trials, the focal length has been missing so I've simply guessed. In this case I set the focal length to be 1000px.
IMG_2469.JPG;3264;2448;1000;Pivothead Video Recorder;Pivothead 8000
When we get to the reconstrution step, focal length is one of the many parameters that OpenMVG will attempt to optimize. If the reconstruction looks good you can probably assume that OpenMVG has arrived at the true focal lenght. In my case, I'm was pretty far off, the focal length was closer to 2600 pixels.
Describing Points in Our Image and Matching Them
The next step is to generate feature descriptors for each image. Currently OpenMVG ships with a command line program that uses the SIFT feature descriptor.
openMVG_main_computeMatches -i ./ -o ./out/matches
It is worth noting that while OpenMVG is permisively licensed under BSD, SIFT is not, it's patented and must be licensed for commercial use. It is possible to use other feature descriptors, and work has already been done to support binary features, but getting this running will take a bit more programming.
The products of this step are *.feat and *.desc files for each of the input images. The feat files contain the location, orientation and scale of each feature in plain text. The desc files contain binary data of the 128 dimensional feature descriptor.
Now that we have a description of interesting point in our images we need to try to identify matching points between images. Matching is done by finding pairs of features that are near each other in 128 dimensional space and are filtered by a metric that ensure they make sense geometrically.
Generating a Point Cloud and Cameras
Our final step is to run the "Incremental Structure from Motion" command.
openMVG_main_IncrementalSfM -i ./ -m ./out/matches -o ./out
It outputs a PLY file containing a sparse point cloud and the camera positions. Here I've overlayed to point cloud and cameras onto an image from Google maps. The green dots are the camera the black dots are points in the scene that have been reconstructed. We can clearly see the outline of the garage and building.
I have a bunch of ideas for future experimens. I would like to try a wider angle lens like what's found on a Go Pro. I'd also like to try to speed up the reconstruction process by being more selective in the feature matching phase and only comparing images that are near each other in the sequence.