Vehicle Pose Estimation from an Image Using Geometry

Overview

3D pose estimation is a quite fascinating problem in the computer vision field, which aims to get the 3D orientation of the object based on the 2D image and can be challenging to find the solution. These solutions are mostly used in the autonomous industry to properly detect cars orientation on street scenes.

The car industry is mostly focused on autonomous driving problems, and most of the publicly available datasets contain information only about one axis rotation. Usually, this is around the Y-axis (yaw angle), while the X-axis (roll angle) and Z-axis (pitch angle) are not considered at all or are always equal to zero. This approximation is sufficient to solve autonomous driving problems, but there are other fields of the car industry that require more than one axis orientation, one interesting example being parking occupancy. For parking occupancy, cameras that should detect cars and lines that represent the parking can be placed into different heights, or even from a birds-eye perspective, and all of those require all three dimensions of orientation to be known. That being said, there is a need for an analytical solution that will represent the full 3D orientation of the object, concretely a vehicle.

Solution

Our proposal uses Euler angle representation and two coordinate systems (one from the camera and one from the detected object) to define angle orientation for each axis in 3D space.

Our pose estimation neural network understands an object (vehicle) dimension and orientation in the 3-dimensional space (vector space) from an image.