Robust and Computationally-Efficient Scene Perception

Despite the strengths of convolutional neural networks (CNNs) for object recognition, these discriminative techniques have several shortcomings that leave them vulnerable to exploitation from adversaries, such as their need for extremely large training sets, their “black box” decision making, and their inability to recover from incorrect inferences. Moreover, CNNs tend to overfit to the training data due to their high non-linearity and parameter counts. Overfitting also makes the CNN vulnerable to adversarial attack (e.g., via small image perturbations), and can also lead to poor predictions when faced with unfamiliar scenarios. In particular, when the robot operates in the real world, it is subject to complex and changing environments that often have not been captured by training data. Finally, the computational, financial, and environmental cost incurred to train these discriminative models can be quite immense, as they often require weeks or even months to adequately train.
In contrast, generative probabilistic inference techniques such as Monte-Carlo sampling are inherently explainable, general, and resilient through the process of generating, evaluating, and maintaining a distribution of many hypotheses representing possible decisions. Unfortunately, this robustness comes at the cost of computational efficiency. Alternatively, our work using hybrid discriminative-generative approaches offers a promising avenue for robust perception and action. Such methods combine inference by deep learning with sampling and probabilistic inference models, and the ability to represent actual and counterfactual experiments to achieve robust and adaptive understanding. This hybrid approach allows intelligent systems to reason about, interact with, and manipulate objects in complex (and even adversarial) environments. Our experiments have shown that our approach can achieve up to 40% improvement in pose estimation accuracy compared to end-to-end neural network approaches, demonstrating robust performance especially in dark or occluded environments. See our video demonstrating our approach in more detail.

While neural network inference can be completed within a second on modern general-purpose graphic processing units (GPUs), the iterative process of Monte-Carlo sampling does not map well to GPU acceleration, making the algorithm less amenable to meeting the energy and real-time constraints required of mobile applications. In particular, the run time and energy consumption are determined by the range of sampling, the number of iterations, and the computational complexity of the likelihood function. To address this challenge, we have developed novel hardware accelerated implementations of Monte-Carlo sampling on FPGA fabrics. The main benefits of using an FPGA is the configuration of all the resources near one other on the same fabric and pipelining the data processing across the various steps of the algorithm. This results in less time spent performing data transfers, less need to store intermediate results between steps, and more opportunity for parallel execution. With our FPGA implementation, we can achieve real-time performance without sacrificing accuracy and with significantly reduced energy consumption. In particular, our design runs 30% faster than a high-end GPU implementation with only 2% of the energy consumption, and 95% faster than a low-power GPU implementation, dissipating approximately the same amount of power but with only 4% of the energy consumption. We find this work very promising and plan to continue investigating hardware acceleration of other generative algorithms to be combined with discriminative (i.e., neural network) techniques.
Here are some of my publications related to the topic.
- Robust Object Estimation using Generative-Discriminative Inference for Secure Robotics Applications. ICCAD 2018. PDF
- GRIP: Generative Robust Inference and Perception for Semantic Robot Manipulation in Adversarial Environments. IROS 2019. PDF
- Hardware Acceleration of Monte-Carlo Sampling for Energy Efficient Robust Robot Manipulation. FPL 2020. PDF
- Hardware Acceleration of Robot Scene Perception Algorithms. ICCAD 2020. PDF