russell stewart (Stanford)

OverFeat using GoogLeNet and LSTM decoder for
occlusions. Attention with rezooming is added to the original
\alpha = 0.2
author = {Russell Stewart and Mykhaylo Andriluka and Andrew Y. Ng},
title = {End-to-End People Detection in Crowded Scenes},
booktitle = CVPR,
year = {2016}

Object detection and orientation estimation results. Results for object detection are given in terms of average precision (AP) and results for joint object detection and orientation estimation are provided in terms of average orientation similarity (AOS).

Benchmark Easy Moderate Hard
Car (Detection) 88.36 % 76.65 % 66.56 %
