Self-training on Hard Examples for Unsupervised Domain Adaptation
We address the unsupervised adaptation
of an existing object detector to a new target domain. We assume that a large number of unlabeled videos from this domain are readily available. We automatically obtain labels on the target data by using high-confidence detections from the existing detector, augmented with hard (misclassified) examples acquired by exploiting temporal cues using a tracker. These automatically-obtained labels are then used for re-training the original model. A modified knowledge distillation loss is proposed, and we investigate several ways of assigning soft-labels to the training examples from the target domain.
Our approach is empirically evaluated on challenging face and pedestrian detection tasks: a face detector trained on WIDER-Face
, which consists of high-quality images crawled from the web, is adapted to a large-scale surveillance data set; a pedestrian detector trained on clear, daytime images from the BDD-100K
driving data set is adapted to all other scenarios such as rainy, foggy, night-time. Our results demonstrate the usefulness of incorporating hard examples obtained from tracking, the advantage of using soft-labels via distillation loss versus hard-labels, and show promising performance as a simple method for unsupervised domain adaptation of object detectors, with minimal dependence on hyper-parameters.
Automatic adaptation of object detectors to new domains using self-training,
Aruni RoyChowdhury, Prithvijit Chakrabarty, Ashish Singh, SouYoung Jin, Huaizu Jiang, Liangliang Cao and Erik Learned-Miller
Computer Vision and Pattern Recognition (CVPR), 2019
NEW Codebase, models and pseudo-labeled data
for reproducing the CVPR 2019 experiments are released.
This research is based in part upon work supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) under contract number 2014-14071600010 and in part on research sponsored by the Air Force Research Laboratory and DARPA under agreement number FA8750-18-2-0126. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, the Air Force Research Laboratory and DARPA or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright annotation thereon. The experiments were performed using high
performance computing equipment obtained under a grant from the
Collaborative R&D Fund managed by the Massachusetts Tech Collaborative and GPUs donated by NVIDIA.