Vidit Jain, Erik Learned-Miller, Andrew McCallum.
In Proceedings of International Conference on Computer Vision (ICCV),
2007, Rio De Janeiro, Brazil.
(Jointly model people's identity, face appearance in an image, and surrounding
text in the image captions with an LDA-style topic model. Improved results in
identifying coherent sets of person "mentions"---that is, improved co-reference
by using both text and image features.)