Overview
Input to the VIDI system consists of still images of a North American downtown area. Training data consists of images with "detectable" sign areas masked.
Details
The images were captured in uncompressed format at a resolution of 2048x1536; they are also available here at half-size (1024x768 using bicubic interpolation for sign images, nearest neighbor interpolation for masks).
Labeled Training Data
Signs were manually masked according to the following criteria:
- Any text must be legible at a 25% magnification of the image (i.e., quarter-size).
- The mask of any sign meeting criterion (1) extends to the proper border of the sign if it is within a "reasonable" distance of the text (e.g., within at most one extra region over which texture statistics would be computed). Otherwise, the mask only extends to just outside the relevant area (e.g., text or logo).
Criterion (1) ensures that texture statistics may be reliably measured for any area that is expected to be detected. Criterion (2) allows border cues to be utilized when the circumstances reasonably permit. (An exception might be letters fastened on a brick wall.)
Images
The entire query database consists of 309 images, which are available in bzipped tar format:
- Full Size (2048x1536)
- Half Size (1024x768)
Acknowledgement
Creation of this database is made possible by NSF grant number IIS-0100851. If you use it in a publication, please cite the following paper:
J. Weinman, A. Hanson, A. McCallum. Sign Detection in Natural Images with Conditional Random Fields. In IEEE International Workshop on Machine Learning for Signal Processing, pp. 549-558. Sept. 2004 [.bib]
Contact
Comments. Questions? Suggestions! Contact Jerod Weinman at
Thumbnails
Thumbnails of all the images follow. Click an image to see a slightly larger version in a new window. (Tip: since the same window is used to view all images, leaving it open can faciliate easier viewing.) Right click on "Full" or "Half" to save individual images.