SESYD "Systems Evaluation SYnthetic Documents" is a database of synthetical documents with groundtruth produced using the 3gT system. This database targets two main research problems in the document image analysis field (i) symbol recognition and spotting in line drawing images (floorplans and electrical diagrams) (ii) character segmentation and recognition in geographical maps. The database is composed of eleven collections for performance evaluation containing 284k images, 190k symbols and 284k characters (k for thousand). SESYD is today a key database in the document image analysis field published in 2010 and referred by one hundred of citations into research papers.

datasets images objects sizes models thumb
symbol bags 16 1600 symbols 15046 25-150 bag of symbols
floorplans 10 1000 symbols 28065 16 floorplan
diagrams 10 1000 symbols 14100 21 diagram
queries 6 6000 symbols 6000 16-21 symbol query
isrc2011 27 33650 symbols 37748 16-150 international symbol recognition contest 2011
lowres 30 3000 symbols 65530 16-21 low resolution floorplan
sketches 24 24000 symbols 24000 17-150 symbol sketch
character ones 42 210000 characters 210000 62x30 image of character
segment characters 9 2250 characters 18129 52 image of text line
word bags 6 600 characters 24340 52 bag of words
text/graphics 6 300 characters 31380 124 geographical map with text

Please, cite the following paper [1] if you are using this database.

  1. M. Delalandre, E. Valveny, T. Pridmore and D. Karatzas. Generation of Synthetic Documents for Performance Evaluation of Symbol Recognition & Spotting Systems. International Journal on Document Analysis and Recognition (IJDAR), 13(3):187-207, 2010.