SESYD
SESYD "Systems Evaluation SYnthetic Documents" is a database of synthetical documents with groundtruth produced using the 3gT system. This database targets two main research problems in the document image analysis field (i) symbol recognition and spotting in line drawing images (floorplans and electrical diagrams) (ii) character segmentation and recognition in geographical maps. The database is composed of eleven collections for performance evaluation containing 284k images, 190k symbols and 284k characters (k for thousand). SESYD is today a key database in the document image analysis field published in 2010 and referred by one hundred of citations into research papers.
datasets | images | objects | sizes | models | thumb | |
symbol bags | 16 | 1600 | symbols | 15046 | 25-150 | |
floorplans | 10 | 1000 | symbols | 28065 | 16 | |
diagrams | 10 | 1000 | symbols | 14100 | 21 | |
queries | 6 | 6000 | symbols | 6000 | 16-21 | |
isrc2011 | 27 | 33650 | symbols | 37748 | 16-150 | |
lowres | 30 | 3000 | symbols | 65530 | 16-21 | |
sketches | 24 | 24000 | symbols | 24000 | 17-150 | |
character ones | 42 | 210000 | characters | 210000 | 62x30 | |
segment characters | 9 | 2250 | characters | 18129 | 52 | |
word bags | 6 | 600 | characters | 24340 | 52 | |
text/graphics | 6 | 300 | characters | 31380 | 124 |
Please, cite the following paper [1] if you are using this database.
- M. Delalandre, E. Valveny, T. Pridmore and D. Karatzas. Generation of Synthetic Documents for Performance Evaluation of Symbol Recognition & Spotting Systems. International Journal on Document Analysis and Recognition (IJDAR), 13(3):187-207, 2010.