A Dataset for Analyzing Complex Document Layouts

The Evaluation Server can be found on EvalAI

In this short article I want to give a summary of our publication that I presented at the GCPR 2022.

Dataset

The here presented dataset focuses on complex layouts of historical documents. It can be used to train models for document layout analysis similar to PubLayNet or DocBank. As an annotation format we used the COCO-Format and provide manual annotated polygons and bounding boxes, which means the dataset can be used to train Object Detection and Instance Segmentation models. The 52,000 annotated instances are spread across 19 classes. For more details please refer to our publication.

We trained and tested^[1] the dataset with two different models:

Task - Model	bbox - Mask R-CNN	bbox - VSR	mask - Mask R-CNN	mask - VSR
mAP	73.18	75.90	64.43	65.80
mAP₅₀	85.12	87.80	80.31	82.60

Our code for VSR is available at github. For Mask R-CNN we only attach the weights and config in our auxiliary materials.

Notes

^[1] Since these data are evaluated on repeated labels, the model can never achieve perfect results. That is also why we decided to release another test dataset.