Results
Analysis complete
Upload a document to begin analysis
Headers
Paragraphs
Tables
Figures
Shapes/Flowcharts
Document Analysis
Confidence: 0%
Regions Found
0
Headers
0
Tables
0
Shapes
0
Graph Neural Networks for Document Understanding
Document layout analysis remains a foundational task in computer vision and natural-language processing. Traditional CNN-based pipelines treat pages as flat pixel grids and miss the relational structure between visual regions.
2. Methodology
We model each page as a graph where nodes represent visual regions detected by a backbone CNN, and edges encode spatial proximity and reading-order relationships. A two-layer Graph Attention Network propagates context.
| Method | F1 | mAP |
|---|---|---|
| CNN baseline | 0.891 | 0.842 |
| DOCUGRAPH (ours) | 0.974 | 0.928 |
[Figure 1 — Pipeline overview]
Results on the PubLayNet benchmark show consistent improvement across all region categories, with the most significant gains on tables and figures.