Enhancing OCR for researchers and institutions using Graph Neural Networks. Transforming complex documents into structured digital intelligence.
DOCUGRAPH combines computer vision, graph neural networks, and OCR to recover the logical structure of any document — not just its text.
Understands complex document layouts beyond linear text — including multi-column research papers, forms and reports.
Uses graph neural networks to learn the relational structure between visual regions for context-aware layout understanding.
Detects tables, headers, figures and column flow accurately, preserving document hierarchy for downstream processing.
Handles structured multilingual documents through Tesseract integration and language-agnostic graph representations.
Traditional systems process text linearly and struggle with complex layouts. DOCUGRAPH goes beyond, automatically understanding document structure and content continuity.
When a paragraph is cut off in one column and continues in the next, traditional OCR systems fail — scrambling text and losing context.
DOCUGRAPH's solution: Our Graph Neural Network automatically detects column boundaries and intelligently reconstructs paragraphs by:
Result: Your multi-column documents become seamlessly readable, not jumbled.
Documents like news articles and office files contain enclosed sections, sidebars, and callout boxes that should stay separate from main content.
DOCUGRAPH's solution: Our system automatically detects and isolates distinct sections by:
Result: Clean, organized output where each content stream is properly identified and separated.
A six-stage pipeline that converts raw documents into machine-readable, hierarchically organized output.
PDF · JPG · PNG
Denoise · deskew
Nodes & edges
Region classification
Tesseract pass
JSON · DOCX · PDF
DOCUGRAPH is an undergraduate research project investigating how Graph Neural Networks can outperform traditional CNN-only pipelines for document layout analysis.
The study introduces a hybrid pipeline that represents document pages as graphs of visual regions and uses GNNs to classify those regions into headers, paragraphs, tables, and figures — improving downstream OCR accuracy and document understanding.
Trained and evaluated on the PubLayNet benchmark and benchmarked against CNN baselines such as Faster R-CNN.
Advancing accessible AI infrastructure for document digitization and research workflows.
An undergraduate research team building the future of document layout analysis.
Questions about the research, collaboration, or thesis defense? Reach out below.
Whether you're a fellow researcher, an institution exploring document intelligence, or a panel reviewer — drop us a note and we'll get back within a few days.