KUDoc is the Document AI research group at Korea University, turning unstructured documents into structured evidence for reliable reasoning and real-world AI.
Layout-aware parsing, table / figure extraction, and structural analysis of complex documents including PDFs, scanned images, and rich-format files.
Retrieval-augmented generation pipelines that fuse text, layout, and visual signals for accurate, grounded answers from large document corpora.
Prompting strategies, fine-tuning, and agent frameworks that enable large language models to reason faithfully over long, structured documents.
Named entity recognition, relation extraction, and key-value pair detection from semi-structured and unstructured business documents.
Constructing rigorous evaluation suites and leaderboards for Document AI tasks, with a focus on Korean-language and multilingual settings.
Extending Document AI capabilities to Korean and other low-resource languages, addressing the unique challenges of mixed-script documents.
Multimodal Chunking for Real-World Industrial Documents
Developed a novel multimodal chunking methodology for real-world industrial data. It is designed to chunk documents by converting their layout information into a hierarchical structure — enabling more accurate and grounded retrieval-augmented generation over complex, long-form industrial documents.
Development of an LLM-Based Multilingual Consultation Service
Synapse is a multilingual consultation system powered by large language models, designed to deliver accurate, context-aware guidance across language barriers. The project focuses on robust multilingual understanding, domain-specific instruction tuning, and safe deployment of LLMs in real-world advisory contexts.
Development of AI-based Thought Structuring Learning Tutoring Technology for Improving Literacy Competency
Developed a structured summarization evaluation dataset and framework based on textbook document parsing, designed to score and diagnose learners' structured learning outputs. This work establishes a core foundation for AI-driven tutoring systems in the educational domain.
Large Language Model — NC AI Consortium (13 Organizations)
VAETKI is a large language model developed by the NC AI consortium, a collaborative initiative led by NC AI with participation from 13 organizations including WBL and ETRI. Designed with scalability and efficiency as primary goals, VAETKI adopts a Mixture-of-Experts (MoE) architecture to effectively balance performance and computational cost. Key features include tool agent tasks in non-thinking mode, human preference alignment for accurate instruction following, and multilingual support across English, Korean, Chinese, and Japanese.
KUDoc is anchored at Korea University and works in close research collaboration with institutions across academia and industry in Korea, Canada, and Singapore.
Get Involved with KUDoc
We are always looking for collaborators! If you're interested in joining our efforts, drop us a line at: