Formal, searchable, and testable knowledge for AI agents.
My work asks how messy domain knowledge becomes something an agent, prover, evaluator, or real user can reliably use. The domains range from Lean and coding theory to RAG products, hydrology, and medical images, but the motion is the same: substrate → interface → evidence → artifact.
Four threads, one repeated pattern.
Each axis below links a question to the papers and systems that carry it. This is the page that should explain the research program before anyone sees a certificate or a trophy.
Make mathematical knowledge formal, searchable, and reusable by theorem-proving agents.
This is the spine of the portfolio: coding theory, self-dual codes, Lean/Mathlib, closed-agent math solving, and domain-specific memory for mathematical agents.
questions
- • What should a mathematical domain look like before an agent can use it?
- • How do Lean artifacts, proof trajectories, and MCP interfaces become reusable research memory?
papers
Treat benchmarks as inspectable processes, not static files.
EntropyMath, EntropyMaG, and EntropyMaLean connect generated problems to lineage, solver traces, verifier contracts, and revision-ready evidence.
questions
- • What does a generated math benchmark need to expose before it can be trusted?
- • Can verification and lineage become first-class evaluation artifacts?
Use real products to test whether a knowledge substrate actually helps people.
SOGAMBOT, MindBuddhi, regulatory RAG, and institutional assistants keep the agent-readable-memory idea grounded in messy users, documents, and product constraints.
questions
- • Can source-grounded answers preserve the texture of a domain?
- • What breaks when retrieval leaves a notebook and becomes a user-facing system?
papers
artifacts
Use forecasting, medical images, and education tools to keep the research honest.
Hydrology, whole-slide images, and education systems create pressure that clean benchmarks often hide: missing data, clinical validation, event timing, and operational evaluation.
questions
- • What evaluation choices matter outside the benchmark?
- • How does domain friction reshape the model pipeline?
papers
Selected publications
Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach
Jae-Hyun Baek, Jon-Lark Kim
JKIIS
A modified retrieval augmented generation framework for regulatory documents. This is one of the applied roots of my current interest in turning domain documents into agent-readable memory.

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT
Jon-Lark Kim, Jae-Hyun Baek, Keon-Hwi Kim, Tae Hyo Baek, Chang-Lae Jang
International Journal of Fuzzy Logic and Intelligent Systems
A study of foundation-model forecasting for river water levels. We compare TimeGPT with classical and linear baselines under rolling-origin evaluation on the Gamcheon River setting.
Submitted, accepted, and under-review papers
Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving
Youngjik Lee, Jon-Lark Kim, Woo-Seok Jeong, Beom-Soo Kim, Jae-Hyun Baek
2026 KIIS Spring Conference — Best Paper Award
A closed-agent architecture for mathematical problem solving that compares a Single solver structure with a Dual structure where a falsifier participates in the reasoning loop, emphasizing token/compute efficiency and final solving performance.

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures
Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim
Accepted manuscript
A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

SolEvolve: LLM-driven Evolutionary Discovery of Algorithms
Jae-Hyun Baek
Master's thesis / manuscript track
An autonomous search system where LLM-guided operators, SAT-seeded search, and verification loops rediscover strong coding-theoretic constructions.

A whole-slide-image pipeline for small object detection in pathology. It belongs to the applied side of my work: turning difficult domain data into usable AI systems.
A coding-theoretic construction of Sudoku-type puzzle structures from perfect codes and symmetry principles.

EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning
Jae-Hyun Baek et al.
ICML AI for Math Workshop submission
A benchmark-generation system that treats difficult mathematical reasoning data as an evolutionary, auditable process with lineage, solver traces, and validation gates.

A Case Study on Alignment Faking in LLMs
Jae-Hyun Baek, Jon-Lark Kim
ISIS 2025 — Best Presentation Award
A case-study presentation on alignment faking: when externally aligned behavior may diverge from internal reasoning under evaluation or monitoring pressure.

MekaNet: A Deep Learning Framework for Megakaryocyte Detection and Myeloproliferative Neoplasm Classification with Enhanced Feature Engineering
Jae-Hyun Baek et al.
Submitted medical AI manuscript
A medical-image manuscript on megakaryocyte detection and myeloproliferative neoplasm classification with enhanced feature-engineering support.

MekaNet: A Tiling-Enhanced Semi-Supervised Detection Framework for Megakaryocytes
Jae-Hyun Baek et al.
BMC / medical AI manuscript
A BMC-style medical-image manuscript on a tiling-enhanced semi-supervised framework for detecting megakaryocytes in pathology imagery.

Automated Cellularity Assessment in Bone Marrow Using Deep Learning-Based Segmentation
Jae-Hyun Baek et al.
Submitted medical AI manuscript
A bone-marrow cellularity assessment manuscript using deep learning-based segmentation for clinical image-analysis support.

Shortest Self-Orthogonal Embeddings of Selected Binary Codes
Jae-Hyun Baek et al.
Submitted coding-theory manuscript
A coding-theory manuscript on shortest self-orthogonal embeddings of selected binary codes, connected to the formal-math backbone of the portfolio.
A Novel Sudoku from Perfect Codes over the Ring of Integers Modulo n
Jae-Hyun Baek et al.
Submitted coding-theory manuscript
A manuscript on constructing Sudoku-type games from perfect codes over the ring of integers modulo n.
The systems behind the papers.
The public site should make artifacts easier to inspect than awards. These systems are the working surfaces where the research program becomes code, data, demos, and deployed tools.
CodingTheoryLib / CodingTheoryLib-MCP
Formal coding-theory knowledge shaped into a reusable substrate for theorem-proving agents.
A formalized coding-theory substrate: Lean definitions, proof work, graph memory, and MCP-style access for agents.

EntropyMath / EntropyMaG / EntropyMaLean
Math benchmarks treated as auditable processes rather than static downloads.
A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

SolEvolve
An agentic search loop for proposing, mutating, testing, and preserving mathematical candidates.
An LLM-guided evolutionary discovery system for algorithmic and mathematical search, developed around coding-theory constructions.

SOGAMBOT
A deployed university assistant built from messy institutional documents and retrieval workflows.
A university RAG chatbot that turns scattered Sogang documents and administrative knowledge into a searchable assistant.
MindBuddhi
A source-grounded Buddhist/meditation assistant that turns texts into counseling-oriented interactions.
A Buddhist and meditation knowledge assistant built around source-grounded answers, counseling flows, and shareable artifacts.

Water-level forecasting pipeline
A real-world forecasting pipeline where event timing and hydrological context matter as much as aggregate error.
A hydrological forecasting workflow around TimeGPT, classical baselines, rainfall features, and rolling-origin evaluation.