research program

Formal, searchable, and testable knowledge for AI agents.

My work asks how messy domain knowledge becomes something an agent, prover, evaluator, or real user can reliably use. The domains range from Lean and coding theory to RAG products, hydrology, and medical images, but the motion is the same: substrate → interface → evidence → artifact.

research map

Four threads, one repeated pattern.

Each axis below links a question to the papers and systems that carry it. This is the page that should explain the research program before anyone sees a certificate or a trophy.

Formalized Mathematical AI

Make mathematical knowledge formal, searchable, and reusable by theorem-proving agents.

This is the spine of the portfolio: coding theory, self-dual codes, Lean/Mathlib, closed-agent math solving, and domain-specific memory for mathematical agents.

questions

  • What should a mathematical domain look like before an agent can use it?
  • How do Lean artifacts, proof trajectories, and MCP interfaces become reusable research memory?
LeanCodingTheoryLibAI4MathMCPproof trajectories
Auditable Benchmark Evolution

Treat benchmarks as inspectable processes, not static files.

EntropyMath, EntropyMaG, and EntropyMaLean connect generated problems to lineage, solver traces, verifier contracts, and revision-ready evidence.

questions

  • What does a generated math benchmark need to expose before it can be trusted?
  • Can verification and lineage become first-class evaluation artifacts?
EntropyMathbenchmark lineageverificationsolver traces
Deployed Knowledge Systems

Use real products to test whether a knowledge substrate actually helps people.

SOGAMBOT, MindBuddhi, regulatory RAG, and institutional assistants keep the agent-readable-memory idea grounded in messy users, documents, and product constraints.

questions

  • Can source-grounded answers preserve the texture of a domain?
  • What breaks when retrieval leaves a notebook and becomes a user-facing system?
RAGsource groundingproduct UXdomain memory
Applied AI Methods under Real Constraints

Use forecasting, medical images, and education tools to keep the research honest.

Hydrology, whole-slide images, and education systems create pressure that clean benchmarks often hide: missing data, clinical validation, event timing, and operational evaluation.

questions

  • What evaluation choices matter outside the benchmark?
  • How does domain friction reshape the model pipeline?
forecastingmedical AIeducation AIevaluation

Selected publications

Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach
Published
2025
RAG / Regulatory AI
TL;DR: A regulatory-document RAG system that foreshadows my current focus on agent-readable domain memory.

A modified retrieval augmented generation framework for regulatory documents. This is one of the applied roots of my current interest in turning domain documents into agent-readable memory.

RAGregulatory documentsLLM evaluationdomain memory
Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT
Published
2026
Time-series / Hydrology AI

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT

Jon-Lark Kim, Jae-Hyun Baek, Keon-Hwi Kim, Tae Hyo Baek, Chang-Lae Jang

International Journal of Fuzzy Logic and Intelligent Systems

TL;DR: A real-world forecasting study that tests foundation-model time-series methods against hydrological constraints and evaluation choices.

A study of foundation-model forecasting for river water levels. We compare TimeGPT with classical and linear baselines under rolling-origin evaluation on the Gamcheon River setting.

TimeGPTwater-level predictionfoundation modelsenvironmental AI
Status labels separate published/accepted work from submitted or under-review manuscripts. Each detail page explains the problem, key idea, role, evidence, and related artifacts.

Submitted, accepted, and under-review papers

This section is an inventory, not a selective shelf: it includes the accepted works, workshop/conference submissions, submitted manuscripts, and under-review manuscripts I have evidence for in the local research archive.
Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving
Accepted
2026
AI4Math / Lean agents

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving

Youngjik Lee, Jon-Lark Kim, Woo-Seok Jeong, Beom-Soo Kim, Jae-Hyun Baek

2026 KIIS Spring Conference — Best Paper Award

TL;DR: A Lean-facing multi-agent math-solving architecture where a falsifier makes the solver loop more inspectable and efficient.

A closed-agent architecture for mathematical problem solving that compares a Single solver structure with a Dual structure where a falsifier participates in the reasoning loop, emphasizing token/compute efficiency and final solving performance.

LLM-Leanclosed agentsfalsifierAI4Mathbest paper
Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures
Accepted
2025
VLM / Mathematical reasoning

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim

Accepted manuscript

TL;DR: A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.

A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

VLMColPaliMathVisionmultimodal reasoning
SolEvolve: LLM-driven Evolutionary Discovery of Algorithms
Submitted
2025
Evolutionary discovery / Coding theory

SolEvolve: LLM-driven Evolutionary Discovery of Algorithms

Jae-Hyun Baek

Master's thesis / manuscript track

TL;DR: An LLM-guided evolutionary search system for discovering and testing coding-theoretic constructions.

An autonomous search system where LLM-guided operators, SAT-seeded search, and verification loops rediscover strong coding-theoretic constructions.

evolutionary searchSATcoding theoryalgorithm discovery
MekaNet: WSI-based Tiny Object Detection
Under Review
2025
Medical AI / Computer vision

MekaNet: WSI-based Tiny Object Detection

Jae-Hyun Baek et al.

Medical image analysis manuscript

TL;DR: A pathology WSI pipeline for tiny-object detection and clinical-data evaluation under real imaging constraints.

A whole-slide-image pipeline for small object detection in pathology. It belongs to the applied side of my work: turning difficult domain data into usable AI systems.

medical AIWSItiny object detectioncomputer vision
Coding theory / Combinatorial games
Submitted
2025
Coding theory / Combinatorial games

Symmetric Sudoku-Type Games from Perfect Codes

Jae-Hyun Baek et al.

Submitted manuscript

TL;DR: A combinatorial-game manuscript that connects coding theory, symmetry, and puzzle construction.

A coding-theoretic construction of Sudoku-type puzzle structures from perfect codes and symmetry principles.

coding theoryperfect codesgamescombinatorics
EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning
Submitted
2026
AI4Math / Benchmark evaluation
TL;DR: The submitted EntropyMath workshop paper: benchmarks as auditable evolutionary processes rather than static problem sets.

A benchmark-generation system that treats difficult mathematical reasoning data as an evolutionary, auditable process with lineage, solver traces, and validation gates.

EntropyMathAI4Mathbenchmark generationlineageverification
A Case Study on Alignment Faking in LLMs
Accepted
2025
LLM safety / Alignment evaluation

A Case Study on Alignment Faking in LLMs

Jae-Hyun Baek, Jon-Lark Kim

ISIS 2025 — Best Presentation Award

TL;DR: An accepted ISIS 2025 alignment-safety work that connects behavioral evaluation to formalizable conditions for apparent compliance.

A case-study presentation on alignment faking: when externally aligned behavior may diverge from internal reasoning under evaluation or monitoring pressure.

alignment fakingLLM safetyformal methodsevaluation
MekaNet: A Deep Learning Framework for Megakaryocyte Detection and Myeloproliferative Neoplasm Classification with Enhanced Feature Engineering
Submitted
2025
Medical AI / Pathology
TL;DR: A submitted MekaNet manuscript connecting megakaryocyte detection, MPN classification, and clinically shaped feature engineering.

A medical-image manuscript on megakaryocyte detection and myeloproliferative neoplasm classification with enhanced feature-engineering support.

MekaNetmegakaryocyteMPNpathologyfeature engineering
MekaNet: A Tiling-Enhanced Semi-Supervised Detection Framework for Megakaryocytes
Submitted
2025
Medical AI / Pathology
TL;DR: A submitted MekaNet manuscript focused on tiling and semi-supervised detection for megakaryocytes.

A BMC-style medical-image manuscript on a tiling-enhanced semi-supervised framework for detecting megakaryocytes in pathology imagery.

MekaNettilingsemi-supervised learningmegakaryocyteWSI
Automated Cellularity Assessment in Bone Marrow Using Deep Learning-Based Segmentation
Submitted
2025
Medical AI / Segmentation
TL;DR: A submitted medical-AI manuscript on bone-marrow cellularity assessment using segmentation.

A bone-marrow cellularity assessment manuscript using deep learning-based segmentation for clinical image-analysis support.

cellularitybone marrowsegmentationmedical AI
Shortest Self-Orthogonal Embeddings of Selected Binary Codes
Submitted
2025
Coding theory / Formal mathematics

Shortest Self-Orthogonal Embeddings of Selected Binary Codes

Jae-Hyun Baek et al.

Submitted coding-theory manuscript

TL;DR: A submitted coding-theory manuscript on self-orthogonal embeddings, adjacent to the Lean/CodingTheoryLib line.

A coding-theory manuscript on shortest self-orthogonal embeddings of selected binary codes, connected to the formal-math backbone of the portfolio.

coding theorybinary codesself-orthogonal embeddingsformalization
Coding theory / Combinatorial games
Submitted
2025
Coding theory / Combinatorial games

A Novel Sudoku from Perfect Codes over the Ring of Integers Modulo n

Jae-Hyun Baek et al.

Submitted coding-theory manuscript

TL;DR: A submitted manuscript connecting perfect codes over Z/nZ to Sudoku-style combinatorial game construction.

A manuscript on constructing Sudoku-type games from perfect codes over the ring of integers modulo n.

perfect codesZ/nZSudokucombinatoricscoding theory
systems and artifacts

The systems behind the papers.

The public site should make artifacts easier to inspect than awards. These systems are the working surfaces where the research program becomes code, data, demos, and deployed tools.

CodingTheoryLib / CodingTheoryLib-MCP
proof substrate
active

CodingTheoryLib / CodingTheoryLib-MCP

Formal coding-theory knowledge shaped into a reusable substrate for theorem-proving agents.

A formalized coding-theory substrate: Lean definitions, proof work, graph memory, and MCP-style access for agents.

The goal is not only to prove isolated theorems. I want a working memory for a mathematical domain, something a prover agent can actually search and reuse.
LeanMathlibcoding theoryMCPformal memory
EntropyMath / EntropyMaG / EntropyMaLean
benchmark
active

EntropyMath / EntropyMaG / EntropyMaLean

Math benchmarks treated as auditable processes rather than static downloads.

A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

I keep coming back to the same question: can a benchmark be something you audit, not just something you download?
AI4MathevaluationLeanbenchmark evolutionprovenance
SolEvolve
research system
research

SolEvolve

An agentic search loop for proposing, mutating, testing, and preserving mathematical candidates.

An LLM-guided evolutionary discovery system for algorithmic and mathematical search, developed around coding-theory constructions.

This is the part of my work where the agent is allowed to be a messy researcher: propose, mutate, test, fail, and try again.
evolutionary searchSATcoding theoryLLM agents
SOGAMBOT
deployed system
deployed

SOGAMBOT

A deployed university assistant built from messy institutional documents and retrieval workflows.

A university RAG chatbot that turns scattered Sogang documents and administrative knowledge into a searchable assistant.

This is the product version of the same pattern: take institutional knowledge that nobody can easily search, then make it usable.
RAGuniversity chatbotretrievaldeployment
MindBuddhi
deployed system
prototype

MindBuddhi

A source-grounded Buddhist/meditation assistant that turns texts into counseling-oriented interactions.

A Buddhist and meditation knowledge assistant built around source-grounded answers, counseling flows, and shareable artifacts.

The hard part is not only the chatbot. It is preserving the texture of the source material while still making the experience simple enough to use.
RAGBuddhist textscounseling UXsource grounding
Water-level forecasting pipeline
forecasting
research

Water-level forecasting pipeline

A real-world forecasting pipeline where event timing and hydrological context matter as much as aggregate error.

A hydrological forecasting workflow around TimeGPT, classical baselines, rainfall features, and rolling-origin evaluation.

This is applied AI with real-world friction: missing values, stations, flood events, and evaluation choices that matter outside the notebook.
TimeGPThydrologyforecastingenvironmental AI
View all artifacts