research program

Formal, searchable, and testable knowledge for AI agents.

My work asks how messy domain knowledge becomes something an agent, prover, evaluator, or real user can reliably use. The domains range from Lean and coding theory to RAG products, hydrology, and medical images, but the motion is the same: substrate → interface → evidence → artifact.

Browse artifacts GitHub ResearchGate

research map

Four threads, one repeated pattern.

Each axis below links a question to the papers and systems that carry it. This is the page that should explain the research program before anyone sees a certificate or a trophy.

Formalized Mathematical AI

Make mathematical knowledge formal, searchable, and reusable by theorem-proving agents.

This is the spine of the portfolio: coding theory, self-dual codes, Lean/Mathlib, closed-agent math solving, and domain-specific memory for mathematical agents.

questions

• What should a mathematical domain look like before an agent can use it?
• How do Lean artifacts, proof trajectories, and MCP interfaces become reusable research memory?

papers

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving SolEvolve: LLM-driven Evolutionary Discovery of Algorithms Symmetric Sudoku-Type Games from Perfect Codes

artifacts

CodingTheoryLib / CodingTheoryLib-MCP SolEvolve

LeanCodingTheoryLibAI4MathMCPproof trajectories

Auditable Benchmark Evolution

Treat benchmarks as inspectable processes, not static files.

EntropyMath, EntropyMaG, and EntropyMaLean connect generated problems to lineage, solver traces, verifier contracts, and revision-ready evidence.

questions

• What does a generated math benchmark need to expose before it can be trusted?
• Can verification and lineage become first-class evaluation artifacts?

papers

EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning

artifacts

EntropyMath / EntropyMaG / EntropyMaLean

EntropyMathbenchmark lineageverificationsolver traces

Deployed Knowledge Systems

Use real products to test whether a knowledge substrate actually helps people.

SOGAMBOT, MindBuddhi, regulatory RAG, and institutional assistants keep the agent-readable-memory idea grounded in messy users, documents, and product constraints.

questions

• Can source-grounded answers preserve the texture of a domain?
• What breaks when retrieval leaves a notebook and becomes a user-facing system?

papers

Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach

artifacts

SOGAMBOT MindBuddhi

RAGsource groundingproduct UXdomain memory

Applied AI Methods under Real Constraints

Use forecasting, medical images, and education tools to keep the research honest.

Hydrology, whole-slide images, and education systems create pressure that clean benchmarks often hide: missing data, clinical validation, event timing, and operational evaluation.

questions

• What evaluation choices matter outside the benchmark?
• How does domain friction reshape the model pipeline?

papers

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures MekaNet: WSI-based Tiny Object Detection

artifacts

Water-level forecasting pipeline MekaNet / medical AI pipelines

forecastingmedical AIeducation AIevaluation

Selected publications

Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach

Published

2025

RAG / Regulatory AI

Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach

Jae-Hyun Baek, Jon-Lark Kim

JKIIS

TL;DR: A regulatory-document RAG system that foreshadows my current focus on agent-readable domain memory.

A modified retrieval augmented generation framework for regulatory documents. This is one of the applied roots of my current interest in turning domain documents into agent-readable memory.

RAGregulatory documentsLLM evaluationdomain memory

Detail page ResearchGate

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT

Published

2026

Time-series / Hydrology AI

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT

Jon-Lark Kim, Jae-Hyun Baek, Keon-Hwi Kim, Tae Hyo Baek, Chang-Lae Jang

International Journal of Fuzzy Logic and Intelligent Systems

TL;DR: A real-world forecasting study that tests foundation-model time-series methods against hydrological constraints and evaluation choices.

A study of foundation-model forecasting for river water levels. We compare TimeGPT with classical and linear baselines under rolling-origin evaluation on the Gamcheon River setting.

TimeGPTwater-level predictionfoundation modelsenvironmental AI

Detail page DOI ResearchGate

Status labels separate published/accepted work from submitted or under-review manuscripts. Each detail page explains the problem, key idea, role, evidence, and related artifacts.

Submitted, accepted, and under-review papers

This section is an inventory, not a selective shelf: it includes the accepted works, workshop/conference submissions, submitted manuscripts, and under-review manuscripts I have evidence for in the local research archive.

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving

Accepted

2026

AI4Math / Lean agents

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving

Youngjik Lee, Jon-Lark Kim, Woo-Seok Jeong, Beom-Soo Kim, Jae-Hyun Baek

2026 KIIS Spring Conference — Best Paper Award

TL;DR: A Lean-facing multi-agent math-solving architecture where a falsifier makes the solver loop more inspectable and efficient.

A closed-agent architecture for mathematical problem solving that compares a Single solver structure with a Dual structure where a falsifier participates in the reasoning loop, emphasizing token/compute efficiency and final solving performance.

LLM-Leanclosed agentsfalsifierAI4Mathbest paper

Detail page Source

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

Accepted

2025

VLM / Mathematical reasoning

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim

Accepted manuscript

TL;DR: A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.

A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

VLMColPaliMathVisionmultimodal reasoning

Detail page ResearchGate

SolEvolve: LLM-driven Evolutionary Discovery of Algorithms

Submitted

2025

Evolutionary discovery / Coding theory

SolEvolve: LLM-driven Evolutionary Discovery of Algorithms

Jae-Hyun Baek

Master's thesis / manuscript track

TL;DR: An LLM-guided evolutionary search system for discovering and testing coding-theoretic constructions.

An autonomous search system where LLM-guided operators, SAT-seeded search, and verification loops rediscover strong coding-theoretic constructions.

evolutionary searchSATcoding theoryalgorithm discovery

Detail page

MekaNet: WSI-based Tiny Object Detection

Under Review

2025

Medical AI / Computer vision

MekaNet: WSI-based Tiny Object Detection

Jae-Hyun Baek et al.

Medical image analysis manuscript

TL;DR: A pathology WSI pipeline for tiny-object detection and clinical-data evaluation under real imaging constraints.

A whole-slide-image pipeline for small object detection in pathology. It belongs to the applied side of my work: turning difficult domain data into usable AI systems.

medical AIWSItiny object detectioncomputer vision

Detail page

Coding theory / Combinatorial games

Submitted

2025

Coding theory / Combinatorial games

Symmetric Sudoku-Type Games from Perfect Codes

Jae-Hyun Baek et al.

Submitted manuscript

TL;DR: A combinatorial-game manuscript that connects coding theory, symmetry, and puzzle construction.

A coding-theoretic construction of Sudoku-type puzzle structures from perfect codes and symmetry principles.

coding theoryperfect codesgamescombinatorics

Detail page

EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning

Submitted

2026

AI4Math / Benchmark evaluation

EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning

Jae-Hyun Baek et al.

ICML AI for Math Workshop submission

TL;DR: The submitted EntropyMath workshop paper: benchmarks as auditable evolutionary processes rather than static problem sets.

A benchmark-generation system that treats difficult mathematical reasoning data as an evolutionary, auditable process with lineage, solver traces, and validation gates.

EntropyMathAI4Mathbenchmark generationlineageverification

Detail page Project Code

A Case Study on Alignment Faking in LLMs

Accepted

2025

LLM safety / Alignment evaluation

A Case Study on Alignment Faking in LLMs

Jae-Hyun Baek, Jon-Lark Kim

ISIS 2025 — Best Presentation Award

TL;DR: An accepted ISIS 2025 alignment-safety work that connects behavioral evaluation to formalizable conditions for apparent compliance.

A case-study presentation on alignment faking: when externally aligned behavior may diverge from internal reasoning under evaluation or monitoring pressure.

alignment fakingLLM safetyformal methodsevaluation

Detail page

MekaNet: A Deep Learning Framework for Megakaryocyte Detection and Myeloproliferative Neoplasm Classification with Enhanced Feature Engineering

Submitted

2025

Medical AI / Pathology

MekaNet: A Deep Learning Framework for Megakaryocyte Detection and Myeloproliferative Neoplasm Classification with Enhanced Feature Engineering

Jae-Hyun Baek et al.

Submitted medical AI manuscript

TL;DR: A submitted MekaNet manuscript connecting megakaryocyte detection, MPN classification, and clinically shaped feature engineering.

A medical-image manuscript on megakaryocyte detection and myeloproliferative neoplasm classification with enhanced feature-engineering support.

MekaNetmegakaryocyteMPNpathologyfeature engineering

Detail page

MekaNet: A Tiling-Enhanced Semi-Supervised Detection Framework for Megakaryocytes

Submitted

2025

Medical AI / Pathology

MekaNet: A Tiling-Enhanced Semi-Supervised Detection Framework for Megakaryocytes

Jae-Hyun Baek et al.

BMC / medical AI manuscript

TL;DR: A submitted MekaNet manuscript focused on tiling and semi-supervised detection for megakaryocytes.

A BMC-style medical-image manuscript on a tiling-enhanced semi-supervised framework for detecting megakaryocytes in pathology imagery.

MekaNettilingsemi-supervised learningmegakaryocyteWSI

Detail page

Automated Cellularity Assessment in Bone Marrow Using Deep Learning-Based Segmentation

Submitted

2025

Medical AI / Segmentation

Automated Cellularity Assessment in Bone Marrow Using Deep Learning-Based Segmentation

Jae-Hyun Baek et al.

Submitted medical AI manuscript

TL;DR: A submitted medical-AI manuscript on bone-marrow cellularity assessment using segmentation.

A bone-marrow cellularity assessment manuscript using deep learning-based segmentation for clinical image-analysis support.

cellularitybone marrowsegmentationmedical AI

Detail page

Shortest Self-Orthogonal Embeddings of Selected Binary Codes

Submitted

2025

Coding theory / Formal mathematics

Shortest Self-Orthogonal Embeddings of Selected Binary Codes

Jae-Hyun Baek et al.

Submitted coding-theory manuscript

TL;DR: A submitted coding-theory manuscript on self-orthogonal embeddings, adjacent to the Lean/CodingTheoryLib line.

A coding-theory manuscript on shortest self-orthogonal embeddings of selected binary codes, connected to the formal-math backbone of the portfolio.

coding theorybinary codesself-orthogonal embeddingsformalization

Detail page

Coding theory / Combinatorial games

Submitted

2025

Coding theory / Combinatorial games

A Novel Sudoku from Perfect Codes over the Ring of Integers Modulo n

Jae-Hyun Baek et al.

Submitted coding-theory manuscript

TL;DR: A submitted manuscript connecting perfect codes over Z/nZ to Sudoku-style combinatorial game construction.

A manuscript on constructing Sudoku-type games from perfect codes over the ring of integers modulo n.

perfect codesZ/nZSudokucombinatoricscoding theory

Detail page

systems and artifacts

The systems behind the papers.

The public site should make artifacts easier to inspect than awards. These systems are the working surfaces where the research program becomes code, data, demos, and deployed tools.

proof substrate

active

CodingTheoryLib / CodingTheoryLib-MCP

Formal coding-theory knowledge shaped into a reusable substrate for theorem-proving agents.

A formalized coding-theory substrate: Lean definitions, proof work, graph memory, and MCP-style access for agents.

The goal is not only to prove isolated theorems. I want a working memory for a mathematical domain, something a prover agent can actually search and reuse.

LeanMathlibcoding theoryMCPformal memory

Detail page Code

EntropyMath / EntropyMaG / EntropyMaLean

benchmark

active

EntropyMath / EntropyMaG / EntropyMaLean

Math benchmarks treated as auditable processes rather than static downloads.

A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

I keep coming back to the same question: can a benchmark be something you audit, not just something you download?

AI4MathevaluationLeanbenchmark evolutionprovenance

Detail page Demo Code

research system

research

SolEvolve

An agentic search loop for proposing, mutating, testing, and preserving mathematical candidates.

An LLM-guided evolutionary discovery system for algorithmic and mathematical search, developed around coding-theory constructions.

This is the part of my work where the agent is allowed to be a messy researcher: propose, mutate, test, fail, and try again.

evolutionary searchSATcoding theoryLLM agents

Detail page Code

deployed system

deployed

SOGAMBOT

A deployed university assistant built from messy institutional documents and retrieval workflows.

A university RAG chatbot that turns scattered Sogang documents and administrative knowledge into a searchable assistant.

This is the product version of the same pattern: take institutional knowledge that nobody can easily search, then make it usable.

RAGuniversity chatbotretrievaldeployment

Detail page Demo

deployed system

prototype

MindBuddhi

A source-grounded Buddhist/meditation assistant that turns texts into counseling-oriented interactions.

A Buddhist and meditation knowledge assistant built around source-grounded answers, counseling flows, and shareable artifacts.

The hard part is not only the chatbot. It is preserving the texture of the source material while still making the experience simple enough to use.

RAGBuddhist textscounseling UXsource grounding

Detail page

forecasting

research

Water-level forecasting pipeline

A real-world forecasting pipeline where event timing and hydrological context matter as much as aggregate error.

A hydrological forecasting workflow around TimeGPT, classical baselines, rainfall features, and rolling-origin evaluation.

This is applied AI with real-world friction: missing values, stations, flood events, and evaluation choices that matter outside the notebook.

TimeGPThydrologyforecastingenvironmental AI

Detail page

View all artifacts