Formalized Mathematical AI · Auditable Evaluation · Deployed Knowledge Systems

I turn domain knowledge into systems that agents can use.

I build AI systems that turn domain knowledge into formal, searchable, and testable artifacts: Lean libraries for coding theory, auditable math benchmarks, RAG products, forecasting pipelines, and research memory for agents.

the working pattern

1

domain knowledge

papers, logs, textbooks, PDFs, datasets

2

structured substrate

Lean, graphs, vector memory, MCP, schemas

3

agent interface

retrieval, prover, evaluator, product workflow

4

artifact

paper, benchmark, deployed system, public tool

workshop, not just CV

The prototype usually starts ugly.

A notebook. A pile of PDFs. A Lean theorem that will not compile. A dataset with missing values. A product log nobody wants to clean. If the idea survives that first mess, I try to turn it into something searchable, testable, and shareable.

Formalized mathematical AI

Coding theory, self-dual codes, Lean/Mathlib, and MCP-style memory for mathematical agents.

Auditable evaluation

EntropyMath and EntropyMaLean treat benchmarks as traceable processes, not frozen files.

Agent-readable knowledge

I keep asking what shape a domain must take before an agent, prover, or evaluator can use it.

Deployed AI systems

RAG chatbots, forecasting pipelines, medical AI tools, and education products keep the work grounded.

selected current focus

What I am sharpening now.

I keep a wide workshop, but the public portfolio should make the center of gravity clear: formalized mathematical AI, auditable evaluation, and applied systems that prove the substrate works.

How I choose work
1
core thesis

Formalized Mathematical AI

CodingTheoryLib, Lean, MCP-style memory, and the 2026 KIIS LLM-Lean award thread are the spine of my long-term research direction.

target artifact

A reusable coding-theory and math-solving substrate that theorem-proving agents can search, inspect, falsify, and extend.

Leancoding theoryMCPclosed agentsfalsifier
2
active paper track

Auditable benchmark evolution

EntropyMath and EntropyMaLean connect generated math problems to solver traces, verifier contracts, and revision-ready evidence.

target artifact

Benchmark lineages that can be audited rather than treated as static download files.

EntropyMathAI4Mathevaluationverification
3
product/research bridge

Applied AI systems with real users

SOGAMBOT, MindBuddhi, water-level forecasting, and medical AI keep the research grounded in messy real-world data and users.

target artifact

Deployed assistants, forecasting workflows, and domain-specific pipelines with source-grounded outputs.

RAGforecastingmedical AIdeployment
accepted and published work

Papers are artifacts too.

I want the paper, the code, the benchmark, and the deployed system to point at the same underlying substrate whenever possible.

All research
Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving
Accepted
2026
AI4Math / Lean agents

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving

Youngjik Lee, Jon-Lark Kim, Woo-Seok Jeong, Beom-Soo Kim, Jae-Hyun Baek

2026 KIIS Spring Conference — Best Paper Award

TL;DR: A Lean-facing multi-agent math-solving architecture where a falsifier makes the solver loop more inspectable and efficient.

A closed-agent architecture for mathematical problem solving that compares a Single solver structure with a Dual structure where a falsifier participates in the reasoning loop, emphasizing token/compute efficiency and final solving performance.

LLM-Leanclosed agentsfalsifierAI4Mathbest paper
Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach
Published
2025
RAG / Regulatory AI
TL;DR: A regulatory-document RAG system that foreshadows my current focus on agent-readable domain memory.

A modified retrieval augmented generation framework for regulatory documents. This is one of the applied roots of my current interest in turning domain documents into agent-readable memory.

RAGregulatory documentsLLM evaluationdomain memory
Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT
Published
2026
Time-series / Hydrology AI

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT

Jon-Lark Kim, Jae-Hyun Baek, Keon-Hwi Kim, Tae Hyo Baek, Chang-Lae Jang

International Journal of Fuzzy Logic and Intelligent Systems

TL;DR: A real-world forecasting study that tests foundation-model time-series methods against hydrological constraints and evaluation choices.

A study of foundation-model forecasting for river water levels. We compare TimeGPT with classical and linear baselines under rolling-origin evaluation on the Gamcheon River setting.

TimeGPTwater-level predictionfoundation modelsenvironmental AI
Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures
Accepted
2025
VLM / Mathematical reasoning

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim

Accepted manuscript

TL;DR: A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.

A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

VLMColPaliMathVisionmultimodal reasoning
deployed research systems

A map of working artifacts.

The surface changes from theorem proving to medical AI to RAG products. The maker habit is the same: find the hidden structure, make it computable, and ship a usable artifact.

CodingTheoryLib / CodingTheoryLib-MCP
proof substrate
active

CodingTheoryLib / CodingTheoryLib-MCP

Formal coding-theory knowledge shaped into a reusable substrate for theorem-proving agents.

A formalized coding-theory substrate: Lean definitions, proof work, graph memory, and MCP-style access for agents.

The goal is not only to prove isolated theorems. I want a working memory for a mathematical domain, something a prover agent can actually search and reuse.
LeanMathlibcoding theoryMCPformal memory
EntropyMath / EntropyMaG / EntropyMaLean
benchmark
active

EntropyMath / EntropyMaG / EntropyMaLean

Math benchmarks treated as auditable processes rather than static downloads.

A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

I keep coming back to the same question: can a benchmark be something you audit, not just something you download?
AI4MathevaluationLeanbenchmark evolutionprovenance
SolEvolve
research system
research

SolEvolve

An agentic search loop for proposing, mutating, testing, and preserving mathematical candidates.

An LLM-guided evolutionary discovery system for algorithmic and mathematical search, developed around coding-theory constructions.

This is the part of my work where the agent is allowed to be a messy researcher: propose, mutate, test, fail, and try again.
evolutionary searchSATcoding theoryLLM agents
SOGAMBOT
deployed system
deployed

SOGAMBOT

A deployed university assistant built from messy institutional documents and retrieval workflows.

A university RAG chatbot that turns scattered Sogang documents and administrative knowledge into a searchable assistant.

This is the product version of the same pattern: take institutional knowledge that nobody can easily search, then make it usable.
RAGuniversity chatbotretrievaldeployment
MindBuddhi
deployed system
prototype

MindBuddhi

A source-grounded Buddhist/meditation assistant that turns texts into counseling-oriented interactions.

A Buddhist and meditation knowledge assistant built around source-grounded answers, counseling flows, and shareable artifacts.

The hard part is not only the chatbot. It is preserving the texture of the source material while still making the experience simple enough to use.
RAGBuddhist textscounseling UXsource grounding
Water-level forecasting pipeline
forecasting
research

Water-level forecasting pipeline

A real-world forecasting pipeline where event timing and hydrological context matter as much as aggregate error.

A hydrological forecasting workflow around TimeGPT, classical baselines, rainfall features, and rolling-origin evaluation.

This is applied AI with real-world friction: missing values, stations, flood events, and evaluation choices that matter outside the notebook.
TimeGPThydrologyforecastingenvironmental AI