Formalized Mathematical AI · Auditable Evaluation · Deployed Knowledge Systems

I turn domain knowledge into systems that agents can use.

I build AI systems that turn domain knowledge into formal, searchable, and testable artifacts: Lean libraries for coding theory, auditable math benchmarks, RAG products, forecasting pipelines, and research memory for agents.

View research map GitHub Start a conversation

the working pattern

domain knowledge

papers, logs, textbooks, PDFs, datasets

structured substrate

Lean, graphs, vector memory, MCP, schemas

agent interface

retrieval, prover, evaluator, product workflow

artifact

paper, benchmark, deployed system, public tool

workshop, not just CV

The prototype usually starts ugly.

A notebook. A pile of PDFs. A Lean theorem that will not compile. A dataset with missing values. A product log nobody wants to clean. If the idea survives that first mess, I try to turn it into something searchable, testable, and shareable.

Formalized mathematical AI

Coding theory, self-dual codes, Lean/Mathlib, and MCP-style memory for mathematical agents.

Auditable evaluation

EntropyMath and EntropyMaLean treat benchmarks as traceable processes, not frozen files.

Agent-readable knowledge

I keep asking what shape a domain must take before an agent, prover, or evaluator can use it.

Deployed AI systems

RAG chatbots, forecasting pipelines, medical AI tools, and education products keep the work grounded.

selected current focus

What I am sharpening now.

I keep a wide workshop, but the public portfolio should make the center of gravity clear: formalized mathematical AI, auditable evaluation, and applied systems that prove the substrate works.

How I choose work

core thesis

Formalized Mathematical AI

CodingTheoryLib, Lean, MCP-style memory, and the 2026 KIIS LLM-Lean award thread are the spine of my long-term research direction.

target artifact

A reusable coding-theory and math-solving substrate that theorem-proving agents can search, inspect, falsify, and extend.

Leancoding theoryMCPclosed agentsfalsifier

active paper track

Auditable benchmark evolution

EntropyMath and EntropyMaLean connect generated math problems to solver traces, verifier contracts, and revision-ready evidence.

target artifact

Benchmark lineages that can be audited rather than treated as static download files.

EntropyMathAI4Mathevaluationverification

product/research bridge

Applied AI systems with real users

SOGAMBOT, MindBuddhi, water-level forecasting, and medical AI keep the research grounded in messy real-world data and users.

target artifact

Deployed assistants, forecasting workflows, and domain-specific pipelines with source-grounded outputs.

RAGforecastingmedical AIdeployment

accepted and published work

Papers are artifacts too.

I want the paper, the code, the benchmark, and the deployed system to point at the same underlying substrate whenever possible.

All research

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving

Accepted

2026

AI4Math / Lean agents

Proposal of an LLM-Lean Approach and Architecture for Automated Mathematical Problem Solving

Youngjik Lee, Jon-Lark Kim, Woo-Seok Jeong, Beom-Soo Kim, Jae-Hyun Baek

2026 KIIS Spring Conference — Best Paper Award

TL;DR: A Lean-facing multi-agent math-solving architecture where a falsifier makes the solver loop more inspectable and efficient.

A closed-agent architecture for mathematical problem solving that compares a Single solver structure with a Dual structure where a falsifier participates in the reasoning loop, emphasizing token/compute efficiency and final solving performance.

LLM-Leanclosed agentsfalsifierAI4Mathbest paper

Detail page Source

Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach

Published

2025

RAG / Regulatory AI

Performance Improvement of LLMs for Regulatory Document Understanding based on Modified RAG Approach

Jae-Hyun Baek, Jon-Lark Kim

JKIIS

TL;DR: A regulatory-document RAG system that foreshadows my current focus on agent-readable domain memory.

A modified retrieval augmented generation framework for regulatory documents. This is one of the applied roots of my current interest in turning domain documents into agent-readable memory.

RAGregulatory documentsLLM evaluationdomain memory

Detail page ResearchGate

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT

Published

2026

Time-series / Hydrology AI

Water Level Forecasting in the Gamcheon River, Korea, Using TimeGPT

Jon-Lark Kim, Jae-Hyun Baek, Keon-Hwi Kim, Tae Hyo Baek, Chang-Lae Jang

International Journal of Fuzzy Logic and Intelligent Systems

TL;DR: A real-world forecasting study that tests foundation-model time-series methods against hydrological constraints and evaluation choices.

A study of foundation-model forecasting for river water levels. We compare TimeGPT with classical and linear baselines under rolling-origin evaluation on the Gamcheon River setting.

TimeGPTwater-level predictionfoundation modelsenvironmental AI

Detail page DOI ResearchGate

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

Accepted

2025

VLM / Mathematical reasoning

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim

Accepted manuscript

TL;DR: A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.

A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

VLMColPaliMathVisionmultimodal reasoning

Detail page ResearchGate

deployed research systems

A map of working artifacts.

The surface changes from theorem proving to medical AI to RAG products. The maker habit is the same: find the hidden structure, make it computable, and ship a usable artifact.

proof substrate

active

CodingTheoryLib / CodingTheoryLib-MCP

Formal coding-theory knowledge shaped into a reusable substrate for theorem-proving agents.

A formalized coding-theory substrate: Lean definitions, proof work, graph memory, and MCP-style access for agents.

The goal is not only to prove isolated theorems. I want a working memory for a mathematical domain, something a prover agent can actually search and reuse.

LeanMathlibcoding theoryMCPformal memory

Detail page Code

EntropyMath / EntropyMaG / EntropyMaLean

benchmark

active

EntropyMath / EntropyMaG / EntropyMaLean

Math benchmarks treated as auditable processes rather than static downloads.

A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

I keep coming back to the same question: can a benchmark be something you audit, not just something you download?

AI4MathevaluationLeanbenchmark evolutionprovenance

Detail page Demo Code

research system

research

SolEvolve

An agentic search loop for proposing, mutating, testing, and preserving mathematical candidates.

An LLM-guided evolutionary discovery system for algorithmic and mathematical search, developed around coding-theory constructions.

This is the part of my work where the agent is allowed to be a messy researcher: propose, mutate, test, fail, and try again.

evolutionary searchSATcoding theoryLLM agents

Detail page Code

deployed system

deployed

SOGAMBOT

A deployed university assistant built from messy institutional documents and retrieval workflows.

A university RAG chatbot that turns scattered Sogang documents and administrative knowledge into a searchable assistant.

This is the product version of the same pattern: take institutional knowledge that nobody can easily search, then make it usable.

RAGuniversity chatbotretrievaldeployment

Detail page Demo

deployed system

prototype

MindBuddhi

A source-grounded Buddhist/meditation assistant that turns texts into counseling-oriented interactions.

A Buddhist and meditation knowledge assistant built around source-grounded answers, counseling flows, and shareable artifacts.

The hard part is not only the chatbot. It is preserving the texture of the source material while still making the experience simple enough to use.

RAGBuddhist textscounseling UXsource grounding

Detail page

forecasting

research

Water-level forecasting pipeline

A real-world forecasting pipeline where event timing and hydrological context matter as much as aggregate error.

A hydrological forecasting workflow around TimeGPT, classical baselines, rainfall features, and rolling-origin evaluation.

This is applied AI with real-world friction: missing values, stations, flood events, and evaluation choices that matter outside the notebook.

TimeGPThydrologyforecastingenvironmental AI

Detail page