Back to research
Submitted
2026
AI4Math / Benchmark evaluation

EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning

The submitted EntropyMath workshop paper: benchmarks as auditable evolutionary processes rather than static problem sets.

A benchmark-generation system that treats difficult mathematical reasoning data as an evolutionary, auditable process with lineage, solver traces, and validation gates.

EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning

problem

Generated math benchmarks can be useful but difficult to trust when provenance, solver dependence, and validation contracts are hidden.

key idea

Make benchmark generation inspectable through evolutionary lineages, solver traces, and explicit validation/evidence artifacts.

my role

Lead system builder and paper framer.

methods

  • Evolutionary problem generation
  • Benchmark lineage tracking
  • Solver-trace analysis
  • Validation-gate design

evidence / results

  • Submitted to the ICML AI for Math Workshop track
  • Created the current revision agenda around scope, contribution framing, and evidence scale

why this belongs in the portfolio

  • Defines the auditable-benchmark axis of the portfolio
  • Connects EntropyMath, EntropyMaG, and EntropyMaLean into one research program

authors

Jae-Hyun Baek et al.

venue / status

ICML AI for Math Workshop submission

Workshop submission; currently useful as the public anchor for the benchmark-evolution research line.

tags

EntropyMathAI4Mathbenchmark generationlineageverification

artifact links

Project Code