Submitted
2026
AI4Math / Benchmark evaluation
EntropyMath: An Evolutionary Benchmark Generation System for Evaluating High-Difficulty Mathematical Reasoning
The submitted EntropyMath workshop paper: benchmarks as auditable evolutionary processes rather than static problem sets.
A benchmark-generation system that treats difficult mathematical reasoning data as an evolutionary, auditable process with lineage, solver traces, and validation gates.

problem
Generated math benchmarks can be useful but difficult to trust when provenance, solver dependence, and validation contracts are hidden.
key idea
Make benchmark generation inspectable through evolutionary lineages, solver traces, and explicit validation/evidence artifacts.
my role
Lead system builder and paper framer.
methods
- • Evolutionary problem generation
- • Benchmark lineage tracking
- • Solver-trace analysis
- • Validation-gate design
evidence / results
- • Submitted to the ICML AI for Math Workshop track
- • Created the current revision agenda around scope, contribution framing, and evidence scale
why this belongs in the portfolio
- • Defines the auditable-benchmark axis of the portfolio
- • Connects EntropyMath, EntropyMaG, and EntropyMaLean into one research program