EntropyMath / EntropyMaG / EntropyMaLean
Math benchmarks treated as auditable processes rather than static downloads.
A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

problem
Generated math benchmarks can look impressive while hiding provenance, contamination risk, solver dependence, and weak validation contracts.
why it matters
AI4Math evaluation needs artifacts that can be inspected, traced, and revised, especially when generated data becomes part of the benchmark supply chain.
maker note
I keep coming back to the same question: can a benchmark be something you audit, not just something you download?
what I built
- • Multi-generation math problem evolution pipeline
- • Quality gates around verification, lineage, and release metadata
- • Evaluation artifacts for ICML AI4Math workshop submission/revision
evidence
- • Public EntropyMath artifact
- • EntropyMaG/EntropyMaLean project family
- • Revision-ready evidence and benchmark lineage framing
current status
Active paper/revision track after workshop review; being reframed around auditable benchmark evolution.
next step
Sharpen the contribution around a narrow Lean-gated validity contract and stronger evidence scale.
related publications
Hybrid Multimodal GenAI for Solving Math Problems Containing Various FiguresA multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.SolEvolve: LLM-driven Evolutionary Discovery of AlgorithmsAn LLM-guided evolutionary search system for discovering and testing coding-theoretic constructions.