Back to artifacts
benchmark
active

EntropyMath / EntropyMaG / EntropyMaLean

Math benchmarks treated as auditable processes rather than static downloads.

A family of systems for evolving math problems with lineage, solver traces, and verification hooks instead of treating benchmarks as frozen files.

EntropyMath / EntropyMaG / EntropyMaLean

problem

Generated math benchmarks can look impressive while hiding provenance, contamination risk, solver dependence, and weak validation contracts.

why it matters

AI4Math evaluation needs artifacts that can be inspected, traced, and revised, especially when generated data becomes part of the benchmark supply chain.

maker note

I keep coming back to the same question: can a benchmark be something you audit, not just something you download?

what I built

  • Multi-generation math problem evolution pipeline
  • Quality gates around verification, lineage, and release metadata
  • Evaluation artifacts for ICML AI4Math workshop submission/revision

evidence

  • Public EntropyMath artifact
  • EntropyMaG/EntropyMaLean project family
  • Revision-ready evidence and benchmark lineage framing

current status

Active paper/revision track after workshop review; being reframed around auditable benchmark evolution.

next step

Sharpen the contribution around a narrow Lean-gated validity contract and stronger evidence scale.