Back to research
Accepted
2025
VLM / Mathematical reasoning

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.

A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

problem

Many mathematical problems rely on diagrams and spatial relations that text-only OCR pipelines fail to preserve.

key idea

Use multimodal retrieval and LLM reasoning as a hybrid pipeline so figure information remains part of the solving context.

my role

Co-author; connected multimodal retrieval to mathematical reasoning evaluation.

methods

  • Vision-language retrieval
  • Multimodal math reasoning
  • Figure-aware prompt construction

evidence / results

  • Accepted manuscript
  • Complements the formal math line by addressing visual mathematical structure

why this belongs in the portfolio

  • Bridges visual problem understanding and math-solver evaluation
  • Adds a multimodal route into the broader AI4Math portfolio

authors

Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim

venue / status

Accepted manuscript

Accepted manuscript; public metadata currently referenced through ResearchGate.

tags

VLMColPaliMathVisionmultimodal reasoning

artifact links

ResearchGate