Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures

A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.

A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

$Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures$

problem

Many mathematical problems rely on diagrams and spatial relations that text-only OCR pipelines fail to preserve.

key idea

Use multimodal retrieval and LLM reasoning as a hybrid pipeline so figure information remains part of the solving context.

my role

Co-author; connected multimodal retrieval to mathematical reasoning evaluation.

methods

• Vision-language retrieval
• Multimodal math reasoning
• Figure-aware prompt construction

evidence / results

• Accepted manuscript
• Complements the formal math line by addressing visual mathematical structure

why this belongs in the portfolio

• Bridges visual problem understanding and math-solver evaluation
• Adds a multimodal route into the broader AI4Math portfolio

authors

Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim

venue / status

Accepted manuscript

Accepted manuscript; public metadata currently referenced through ResearchGate.