Hybrid Multimodal GenAI for Solving Math Problems Containing Various Figures
A multimodal math-solving pipeline for figure-heavy problems where text-only reasoning misses the visual structure.
A hybrid pipeline combining visual retrieval and LLM reasoning for diagram-heavy mathematical problems, especially cases where OCR-only methods miss the actual structure of the figure.

problem
Many mathematical problems rely on diagrams and spatial relations that text-only OCR pipelines fail to preserve.
key idea
Use multimodal retrieval and LLM reasoning as a hybrid pipeline so figure information remains part of the solving context.
my role
Co-author; connected multimodal retrieval to mathematical reasoning evaluation.
methods
- • Vision-language retrieval
- • Multimodal math reasoning
- • Figure-aware prompt construction
evidence / results
- • Accepted manuscript
- • Complements the formal math line by addressing visual mathematical structure
why this belongs in the portfolio
- • Bridges visual problem understanding and math-solver evaluation
- • Adds a multimodal route into the broader AI4Math portfolio
authors
Sangsoo Lee, Jae-Hyun Baek, Jon-Lark Kim
venue / status
Accepted manuscript
Accepted manuscript; public metadata currently referenced through ResearchGate.
tags
artifact links
ResearchGate