What inputs does Extract Image Captions support?
This tool currently accepts PDF, Word, PowerPoint, HTML. After upload, Kitlot passes the file through Docling for parsing and conversion.
Extract picture-related blocks and captions from documents.
Selected file
One file at a time. Keep uploads under 20MB.
Extract picture-related blocks and captions from documents.
Execution type: API
Credit cost: Free
Billing unit: 1 document
Supported locales: English, Chinese
Last updated: Apr 14, 2026
Published articles and notes currently linked to this tool.
This tool currently accepts PDF, Word, PowerPoint, HTML. After upload, Kitlot passes the file through Docling for parsing and conversion.
The primary output is structured JSON. Returns detected image blocks, visual structure, and related metadata.
Open Extract Image Captions and choose one supported document.
Kitlot sends the file through the local Docling conversion flow.
After processing, review, copy, or download the structured JSON result.