Extract Image Captions

Extract picture-related blocks and captions from documents.

ProductivityAPIStatic billingFree

Local DoclingStructured output

Selected file

One file at a time. Keep uploads under 20MB.

Accepted formats: PDF, Word, PowerPoint, HTMLPrimary output: structured JSON

About this tool

Extract picture-related blocks and captions from documents.

ProductivityAPIStatic billingPublic

Execution type: API

Credit cost: Free

Billing unit: 1 document

Supported locales: English, Chinese

Last updated: Apr 14, 2026

This tool currently accepts PDF, Word, PowerPoint, HTML. After upload, Kitlot passes the file through Docling for parsing and conversion.

The primary output is structured JSON. Returns detected image blocks, visual structure, and related metadata.

Open Extract Image Captions and choose one supported document.

Kitlot sends the file through the local Docling conversion flow.

After processing, review, copy, or download the structured JSON result.