Skip to content

OCR (Optical Character Recognition)

Extract text and structured content from images and PDF documents. Powered by Mistral OCR, this endpoint can understand complex document elements including tables, mathematical expressions, and multi-column layouts.

Endpoint

POST /proxy/v1/ocr

Request Body

ParameterTypeRequiredDescription
documentobjectYesThe document to process (see Document Types below)
modelstringNoModel to use. Default: mistral-ocr-latest
idstringNoOptional request identifier
pagesarray<integer>NoSpecific pages to process (0-indexed). Example: [0, 1, 2]
include_image_base64booleanNoInclude extracted images as base64 in response
image_limitintegerNoMaximum number of images to extract
image_min_sizeintegerNoMinimum height/width of images to extract
table_formatstringNoTable output format: "markdown" or "html"
extract_headerbooleanNoExtract document headers separately. Default: false
extract_footerbooleanNoExtract document footers separately. Default: false
document_annotation_formatobjectNoResponse format for document annotation (see Response Formats)
bbox_annotation_formatobjectNoResponse format for bounding box annotation (see Response Formats)

Document Types

Image URL

json
{
  "document": {
    "type": "image_url",
    "image_url": "https://example.com/image.png"
  }
}

Document URL (PDF)

json
{
  "document": {
    "type": "document_url",
    "document_url": "https://example.com/document.pdf"
  }
}

File ID (previously uploaded)

json
{
  "document": {
    "type": "file",
    "file_id": "your_file_id_here"
  }
}

TIP

To process local files, first upload them to a publicly accessible URL (e.g., cloud storage) and use document_url or image_url.

Response Formats

Use these for document_annotation_format or bbox_annotation_format:

Text (default)

json
{ "type": "text" }

JSON Object

json
{ "type": "json_object" }

JSON Schema

json
{
  "type": "json_schema",
  "json_schema": { "your": "schema" }
}

Example Request

bash
curl https://ai.hackclub.com/proxy/v1/ocr \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document": {
      "type": "image_url",
      "image_url": "https://example.com/receipt.png"
    },
    "table_format": "markdown"
  }'

Example Response

json
{
  "model": "mistral-ocr-latest",
  "pages": [
    {
      "index": 0,
      "markdown": "# Invoice\n\nDate: 2024-01-15\n\n| Item | Quantity | Price |\n|------|----------|-------|\n| Widget | 5 | $10.00 |\n| Gadget | 2 | $25.00 |\n\n**Total: $100.00**",
      "images": [],
      "dimensions": {
        "width": 800,
        "height": 1200
      }
    }
  ],
  "document_annotation": null,
  "usage_info": {
    "pages_processed": 1,
    "doc_size_bytes": 102400
  }
}

Response Fields

FieldTypeDescription
modelstringThe model used for OCR
pagesarrayList of OCR results per page
pages[].indexintegerPage index (0-based)
pages[].markdownstringExtracted content in Markdown format
pages[].imagesarrayExtracted images with bounding boxes
pages[].dimensionsobjectPage dimensions (width, height)
document_annotationstring|nullFormatted response if annotation format was specified
usage_infoobjectUsage information for the request

Features

  • Text Extraction: Preserves document structure including headers, paragraphs, and lists
  • Table Recognition: Outputs tables in Markdown or HTML format
  • Math Support: Handles mathematical expressions and LaTeX formatting
  • Multi-language: Supports thousands of scripts and languages
  • Image Extraction: Optionally extract embedded images with bounding boxes
  • Structured Output: Use JSON schema for structured data extraction

Supported Formats

Images

  • PNG, JPEG/JPG, AVIF, WebP, and more

Documents

  • PDF, PPTX, DOCX, and more

Limitations

  • Maximum file size: 50 MB
  • Maximum pages: 1000 per request
  • Character formatting (bold, italic, underline) is not preserved
  • Footnotes and superscript text are preserved