Help & Documentation — OCR Converter

Extract text from images and PDFs using Tesseract.js OCR engine. Supports 9 languages and 5 output formats (TXT, DOCX, PDF, HTML, JSON).

A browser-based OCR tool powered by Tesseract.js. It extracts text from images and PDFs in 9 languages, with output in 5 formats. All processing happens locally — your files stay private.

1. Upload an image or PDF using the drop zone 2. Select the document language 3. Choose an output format (TXT, DOCX, PDF, HTML, JSON) 4. Click "Convert" 5. Wait for processing (progress bar shows status) 6. Copy or download the result

Input: PNG, JPEG, BMP, WebP, GIF, TIFF, PDF
Output: TXT (plain text), DOCX (Word), PDF (searchable), HTML (web page), JSON (structured data with confidence)

English, Spanish, French, German, Hindi, Marathi, Chinese Simplified, Japanese, Arabic. Language data is downloaded on first use (~2-15MB) and cached in your browser.

• Use high-resolution images (300 DPI or higher)
• Ensure good contrast between text and background
• Avoid skewed or rotated text
• Crop to the text area for faster processing
• Choose the correct language for best accuracy

Low accuracy: Try a higher resolution image with better contrast. Ensure the correct language is selected.
Slow processing: First use downloads language data. Subsequent uses are faster.
PDF not working: Ensure it's a scanned/image PDF, not a text-based PDF (text PDFs don't need OCR).

OCR Converter — Help & Documentation

Table of Contents