← All Tools ZeroDataUpload Home

DocForge

Convert documents, spreadsheets & images between 17 formats with 67 conversion paths

Launch DocForge →
DocForge

Table of Contents

  1. Overview
  2. Key Features
  3. Processing Libraries
  4. 67 Conversion Paths
  5. Technical Deep Dive
  6. How to Use
  7. Frequently Asked Questions
  8. Privacy & Security

Overview

DocForge is a universal document converter that transforms files between 17 formats across three categories: documents, spreadsheets, and images. With 67 distinct conversion paths powered by 10 specialized JavaScript libraries, it handles everything from DOCX-to-PDF document conversion to CSV-to-JSON data transformation to PNG-to-WebP image optimization -- all without any file ever leaving your browser.

The converter supports six document formats (DOCX, PDF, TXT, HTML, RTF, Markdown), six spreadsheet formats (XLSX, XLS, CSV, TSV, JSON, XML), and five image formats plus SVG input (PNG, JPG, WebP, BMP, GIF, SVG). Each format connects to multiple output targets through carefully engineered conversion pipelines that preserve content fidelity while adapting to the constraints of each target format.

DocForge includes a smart preview system that renders your file content before conversion -- DOCX files display as formatted HTML via mammoth.js, spreadsheets render as interactive tables via PapaParse and SheetJS, images appear inline, and PDFs show extracted text. The preview data is cached in memory so the conversion step does not need to re-parse the original file, making the entire process faster and more efficient.

A standout technical achievement is the custom MinZip class for DOCX generation. Rather than depending on a heavy ZIP library like JSZip, DocForge implements its own minimal ZIP builder that constructs valid OOXML packages with CRC-32 checksums, proper content type declarations, relationship files, and Word document XML -- creating standards-compliant .docx files from scratch using pure JavaScript.

Key Features

17 Supported Formats

Documents: DOCX, PDF, TXT, HTML, RTF, Markdown. Spreadsheets: XLSX, XLS, CSV, TSV, JSON, XML. Images: PNG, JPG, WebP, BMP, GIF, plus SVG as an input format. Every format is read and written entirely in JavaScript.

67 Conversion Paths

14 document paths (DOCX↔PDF↔TXT↔HTML↔MD↔RTF), 24 spreadsheet paths (XLSX↔CSV↔TSV↔JSON↔XML), 20 image paths (PNG↔JPG↔WebP↔BMP↔GIF, SVG→raster), and 9 cross-category paths including spreadsheet-to-PDF and image-to-PDF.

10 Processing Libraries

mammoth.js 1.6.0, pdf.js 3.11.174, pdf-lib 1.17.1, jsPDF 2.5.1, SheetJS (XLSX) 0.18.5, PapaParse 5.4.1, marked 12.0.0, Turndown 7.1.3, html2canvas 1.4.1, and FileSaver 2.0.5 -- each handling specific format conversions.

Smart File Preview

See your content before converting: DOCX renders as formatted HTML, CSV/TSV/XLSX display as sortable tables (first 15 rows), images show inline (max 280px), PDFs extract text from the first 5 pages, and text formats show the first 5000 characters.

Custom DOCX Generation

The MinZip class builds valid OOXML .docx packages without any external ZIP library. It generates CRC-32 checksums, writes uncompressed store entries, and creates proper [Content_Types].xml, _rels/.rels, and word/document.xml structures.

PDF Rendering Engine

pdf.js 3.11.174 extracts text for preview and renders pages at 2x scale for high-quality image export. Supports up to 20 pages per document. Combined with jsPDF 2.5.1 for generating PDF output from any source format.

Image Interchange Layer

The Canvas API serves as a universal image interchange format -- any image loads onto a canvas element, then exports to PNG, JPG (92% quality, white background), WebP, BMP, or GIF. SVG rasterization defaults to 800×600 pixels.

Auto-Detection & Download

Input format is detected automatically from the file extension. The output dropdown shows only compatible target formats. Converted files download instantly with the correct file extension and MIME type via FileSaver.js.

Processing Libraries

DocForge relies on 10 battle-tested open-source JavaScript libraries, each responsible for specific format operations. All libraries execute entirely in the browser -- no server-side processing occurs at any point.

  1. mammoth.js 1.6.0 -- Parses DOCX files by reading the underlying XML structure inside the ZIP container. Preserves headings, bold, italic, and paragraph formatting. Converts DOCX content to clean HTML that serves as an intermediate representation for further conversions to PDF, TXT, Markdown, and RTF. The HTML output is displayed directly in the preview panel.
  2. pdf.js 3.11.174 -- Mozilla's PDF rendering engine, used for two purposes: text extraction (reading text content from each page for preview and TXT/HTML/MD conversion) and page rendering (drawing each page onto a canvas at 2x resolution for high-quality image export). Processes up to 20 pages per document to prevent memory issues.
  3. pdf-lib 1.17.1 -- Handles PDF document manipulation at the binary level. Used for operations that require modifying or constructing PDF structures programmatically, complementing jsPDF's generation capabilities and pdf.js's reading capabilities.
  4. jsPDF 2.5.1 -- Generates PDF documents from text content, HTML tables, and images. Uses A4 page size with Helvetica font at 10pt and 15mm margins. For spreadsheet-to-PDF conversion, it creates landscape-oriented pages with auto-calculated column widths (maximum 40mm per column), bold headers, and automatic multi-page support for large datasets.
  5. SheetJS (XLSX) 0.18.5 -- Reads and writes Excel formats including XLSX and legacy XLS. Provides sheet_to_json for converting spreadsheets to JSON objects, sheet_to_csv for CSV/TSV output, and aoa_to_sheet for building worksheets from arrays of arrays. Processes the first sheet only, with all formulas evaluated to their computed values.
  6. PapaParse 5.4.1 -- RFC 4180-compliant CSV and TSV parser with intelligent header detection, automatic delimiter inference, and proper handling of quoted fields containing commas, newlines, and escape characters. Converts CSV/TSV data into JavaScript objects that feed into the preview table (first 15 rows) and downstream format conversions.
  7. marked 12.0.0 -- Converts Markdown syntax to HTML. Used when Markdown files are the input format -- the generated HTML then serves as an intermediate for conversion to PDF, DOCX, TXT, and RTF. Supports standard Markdown features including headings, emphasis, links, code blocks, and lists.
  8. Turndown 7.1.3 -- Performs the reverse of marked: converts HTML to Markdown. Configured to use ATX-style headings (# instead of underlines) and fenced code blocks (triple backticks). Used when any HTML-producing format (DOCX, PDF, HTML) needs to output as Markdown.
  9. html2canvas 1.4.1 -- Renders HTML elements to canvas by parsing the DOM and recreating the layout on a canvas element. Used as part of the HTML-to-image conversion pipeline and for generating visual representations of document content for image-based output formats.
  10. FileSaver 2.0.5 -- Provides a cross-browser saveAs() function that triggers file downloads from Blob objects. Handles MIME type assignment and filename extension management. Works consistently across Chrome, Firefox, Safari, and Edge without browser-specific workarounds.

67 Conversion Paths

Every conversion in DocForge follows a specific pipeline through one or more intermediate representations. The 67 paths are organized into five categories based on the source and target format types:

Document Conversions (14 paths)

Document-to-document conversions use HTML as the primary interchange format. DOCX is parsed to HTML via mammoth.js, PDF text is extracted via pdf.js, RTF control sequences are stripped via regex, and Markdown is rendered via marked. From the HTML intermediate, Turndown generates Markdown, jsPDF generates PDF, and text extraction produces TXT. The full matrix covers DOCX↔PDF↔TXT↔HTML↔MD↔RTF with 14 distinct directional paths.

Spreadsheet Conversions (24 paths)

Spreadsheet conversions center on SheetJS and PapaParse. XLSX and XLS files are read by SheetJS into a workbook object (first sheet extracted). CSV and TSV are parsed by PapaParse into row arrays. JSON is parsed natively. XML is processed by a recursive tree builder that handles attributes (prefixed with @), text nodes (as #text), and auto-arrays for duplicate sibling elements. From any parsed state, SheetJS writes XLSX, PapaParse-compatible arrays write CSV/TSV, JSON.stringify writes JSON, and a custom XML builder writes XML. The 24 paths cover XLSX↔CSV↔TSV↔JSON↔XML with XLS as an additional input format.

Image Conversions (20 paths)

All image conversions flow through the HTML Canvas API as the universal interchange layer. Source images (PNG, JPG, WebP, BMP, GIF) are drawn onto a canvas element. SVG is rasterized at a default resolution of 800×600 pixels with a white background. From the canvas, toDataURL() or toBlob() exports to the target format. JPG and BMP outputs use a white background fill (since they do not support transparency), and JPEG quality is set to 92%. The 20 paths cover PNG↔JPG↔WebP↔BMP↔GIF bidirectionally, plus SVG→PNG/JPG/WebP/BMP/GIF (SVG is input-only).

Cross-Category Conversions (9 paths)

These paths bridge format categories. Spreadsheet→PDF uses jsPDF with landscape orientation, auto-calculated column widths (capped at 40mm), bold headers, and multi-page support for large datasets. XML→CSV flattens the recursive tree structure into tabular rows. Image→PDF embeds the image into a jsPDF document. These 9 paths enable workflows like converting an Excel report to a PDF for distribution or embedding a chart image into a PDF document.

Technical Deep Dive

Custom MinZip DOCX Builder

One of DocForge's most distinctive features is its custom MinZip class that generates valid .docx files without relying on JSZip or any other external ZIP library. A DOCX file is actually a ZIP archive containing XML files in the Office Open XML (OOXML) format. MinZip constructs this archive byte-by-byte:

The builder creates three required XML files: [Content_Types].xml (declares MIME types for each file in the archive), _rels/.rels (defines relationships between document parts), and word/document.xml (contains the actual document content with paragraph and run elements). Each file entry is stored uncompressed using the ZIP store method, with CRC-32 checksums computed for data integrity verification. The resulting archive includes proper local file headers, a central directory, and an end-of-central-directory record -- producing a standards-compliant ZIP file that Microsoft Word, LibreOffice, and Google Docs can all open correctly.

SVG Rasterization Pipeline

SVG files require special handling because they are vector-based and resolution-independent. DocForge rasterizes SVGs by creating an <img> element with the SVG data as its source, then drawing that image onto an HTML canvas at a default resolution of 800×600 pixels. A white background rectangle is painted first (necessary for JPG and BMP output which do not support transparency), then the SVG content is drawn on top. The canvas is then exported to the target raster format using the standard toBlob() or toDataURL() methods.

PDF Rendering at 2x Scale

When converting PDF to image formats, pdf.js renders each page at 2x the natural resolution (double the viewport dimensions). This produces crisp, high-quality images suitable for printing or detailed viewing. Each page is rendered to its own canvas element, exported as an individual image file, and named with the page number appended (e.g., document_page1.png, document_page2.png). A maximum of 20 pages is enforced to prevent excessive memory consumption in the browser.

RTF Processing

RTF (Rich Text Format) input is processed by stripping RTF control sequences using regular expressions. The regex pipeline removes \par paragraph markers (converting them to line breaks), decodes hexadecimal character escapes in the format \'XX, strips remaining backslash control sequences, and removes curly braces that delimit RTF groups. The result is plain text that can then be converted to any other supported format.

XML↔JSON Recursive Tree Builder

The XML-to-JSON converter uses a recursive tree-walking algorithm that preserves the full structure of the XML document. Element attributes are prefixed with @ (e.g., @id, @class) to distinguish them from child elements. Text content within elements is stored under the #text key. When multiple sibling elements share the same tag name, they are automatically collected into a JSON array. This approach handles arbitrarily nested XML documents and produces clean, predictable JSON output.

Nested JSON Flattening for CSV

When converting JSON to CSV, nested object structures are flattened using dot notation. A JSON object like {"user": {"address": {"city": "Mumbai"}}} becomes a CSV column header user.address.city with the value Mumbai. This allows complex hierarchical JSON data to be represented in the flat tabular structure that CSV requires, with each unique path through the object tree becoming its own column.

Preview System Architecture

The preview system displays file content immediately after upload, before any conversion takes place. Each format has a dedicated preview handler:

Critically, the parsed data from the preview step is cached in a parsedData variable. When the user clicks "Convert & Download," the conversion engine uses this cached data instead of re-reading and re-parsing the original file. This eliminates redundant processing and makes conversions near-instantaneous for files that have already been previewed.

How to Use

  1. Open DocForge -- Navigate to the converter page in your browser. No installation, signup, or plugin is required. The tool loads all 10 JavaScript libraries and is ready to convert immediately.
  2. Upload Your File -- Drag and drop a file onto the upload zone, or click the zone to browse your device. DocForge accepts any of the 17 supported formats: DOCX, PDF, TXT, HTML, RTF, MD, XLSX, XLS, CSV, TSV, JSON, XML, PNG, JPG, WebP, BMP, GIF, and SVG.
  3. Review the Preview -- Your file content is displayed automatically. Spreadsheets appear as formatted tables (first 15 rows), DOCX files render with their original formatting, images display inline, and text-based formats show their raw content. Use this preview to confirm you have uploaded the correct file.
  4. Select the Output Format -- Choose your desired target format from the output dropdown menu. The dropdown only shows formats that are compatible with your input file -- for example, uploading a PNG will show JPG, WebP, BMP, GIF, and PDF as options.
  5. Click Convert & Download -- Press the convert button to begin processing. The conversion happens instantly in your browser using the cached preview data. A status bar appears showing either a green success message with the output file size, or a red error message with diagnostic details.
  6. Receive Your File -- The converted file downloads automatically to your default download folder with the correct file extension and MIME type. For PDF-to-image conversions, multiple files may download (one per page), each named with its page number.
  7. Convert Another File -- Click the "Remove file" button to clear the current file and start over. The cached preview data is released from memory. You can immediately upload a new file for conversion.

Frequently Asked Questions

What formats does DocForge support?
DocForge supports 17 file formats across three categories. Documents: DOCX, PDF, TXT, HTML, RTF, and Markdown (MD). Spreadsheets: XLSX, XLS, CSV, TSV, JSON, and XML. Images: PNG, JPG, WebP, BMP, and GIF as both input and output, plus SVG as an input-only format (SVG can be converted to any raster image format or PDF, but other formats cannot be converted to SVG since vector conversion requires manual tracing).
How does DOCX conversion work?
DOCX files are parsed by mammoth.js 1.6.0, which reads the XML content inside the DOCX ZIP container and converts it to clean HTML. This HTML preserves structural formatting: headings (H1-H6), bold text, italic text, and paragraph breaks. From this HTML intermediate representation, DocForge can generate PDF output via jsPDF, extract plain text by stripping HTML tags for TXT output, convert to Markdown via Turndown 7.1.3 (using ATX headings and fenced code blocks), or wrap the content in RTF control sequences. The reverse direction -- converting other formats to DOCX -- uses the custom MinZip builder to create a valid OOXML package.
Can DocForge handle multi-sheet Excel files?
DocForge processes the first sheet of XLSX and XLS workbooks. When you upload a multi-sheet Excel file, SheetJS reads the entire workbook but only the first sheet is used for preview and conversion. There is currently no option to select a specific sheet or convert all sheets simultaneously. If you need to convert data from a different sheet, open the file in Excel or Google Sheets first, move the desired sheet to the first position, save, and then upload to DocForge. All formulas in the sheet are evaluated to their computed values before conversion.
Does DocForge preserve document formatting?
DocForge preserves basic structural formatting: headings, bold, italic, and paragraph structure. Complex formatting elements like tables, images embedded in documents, custom fonts, page headers/footers, columns, and advanced layout features are simplified during conversion. This is inherent to the conversion process -- different formats have different capabilities, and a lossless conversion between fundamentally different format types (like DOCX to Markdown) is not possible. For best results with complex documents, convert to PDF which preserves visual layout, or to HTML which retains the most structural information.
How does PDF-to-image conversion work?
PDF-to-image conversion uses Mozilla's pdf.js 3.11.174 to render each page of the PDF onto an HTML canvas element at 2x the natural resolution. This double-resolution rendering produces crisp, high-quality images suitable for printing and detailed viewing. Each page becomes a separate image file -- for example, a 5-page PDF converted to PNG produces five files named document_page1.png through document_page5.png. A maximum of 20 pages is enforced to prevent excessive browser memory usage. The canvas output is exported via toBlob() with format-appropriate settings (92% quality for JPEG, white background for JPG/BMP).
What is the MinZip DOCX builder?
MinZip is a custom JavaScript class built specifically for DocForge that creates valid .docx files without depending on JSZip or any other external ZIP library. A DOCX file is actually a ZIP archive containing XML files following the Office Open XML (OOXML) standard. MinZip constructs this archive at the byte level: it computes CRC-32 checksums for data integrity, writes file entries using the uncompressed store method, creates proper local file headers and a central directory, and generates the three required XML files -- [Content_Types].xml, _rels/.rels, and word/document.xml. The resulting files open correctly in Microsoft Word, LibreOffice Writer, and Google Docs.
Can I convert XML to CSV?
Yes. DocForge includes a recursive XML tree builder that converts hierarchical XML into flat tabular data suitable for CSV. Element attributes are mapped to columns prefixed with @ (for example, <user id="5"> produces a column named @id). Text content within elements is stored under a #text column. When multiple sibling elements share the same tag name, they are automatically collected into arrays. The flattened data is then written as CSV with proper RFC 4180 quoting and escaping via PapaParse.
How does SVG conversion work?
SVG is supported as an input-only format because it is a vector format that requires fundamentally different rendering than raster images. To convert SVG to a raster format, DocForge creates an <img> element with the SVG data, draws it onto an HTML canvas at a default resolution of 800×600 pixels, and then exports the canvas to the target format (PNG, JPG, WebP, BMP, GIF, or PDF). A white background is painted first, which is necessary for formats like JPG and BMP that do not support transparency. The resulting raster image captures the SVG content at the specified resolution.
What is the file size limit?
DocForge has a practical limit of approximately 50MB, though this depends on your device's available memory and browser capabilities. Since all processing happens in-browser using JavaScript, very large files can cause the browser tab to slow down or run out of memory. Documents and spreadsheets are generally fine up to 50MB. For images, the limit is lower because the canvas element must hold the full uncompressed pixel data in memory. PDFs are limited to 20 pages for image conversion to prevent memory exhaustion. If you encounter issues, try closing other browser tabs to free up memory.
Is my data safe with DocForge?
Yes, completely. All 10 processing libraries (mammoth.js, pdf.js, pdf-lib, jsPDF, SheetJS, PapaParse, marked, Turndown, html2canvas, and FileSaver) execute entirely within your browser's JavaScript engine. No file data is ever transmitted to any server. There are no API calls, no cloud processing, no temporary uploads, and no analytics that track file content. Your documents, spreadsheets, and images are processed using in-memory JavaScript operations only. The preview cache is stored in a local JavaScript variable and is released when you remove the file or close the tab.

Privacy & Security

Your Data Never Leaves Your Device

All 10 processing libraries run entirely in your browser. Documents, spreadsheets, and images are processed using in-memory JavaScript operations -- no file is ever uploaded to any server. Preview data is cached in a temporary variable and cleared when you remove the file. Your data remains exclusively on your device.

DocForge makes zero network requests during file processing. mammoth.js parses DOCX locally, pdf.js renders PDFs locally, SheetJS reads Excel files locally, PapaParse processes CSV/TSV locally, and the Canvas API handles all image conversions locally. The only network activity is loading the page itself and its libraries -- once loaded, the converter works completely offline. There is no authentication, no user accounts, no cookies tracking your conversions, and no telemetry reporting what formats you convert. Every byte of your data stays on your machine from upload through preview to download.

Ready to try DocForge? It's free, private, and runs entirely in your browser.

Launch DocForge →

Related

Milan Salvi

Milan Salvi

Founder, Leena Software Solutions

Milan is the founder of ZeroDataUpload and Leena Software Solutions, building privacy-first browser tools that process everything client-side. View all articles ยท About the author.

Last Updated: March 26, 2026