← All Tools ZeroDataUpload Home

DataForge

Convert spreadsheets between XLSX, CSV, TSV, JSON & XML — with multi-sheet support, live preview, and smart type detection

Launch DataForge →
DataForge

Table of Contents

  1. Overview
  2. Key Features
  3. Format Deep Dive
  4. How to Use
  5. Frequently Asked Questions
  6. Privacy & Security

Overview

DataForge is a browser-based spreadsheet and data converter that supports six formats — CSV, TSV, XLSX, XLS, JSON, and XML — with 30 conversion paths between them (every format converts to every other, plus XLS→XLSX upgrade). It is powered by two industry-standard JavaScript libraries: SheetJS 0.18.5 (862KB) for reading and writing Excel binary formats, and PapaParse 5.4.1 (20KB) for RFC 4180-compliant CSV and TSV parsing and generation. Every conversion runs entirely in your browser with zero data uploads.

What sets DataForge apart from typical online converters is its depth of intelligence. When you upload a JSON file, it does not naively assume a flat array — it analyzes the structure, detecting whether the data is an array of objects, a nested object with array-valued keys, a single object that should be wrapped, or even an array of primitives that needs a synthetic column. When you upload an XML file, it does not blindly flatten everything — it counts child tag name frequencies to identify the "record tag," extracts attributes as @attribute columns, and handles flat XML documents as single-row datasets. This smart detection means DataForge handles messy, real-world data files correctly without manual configuration.

The converter also includes multi-sheet Excel support (reads ALL sheets with a dropdown selector and independent caching), a live data preview system with both table and raw views, a "First row is header" toggle for data that lacks column headers, and auto-column-width sizing for XLSX output that produces properly formatted spreadsheets. The entire state management layer caches parsed data after upload, so switching between preview modes and output formats never requires re-parsing the original file.

Key Features

6 Formats, 30 Paths

CSV, TSV, XLSX, XLS, JSON, and XML — every format converts to every other. That is 30 distinct conversion paths, including the XLS→XLSX upgrade path that modernizes legacy Excel files to the current Open XML standard. Upload any supported format and export to any other in a single click.

SheetJS & PapaParse

SheetJS 0.18.5 (862KB) handles all Excel binary read/write operations — parsing XLSX (OOXML ZIP) and XLS (OLE2 binary), multi-sheet workbook navigation, and worksheet generation with column-width metadata. PapaParse 5.4.1 (20KB) delivers RFC 4180-compliant CSV/TSV parsing with quoted field support, escaped quotes, and embedded newline handling. Two powerful, battle-tested libraries working together.

Multi-Sheet Excel

When you upload an XLSX or XLS file with multiple sheets, DataForge reads ALL sheets via XLSX.read() and stores each one independently in an allSheets cache with its own headers and data arrays. A dropdown selector lets you switch between sheets instantly without re-reading the file. The selected sheet drives both the preview and the conversion output.

Live Data Preview

Two preview modes: a Table view showing up to 100 rows with a sticky header, striped rows (alternating backgrounds via nth-child(even)), row numbers in JetBrains Mono 11px, and ellipsis overflow for long cells (max-width 240px); and a Raw view displaying the first 20,000 characters in a monospace pre-wrap block. A "First row is header" checkbox toggles header interpretation in real time.

Smart JSON Detection

Handles four JSON structures automatically: (1) Direct arrays of objects are used as-is for tabular conversion. (2) Objects with array-valued keys — the first array found becomes the dataset. (3) Single objects are wrapped in an array to produce a single-row table. (4) Arrays of primitives (strings, numbers) generate a single "value" column. Type inference converts string "30" to number 30, "true"/"false" to booleans, and empty values to null.

XML Hierarchical Flattening

DOMParser analyzes the XML tree and counts child tag name frequencies. The most commonly repeated tag is identified as the "record tag" — each instance becomes a row. Attributes are extracted as @attribute columns, child element text becomes cell values, and #text content is captured separately. Flat XML documents (no repeating children) are handled as a single row derived from root attributes and direct child elements.

Auto-Column-Width XLSX

When generating XLSX output, DataForge calculates the optimal column width for every column by measuring the maximum content length across all rows (including the header), adding 2 characters of padding, and capping at 50 characters. These widths are applied via the SheetJS !cols worksheet property, producing Excel files with properly sized columns that do not require manual resizing after opening.

Privacy-First Processing

All parsing and generation runs entirely in-browser via SheetJS and PapaParse. No data is uploaded to any server, no temporary files are created remotely, and no server-side processing occurs at any stage. Your spreadsheet data — whether it contains financial records, customer lists, or employee information — remains exclusively on your device from upload to download.

Format Deep Dive

CSV — Comma-Separated Values

Parsing: CSV input is parsed by PapaParse 5.4.1 using RFC 4180 compliance mode. RFC 4180 is the formal specification for CSV published by the IETF, and it defines precise rules for how fields should be delimited, quoted, and escaped. PapaParse handles all RFC 4180 edge cases: fields containing commas are wrapped in double quotes ("New York, NY"), literal double-quote characters within a field are escaped by doubling them ("He said ""hello""" represents the value He said "hello"), and embedded newlines within quoted fields are preserved rather than treated as row separators. The parser uses a comma as the delimiter and enables the skipEmptyLines: true option to discard blank rows that commonly appear at the end of exported spreadsheets. CRLF (\r\n) and LF (\n) line endings are both handled transparently.

Output generation: CSV output is produced by Papa.unparse() with the delimiter set to comma. PapaParse automatically applies RFC 4180 escaping rules to the output: any cell value containing a comma, double quote, or newline is wrapped in double quotes, and internal double quotes are doubled. This ensures that the generated CSV file can be opened correctly by Microsoft Excel, Google Sheets, LibreOffice Calc, and any other CSV-compliant reader without data corruption or column misalignment.

TSV — Tab-Separated Values

Parsing: TSV is parsed using the same PapaParse engine as CSV, with the delimiter changed to the tab character (\t). The same RFC 4180-style quoting and escaping rules apply: fields containing tabs or newlines are quoted, and embedded double quotes are doubled. The skipEmptyLines: true option is enabled identically to CSV parsing. TSV is functionally identical to CSV in DataForge's processing pipeline — the only difference is the delimiter character.

Output generation: TSV output uses Papa.unparse() with the delimiter option set to \t. Fields containing tabs, newlines, or double quotes are automatically quoted. The resulting file uses the .tsv extension and opens correctly in spreadsheet applications and text editors that recognize tab-delimited data.

XLSX — Office Open XML Spreadsheet

Parsing: XLSX files are parsed by SheetJS via XLSX.read(buffer, {type: 'array'}), where the file is first read into an ArrayBuffer using the FileReader API. XLSX is a ZIP-based format defined by the OOXML (Office Open XML) standard — the ZIP container holds multiple XML files describing worksheets, shared strings, styles, and metadata. SheetJS decompresses the archive and reconstructs the workbook structure, populating a SheetNames array (ordered list of sheet names) and a Sheets object (mapping sheet names to worksheet data). For each sheet, XLSX.utils.sheet_to_json() is called with header: 1 (returns an array of arrays rather than an array of objects) and defval: '' (fills missing cells with empty strings instead of undefined). The result is a clean 2D array where the first row contains headers and subsequent rows contain data. For multi-sheet workbooks, ALL sheets are parsed and stored in the allSheets cache as {sheetName: {headers, data}} objects. A dropdown selector is populated with all sheet names, defaulting to the first sheet.

Output generation: XLSX output is built using XLSX.utils.aoa_to_sheet(), which converts the 2D array of arrays (headers + data rows) into a SheetJS worksheet object. Auto-column-width is then calculated: for each column, DataForge measures the string length of the header and every data cell, takes the maximum, adds 2 characters of padding, and caps the result at 50 characters. These widths are written to the worksheet's !cols property as an array of {wch: width} objects. A new workbook is created via XLSX.utils.book_new(), the worksheet is appended with XLSX.utils.book_append_sheet(wb, ws, "Sheet1"), and the final file is generated by XLSX.write(wb, {bookType: 'xlsx', type: 'array'}), which produces an ArrayBuffer that is then wrapped in a Blob for download.

XLS — Excel Binary Format (Legacy)

Parsing: XLS files use the older OLE2 (Object Linking and Embedding) compound document format, predating the ZIP-based OOXML standard introduced with Excel 2007. SheetJS provides backward-compatible parsing through the same XLSX.read() API — it automatically detects the OLE2 container format and decodes the binary structures within. The API surface is identical to XLSX: sheet_to_json with header: 1 and defval: '' produces the same 2D array output. Multi-sheet XLS workbooks are handled identically, with all sheets loaded into the allSheets cache. This means users can convert legacy XLS files without knowing or caring about the underlying binary format differences.

XLS→XLSX upgrade: One of the 30 conversion paths is the XLS-to-XLSX upgrade, which reads the legacy OLE2 binary format and writes it back as modern OOXML. This is valuable for organizations modernizing archived spreadsheets: the new XLSX file is smaller (ZIP compression), more interoperable (supported by all modern applications), and preserves all cell data from the original. The upgrade uses the same aoa_to_sheetbook_newbook_append_sheetwrite pipeline as all other XLSX output generation.

JSON — JavaScript Object Notation

Parsing (Smart Detection): JSON parsing is where DataForge's intelligence is most visible. After JSON.parse() deserializes the raw text, DataForge analyzes the resulting JavaScript value to determine its structure and the best strategy for tabular conversion:

Nested objects within rows are not recursively flattened — instead, they are serialized back to JSON strings via JSON.stringify() and stored as cell values. This preserves the nested data while keeping the tabular structure manageable.

Output generation (Type Detection): When converting tabular data to JSON, DataForge performs per-cell type inference to produce semantically correct JSON values rather than treating everything as strings. The detection pipeline applies the following checks in order: (1) isNaN() check — if a string value like "30" passes numeric parsing, it is output as the number 30. (2) Boolean detection — the string "true" becomes the boolean true, and "false" becomes false (case-sensitive). (3) Null detection — empty strings and undefined values become null. (4) Objects — values that are already objects are serialized via JSON.stringify(). The final output is formatted with JSON.stringify(array, null, 2) for human-readable indentation with 2-space nesting.

XML — Extensible Markup Language

Parsing (Hierarchical Flattening): XML parsing uses the browser's native DOMParser to construct a DOM tree from the raw XML text. The flattening algorithm then applies a sophisticated strategy to convert the hierarchical tree into flat tabular rows:

  1. Tag frequency counting: The algorithm iterates through all direct children of the root element and counts how many times each tag name appears. For example, in <data><user>...</user><user>...</user><meta>...</meta></data>, the tag user appears twice and meta appears once.
  2. Record tag identification: The tag with the highest frequency is designated as the "record tag." Each instance of this tag becomes one row in the output table. In the example above, user is the record tag, producing two rows.
  3. Attribute extraction: XML attributes on each record element are extracted as columns prefixed with @. For example, <user id="5" role="admin"> produces columns @id and @role with values 5 and admin.
  4. Child element extraction: Direct child elements of each record become columns. For <user><name>Alice</name><age>30</age></user>, the columns are name and age with values Alice and 30.
  5. Text content: Direct text content within a record element (not wrapped in a child element) is captured as a #text column.
  6. Flat XML handling: If no tag appears more than once (all children are unique), the document is treated as a flat structure. The root element's attributes and all direct child elements are collected into a single row. This handles configuration files and settings documents where the XML is essentially a key-value store rather than a list of records.

Output generation (Safe XML Construction): When generating XML from tabular data, DataForge applies two safety functions. The safeTag() function sanitizes column names for use as XML element tag names: non-alphanumeric characters (except underscore, hyphen, and period) are removed, and if the resulting string does not start with a letter or underscore, it is prefixed or replaced with the default tag name "field". The escXml() function escapes the five XML-reserved characters in cell values: & becomes &amp;, < becomes &lt;, > becomes &gt;, " becomes &quot;, and ' becomes &apos;. The output structure is <records><record><tag>value</tag></record></records>, where each row becomes a <record> element and each column becomes a child element with the sanitized column name as the tag and the escaped cell value as the text content.

How to Use

  1. Open DataForge — Navigate to the converter page in your browser. No installation, signup, or plugin is required. SheetJS (862KB) and PapaParse (20KB) load automatically and the tool is ready to accept files immediately.
  2. Upload your data file — Drag and drop a CSV, TSV, XLSX, XLS, JSON, or XML file onto the upload zone, or click to browse your device. The input format is auto-detected from the file extension (.txt files are treated as comma-delimited CSV).
  3. Review the live preview — A table preview appears immediately showing your data with row numbers, sticky headers, and striped rows (up to 100 rows displayed). Switch to Raw view to see the first 20,000 characters of the parsed output in a monospace code block. Use this to confirm your data was parsed correctly.
  4. For multi-sheet Excel files — If your XLSX or XLS file contains multiple sheets, a dropdown selector appears above the preview. Select the sheet you want to convert. Each sheet is cached independently, so switching between sheets is instant with no re-parsing.
  5. Toggle "First row is header" — If your data file does not have column headers in the first row, uncheck this option. DataForge will generate synthetic headers (Col 1, Col 2, Col 3, etc.) and treat the original first row as data rather than as column names. The preview updates in real time when you toggle this setting.
  6. Select your output format — Choose the desired target format from the output dropdown: CSV, TSV, XLSX, JSON, or XML. All 30 conversion paths are available regardless of your input format.
  7. Click "Convert & Download" — Processing is instant because DataForge uses the cached parsed data rather than re-reading the original file. The converted output downloads automatically to your default download folder with the correct file extension and MIME type. Check the status bar for a green success confirmation and the output file size.

Frequently Asked Questions

What formats does DataForge support?
DataForge supports six data formats: CSV (Comma-Separated Values), TSV (Tab-Separated Values), XLSX (Office Open XML Spreadsheet), XLS (Legacy Excel Binary), JSON (JavaScript Object Notation), and XML (Extensible Markup Language). Every format can be converted to every other format, yielding 30 distinct conversion paths. This includes the XLS→XLSX upgrade path, which modernizes legacy Excel files from the older OLE2 binary format to the current ZIP-based Open XML standard without data loss.
How does CSV parsing work?
CSV files are parsed by PapaParse 5.4.1, which implements full RFC 4180 compliance. RFC 4180 is the IETF specification for CSV that defines how fields are delimited and how special characters are handled. PapaParse correctly processes: quoted fields where the field value is wrapped in double quotes (allowing commas within values), escaped double quotes where a literal " character inside a quoted field is represented as "", and embedded newlines where a line break within a quoted field is treated as part of the value rather than a row separator. The parser uses comma as the delimiter for CSV (tab for TSV), enables skipEmptyLines: true to discard trailing blank rows, and handles both CRLF and LF line endings transparently.
Can DataForge handle multi-sheet Excel files?
Yes — multi-sheet support is one of DataForge's core features. When you upload an XLSX or XLS file, SheetJS reads the entire workbook and populates the SheetNames array and Sheets object. DataForge then iterates through ALL sheets and caches each one independently in an allSheets object, storing separate headers and data arrays for each sheet. A dropdown selector appears in the interface, populated with all sheet names, defaulting to the first sheet. When you select a different sheet, the handleSheetChange() function retrieves the cached headers and data for that sheet, updates the internal state (parsedHeaders and parsedData), and re-renders the preview — all without re-reading or re-parsing the original file. The conversion output always uses the currently selected sheet.
What is "smart JSON detection"?
Smart JSON detection is DataForge's ability to automatically identify the structure of a JSON file and extract tabular data from it without any user configuration. After parsing the JSON, the tool checks four cases in order: (1) If the top-level value is an array of objects, it is used directly — all unique keys become column headers. (2) If it is an object whose values include arrays, the first array-valued key is extracted as the dataset. This handles API response envelopes like {"results": [...], "pagination": {...}}. (3) If it is a single flat object, it is wrapped in an array to produce a one-row table. (4) If it is an array of primitives (numbers, strings, booleans), a single "value" column is created. Nested objects within any of these structures are serialized to JSON strings via JSON.stringify() and stored as cell values.
How does XML flattening work?
DataForge uses the browser's native DOMParser to build a DOM tree from the XML text, then applies a frequency-based flattening algorithm. It counts how many times each child tag name appears under the root element. The most frequently occurring tag is designated the "record tag" — each instance becomes one row in the output. For each record element, attributes are extracted as @attribute-prefixed columns, child elements become columns named after their tag, and direct text content becomes a #text column. If no tag appears more than once (a flat XML document), the root's attributes and all direct children are collected into a single row. This handles both list-style XML (like RSS feeds and data exports) and flat configuration-style XML without any user intervention.
What is auto-column-width?
When DataForge generates XLSX output, it automatically calculates optimal column widths so the resulting Excel file looks properly formatted without manual column resizing. For each column, it measures the string length of the header value and every data cell in that column, takes the maximum length, adds 2 characters of padding to prevent text from touching cell borders, and caps the width at 50 characters to prevent extremely wide columns from making the spreadsheet unusable. These widths are applied via the SheetJS !cols worksheet property as an array of {wch: width} objects, where wch specifies the column width in character units. The result is a professional-looking spreadsheet that opens with readable column sizes in Excel, Google Sheets, and LibreOffice Calc.
Does JSON output detect types?
Yes. When converting tabular data (from CSV, TSV, XLSX, XLS, or XML) to JSON, DataForge performs per-cell type inference rather than outputting everything as strings. The detection pipeline checks each cell value in order: (1) Numeric detection via isNaN() — a string value like "30" is converted to the number 30, and "3.14" becomes 3.14. (2) Boolean detection — the exact strings "true" and "false" (case-sensitive) are converted to their boolean equivalents. (3) Null detection — empty strings and undefined values become null in the JSON output. (4) Object passthrough — values that are already objects are serialized via JSON.stringify(). The final JSON output uses JSON.stringify(array, null, 2) for human-readable formatting with 2-space indentation.
What about the "First row is header" toggle?
The "First row is header" checkbox controls how DataForge interprets the first row of your uploaded data. When checked (the default), the first row is treated as column headers — its values become the header names displayed in the preview table and used as keys in JSON output or tag names in XML output. When unchecked, DataForge generates synthetic headers (Col 1, Col 2, Col 3, etc.) and the original first row is treated as data, shifting all content down by one row. Toggling this checkbox immediately re-renders the live preview so you can see the effect before converting. This is essential for data files exported from systems that do not include header rows, such as some database dump utilities and legacy mainframe exports.
What is the file size limit?
DataForge enforces a 50MB hard limit on uploaded files. In practice, the effective limit is approximately 20MB for comfortable browser-based processing. Larger files consume significant browser memory because SheetJS and PapaParse must hold the entire parsed dataset in JavaScript memory simultaneously. For XLSX files, the memory footprint is roughly 3-5x the file size due to ZIP decompression and DOM construction. If you experience slow processing or browser tab crashes with large files, try splitting the data into smaller chunks (separate sheets or files) and converting them individually.
Is my spreadsheet data safe?
Absolutely. All processing happens entirely in your browser using SheetJS and PapaParse — two JavaScript libraries that execute locally without any network communication. No cell data, column headers, sheet names, file names, or any other content is ever transmitted to any server. There is no backend, no API endpoint, no temporary cloud storage, and no analytics on file content. DataForge does not even know what you are converting. Your data exists only in your browser's memory during the active session and is released when you close the tab or upload a new file. This makes DataForge safe for financial data, customer records, employee information, medical data, and any other sensitive content.

Privacy & Security

Your Spreadsheet Data Never Leaves Your Device

Spreadsheets often contain some of the most sensitive data in any organization — financial records, customer lists, employee information, salary data, business metrics, and strategic projections. DataForge processes everything locally using SheetJS for Excel binary operations and PapaParse for delimited text parsing. No cell data, column headers, sheet names, or file content is ever transmitted to any server. There is no backend API, no cloud processing pipeline, no temporary storage, and no telemetry on file content. The two processing libraries (SheetJS and PapaParse) are loaded once when the page opens and execute entirely within your browser's JavaScript sandbox. Your business data remains exclusively on your device — from the moment you drag a file onto the upload zone to the moment the converted output downloads to your computer.

Ready to try DataForge? It's free, private, and runs entirely in your browser.

Launch DataForge →

Related

Milan Salvi

Milan Salvi

Founder, Leena Software Solutions

Milan is the founder of ZeroDataUpload and Leena Software Solutions, building privacy-first browser tools that process everything client-side. View all articles ยท About the author.

Last Updated: March 26, 2026