Simple XLS to CSV Converter: Preserve Data & FormattingConverting Excel spreadsheets (XLS) to CSV is a common task for data exchange, analysis, and import into systems that don’t support Excel’s proprietary formats. While CSV files are simpler and more universally compatible, converting from XLS without losing data or formatting requires attention to several details. This article explains what XLS and CSV are, why conversion matters, common pitfalls, and practical steps to convert while preserving data integrity and as much formatting as CSV allows.
What are XLS and CSV?
- XLS: A binary spreadsheet format created by Microsoft Excel (pre-2007) that can contain multiple worksheets, rich cell formatting, formulas, charts, images, and metadata.
- CSV: Comma-Separated Values — a plain-text table format where each line represents a row and columns are separated by commas (or other delimiters). CSV stores raw cell values only; it has no native support for multiple sheets, formulas, cell colors, fonts, or embedded objects.
Why convert XLS to CSV?
- Interoperability: CSV is widely supported by databases, programming languages, and import tools.
- Simplicity: Plain-text files are easier to version-control and inspect.
- Performance: CSV files are often smaller and faster to parse in automated workflows.
- Integration: Many web applications and data pipelines accept CSV but not XLS.
Key challenges in preserving data and formatting
Because CSV inherently lacks most Excel formatting features, “preserving formatting” means different things depending on goals:
- Preserve raw values: Ensure numbers, dates, and text are exported accurately.
- Preserve delimiters and special characters: Handle commas, newlines, and quotes inside cells correctly.
- Preserve numeric precision: Avoid rounding or scientific notation when not desired.
- Preserve date/time formats: Convert Excel’s serial date numbers to consistent, human-readable strings.
- Preserve formulas’ results: Export computed values rather than formula strings, unless explicitly needed.
- Preserve sheet structure: Decide which worksheet to export or export multiple sheets to separate CSV files.
- Preserve encoding: Use UTF-8 (or required encoding) to retain non‑ASCII characters.
Best practices for accurate XLS→CSV conversion
- Choose the right delimiter
- Use commas for standard CSV. Use tabs (TSV) or semicolons if data contains many commas.
- Force text qualifiers
- Wrap text fields containing delimiters or newlines in double quotes; escape inner quotes by doubling them.
- Handle dates explicitly
- Convert Excel serials to ISO 8601 (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS) to avoid locale-dependent ambiguity.
- Control numeric formatting
- Format numbers with the desired number of decimal places before export; export as text if precision must be exact.
- Evaluate formulas
- Ensure the converter outputs calculated values, not formula expressions, unless formulas are required downstream.
- Export multiple sheets thoughtfully
- Save each worksheet to its own CSV file and name files clearly (e.g., sheetname.csv).
- Use UTF-8 encoding
- Prefer UTF-8 with or without BOM depending on target system compatibility.
- Test with edge cases
- Validate with cells containing commas, quotes, newlines, leading zeros (IDs), very large numbers, and non-Latin text.
Methods to convert XLS to CSV
- Microsoft Excel
- Open the XLS file, choose “Save As” → CSV (Comma delimited) or CSV UTF-8. Excel will export the active sheet only and convert formulas to values.
- LibreOffice / OpenOffice
- Use “Save As” and select Text CSV (.csv). You can choose the character set, field delimiter, and text delimiter.
- Command-line tools
- ssconvert (Gnumeric), xlsx2csv, or Python scripts using pandas or openpyxl give automation control and reproducibility.
- Online converters
- Quick and accessible but avoid them for sensitive data due to privacy concerns.
- Custom scripts
- Python: pandas.read_excel(…) then df.to_csv(…, index=False, encoding=‘utf-8’, date_format=‘…’) — ideal for batch jobs and precise control.
Example Python snippet:
import pandas as pd df = pd.read_excel("input.xls", sheet_name="Sheet1", dtype=str) # Optional: convert dates or numbers here df.to_csv("output.csv", index=False, encoding="utf-8", line_terminator=" ")
Handling common conversion pitfalls
- Leading zeros (IDs, ZIP codes)
- Read columns as text (dtype=str) or prefix with a single quote in Excel so zeros are kept.
- Large integers and scientific notation
- Export as text or format cells to avoid Excel switching to scientific notation.
- Multiline cells
- Ensure CSV writer preserves newlines within quoted fields.
- Commas and quotes inside cells
- Use proper quoting and escaping: “He said, ““Hello”“” for a cell containing: He said, “Hello”
- Locale-dependent decimals
- Normalize to a dot (.) decimal separator if required by the target system.
- Multiple worksheets
- Save each sheet separately; include sheet name in filenames or combine into an archive.
Example conversion workflows
- Quick manual (single sheet)
- Open XLS in Excel → File → Save As → CSV UTF-8 → Confirm.
- Batch automated (many files)
- Use a Python script that loops files, reads each sheet, applies formatting rules, and writes CSVs to a destination folder.
- Preserve special handling (dates, IDs)
- Preprocess in Excel or script: set column types, format cells, or convert values explicitly before export.
When CSV isn’t enough
If your workflow needs formulas, multiple sheets, cell-level formatting (colors, fonts), or embedded objects, consider:
- XLSX (modern Excel XML format)
- ODS (OpenDocument Spreadsheet)
- JSON, Parquet, or database formats for structured data transfers
Checklist before conversion
- [ ] Identify required worksheet(s)
- [ ] Confirm desired encoding (UTF-8 recommended)
- [ ] Decide on delimiter and quoting rules
- [ ] Set column data types (text for IDs, formatted dates)
- [ ] Confirm formulas should be values
- [ ] Test with representative sample rows
Converting XLS to CSV can be straightforward, but preserving the data you care about requires explicit handling of types, encoding, and edge cases. With the right tools and checks, you can create reliable CSV exports that keep values intact and ready for downstream use.
Leave a Reply