Automatically Clean CSV Supplier Imports: How Retailers Save Hours Per Week

CSV files from suppliers are deceptively simple. In practice, every supplier does it differently — and the cumulative cost of manual cleaning adds up to hours every week. Here is how to automate it.

Productbay TeamMarch 17, 20268 min read
☝️Key takeaways
  • CSV files from different suppliers have different encodings, delimiters, column names, and data formats — manual cleaning is unavoidable without automation.
  • The most common problems: encoding errors (UTF-8 vs. Latin-1), inconsistent column naming, missing required fields, duplicate SKUs, wrong decimal separators.
  • Automated cleaning rules applied at import time eliminate the same manual work every week — configured once, applied forever.
  • Productbay normalizes, validates, and deduplicates CSV data from every supplier before it reaches your catalog.

Why CSV Imports Are Harder Than They Look

CSV stands for “Comma-Separated Values” — but in practice, supplier CSV files use semicolons, tabs, and pipe characters as separators. The format is simple; the reality is not. Every supplier’s ERP system generates CSV exports according to its own logic, and the result is a format that is theoretically standardized but practically inconsistent across every data source.

For a retailer receiving data from 5 or 10 suppliers, this means 5 or 10 different cleaning routines to run every time a new file arrives. Multiply that by weekly update cycles and you have a significant, recurring time sink — one that produces no business value, only keeps the lights on.

The Six Most Common CSV Problems from Suppliers

1. Encoding Errors: UTF-8 vs. Latin-1

This is the most frustrating CSV problem for DACH retailers because it is invisible until it breaks things. German umlauts — ä, ö, ü, ß — are encoded differently in UTF-8 and Latin-1 (also called ISO-8859-1). When a file exported in Latin-1 is read as UTF-8 by your import system, the umlauts become garbled: “Möbel” becomes “Möbel”, “Größe” becomes “Größe”.

The root cause is that older ERP systems — particularly older versions of SAP, Navision, and German accounting software — default to Latin-1 encoding. Modern systems use UTF-8. When a supplier upgrades their system, the encoding may change without notice, breaking an import process that had worked fine for years.

Automated solution: Encoding detection at import time. Productbay automatically identifies whether a file is UTF-8, Latin-1, or another encoding and converts it correctly — no manual pre-processing required.

2. Inconsistent Column Names Across Suppliers

You have an internal product attribute schema. Your suppliers do not know or care about it. Supplier A calls it “Artikelnummer”, supplier B calls it “Art_Nr”, supplier C calls it “SKU”, supplier D calls it “ProductID”. All four mean the same thing — your internal “sku” field.

In a manual workflow, you handle this translation in Excel every time. In an automated workflow, you define the mapping once per supplier: Supplier A’s “Artikelnummer” maps to “sku”. From that point on, every import from Supplier A uses this mapping automatically.

3. Missing Required Fields

Supplier CSV files frequently omit fields that your shop requires — EAN codes, product categories, weight, or country of origin. The supplier simply does not collect that data, or includes it in a different field, or splits it across multiple columns.

Automated handling: Configurable import rules that identify missing required fields and either fill them from derived data (e.g., calculate EAN check digit if missing), apply default values where appropriate, or flag the product for manual completion before it can be published.

4. Duplicate SKUs

Supplier catalogs often contain duplicate entries — the same SKU appearing multiple times with slightly different data. This happens when suppliers have multiple product variants in one row, when they include discontinued products alongside active ones, or simply due to data errors in their own system.

Without duplicate detection at import time, you end up with conflicting product records in your catalog — different prices, different descriptions, different stock levels for the same SKU. Automated deduplication identifies these conflicts and handles them according to configurable merge rules before they reach your live catalog.

5. Wrong Decimal Separators

Germany uses a comma as the decimal separator (1,99 €), while many international systems use a period (1.99). When a supplier’s system generates a CSV with comma separators and that file uses commas as both field delimiters and decimal separators, the result is broken data: a price of “1,99” in a comma-delimited CSV becomes two fields — “1” and “99” — instead of one.

This is a particularly damaging error because it often passes initial format checks but produces wrong prices in your catalog. Automated normalization detects and corrects decimal separator inconsistencies before they cause pricing errors.

6. Embedded HTML and Formatting in Text Fields

Product descriptions from suppliers often contain HTML tags (<p>, <br>, <strong>, &nbsp;), special characters (&amp;, &uuml;), or formatting artifacts from word processors. If you publish these directly to your shop, the raw HTML appears as visible text — or breaks the layout.

Automated cleaning strips HTML tags, converts HTML entities to proper characters, and normalizes whitespace before product descriptions reach your catalog.

How Productbay Automates CSV Cleaning

Productbay’s import pipeline applies cleaning rules at every stage automatically. The workflow:

  1. Encoding detection: File encoding is identified and converted to UTF-8 before any processing begins
  2. Delimiter detection: The file’s delimiter (comma, semicolon, tab) is auto-detected — no configuration needed
  3. Column mapping: Supplier columns are mapped to your internal schema using saved per-supplier mappings
  4. Data normalization: Decimal separators, date formats, unit conversions, and value normalization applied
  5. Text cleaning: HTML tags stripped, entities decoded, whitespace normalized
  6. Duplicate detection: SKU and EAN deduplication with configurable merge behavior
  7. Validation: Required fields, format checks, and custom rules — non-passing products queued for review

All of this happens automatically, every time. The rules you configure for a supplier on the first import continue to apply on every subsequent import from that supplier — without any manual intervention.

Practical Tips for Working with Supplier CSV Files

  • Always request UTF-8 encoding from suppliers if you have influence over their export settings — it eliminates the single most common source of import errors.
  • Ask for consistent column headers in every export. Even if the data format changes, consistent headers let your mapping rules apply correctly.
  • Never open supplier CSV files in Excel before importing — Excel often converts EAN codes (which look like large numbers) to scientific notation, permanently corrupting the data.
  • Test your import pipeline with edge cases: empty rows, products with only partial data, and price fields with different formats are common failure points in manual processes.

The Business Case for Automation

Manual CSV cleaning for a retailer with 5 suppliers and weekly update cycles typically takes 2–4 hours per supplier per week. That is 10–20 hours per week of pure overhead — no value created, just data moving from one format to another.

With automated PIM import pipelines, those 10–20 hours become zero. The same cleaning logic that your team applies manually every week runs automatically, correctly, every time — without the risk of human error, without the time cost, and without the frustration of discovering encoding errors at midnight before a product launch.

The goal is not to clean CSV files faster. The goal is to never clean CSV files manually again. Automation converts a recurring time cost into a one-time configuration.

Häufige Fragen

Automate your CSV imports

Productbay cleans, normalizes, and validates CSV imports from every supplier automatically. Book a free demo.

Get started