CSV files from suppliers are deceptively simple. In practice, every supplier does it differently — and the cumulative cost of manual cleaning adds up to hours every week. Here is how to automate it.
CSV stands for “Comma-Separated Values” — but in practice, supplier CSV files use semicolons, tabs, and pipe characters as separators. The format is simple; the reality is not. Every supplier’s ERP system generates CSV exports according to its own logic, and the result is a format that is theoretically standardized but practically inconsistent across every data source.
For a retailer receiving data from 5 or 10 suppliers, this means 5 or 10 different cleaning routines to run every time a new file arrives. Multiply that by weekly update cycles and you have a significant, recurring time sink — one that produces no business value, only keeps the lights on.
This is the most frustrating CSV problem for DACH retailers because it is invisible until it breaks things. German umlauts — ä, ö, ü, ß — are encoded differently in UTF-8 and Latin-1 (also called ISO-8859-1). When a file exported in Latin-1 is read as UTF-8 by your import system, the umlauts become garbled: “Möbel” becomes “Möbel”, “Größe” becomes “Größe”.
The root cause is that older ERP systems — particularly older versions of SAP, Navision, and German accounting software — default to Latin-1 encoding. Modern systems use UTF-8. When a supplier upgrades their system, the encoding may change without notice, breaking an import process that had worked fine for years.
Automated solution: Encoding detection at import time. Productbay automatically identifies whether a file is UTF-8, Latin-1, or another encoding and converts it correctly — no manual pre-processing required.
You have an internal product attribute schema. Your suppliers do not know or care about it. Supplier A calls it “Artikelnummer”, supplier B calls it “Art_Nr”, supplier C calls it “SKU”, supplier D calls it “ProductID”. All four mean the same thing — your internal “sku” field.
In a manual workflow, you handle this translation in Excel every time. In an automated workflow, you define the mapping once per supplier: Supplier A’s “Artikelnummer” maps to “sku”. From that point on, every import from Supplier A uses this mapping automatically.
Supplier CSV files frequently omit fields that your shop requires — EAN codes, product categories, weight, or country of origin. The supplier simply does not collect that data, or includes it in a different field, or splits it across multiple columns.
Automated handling: Configurable import rules that identify missing required fields and either fill them from derived data (e.g., calculate EAN check digit if missing), apply default values where appropriate, or flag the product for manual completion before it can be published.
Supplier catalogs often contain duplicate entries — the same SKU appearing multiple times with slightly different data. This happens when suppliers have multiple product variants in one row, when they include discontinued products alongside active ones, or simply due to data errors in their own system.
Without duplicate detection at import time, you end up with conflicting product records in your catalog — different prices, different descriptions, different stock levels for the same SKU. Automated deduplication identifies these conflicts and handles them according to configurable merge rules before they reach your live catalog.
Germany uses a comma as the decimal separator (1,99 €), while many international systems use a period (1.99). When a supplier’s system generates a CSV with comma separators and that file uses commas as both field delimiters and decimal separators, the result is broken data: a price of “1,99” in a comma-delimited CSV becomes two fields — “1” and “99” — instead of one.
This is a particularly damaging error because it often passes initial format checks but produces wrong prices in your catalog. Automated normalization detects and corrects decimal separator inconsistencies before they cause pricing errors.
Product descriptions from suppliers often contain HTML tags (<p>, <br>, <strong>, ), special characters (&, ü), or formatting artifacts from word processors. If you publish these directly to your shop, the raw HTML appears as visible text — or breaks the layout.
Automated cleaning strips HTML tags, converts HTML entities to proper characters, and normalizes whitespace before product descriptions reach your catalog.
Productbay’s import pipeline applies cleaning rules at every stage automatically. The workflow:
All of this happens automatically, every time. The rules you configure for a supplier on the first import continue to apply on every subsequent import from that supplier — without any manual intervention.
Manual CSV cleaning for a retailer with 5 suppliers and weekly update cycles typically takes 2–4 hours per supplier per week. That is 10–20 hours per week of pure overhead — no value created, just data moving from one format to another.
With automated PIM import pipelines, those 10–20 hours become zero. The same cleaning logic that your team applies manually every week runs automatically, correctly, every time — without the risk of human error, without the time cost, and without the frustration of discovering encoding errors at midnight before a product launch.
The goal is not to clean CSV files faster. The goal is to never clean CSV files manually again. Automation converts a recurring time cost into a one-time configuration.
Productbay cleans, normalizes, and validates CSV imports from every supplier automatically. Book a free demo.
Get started