Why do CSV files from different suppliers look so different?

There is no enforced standard for CSV exports. Each supplier or ERP system generates CSV files according to its own column naming, encoding, delimiter, and data format choices. One supplier might use semicolons as separators, another commas, a third tabs. One exports UTF-8, another Latin-1. This variability is inherent to the CSV format and cannot be solved by asking suppliers to 'fix' their exports — it must be handled on the import side.

What is the difference between UTF-8 and Latin-1 encoding, and why does it matter?

UTF-8 and Latin-1 are character encoding standards. German umlauts (ä, ö, ü, ß) are represented differently in each. When a CSV encoded in Latin-1 is read as UTF-8, umlauts become garbled characters (e.g., ä becomes Ã¤). This is one of the most common CSV import errors for DACH retailers, and it must be auto-detected and corrected at import time.

How does Productbay handle duplicate SKUs in supplier CSV imports?

Productbay detects duplicate SKUs (and EANs) during import validation. When duplicates are found, you can configure the merge behavior: update existing records, skip duplicates, or flag them for manual review. The system gives you full visibility into which products are affected before any data is written to your catalog.

Can I automate CSV cleaning rules without writing code?

Yes. Productbay's import configuration is entirely no-code. You define cleaning rules — strip HTML tags, normalize decimal separators, trim whitespace, map column names — through a visual interface. Rules are saved per supplier and applied automatically to every future import. No developer required.

Automatically Clean CSV Supplier Imports

Why CSV Imports Are Harder Than They Look

CSV stands for “Comma-Separated Values” — but in practice, supplier CSV files use semicolons, tabs, and pipe characters as separators. The format is simple; the reality is not. Every supplier’s ERP system generates CSV exports according to its own logic, and the result is a format that is theoretically standardized but practically inconsistent across every data source.

For a retailer receiving data from 5 or 10 suppliers, this means 5 or 10 different cleaning routines to run every time a new file arrives. Multiply that by weekly update cycles and you have a significant, recurring time sink — one that produces no business value, only keeps the lights on.

The Six Most Common CSV Problems from Suppliers

1. Encoding Errors: UTF-8 vs. Latin-1

This is the most frustrating CSV problem for DACH retailers because it is invisible until it breaks things. German umlauts — ä, ö, ü, ß — are encoded differently in UTF-8 and Latin-1 (also called ISO-8859-1). When a file exported in Latin-1 is read as UTF-8 by your import system, the umlauts become garbled: “Möbel” becomes “MÃ¶bel”, “Größe” becomes “GrÃ¶ÃŸe”.

The root cause is that older ERP systems — particularly older versions of SAP, Navision, and German accounting software — default to Latin-1 encoding. Modern systems use UTF-8. When a supplier upgrades their system, the encoding may change without notice, breaking an import process that had worked fine for years.

Automated solution: Encoding detection at import time. Productbay automatically identifies whether a file is UTF-8, Latin-1, or another encoding and converts it correctly — no manual pre-processing required.

2. Inconsistent Column Names Across Suppliers

You have an internal product attribute schema. Your suppliers do not know or care about it. Supplier A calls it “Artikelnummer”, supplier B calls it “Art_Nr”, supplier C calls it “SKU”, supplier D calls it “ProductID”. All four mean the same thing — your internal “sku” field.

In a manual workflow, you handle this translation in Excel every time. In an automated workflow, you define the mapping once per supplier: Supplier A’s “Artikelnummer” maps to “sku”. From that point on, every import from Supplier A uses this mapping automatically.

3. Missing Required Fields

Supplier CSV files frequently omit fields that your shop requires — EAN codes, product categories, weight, or country of origin. The supplier simply does not collect that data, or includes it in a different field, or splits it across multiple columns.

Automated handling: Configurable import rules that identify missing required fields and either fill them from derived data (e.g., calculate EAN check digit if missing), apply default values where appropriate, or flag the product for manual completion before it can be published.

4. Duplicate SKUs

Supplier catalogs often contain duplicate entries — the same SKU appearing multiple times with slightly different data. This happens when suppliers have multiple product variants in one row, when they include discontinued products alongside active ones, or simply due to data errors in their own system.

Without duplicate detection at import time, you end up with conflicting product records in your catalog — different prices, different descriptions, different stock levels for the same SKU. Automated deduplication identifies these conflicts and handles them according to configurable merge rules before they reach your live catalog.

5. Wrong Decimal Separators

Germany uses a comma as the decimal separator (1,99 €), while many international systems use a period (1.99). When a supplier’s system generates a CSV with comma separators and that file uses commas as both field delimiters and decimal separators, the result is broken data: a price of “1,99” in a comma-delimited CSV becomes two fields — “1” and “99” — instead of one.

This is a particularly damaging error because it often passes initial format checks but produces wrong prices in your catalog. Automated normalization detects and corrects decimal separator inconsistencies before they cause pricing errors.

6. Embedded HTML and Formatting in Text Fields

Product descriptions from suppliers often contain HTML tags (<p>, <br>, <strong>,  ), special characters (&, ü), or formatting artifacts from word processors. If you publish these directly to your shop, the raw HTML appears as visible text — or breaks the layout.

Automated cleaning strips HTML tags, converts HTML entities to proper characters, and normalizes whitespace before product descriptions reach your catalog.

How Productbay Automates CSV Cleaning

Productbay’s import pipeline applies cleaning rules at every stage automatically. The workflow:

Encoding detection: File encoding is identified and converted to UTF-8 before any processing begins
Delimiter detection: The file’s delimiter (comma, semicolon, tab) is auto-detected — no configuration needed
Column mapping: Supplier columns are mapped to your internal schema using saved per-supplier mappings
Data normalization: Decimal separators, date formats, unit conversions, and value normalization applied
Text cleaning: HTML tags stripped, entities decoded, whitespace normalized
Duplicate detection: SKU and EAN deduplication with configurable merge behavior
Validation: Required fields, format checks, and custom rules — non-passing products queued for review

All of this happens automatically, every time. The rules you configure for a supplier on the first import continue to apply on every subsequent import from that supplier — without any manual intervention.

Practical Tips for Working with Supplier CSV Files

Always request UTF-8 encoding from suppliers if you have influence over their export settings — it eliminates the single most common source of import errors.
Ask for consistent column headers in every export. Even if the data format changes, consistent headers let your mapping rules apply correctly.
Never open supplier CSV files in Excel before importing — Excel often converts EAN codes (which look like large numbers) to scientific notation, permanently corrupting the data.
Test your import pipeline with edge cases: empty rows, products with only partial data, and price fields with different formats are common failure points in manual processes.

The Business Case for Automation

Manual CSV cleaning for a retailer with 5 suppliers and weekly update cycles typically takes 2–4 hours per supplier per week. That is 10–20 hours per week of pure overhead — no value created, just data moving from one format to another.

With automated PIM import pipelines, those 10–20 hours become zero. The same cleaning logic that your team applies manually every week runs automatically, correctly, every time — without the risk of human error, without the time cost, and without the frustration of discovering encoding errors at midnight before a product launch.

The goal is not to clean CSV files faster. The goal is to never clean CSV files manually again. Automation converts a recurring time cost into a one-time configuration.

Automatically Clean CSV Supplier Imports: How Retailers Save Hours Per Week