Harvest
AI + crawler product enrichment that gathers product information, images, and safety data sheets from supplier sites, scores content quality, and transforms it directly into InRiver PIM format, built to clear a 55k+ item enrichment backlog.
The Challenge
A growing backlog and a shrinking team
Product data retrieval was reactive and slow. Enriching a catalog item (pulling accurate descriptions, imagery, and safety data sheets and mapping them into the PIM) was painstaking manual work, often outsourced.
Meanwhile the Data Quality team had shrunk and the enrichment backlog had grown to over 55,000 items. Manual curation simply could not catch up, and the gap was a drag on commerce: incomplete listings convert worse and create downstream support load.
The Process
Pairing AI with a crawler
Harvest automates the enrichment pipeline end to end: gather, score, and transform. Built on Bunzl Forge, it combines an Apify crawler for acquisition with AI for understanding and mapping.
Crawl supplier sites
An Apify crawler gathers product content, imagery, and safety data sheets directly from supplier sites automatically, instead of waiting on manual lookups.
Score content quality
AI evaluates the gathered content for quality and completeness, so only trustworthy data flows downstream.
Transform into InRiver
Validated content is mapped directly into the InRiver PIM schema, collapsing curation time and reducing dependence on outsourced data work.
The Solution
Crawl → score → InRiver, automatically
Bunzl Harvest pairs AI with an Apify crawler to gather product information, images, and safety data sheets from supplier sites, score content quality, and transform it directly into InRiver format. Built on Bunzl Forge, it turns a reactive, outsourced process into an automated pipeline built to clear a 55,000+ item enrichment backlog.
The Results
Clearing the backlog
Harvest was built to clear a 55k+ item enrichment backlog by automating acquisition, quality scoring, and PIM mapping, cutting curation time and reducing reliance on external data curation.