June 20, 2025
Extract Product Info from Webpage Screenshots using Dumpling AI and GPT-4o
Introduction
Manually collecting product details from web pages is exhausting, error-prone, and simply not scalable. This workflow turns that entire process into an automated pipeline, from scraping the web page and extracting the screenshot, to parsing the visible content with AI and saving structured product data directly into Google Sheets.
Whether you’re in eCommerce, running competitor analysis, or need fast product data for reports, this solution saves time, reduces errors, and centralizes your insights, all triggered by a simple URL entry.
Step-by-Step Breakdown
1. Trigger on New URL in Google Sheet
- Type: Google Sheets Trigger (Watch Rows)
- Purpose: Monitors a designated column in a Google Sheet. When a new URL is added to the sheet, the automation starts.
- How it works:
- Watches for new rows
- Triggers on every new product URL entered by your team
- Keeps input simple,just paste the product page URL
2. Take Full-Page Screenshot using Dumpling AI
- Type: HTTP Request
- Purpose: Captures a full-page screenshot of the product page using Dumpling AI.
- Configuration:
- API call is made to Dumpling AI with the target URL and fullPage: true
- Returns a screenshot URL that represents a visual capture of the webpage
- This step ensures we’re working with an exact replica of the content that users see
3. Extract Visible Text from Screenshot using Dumpling AI
- Type: HTTP Request
- Purpose: Uses OCR and AI analysis to extract all readable text from the screenshot.
- What it captures:
- Titles, descriptions, prices, reviews, product features
- Maintains the layout order for better context
- Ensures no important visual content is missed
4. Download the Screenshot File
- Type: HTTP Request (Binary download)
- Purpose: Downloads the screenshot locally into the workflow
- Why this is important:
- Allows for local storage, image uploads, or archival
- Useful for visual audits or attaching screenshots in reports
5. Save Screenshot to Google Drive
- Type: Google Drive Node
- Purpose: Uploads the screenshot to a specific Drive folder
- Setup:
- Path to the designated Drive folder is set in the node
- Ensures all captured screenshots are archived securely for future reference
6. Update Google Sheet with Screenshot Link
- Type: Google Sheets (Update Row)
- Purpose: Adds the screenshot link into the same row of the original product URL
- Why it matters:
- Connects raw input (URL) with the output (screenshot + extracted data)
- Keeps everything traceable in a single sheet
7. Use GPT-4o to Extract Product Info
- Type: OpenAI via Langchain
- Purpose: Sends the visible text to GPT-4o with a structured prompt to extract product data
- Prompt Example:
- “From the text, identify product names, pricing, rating counts, purchasing options, deals, and output as a JSON with an array of products.”
- What it returns:
- Structured JSON data that includes all the relevant product information extracted intelligently from the messy raw text
8. Split Products into Individual Records
- Type: Item Lists (Split Out)
- Purpose: Processes each product found in the JSON into its own record
- Why it’s critical:
- Allows multiple products on one page to be treated as unique rows in your sheet
- Perfect for bulk listings or category pages
9. Append Each Product to a Google Sheet
- Type: Google Sheets (Append Row)
- Purpose: Adds each product’s details into a final, structured output sheet
- Fields Written:
- Product Name
- Price
- Ratings
- Number of Purchases
- Special Deals
- Purchase Options (new/used/offers)
- Why this matters:
- Clean, organized data for analysis, syncing to databases, or direct use in reports
Conclusion
This workflow turns messy screenshots into structured product data with zero manual input. You just paste a URL, everything else happens automatically: visual capture, text extraction, product recognition, and recordkeeping.
It’s powerful for teams tracking competitors, managing product catalogs, or monitoring eCommerce trends. This isn’t just automation, it’s real-time insight, directly inside your spreadsheets.
You can easily extend this by syncing to Airtable, sending reports to Slack, or filtering deals by category.
Download the blueprint used in this blog post
Click here to access the blueprint.