Commit graph

17 commits

Author SHA1 Message Date
clucraft
31732b814f Fix Newegg scraper picking up bundle savings instead of product price
- Add helper to detect savings/combo/bundle containers
- Prioritize JSON-LD data as primary price source (most reliable)
- Skip price elements inside savings containers
- Add minimum price threshold to filter out discount amounts

Fixes issue where $189.99 bundle savings was extracted instead of
actual $675.59 product price for items like AMD Ryzen 9 9950X3D.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 04:25:12 -05:00
clucraft
d98138fe7c Add AI-powered price extraction fallback
- Add AI extraction service supporting Anthropic (Claude) and OpenAI
- Add AI settings UI in Settings page with provider selection
- Add database migration for AI settings columns
- Integrate AI fallback into scraper when standard methods fail
- Add API endpoints for AI settings and test extraction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:49:55 -05:00
clucraft
a8a2562cee Extract stock status from JSON-LD availability field
Now extracts availability/stock status from JSON-LD structured data
(e.g., "https://schema.org/OutOfStock"). This fixes stock detection
for sites like Ubiquiti Store that provide availability in JSON-LD.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:28:39 -05:00
clucraft
f188ad4ff1 Support priceSpecification in JSON-LD price extraction
Some sites (like Ubiquiti Store) use the priceSpecification nested
format in their JSON-LD structured data instead of direct price
property. Now checks offer.priceSpecification.price as fallback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:22:49 -05:00
clucraft
c23cc8353a Remove B&H Photo scraper (Cloudflare protection too strong)
B&H Photo Video uses aggressive Cloudflare protection that blocks
headless browsers even with stealth plugins. Removing the site-specific
scraper for now. The Puppeteer fallback remains in place for other
sites with less aggressive protection.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:17:14 -05:00
clucraft
58ad638641 Add human-like behavior to browser scraping 2026-01-21 21:13:09 -05:00
clucraft
9af18969f3 Add puppeteer-extra stealth plugin for Cloudflare bypass
The headless browser was being detected by Cloudflare and stuck on the
"Just a moment..." challenge page. Added puppeteer-extra with the stealth
plugin which patches browser fingerprinting to avoid bot detection. Also
added logic to wait for Cloudflare challenges to complete.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 21:07:51 -05:00
clucraft
be1b2d9b6c Add temporary debug logging to B&H scraper 2026-01-21 21:03:01 -05:00
clucraft
c96861fefb Add Puppeteer fallback for Cloudflare-protected sites
When HTTP requests are blocked with 403 (e.g., B&H Photo's Cloudflare
protection), the scraper now automatically retries using a headless
Chrome browser via Puppeteer. Also updated Dockerfile to include
Chromium dependencies for container deployment.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:55:49 -05:00
clucraft
7c8ab0721b Add B&H Photo Video scraper support
- Parse JSON-LD structured data for price, name, image, availability
- Add fallback HTML selectors using data-selenium attributes
- Detect stock status from add-to-cart and notify buttons

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:37:53 -05:00
clucraft
e0adae87f6 Fix Walmart scraper with improved price and stock detection
- Parse Walmart's __NEXT_DATA__ JSON for accurate product data
- Extract price, name, image, and availability from embedded JSON
- Add fallback HTML selectors if JSON parsing fails
- Make stock status detection more conservative
- Avoid false "out of stock" from unrelated page text
- Only mark out of stock when explicitly indicated

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 20:33:24 -05:00
clucraft
09b7e66758 Improve Newegg scraper with better price and stock detection
- Add multiple price selectors for robustness
- Combine dollar/cents from price-current element
- Add JSON-LD fallback for price extraction
- Add explicit stock status detection for Newegg
- Prevents false out-of-stock detection from generic detector

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 21:54:04 -05:00
clucraft
8c5d20707d Add out-of-stock detection and display
- Add stock_status column to products table (in_stock/out_of_stock/unknown)
- Detect out-of-stock status on Amazon by checking:
  - #availability text for "currently unavailable"
  - #outOfStock element presence
  - Missing "Add to Cart" button
- Add generic stock status detection for other sites
- Allow adding out-of-stock products (they just won't have a price)
- Update background scheduler to track stock status changes
- Display stock status badge in product list and detail pages
- Dim out-of-stock products in the dashboard
- Show "Currently Unavailable" badge instead of price when out of stock

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:54:12 -05:00
clucraft
bf111e13d8 Fix Amazon scraper picking up coupon prices instead of product price
- Add detection for coupon/savings containers and skip prices within them
- Check parent elements for coupon-related IDs, classes, and text
- Add minimum price threshold of $2 (coupons are typically $1-5)
- Add fallback to parse Amazon's whole/fraction price format directly
- Increase findMostLikelyPrice threshold from $0.99 to $5

This fixes the issue where $1 coupon savings were being scraped
instead of the actual $25.99 product price.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 20:46:17 -05:00
clucraft
93dbb5cc7c Improve price scraping with site-specific extractors
Added dedicated scrapers for major retailers:
- Amazon (all regions)
- Walmart
- Best Buy
- Target
- eBay
- Newegg
- Home Depot
- Costco
- AliExpress

Improvements:
- Site-specific selectors tried first
- Skip "original/was" prices in generic scraper
- Better browser headers to avoid blocks
- Prefer lowPrice for price ranges in JSON-LD
- Increased timeout to 20s

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 19:24:40 -05:00
clucraft
93b6338e99 Fix TypeScript errors in scraper
- Fix cheerio import to use named exports
- Add proper interfaces for JSON-LD data
- Fix type annotations for CheerioAPI

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 14:09:10 -05:00
clucraft
10660e5626 Initial commit: PriceGhost price tracking application
Full-stack application for tracking product prices:
- Backend: Node.js + Express + TypeScript
- Frontend: React + Vite + TypeScript
- Database: PostgreSQL
- Price scraping with Cheerio
- JWT authentication
- Background price checking with node-cron
- Price history charts with Recharts
- Docker support with docker-compose

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-20 13:58:13 -05:00