feat: enhance SurfSense with new skills, blog section, and improve SEO metadata
Some checks failed
Build and Push Docker Images / tag_release (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_backend, ./surfsense_backend/Dockerfile, backend, surfsense-backend, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-24.04-arm, linux/arm64, arm64) (push) Has been cancelled
Build and Push Docker Images / build (./surfsense_web, ./surfsense_web/Dockerfile, web, surfsense-web, ubuntu-latest, linux/amd64, amd64) (push) Has been cancelled
Build and Push Docker Images / create_manifest (backend, surfsense-backend) (push) Has been cancelled
Build and Push Docker Images / create_manifest (web, surfsense-web) (push) Has been cancelled

- Added multiple new skills to skills-lock.json from the repository `aaron-he-zhu/seo-geo-claude-skills`.
- Introduced `fuzzy-search` dependency in package.json for improved search functionality.
- Updated pnpm-lock.yaml to include the new `fuzzy-search` package.
- Enhanced SEO metadata across various pages, including canonical links and descriptions for better search visibility.
- Improved layout and structure of several components, including the homepage and changelog, to enhance user experience.
This commit is contained in:
DESKTOP-RTLN3BA\$punk 2026-04-11 23:38:12 -07:00
parent 61b3f0d7e3
commit 7ea840dbb2
120 changed files with 25729 additions and 352 deletions

View file

@ -0,0 +1,705 @@
# HTTP Status Codes for SEO
SEO-relevant HTTP status codes, their implications, and how to diagnose and fix issues.
## Status Code Categories
- **2xx**: Success - Request succeeded
- **3xx**: Redirection - Further action needed
- **4xx**: Client Error - Problem with the request
- **5xx**: Server Error - Server failed to fulfill request
---
## 2xx Success Codes
### 200 OK
**What it means**: Request succeeded, content returned normally.
**SEO impact**: Positive - page is accessible and indexable.
**When to use**: Standard response for all working pages.
**When it's a problem**: When different URLs return 200 for same content (should use 301 redirect).
---
### 204 No Content
**What it means**: Request succeeded but no content to return.
**SEO impact**: Neutral - rarely used for pages meant to be indexed.
**Common use**: API responses, AJAX requests.
---
## 3xx Redirection Codes
### 301 Moved Permanently
**What it means**: Resource permanently moved to new URL. All link equity transfers.
**SEO impact**: Positive when used correctly - passes 90-99% of link equity.
**When to use**:
- Permanently changing URL structure
- Consolidating duplicate content
- Moving to new domain
- Changing HTTP to HTTPS
- Changing www to non-www (or vice versa)
**Example header**:
```
HTTP/1.1 301 Moved Permanently
Location: https://example.com/new-page
```
**Common mistakes**:
- Using 302 instead of 301 for permanent changes
- Creating redirect chains (A→B→C)
- Redirecting to irrelevant pages
- Not redirecting HTTP to HTTPS
**How to implement**:
- **.htaccess** (Apache): `Redirect 301 /old-page /new-page`
- **nginx**: `rewrite ^/old-page$ /new-page permanent;`
- **Server-side**: Set Location header with 301 status
---
### 302 Found (Temporary Redirect)
**What it means**: Resource temporarily at different URL. Original URL should still be used.
**SEO impact**: Neutral to negative - does NOT pass full link equity. Search engines keep indexing original URL.
**When to use**:
- A/B testing
- Temporary promotions
- Maintenance redirects
- Geolocation redirects (sometimes)
**When NOT to use**: Permanent URL changes (use 301).
**Warning**: Google may treat long-standing 302s as 301s, but better to be explicit.
---
### 303 See Other
**What it means**: Response can be found at another URI using GET.
**SEO impact**: Minimal - rarely used for SEO purposes.
**Common use**: After form submissions, redirect to results page.
---
### 307 Temporary Redirect
**What it means**: Temporary redirect that preserves request method (POST stays POST).
**SEO impact**: Similar to 302 - temporary, doesn't pass full link equity.
**Difference from 302**: Guarantees request method won't change (more precise than 302).
**When to use**: Temporary redirects where HTTP method preservation matters.
---
### 308 Permanent Redirect
**What it means**: Permanent redirect that preserves request method.
**SEO impact**: Similar to 301 - passes link equity.
**Difference from 301**: Guarantees request method won't change (POST stays POST).
**When to use**: Permanent redirects where method preservation matters (rare for SEO).
---
### Redirect Chain Issues
**Problem**: Multiple redirects before reaching final destination.
**Example chain**:
```
http://example.com/page
→ https://example.com/page (redirect 1)
→ https://www.example.com/page (redirect 2)
→ https://www.example.com/new-page (redirect 3)
```
**SEO impact**:
- Slows page load (each redirect = new HTTP request)
- Dilutes link equity with each hop
- Wastes crawl budget
- Poor user experience
**How to fix**: Redirect directly from original URL to final destination.
**Fixed version**:
```
http://example.com/page
→ https://www.example.com/new-page (single redirect)
```
---
### Redirect Loops
**Problem**: Redirects create infinite loop.
**Example**:
```
/page-a → /page-b
/page-b → /page-a
```
**SEO impact**: Severe - page completely inaccessible.
**Symptoms**:
- Browser shows "Too many redirects" error
- Page never loads
- Search Console shows crawl errors
**How to diagnose**:
1. Use redirect checker tool
2. Check .htaccess or nginx config for conflicting rules
3. Review server-side redirect logic
**How to fix**:
1. Identify conflicting redirect rules
2. Remove or correct the loop
3. Test thoroughly
4. Request recrawl in Search Console
---
## 4xx Client Error Codes
### 404 Not Found
**What it means**: Requested resource doesn't exist.
**SEO impact**: Neutral to negative depending on context.
**When 404s are OK**:
- Legitimately deleted pages with no equivalent
- Never-existed URLs from typos
- Temporary content that expired (old promotions)
- Intentionally removed low-quality content
**When 404s are problems**:
- Pages that should exist are returning 404
- Previously working pages now broken
- Important pages missing from navigation
- High-traffic pages deleted without redirect
**How to fix**:
1. **If content moved**: Set up 301 redirect to new location
2. **If content deleted**: Either keep 404 or redirect to relevant category
3. **If never existed**: Leave as 404
4. **If important**: Restore the page
**Monitoring 404s**:
- Check Search Console → Coverage → Not found (404)
- Review referrer data to see what's linking to 404s
- Fix high-value 404s first (most traffic/backlinks)
**Soft 404s** (BAD):
- Page returns 200 but shows "not found" message
- Search engines may keep page indexed
- Creates duplicate content issues
- Fix: Return proper 404 status code
---
### 410 Gone
**What it means**: Resource permanently deleted, never coming back.
**SEO impact**: Stronger signal than 404 - tells search engines not to return.
**When to use**:
- Discontinued products
- Expired promotions
- Permanently removed content
- Outdated information
**Difference from 404**:
- 404: "Not found" (might exist at another URL)
- 410: "Gone forever" (don't look for it)
**When to use 410 vs 301**:
- Use 410: No equivalent replacement exists
- Use 301: Relevant alternative exists
**How search engines respond**:
- Faster de-indexing than 404
- Stop crawling sooner
- Better for crawl budget
---
### 403 Forbidden
**What it means**: Server understood request but refuses to authorize it.
**SEO impact**: Negative - page inaccessible and won't be indexed.
**Common causes**:
- Permission restrictions
- IP blocking
- .htaccess restrictions
- File permissions (chmod)
- Authentication required
**When it's intentional**:
- Admin areas
- Member-only content
- Geographic restrictions
**When it's a problem**:
- Public pages returning 403
- Search engine bots blocked
- Accidental permission changes
**How to diagnose**:
1. Check .htaccess for IP restrictions
2. Verify file permissions (should be 644 for files, 755 for directories)
3. Check server-level access rules
4. Test with different IPs/user-agents
**How to fix**:
1. Adjust file permissions: `chmod 644 filename`
2. Remove blocking rules from .htaccess
3. Whitelist search engine bots
4. Review server firewall rules
---
### 401 Unauthorized
**What it means**: Authentication required but not provided or failed.
**SEO impact**: Negative - page won't be indexed.
**Common causes**:
- Password-protected pages
- HTTP Basic Authentication
- Expired sessions
- Missing credentials
**When it's intentional**: Member areas, staging sites, admin panels.
**How to handle for SEO**:
- Don't password-protect pages you want indexed
- Use separate staging domain with 401
- For members-only content, show teaser with meta robots noindex
---
### 429 Too Many Requests
**What it means**: User/bot sent too many requests in given timeframe (rate limiting).
**SEO impact**: Negative if search engines can't crawl.
**Common causes**:
- Aggressive crawling
- DDoS protection triggered
- API rate limits
- Server throttling
**How to handle**:
1. Check Googlebot isn't being rate-limited (use Search Console)
2. Whitelist verified search engine bots
3. Configure rate limits appropriately
4. Monitor crawl rate in Search Console
---
## 5xx Server Error Codes
### 500 Internal Server Error
**What it means**: Generic server error, something went wrong.
**SEO impact**: Very negative if persistent - prevents indexing and ranking.
**Common causes**:
- PHP/code errors
- Database connection issues
- .htaccess syntax errors
- Resource limits exceeded
- Plugin/theme conflicts (WordPress)
**How to diagnose**:
1. Check server error logs
2. Review recent code/config changes
3. Test locally or on staging
4. Disable plugins one by one (if CMS)
5. Check .htaccess syntax
**How to fix**:
1. Review error logs for specific error
2. Roll back recent changes
3. Fix code errors
4. Increase resource limits if needed
5. Test thoroughly before re-deploying
**Monitoring**: Set up alerts for 500 errors (sudden spike = problem).
---
### 502 Bad Gateway
**What it means**: Server received invalid response from upstream server.
**SEO impact**: Negative if persistent - prevents crawling/indexing.
**Common causes**:
- Proxy/load balancer issues
- Upstream server down
- Timeout issues
- Firewall blocking
**Common scenarios**:
- CDN can't reach origin server
- Application server crashed
- Database server unresponsive
**How to fix**:
1. Check upstream server status
2. Verify firewall rules
3. Check timeout settings
4. Restart proxy/load balancer if needed
5. Review CDN configuration
---
### 503 Service Unavailable
**What it means**: Server temporarily unable to handle request.
**SEO impact**: Neutral if truly temporary with Retry-After header. Negative if prolonged.
**Common causes**:
- Maintenance mode
- Server overload
- Database down
- Resource exhaustion
**Proper use for maintenance**:
```
HTTP/1.1 503 Service Unavailable
Retry-After: 3600
```
**Best practices for maintenance**:
1. Use 503 (not 404 or 500)
2. Include Retry-After header
3. Keep maintenance brief (<24 hours)
4. Schedule during low-traffic times
5. Inform users with clear message
**How search engines handle 503**:
- Short-term (hours): Will retry, no ranking impact
- Long-term (days+): May drop rankings, de-index pages
---
### 504 Gateway Timeout
**What it means**: Server didn't receive timely response from upstream server.
**SEO impact**: Negative - prevents crawling.
**Common causes**:
- Slow database queries
- External API timeouts
- Insufficient server resources
- Network issues
**How to fix**:
1. Optimize slow queries
2. Increase timeout limits
3. Add caching
4. Scale server resources
5. Review external dependencies
---
## Status Code Decision Flowchart
### Content Moved Permanently?
→ YES: Use **301 redirect**
→ NO: Continue
### Content Moved Temporarily?
→ YES: Use **302 redirect**
→ NO: Continue
### Content Deleted with No Replacement?
→ YES: Use **404** (or **410** if permanently gone)
→ NO: Continue
### Content Exists at This URL?
→ YES: Use **200 OK**
→ NO: Use **404**
### Need Authentication?
→ YES: Use **401**
→ NO: Continue
### Access Forbidden?
→ YES: Use **403**
→ NO: Continue
### Server Error?
→ YES: Use **500**, **502**, **503**, or **504** depending on cause
→ NO: Use **200 OK**
---
## Diagnosing Status Code Issues
### Tools
**Browser DevTools**:
1. Open DevTools (F12)
2. Go to Network tab
3. Reload page
4. Check status code in first request
**cURL command**:
```bash
curl -I https://example.com/page
```
**Online checkers**:
- httpstatus.io
- redirect-checker.org
- websiteplanet.com/webtools/redirects/
**Google Search Console**:
- Coverage report → Error/Excluded sections
- URL Inspection tool → Check specific URLs
---
### Common Diagnostic Scenarios
### "Page Won't Index"
**Check**:
1. Status code (should be 200)
2. Redirects (shouldn't redirect away)
3. 4xx/5xx errors
4. robots.txt blocking
5. noindex meta tag
### "Page Disappeared from Results"
**Check**:
1. Returns 404/410/5xx
2. Redirecting elsewhere (301/302)
3. Changed to 403/401
4. Server timing out (504)
### "Traffic Dropped After Migration"
**Check**:
1. Old URLs return 404 (should be 301)
2. Redirect chains (should be direct)
3. Redirect loops
4. Wrong redirect type (302 vs 301)
5. Incorrect redirect targets
---
## Status Codes and Crawl Budget
### Impact on Crawl Budget
**Efficient (minimal impact)**:
- 200 OK
- 301 redirects (if minimal chains)
- 410 Gone (removes from crawl queue)
**Moderate impact**:
- 302 redirects (search engine may keep checking)
- 404 errors (search engines periodically recheck)
- Redirect chains (multiple requests per URL)
**High impact (wasteful)**:
- 5xx errors (search engines retry frequently)
- Redirect loops (waste crawl budget)
- Soft 404s (search engine confused, keeps crawling)
- 429 rate limiting (prevents efficient crawling)
---
## SEO Status Code Best Practices
### For Migrations
- [ ] Use 301 redirects for all permanently moved pages
- [ ] Redirect directly to final destination (no chains)
- [ ] Test all redirects before launching
- [ ] Keep redirects in place for at least 1 year
- [ ] Monitor 404 errors in Search Console post-launch
- [ ] Map 1:1 where possible (old URL → equivalent new URL)
### For Deleted Content
- [ ] Use 301 if relevant replacement exists
- [ ] Use 404 if no replacement and might return
- [ ] Use 410 if permanently gone, never returning
- [ ] Don't redirect to irrelevant pages (creates soft 404)
- [ ] Create custom 404 page with search and navigation
### For Maintenance
- [ ] Use 503 with Retry-After header
- [ ] Keep maintenance window brief (<24 hours)
- [ ] Create user-friendly maintenance page
- [ ] Inform users of expected downtime
- [ ] Monitor Search Console for crawl issues
### For Performance
- [ ] Minimize redirect chains
- [ ] Fix redirect loops immediately
- [ ] Monitor 5xx errors closely
- [ ] Set up alerts for sudden status code changes
- [ ] Optimize to reduce 504 timeouts
---
## Status Code Monitoring
### Key Metrics to Track
**In Search Console**:
- Crawl errors by type
- Server errors (5xx) trend
- Not found (404) trend
- Redirect errors
**In analytics**:
- 404 page views
- Entry pages with high exit rate (might be errors)
- Sudden traffic drops (could indicate status code issues)
**Server logs**:
- Status code distribution
- 5xx error frequency
- Unusual patterns
### Setting Up Alerts
**Alert on**:
- Sudden increase in 5xx errors
- Increase in 404 errors
- New redirect chains
- Crawl error spikes in Search Console
**Tools**:
- Google Search Console email alerts
- Server monitoring (UptimeRobot, Pingdom)
- Log analysis tools
- Custom scripts for log monitoring
---
## Quick Reference Table
| Code | Name | SEO Impact | Use When | Passes Link Equity? |
|------|------|------------|----------|---------------------|
| 200 | OK | ✅ Positive | Page works normally | N/A (original URL) |
| 301 | Moved Permanently | ✅ Positive | Permanent URL change | ✅ Yes (~90-99%) |
| 302 | Found | ⚠️ Neutral | Temporary redirect | ❌ No |
| 307 | Temporary Redirect | ⚠️ Neutral | Temporary (method preserved) | ❌ No |
| 308 | Permanent Redirect | ✅ Positive | Permanent (method preserved) | ✅ Yes |
| 404 | Not Found | ⚠️ Neutral | Content doesn't exist | N/A |
| 410 | Gone | ⚠️ Neutral | Permanent deletion | N/A |
| 403 | Forbidden | ❌ Negative | Access denied | N/A |
| 401 | Unauthorized | ❌ Negative | Auth required | N/A |
| 500 | Internal Server Error | ❌ Negative | Server error | N/A |
| 502 | Bad Gateway | ❌ Negative | Upstream error | N/A |
| 503 | Service Unavailable | ⚠️ Neutral | Temporary downtime | N/A |
| 504 | Gateway Timeout | ❌ Negative | Timeout error | N/A |
---
## Status Code Testing Checklist
Before launching site changes:
- [ ] Test all redirects return correct status codes
- [ ] Verify no redirect chains exist
- [ ] Check no redirect loops present
- [ ] Confirm important pages return 200
- [ ] Ensure deleted pages return 404/410 (not 200)
- [ ] Verify 301s point to correct destinations
- [ ] Test with multiple user-agents
- [ ] Check status codes in Search Console
- [ ] Monitor server logs for unusual patterns
- [ ] Set up alerts for error spikes
---
## Technical SEO Severity Framework
### Issue Classification
| Severity | Impact Description | Examples | Response Time |
|----------|-------------------|---------|---------------|
| **Critical** | Prevents indexation or causes site-wide issues | Robots.txt blocking site, noindex on key pages, site-wide 500 errors | Same day |
| **High** | Significantly impacts rankings or user experience | Slow page speed, missing hreflang, duplicate content, redirect chains | Within 1 week |
| **Medium** | Affects specific pages or has moderate impact | Missing schema, suboptimal canonicals, thin content pages | Within 1 month |
| **Low** | Minor optimization opportunities | Image compression, minor CLS issues, non-essential schema missing | Next quarter |
### Technical Debt Prioritization Matrix
| Factor | Weight | Assessment |
|--------|--------|-----------|
| Pages affected | 30% | Site-wide > Section > Single page |
| Revenue impact | 25% | Revenue pages > Blog > Utility pages |
| Fix difficulty | 20% | Config change < Template change < Code rewrite |
| Competitive impact | 15% | Competitors passing you > parity > you ahead |
| Crawl budget waste | 10% | High waste > Moderate > Minimal |
## Core Web Vitals Optimization Quick Reference
### LCP (Largest Contentful Paint) Optimization
| Root Cause | Detection | Fix |
|-----------|-----------|-----|
| Large hero image | PageSpeed Insights | Serve WebP, resize to container, add loading="lazy" |
| Render-blocking CSS/JS | DevTools Coverage | Defer non-critical, inline critical CSS |
| Slow server response | TTFB >800ms | CDN, server-side caching, upgrade hosting |
| Third-party scripts | DevTools Network | Defer/async, use facade pattern |
### CLS (Cumulative Layout Shift) Optimization
| Root Cause | Detection | Fix |
|-----------|-----------|-----|
| Images without dimensions | DevTools | Add explicit width/height attributes |
| Ads/embeds without reserved space | Visual inspection | Set min-height on containers |
| Web fonts causing FOUT | DevTools | font-display: swap + preload fonts |
| Dynamic content injection | Visual inspection | Reserve space with CSS |
### INP (Interaction to Next Paint) Optimization
| Root Cause | Detection | Fix |
|-----------|-----------|-----|
| Long JavaScript tasks | DevTools Performance | Break into smaller tasks, use requestIdleCallback |
| Heavy event handlers | DevTools | Debounce/throttle, use passive listeners |
| Main thread blocking | DevTools | Web workers for heavy computation |

View file

@ -0,0 +1,717 @@
# Robots.txt Reference Guide
Complete reference for creating, testing, and troubleshooting robots.txt files.
## Syntax Guide
### Basic Structure
```
User-agent: [bot name]
Disallow: [path to block]
Allow: [path to allow]
Sitemap: [sitemap URL]
Crawl-delay: [seconds]
```
---
## Core Directives
### User-agent
Specifies which bot the rules apply to.
**Syntax**: `User-agent: [bot-name]`
**Common user-agents**:
```
User-agent: * # All bots
User-agent: Googlebot # Google's crawler
User-agent: Bingbot # Bing's crawler
User-agent: GPTBot # OpenAI's crawler
User-agent: CCBot # Common Crawl bot
User-agent: anthropic-ai # Anthropic's crawler
User-agent: PerplexityBot # Perplexity AI crawler
User-agent: ClaudeBot # Claude's web crawler
```
**Multiple user-agents**: Group rules by leaving no blank lines between user-agent declarations.
```
User-agent: Googlebot
User-agent: Bingbot
Disallow: /admin/
```
---
### Disallow
Blocks bots from crawling specified paths.
**Syntax**: `Disallow: [path]`
**Examples**:
```
Disallow: / # Block entire site
Disallow: /admin/ # Block admin directory
Disallow: /private # Block private directory (and subdirectories)
Disallow: /*.pdf$ # Block all PDF files
Disallow: /*? # Block all URLs with parameters
Disallow: # Allow everything (empty disallow)
```
**Path matching**:
- `/` at end = block directory and all subdirectories
- Without `/` at end = block all paths starting with string
- `*` = wildcard, matches any sequence
- `$` = end of URL
---
### Allow
Explicitly allows crawling (overrides Disallow).
**Syntax**: `Allow: [path]`
**Common use**: Allow specific subdirectories within blocked parent.
```
User-agent: *
Disallow: /admin/
Allow: /admin/public/
```
**Note**: Allow is not standard but supported by Google, Bing, and most major crawlers.
---
### Sitemap
Specifies location of XML sitemap.
**Syntax**: `Sitemap: [absolute URL]`
**Examples**:
```
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap_index.xml
Sitemap: https://example.com/blog/sitemap.xml
```
**Best practices**:
- Use absolute URLs (not relative)
- Can include multiple Sitemap directives
- Place at end of file
- Submit same sitemap(s) to Google Search Console
---
### Crawl-delay
Adds delay between requests (seconds).
**Syntax**: `Crawl-delay: [seconds]`
**Example**:
```
User-agent: *
Crawl-delay: 10
```
**Warning**: Not supported by Googlebot (use Search Console rate limiting instead). Supported by Bing, Yandex, and others.
---
## Common Configurations
### 1. Allow All Bots (Default)
```
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml
```
Use when you want all bots to crawl entire site.
---
### 2. Block All Bots
```
User-agent: *
Disallow: /
```
Use for development/staging sites or private content.
---
### 3. Block Specific Directories
```
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /temp/
Disallow: /cgi-bin/
Sitemap: https://example.com/sitemap.xml
```
Standard configuration blocking admin and utility directories.
---
### 4. Block All AI Crawlers
```
# Block OpenAI
User-agent: GPTBot
Disallow: /
# Block Anthropic
User-agent: anthropic-ai
User-agent: ClaudeBot
Disallow: /
# Block Common Crawl
User-agent: CCBot
Disallow: /
# Block Perplexity
User-agent: PerplexityBot
Disallow: /
# Block Google-Extended (Bard training)
User-agent: Google-Extended
Disallow: /
# Allow search engines
User-agent: Googlebot
Disallow:
User-agent: Bingbot
Disallow:
Sitemap: https://example.com/sitemap.xml
```
Use when you want search indexing but not AI training.
---
### 5. Allow Search Engines, Block Everything Else
```
# Block all by default
User-agent: *
Disallow: /
# Allow Google
User-agent: Googlebot
Disallow:
# Allow Bing
User-agent: Bingbot
Disallow:
# Allow DuckDuckGo
User-agent: DuckDuckBot
Disallow:
Sitemap: https://example.com/sitemap.xml
```
---
### 6. Block URL Parameters
```
User-agent: *
Disallow: /*? # Block all URLs with parameters
Allow: /? # Allow homepage with parameters
Sitemap: https://example.com/sitemap.xml
```
Prevents duplicate content from parameter variations.
---
### 7. Block File Types
```
User-agent: *
Disallow: /*.pdf$
Disallow: /*.doc$
Disallow: /*.xls$
Disallow: /*.zip$
Sitemap: https://example.com/sitemap.xml
```
---
### 8. E-commerce Configuration
```
User-agent: *
# Block search/filter pages
Disallow: /*?q=
Disallow: /*?sort=
Disallow: /*?filter=
# Block account pages
Disallow: /account/
Disallow: /cart/
Disallow: /checkout/
# Block admin
Disallow: /admin/
# Allow product pages
Allow: /products/
Sitemap: https://example.com/sitemap.xml
```
---
### 9. WordPress Configuration
```
User-agent: *
# WordPress core
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
# WordPress directories
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
# Allow uploads
Allow: /wp-content/uploads/
# Block parameter pages
Disallow: /?s=
Disallow: /feed/
Disallow: /trackback/
Sitemap: https://example.com/sitemap_index.xml
```
---
### 10. Shopify Configuration
```
User-agent: *
# Block admin and account
Disallow: /admin
Disallow: /account
Disallow: /cart
Disallow: /checkout
# Block search
Disallow: /search
# Block collections with filters
Disallow: /collections/*+*
Disallow: /collections/*?*
Sitemap: https://example.com/sitemap.xml
```
---
## Platform-Specific Templates
### Wix
```
User-agent: *
Disallow: /_api/
Disallow: /_partials/
Sitemap: https://example.com/sitemap.xml
```
### Squarespace
```
User-agent: *
Disallow: /config/
Disallow: /search
Sitemap: https://example.com/sitemap.xml
```
### Webflow
```
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
```
### Drupal
```
User-agent: *
Disallow: /admin/
Disallow: /user/
Disallow: /node/add/
Disallow: /?q=
Sitemap: https://example.com/sitemap.xml
```
---
## Testing and Validation
### Google Search Console Robots.txt Tester
1. Go to: Search Console → Settings → robots.txt
2. View current robots.txt
3. Test specific URLs
4. See which user-agents are affected
### Manual Testing
Test URL pattern: `https://example.com/robots.txt`
Check file is:
- Accessible (returns 200 status)
- Plain text format
- UTF-8 encoded
- Located at root domain
- No more than 500KB (Google limit)
### Common Testing Scenarios
Test these URLs in tester:
- Homepage: `/`
- Product page: `/products/example`
- Admin page: `/admin/`
- Parameter page: `/search?q=test`
- File: `/document.pdf`
---
## Common Mistakes and Fixes
### Mistake 1: Blocking CSS/JS Files
**Wrong**:
```
User-agent: *
Disallow: /css/
Disallow: /js/
```
**Why it's wrong**: Google needs CSS/JS to render pages properly.
**Fix**:
```
User-agent: *
Allow: /css/
Allow: /js/
```
---
### Mistake 2: Using Relative URLs for Sitemap
**Wrong**:
```
Sitemap: /sitemap.xml
```
**Fix**:
```
Sitemap: https://example.com/sitemap.xml
```
---
### Mistake 3: Spaces in Directives
**Wrong**:
```
User-agent : Googlebot
Disallow : /admin/
```
**Fix** (no spaces before colons):
```
User-agent: Googlebot
Disallow: /admin/
```
---
### Mistake 4: Forgetting Trailing Slash
**Intention**: Block /admin directory
**Wrong**:
```
Disallow: /admin
```
**Result**: Also blocks /admin-panel, /administrator, etc.
**Fix**:
```
Disallow: /admin/
```
---
### Mistake 5: Blocking Entire Site Accidentally
**Wrong**:
```
User-agent: *
Disallow: /
Allow: /blog/
```
**Why it's wrong**: Many bots don't support Allow directive.
**Fix**: Use noindex meta tags for pages you don't want indexed, not robots.txt.
---
### Mistake 6: Not Blocking Development Environments
**Wrong**: No robots.txt on staging.example.com
**Result**: Staging site gets indexed.
**Fix**:
```
User-agent: *
Disallow: /
```
On all non-production environments.
---
### Mistake 7: Case Sensitivity Errors
**Note**: Directives are case-insensitive, but paths are case-sensitive.
**Example**:
```
Disallow: /Admin/ # Blocks /Admin/ but not /admin/
```
**Fix**: Block both if needed:
```
Disallow: /admin/
Disallow: /Admin/
```
---
## Advanced Patterns
### Wildcard Examples
```
# Block all PDFs
Disallow: /*.pdf$
# Block all URLs with parameters
Disallow: /*?
# Block all URLs ending in .php
Disallow: /*.php$
# Block all admin paths regardless of location
Disallow: /*/admin/
```
### Multiple Sitemaps
```
Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-posts.xml
Sitemap: https://example.com/sitemap-products.xml
```
### Bot-Specific Rules
```
# Aggressive bot - slow it down
User-agent: BadBot
Crawl-delay: 60
Disallow: /
# Good bots - full access
User-agent: Googlebot
User-agent: Bingbot
Disallow:
# Default for others
User-agent: *
Crawl-delay: 10
Disallow: /admin/
```
---
## Robots.txt vs Meta Robots vs X-Robots-Tag
### When to use each:
**Robots.txt**:
- Block crawling of entire directories
- Reduce crawl budget waste
- Block parameter variations
- Does NOT prevent indexing if page is linked from elsewhere
**Meta robots tag**:
- Prevent specific pages from being indexed
- Control snippet display
- Control following links
- Example: `<meta name="robots" content="noindex,follow">`
**X-Robots-Tag HTTP header**:
- Control non-HTML files (PDFs, images)
- Server-level control
- Example: `X-Robots-Tag: noindex`
**Important**: If you don't want a page indexed, use noindex (meta tag or header), NOT robots.txt.
---
## Monitoring and Maintenance
### Regular Checks
**Monthly**:
- [ ] Verify robots.txt is accessible
- [ ] Check Search Console for blocked URLs
- [ ] Review crawl stats for blocked resources
**Quarterly**:
- [ ] Audit blocked paths - still relevant?
- [ ] Check for new admin/private sections to block
- [ ] Review AI crawler landscape (new bots?)
**After site changes**:
- [ ] Update robots.txt if URL structure changed
- [ ] Test new sections (should they be blocked?)
- [ ] Verify sitemaps still referenced
### Search Console Monitoring
Check these reports:
- **Coverage** → Excluded by robots.txt
- **Settings** → Crawl stats
- **URL Inspection** → Test specific URLs
---
## Robots.txt Checklist
Before deploying:
- [ ] File is named exactly `robots.txt` (lowercase)
- [ ] Located at root domain (`example.com/robots.txt`)
- [ ] Plain text format (not HTML or PDF)
- [ ] UTF-8 encoding
- [ ] No HTML tags in file
- [ ] All paths start with `/`
- [ ] Sitemap URLs are absolute
- [ ] No spaces before colons
- [ ] Tested in Search Console robots.txt tester
- [ ] Not blocking important CSS/JS/images
- [ ] Not blocking content you want indexed
- [ ] Trailing slashes used correctly for directories
- [ ] Wildcard patterns tested
- [ ] File size under 500KB
---
## Emergency Fixes
### Accidentally Blocked Entire Site
**Symptom**: All pages blocked in Search Console
**Fix**:
1. Edit robots.txt to:
```
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xml
```
2. Test in Search Console
3. Request urgent recrawl for key pages
4. Monitor Coverage report for recovery
**Recovery time**: 1-7 days
---
### Blocked CSS/JS Files
**Symptom**: "Blocked by robots.txt" in Mobile-Friendly Test
**Fix**:
1. Add Allow directives:
```
User-agent: *
Allow: /css/
Allow: /js/
Allow: /wp-content/uploads/
```
2. Test in robots.txt tester
3. Request re-render in URL Inspection tool
---
### Staging Site Indexed
**Symptom**: staging.example.com appears in search results
**Fix**:
1. Add to staging robots.txt:
```
User-agent: *
Disallow: /
```
2. Add noindex meta tag to all staging pages
3. Remove staging URLs in Search Console (Removals tool)
---
## Resources and Tools
**Testing**:
- Google Search Console robots.txt tester
- Bing Webmaster Tools robots.txt analyzer
- Technical SEO browser extensions
**Validation**:
- https://www.google.com/webmasters/tools/robots-testing-tool
- https://en.ryte.com/free-tools/robots-txt/
- https://technicalseo.com/tools/robots-txt/
**Documentation**:
- Google: https://developers.google.com/search/docs/crawling-indexing/robots/intro
- Bing: https://www.bing.com/webmasters/help/robots-txt-validation
- Robots.txt spec: https://www.robotstxt.org/

View file

@ -0,0 +1,169 @@
# Technical SEO Checker — Worked Example & Checklist
Referenced from [SKILL.md](https://github.com/aaron-he-zhu/seo-geo-claude-skills/blob/main/optimize/technical-seo-checker/SKILL.md).
---
## Worked Example
**User**: "Check the technical SEO of cloudhosting.com"
**Output**:
```markdown
# Technical SEO Audit Report
**Domain**: cloudhosting.com
**Audit Date**: 2024-09-15
**Pages Analyzed**: 312
## Crawlability Analysis
### Robots.txt Review
**URL**: cloudhosting.com/robots.txt
**Status**: Found
| Check | Status | Notes |
|-------|--------|-------|
| File exists | ✅ | 200 response |
| Valid syntax | ⚠️ | Wildcard pattern `Disallow: /*?` too aggressive — blocks faceted pages |
| Sitemap declared | ❌ | No Sitemap directive in robots.txt |
| Important pages blocked | ⚠️ | /pricing/ blocked by `Disallow: /pricing` rule |
| Assets blocked | ✅ | CSS/JS accessible |
**Issues Found**:
- Sitemap URL not declared in robots.txt
- `/pricing/` inadvertently blocked — high-value commercial page
### XML Sitemap Review
**Sitemap URL**: cloudhosting.com/sitemap.xml
**Status**: Found (not referenced in robots.txt)
| Check | Status | Notes |
|-------|--------|-------|
| Sitemap exists | ✅ | Valid XML, 287 URLs |
| Only indexable URLs | ❌ | 23 noindex URLs included |
| Includes lastmod | ⚠️ | All dates set to 2023-01-01 — not accurate |
**Crawlability Score**: 5/10
## Performance Analysis
### Core Web Vitals
| Metric | Mobile | Desktop | Target | Status |
|--------|--------|---------|--------|--------|
| LCP (Largest Contentful Paint) | 4.8s | 2.1s | <2.5s | Mobile / Desktop |
| FID (First Input Delay) | 45ms | 12ms | <100ms | / |
| CLS (Cumulative Layout Shift) | 0.24 | 0.08 | <0.1 | Mobile / Desktop |
| INP (Interaction to Next Paint) | 380ms | 140ms | <200ms | Mobile / Desktop |
### Additional Performance Metrics
| Metric | Value | Status |
|--------|-------|--------|
| Time to First Byte (TTFB) | 1,240ms | ❌ |
| Page Size | 3.8MB | ❌ |
| Requests | 94 | ⚠️ |
**LCP Issues**:
- Uncompressed hero image (2.4MB PNG): Convert to WebP, est. save 1.9MB
- No CDN detected: TTFB 1,240ms from origin server
**CLS Issues**:
- Ad banner at top of page injects without reserved height (0.18 shift contribution)
**Performance Score**: 3/10
## Security Analysis
### HTTPS Status
| Check | Status | Notes |
|-------|--------|-------|
| SSL certificate valid | ✅ | Expires: 2025-03-22 |
| HTTPS enforced | ⚠️ | http://cloudhosting.com returns 200 instead of 301 redirect |
| Mixed content | ❌ | 7 images loaded over HTTP on /features/ page |
| HSTS enabled | ❌ | Header not present |
**Security Score**: 5/10
## Structured Data Analysis
### Schema Markup Found
| Schema Type | Pages | Valid | Errors |
|-------------|-------|-------|--------|
| Organization | 1 (homepage) | ✅ | None |
| Article | 0 | — | Missing on 48 blog posts |
| Product | 0 | — | Missing on 5 plan pages |
| FAQ | 0 | — | Missing on 12 pages with FAQ content |
**Structured Data Score**: 3/10
## Overall Technical Health: 42/100
```
Score Breakdown:
█████░░░░░ Crawlability: 5/10
██████░░░░ Indexability: 6/10
███░░░░░░░ Performance: 3/10
██████░░░░ Mobile: 6/10
█████░░░░░ Security: 5/10
██████░░░░ URL Structure: 6/10
███░░░░░░░ Structured Data: 3/10
```
## Priority Issues
### 🔴 Critical (Fix Immediately)
1. **Mobile LCP 4.8s (target <2.5s)** — Compress hero image to WebP (est. save 1.9MB) and implement a CDN to reduce TTFB from 1,240ms to <400ms.
### 🟡 Important (Fix Soon)
2. **HTTP not redirecting to HTTPS** — Add 301 redirect from http:// to https:// and enable HSTS header. 7 mixed-content images on /features/ need URL updates.
### 🟢 Minor (Optimize)
3. **No Article/FAQ schema on blog posts** — Add Article schema to 48 blog posts and FAQ schema to 12 FAQ pages for rich result eligibility.
```
---
## Technical SEO Checklist
```markdown
### Crawlability
- [ ] robots.txt is valid and not blocking important content
- [ ] XML sitemap exists and is submitted to ~~search console
- [ ] No crawl errors in ~~search console
- [ ] No redirect chains or loops
### Indexability
- [ ] Important pages are indexable
- [ ] Canonical tags are correct
- [ ] No duplicate content issues
- [ ] Pagination is handled correctly
### Performance
- [ ] Core Web Vitals pass
- [ ] Page speed under 3 seconds
- [ ] Images are optimized
- [ ] JS/CSS are minified
### Mobile
- [ ] Mobile-friendly test passes
- [ ] Viewport is configured
- [ ] Touch elements are properly sized
### Security
- [ ] HTTPS is enforced
- [ ] SSL certificate is valid
- [ ] No mixed content
- [ ] Security headers present
### Structure
- [ ] URLs are clean and descriptive
- [ ] Site architecture is logical
- [ ] Internal linking is strong
```

View file

@ -0,0 +1,311 @@
# Technical SEO Checker — Output Templates
Detailed output templates for technical-seo-checker steps 3-9. Referenced from [SKILL.md](https://github.com/aaron-he-zhu/seo-geo-claude-skills/blob/main/optimize/technical-seo-checker/SKILL.md).
---
## Step 3: Audit Site Speed & Core Web Vitals
```markdown
## Performance Analysis
### Core Web Vitals
| Metric | Mobile | Desktop | Target | Status |
|--------|--------|---------|--------|--------|
| LCP (Largest Contentful Paint) | [X]s | [X]s | <2.5s | // |
| FID (First Input Delay) | [X]ms | [X]ms | <100ms | // |
| CLS (Cumulative Layout Shift) | [X] | [X] | <0.1 | // |
| INP (Interaction to Next Paint) | [X]ms | [X]ms | <200ms | // |
### Additional Performance Metrics
| Metric | Value | Status |
|--------|-------|--------|
| Time to First Byte (TTFB) | [X]ms | ✅/⚠️/❌ |
| First Contentful Paint (FCP) | [X]s | ✅/⚠️/❌ |
| Speed Index | [X] | ✅/⚠️/❌ |
| Total Blocking Time | [X]ms | ✅/⚠️/❌ |
| Page Size | [X]MB | ✅/⚠️/❌ |
| Requests | [X] | ✅/⚠️/❌ |
### Performance Issues
**LCP Issues**:
- [Issue]: [Impact] - [Solution]
- [Issue]: [Impact] - [Solution]
**CLS Issues**:
- [Issue]: [Impact] - [Solution]
**Resource Loading**:
| Resource Type | Count | Size | Issues |
|---------------|-------|------|--------|
| Images | [X] | [X]MB | [notes] |
| JavaScript | [X] | [X]MB | [notes] |
| CSS | [X] | [X]KB | [notes] |
| Fonts | [X] | [X]KB | [notes] |
### Optimization Recommendations
**High Impact**:
1. [Recommendation] - Est. improvement: [X]s
2. [Recommendation] - Est. improvement: [X]s
**Medium Impact**:
1. [Recommendation]
2. [Recommendation]
**Performance Score**: [X]/10
```
---
## Step 4: Audit Mobile-Friendliness
```markdown
## Mobile Optimization Analysis
### Mobile-Friendly Test
| Check | Status | Notes |
|-------|--------|-------|
| Mobile-friendly overall | ✅/❌ | [notes] |
| Viewport configured | ✅/❌ | [viewport tag] |
| Text readable | ✅/⚠️/❌ | Font size: [X]px |
| Tap targets sized | ✅/⚠️/❌ | [notes] |
| Content fits viewport | ✅/❌ | [notes] |
| No horizontal scroll | ✅/❌ | [notes] |
### Responsive Design Check
| Element | Desktop | Mobile | Issues |
|---------|---------|--------|--------|
| Navigation | [status] | [status] | [notes] |
| Images | [status] | [status] | [notes] |
| Forms | [status] | [status] | [notes] |
| Tables | [status] | [status] | [notes] |
| Videos | [status] | [status] | [notes] |
### Mobile-First Indexing
| Check | Status | Notes |
|-------|--------|-------|
| Mobile version has all content | ✅/⚠️/❌ | [notes] |
| Mobile has same structured data | ✅/⚠️/❌ | [notes] |
| Mobile has same meta tags | ✅/⚠️/❌ | [notes] |
| Mobile images have alt text | ✅/⚠️/❌ | [notes] |
**Mobile Score**: [X]/10
```
---
## Step 5: Audit Security & HTTPS
```markdown
## Security Analysis
### HTTPS Status
| Check | Status | Notes |
|-------|--------|-------|
| SSL certificate valid | ✅/❌ | Expires: [date] |
| HTTPS enforced | ✅/❌ | [redirects properly?] |
| Mixed content | ✅/⚠️/❌ | [X] issues |
| HSTS enabled | ✅/⚠️ | [notes] |
| Certificate chain | ✅/⚠️/❌ | [notes] |
### Security Headers
| Header | Present | Value | Recommended |
|--------|---------|-------|-------------|
| Content-Security-Policy | ✅/❌ | [value] | [recommendation] |
| X-Frame-Options | ✅/❌ | [value] | DENY or SAMEORIGIN |
| X-Content-Type-Options | ✅/❌ | [value] | nosniff |
| X-XSS-Protection | ✅/❌ | [value] | 1; mode=block |
| Referrer-Policy | ✅/❌ | [value] | [recommendation] |
**Security Score**: [X]/10
```
---
## Step 6: Audit URL Structure
```markdown
## URL Structure Analysis
### URL Pattern Review
| Check | Status | Notes |
|-------|--------|-------|
| HTTPS URLs | ✅/⚠️/❌ | [X]% HTTPS |
| Lowercase URLs | ✅/⚠️/❌ | [notes] |
| No special characters | ✅/⚠️/❌ | [notes] |
| Readable/descriptive | ✅/⚠️/❌ | [notes] |
| Appropriate length | ✅/⚠️/❌ | Avg: [X] chars |
| Keywords in URLs | ✅/⚠️/❌ | [notes] |
| Consistent structure | ✅/⚠️/❌ | [notes] |
### URL Issues Found
| Issue Type | Count | Examples |
|------------|-------|----------|
| Dynamic parameters | [X] | [URLs] |
| Session IDs in URLs | [X] | [URLs] |
| Uppercase characters | [X] | [URLs] |
| Special characters | [X] | [URLs] |
| Very long URLs (>100) | [X] | [URLs] |
### Redirect Analysis
| Check | Status | Notes |
|-------|--------|-------|
| Redirect chains | [X] found | [max chain length] |
| Redirect loops | [X] found | [URLs] |
| 302 → 301 needed | [X] found | [URLs] |
| Broken redirects | [X] found | [URLs] |
**URL Score**: [X]/10
```
---
## Step 7: Audit Structured Data
> **CORE-EEAT alignment**: Schema markup quality maps to O05 (Schema Markup) in the CORE-EEAT benchmark. See [content-quality-auditor](https://github.com/aaron-he-zhu/seo-geo-claude-skills/blob/main/cross-cutting/content-quality-auditor/SKILL.md) for full content quality audit.
```markdown
## Structured Data Analysis
### Schema Markup Found
| Schema Type | Pages | Valid | Errors |
|-------------|-------|-------|--------|
| [Type 1] | [X] | ✅/❌ | [errors] |
| [Type 2] | [X] | ✅/❌ | [errors] |
### Validation Results
**Errors**:
- [Error 1]: [affected pages] - [solution]
- [Error 2]: [affected pages] - [solution]
**Warnings**:
- [Warning 1]: [notes]
### Missing Schema Opportunities
| Page Type | Current Schema | Recommended |
|-----------|----------------|-------------|
| Blog posts | [current] | Article + FAQ |
| Products | [current] | Product + Review |
| Homepage | [current] | Organization |
**Structured Data Score**: [X]/10
```
---
## Step 8: Audit International SEO (if applicable)
```markdown
## International SEO Analysis
### Hreflang Implementation
| Check | Status | Notes |
|-------|--------|-------|
| Hreflang tags present | ✅/❌ | [notes] |
| Self-referencing | ✅/⚠️/❌ | [notes] |
| Return tags present | ✅/⚠️/❌ | [notes] |
| Valid language codes | ✅/⚠️/❌ | [notes] |
| x-default tag | ✅/⚠️ | [notes] |
### Language/Region Targeting
| Language | URL | Hreflang | Status |
|----------|-----|----------|--------|
| [en-US] | [URL] | [tag] | ✅/⚠️/❌ |
| [es-ES] | [URL] | [tag] | ✅/⚠️/❌ |
**International Score**: [X]/10
```
---
## Step 9: Generate Technical Audit Summary
```markdown
# Technical SEO Audit Report
**Domain**: [domain]
**Audit Date**: [date]
**Pages Analyzed**: [X]
## Overall Technical Health: [X]/100
```
Score Breakdown:
████████░░ Crawlability: 8/10
███████░░░ Indexability: 7/10
█████░░░░░ Performance: 5/10
████████░░ Mobile: 8/10
█████████░ Security: 9/10
██████░░░░ URL Structure: 6/10
█████░░░░░ Structured Data: 5/10
```
## Critical Issues (Fix Immediately)
1. **[Issue]**: [Impact]
- Affected: [pages/scope]
- Solution: [specific fix]
- Priority: 🔴 Critical
2. **[Issue]**: [Impact]
- Affected: [pages/scope]
- Solution: [specific fix]
- Priority: 🔴 Critical
## High Priority Issues
1. **[Issue]**: [Solution]
2. **[Issue]**: [Solution]
## Medium Priority Issues
1. **[Issue]**: [Solution]
2. **[Issue]**: [Solution]
## Quick Wins
These can be fixed quickly for immediate improvement:
1. [Quick fix 1]
2. [Quick fix 2]
3. [Quick fix 3]
## Implementation Roadmap
### Week 1: Critical Fixes
- [ ] [Task 1]
- [ ] [Task 2]
### Week 2-3: High Priority
- [ ] [Task 1]
- [ ] [Task 2]
### Week 4+: Optimization
- [ ] [Task 1]
- [ ] [Task 2]
## Monitoring Recommendations
Set up alerts for:
- Core Web Vitals drops
- Crawl error spikes
- Index coverage changes
- Security issues
```