From c0e1584935701a35acae4dabb62ca82a2121bb72 Mon Sep 17 00:00:00 2001 From: Arjun <6592213+arkml@users.noreply.github.com> Date: Thu, 26 Mar 2026 16:17:45 +0530 Subject: [PATCH] higher precision --- .../core/src/knowledge/labeling_agent.ts | 99 +++++++++++++++---- .../packages/core/src/knowledge/tag_system.ts | 19 ++-- 2 files changed, 92 insertions(+), 26 deletions(-) diff --git a/apps/x/packages/core/src/knowledge/labeling_agent.ts b/apps/x/packages/core/src/knowledge/labeling_agent.ts index 03941940..c9367f39 100644 --- a/apps/x/packages/core/src/knowledge/labeling_agent.ts +++ b/apps/x/packages/core/src/knowledge/labeling_agent.ts @@ -57,38 +57,101 @@ ${renderTagSystemForEmails()} # Instructions 1. For each email file provided in the message, read its content carefully. -2. Classify the email using the taxonomy above. Think like a startup founder triaging their inbox: +2. Classify the email using the taxonomy above. Think like a **YC startup founder** triaging their inbox — your time is your scarcest resource: - **Relationship**: Who is this from? An investor, customer, team member, vendor, candidate, etc.? - **Topic**: What is this about? Legal, finance, hiring, fundraising, security, infrastructure, etc.? - **Email Type**: Is this a warm intro or a followup on an existing conversation? - - **Noise**: Is this a newsletter, cold outreach, promotion, automated notification, digest, receipt, or other low-signal email? If so, label it with the appropriate noise tag — this will skip note creation. - - **Action**: Does this need a response (action-required), is it time-sensitive (urgent), or are you waiting on them (waiting)? -3. Be accurate and conservative — only apply labels that clearly fit. -4. Use \`workspace-edit\` to prepend YAML frontmatter to the file. The oldString should be the first line of the file (the \`# Subject\` heading), and the newString should be the frontmatter followed by that same first line. -5. Always include \`processed: true\` and \`labeled_at\` with the current ISO timestamp. -6. If the email already has frontmatter (starts with \`---\`), skip it. + - **Filter (Noise)**: Is this email noise? **Apply ALL applicable filter tags.** If even one noise tag is present the email is skipped — noise overrides everything. Common noise: + - Cold outreach / unsolicited service pitches / "YC exclusive" deals / freelancers offering free work + - Newsletters, industry reports, webinar invitations, product tips from vendors + - Promotions, marketing, event invitations you did not register for, startup program upsells + - Automated notifications (email verifications, recording uploads, platform policy changes, expired OTPs) + - Transactional confirmations (salary disbursements, tax payments, GST filings, TDS workings, invoice-sharing threads) + - Spam and spam moderation digests + - **Action**: Does this need a response (\`action-required\`), is it time-sensitive (\`urgent\`), or are you waiting on them (\`waiting\`)? Use \`""\` if none apply. **Do NOT use \`fyi\` as an action value** — it is not a valid action tag. +3. **Apply noise tags aggressively.** Noise tags can and should coexist with relationship and topic tags. A salary confirmation from your finance team should have BOTH \`relationship: ['team']\` AND \`filter: ['receipt']\`. The noise tag determines whether a note is created — it overrides relationship and topic signals. +4. Be accurate — only apply labels that clearly fit. But when an email IS noise, always add the noise tag even when other tags are present. +5. Use \`workspace-edit\` to prepend YAML frontmatter to the file. The oldString should be the first line of the file (the \`# Subject\` heading), and the newString should be the frontmatter followed by that same first line. +6. Always include \`processed: true\` and \`labeled_at\` with the current ISO timestamp. +7. If the email already has frontmatter (starts with \`---\`), skip it. + +# The Founder Signal Test + +Before finalizing labels, ask: **"Would a busy YC founder want a note about this in their knowledge system?"** + +**YES — create a note** if the email: +- Requires a decision or response from the founder +- Updates an active business relationship (customer deal, investor conversation, partner integration) +- Contains information that will be referenced later (pricing, terms, deadlines, compliance requirements) +- Has action items for the team (e.g. standup notes, meeting notes with to-dos) +- Presents a genuine opportunity worth evaluating (accelerator, partnership, relevant hire) +- Flags a risk that needs attention (security vulnerability, legal issue, compliance blocker) +- Is from a vendor you are actively engaged with on an ongoing process (e.g. your compliance assessor following up after a call you participated in) + +**NO — skip it** if the email: +- Confirms a transaction that already happened with no open decision (payment received, tax filed, salary disbursed, invoice shared) +- Is a system-generated notification with no decision needed (email verification, recording uploaded, policy update, expired OTP) +- Is unsolicited outreach from someone you have never engaged with — regardless of how personalized it sounds +- Is a newsletter, industry report, webinar invitation, or product tips email +- Is marketing or promotional content, including from vendors you use +- Is a spam digest or Google Groups moderation report +- Is routine operational correspondence where the transaction is complete and no follow-up remains # Cold Outreach Detection (Critical for Precision) Many emails disguise themselves as real relationships. Before assigning \`vendor\`, \`candidate\`, \`partner\`, or \`followup\`, apply these tests: **It's \`cold-outreach\` (noise), NOT a real relationship, if:** -- The sender is pitching their own product or service to you (design agencies, compliance firms, lead gen tools, dev shops, etc.) — even if they reference your company by name or mention a prior call YOU didn't initiate. +- The sender is pitching their own product or service — design agencies, compliance firms, content/copy writers, dev shops, freelancers, trademark services, company closure/winding-down services, hiring platforms, etc. — even if they reference your company by name, your YC batch, or offer something "free" or "exclusive for YC founders." - The thread consists entirely of the same sender following up on their own unanswered messages. A real followup requires prior two-way engagement. -- A student, job-seeker, or founder cold-emails asking for your time, feedback, or mentorship without a warm intro or a specific open role they're applying to. These are NOT \`candidate\` — they are \`cold-outreach\`. +- A student, job-seeker, freelancer, or founder cold-emails asking for your time, feedback, or offering free work/trials. These are NOT \`candidate\` — they are \`cold-outreach\`. - Someone invites you to an event you didn't sign up for, especially if the email has marketing formatting (tracking links, unsubscribe footers, HTML banners). This is \`promotion\`, not \`event\`. -**It IS a real relationship if:** +**It IS a real relationship (not noise) if:** - You (the inbox owner) are a participant in the thread (you sent a reply, or someone on your team did). - The sender is from a company you are already paying, or they are providing a service under contract (e.g., your law firm, your accountant, your cloud provider support). - The sender was introduced to you by someone you know (warm intro present in the thread). -- The sender references a specific ongoing deal, contract, or project with concrete details (not generic "I noticed your company..."). +- The sender references a specific ongoing engagement with concrete details — e.g., they are your assigned compliance assessor for an audit you initiated, or they are following up after a call you participated in. This is NOT the same as a generic "I noticed your company uses X" pitch. **Key heuristic:** If every message in the thread is FROM the same external person and the inbox owner never replied, it's almost certainly cold outreach — regardless of how personalized it sounds. Label it \`cold-outreach\`. -**Noise array must only contain tags from the Noise category.** Do not put topic or relationship tags (like \`event\`) into the noise array. If an email is an event promotion, use \`promotion\` in noise — not \`event\`. +# Routine Operations & Finance (Often Missed as Noise) -**Spam digests are spam.** If the sender is \`noreply-spamdigest\` (Google Groups spam moderation reports), label it as \`spam\` — Google already flagged these as spam. Do not try to evaluate the held messages inside. +These emails involve real relationships (team, vendor) and real topics (finance) but are **noise** because the transaction is complete and no decision remains. They MUST get a filter tag even though they also have relationship/topic tags: + +- **Salary/payroll confirmations**: "Total salary disbursement is INR X, transfer initiated" → \`filter: ['receipt']\` +- **Tax payment acknowledgements**: Income tax challan confirmations, TDS workings sent for processing → \`filter: ['receipt']\` +- **GST/compliance filing confirmations**: GSTR1 ARN generated, GST OTPs (expired or used) → \`filter: ['receipt']\` +- **Recurring invoice sharing**: Monthly cloud/SaaS invoices shared between team and finance dept → \`filter: ['receipt']\` +- **Payment transfer confirmations**: "Transfer initiated", "Payment confirmed" → \`filter: ['receipt']\` + +# Automated Notifications (Often Missed as Noise) + +System-generated messages that require no decision: + +- **Email verifications**: "Confirm your email address on Slack" → \`filter: ['notification']\` +- **Meeting recordings**: "Your meeting recording is ready in Google Drive" → \`filter: ['notification']\` +- **Platform policy updates**: "Billing permissions are changing starting next month" → \`filter: ['notification']\` +- **Expired OTPs**: One-time passwords for completed actions → \`filter: ['notification']\` + +# Newsletter & Promotion Detection (Often Missed as Noise) + +These are noise even from a vendor you recognize or a platform you use: + +- **Industry reports**: "Report: $1.2T in combined enterprise AI value" → \`filter: ['newsletter']\` +- **Webinar/workshop invitations**: "Register for our knowledge sessions", "5 Slots Left. Pitch Tomorrow." → \`filter: ['promotion']\` +- **Product tips and tutorials**: "Discover more with your free account" → \`filter: ['newsletter']\` +- **Startup program marketing**: "Reminder - Register for AI Architecture sessions" → \`filter: ['promotion']\` + +**Exception:** If a tool your team actively uses is expiring and you need to make an upgrade/cancellation decision, that is NOT noise — it requires action. + +# Spam Digests Are Always Spam + +If the sender is \`noreply-spamdigest\` (Google Groups spam moderation reports), label it \`filter: ['spam']\`. Google already flagged these as spam. Do not evaluate the held messages inside — the digest itself is noise. + +# Filter array must only contain tags from the Noise category + +Do not put topic or relationship tags into the filter array. If an email is an event promotion, use \`promotion\` in filter — not \`event\`. # Frontmatter Format @@ -101,7 +164,7 @@ labels: - fundraising - finance type: intro - noise: + filter: - [] action: action-required processed: true @@ -113,11 +176,13 @@ labeled_at: "2026-02-28T12:00:00Z" - Every label category must be present in the frontmatter, even if empty (use \`[]\` for empty arrays). - \`type\` and \`action\` are single values (strings), not arrays. Use empty string \`""\` if not applicable. -- \`relationship\`, \`topics\`, and \`noise\` are arrays. +- \`relationship\`, \`topics\`, and \`filter\` are arrays. +- The \`action\` field only accepts: \`action-required\`, \`urgent\`, \`waiting\`, or \`""\`. Never use \`fyi\` as an action value. - Use the exact label values from the taxonomy — do not invent new ones. - The \`labeled_at\` timestamp should be the current time in ISO 8601 format. - Process all files in the batch. Do not skip any unless they already have frontmatter. -- **Noise labels are skip signals.** If an email is clearly a newsletter, cold outreach, promotion, digest, receipt, notification, or other noise — label it as such. These emails will NOT create notes. -- **When in doubt between noise and a real relationship/topic, ask:** "Would a busy startup founder want a note about this in their system?" If no, it's noise. +- **Noise labels are skip signals.** If an email is clearly a newsletter, cold outreach, promotion, digest, receipt, notification, or other noise — label it in the \`filter\` array. These emails will NOT create notes. +- **Noise tags coexist with other tags.** An email from your team about salary (\`relationship: ['team']\`, \`topics: ['finance']\`) that is just a payroll confirmation should ALSO have \`filter: ['receipt']\`. The noise tag overrides — it ensures the email is skipped even when relationship/topic tags are present. +- **When in doubt, ask:** "Does this email change any decision, require any follow-up, or update a relationship I need to track?" If no, it's noise — add the appropriate filter tag. `; } diff --git a/apps/x/packages/core/src/knowledge/tag_system.ts b/apps/x/packages/core/src/knowledge/tag_system.ts index fedb069e..8fdd70f0 100644 --- a/apps/x/packages/core/src/knowledge/tag_system.ts +++ b/apps/x/packages/core/src/knowledge/tag_system.ts @@ -70,22 +70,23 @@ const DEFAULT_TAG_DEFINITIONS: TagDefinition[] = [ { tag: 'followup', type: 'email-type', applicability: 'both', noteEffect: 'create', description: 'Following up on a previous two-way conversation (both parties have engaged). A cold sender bumping their own unanswered email is NOT a followup — it is cold-outreach.', example: 'Following up on our call last week. Have you had a chance to review the proposal?' }, // ── Noise — all skip signals in one place ───────────────────────────── + // NOTE: Noise tags override relationship/topic tags. An email can have + // relationship: team AND filter: receipt — the noise tag wins and skips note creation. { tag: 'spam', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Junk and unwanted email, including Google Groups spam moderation digests (from noreply-spamdigest)', example: 'Congratulations! You\'ve won $1,000,000...' }, - { tag: 'promotion', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Marketing offers, sales pitches, and product launches', example: '50% off all items this weekend only!' }, - { tag: 'cold-outreach', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Unsolicited contact from someone you don\'t know — includes sales pitches disguised as follow-ups, strangers asking for your time/feedback, and service providers you never engaged with', example: 'Hi, I noticed your company is growing fast. I\'d love to show you how we can help with...' }, - { tag: 'newsletter', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Newsletters, digests, and subscription emails', example: 'This week in AI: The latest developments in agent frameworks...' }, - { tag: 'notification', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Automated alerts and system notifications with no action needed', example: 'Your password was changed successfully. If this wasn\'t you, contact support.' }, + { tag: 'promotion', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Marketing offers, sales pitches, product launches, event invitations you did not register for, startup program upsells, vendor upgrade campaigns, and webinar/workshop invitations from companies', example: 'Register Now! Experts talk live: AI, Marketplace, Architecture & GTM Sessions Coming Up' }, + { tag: 'cold-outreach', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Unsolicited contact from someone you have no prior engagement with — includes design agencies, compliance firms, content/copy writers, dev shops, freelancers offering free work, trademark services, company closure services, hiring platforms, and anyone pitching a service with "exclusive YC deal" or referencing your YC batch. Even if they mention your company by name or offer something free.', example: 'Ramnique, $2000 worth YC Design deal every month — we work with 230+ YC founders' }, + { tag: 'newsletter', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Newsletters, industry reports, subscription emails, product tips/tutorials from vendors, and research digests — even from platforms you actively use', example: 'Report: $1.2T in combined enterprise AI value — but what\'s actually built to last?' }, + { tag: 'notification', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Automated system messages requiring no decision: email verifications, meeting recording uploads, platform policy/permission changes, billing console updates, password resets, and expired OTPs', example: 'Meeting records: your recording has been uploaded to Google Drive.' }, { tag: 'digest', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Community digests, forum roundups, and aggregated updates', example: 'YC Bookface Weekly: 12 new posts this week...' }, - { tag: 'product-update', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Product changelogs, feature announcements, and vendor marketing', example: 'Introducing our new AI-powered dashboard...' }, - { tag: 'receipt', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Transactional receipts, invoices, and billing confirmations with no follow-up needed', example: 'Payment of $49.99 received. Thank you!' }, + { tag: 'product-update', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Product changelogs, feature announcements, and vendor marketing disguised as tips', example: 'Discover more with your Upstash free account — popular use cases inside' }, + { tag: 'receipt', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Completed transaction confirmations with no decision remaining: payment receipts, salary/payroll disbursements, tax payment acknowledgements (challans), GST/VAT filing confirmations (GSTR1 ARNs), TDS workings, recurring invoice-sharing threads, and transfer-initiated confirmations', example: 'Challan payment under section 200 for TAN BLXXXXXX4B has been successfully paid.' }, { tag: 'social', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Social media notifications', example: 'John Smith commented on your post.' }, - { tag: 'forums', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Mailing lists and group discussions', example: 'Re: [dev-list] Question about API design' }, + { tag: 'forums', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Mailing lists, group discussions, and Google Groups moderation digests that are not spam digests', example: 'Re: [dev-list] Question about API design' }, { tag: 'scheduling', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Calendar invites, meeting reminders, and scheduling confirmations', example: 'Reminder: Team standup in 15 minutes.' }, - { tag: 'fyi', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Informational only, no action needed', example: 'Just wanted to let you know the deal closed. Thanks for your help!' }, { tag: 'travel', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Flights, hotels, trips, and travel logistics', example: 'Your flight to Tokyo on March 15 is confirmed. Confirmation #ABC123.' }, { tag: 'shopping', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Purchases, orders, and returns', example: 'Your order #12345 has shipped. Track it here.' }, { tag: 'health', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Medical, wellness, and health-related matters', example: 'Your appointment with Dr. Smith is confirmed for Monday at 2pm.' }, - { tag: 'learning', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Courses, webinars, and education marketing', example: 'Welcome to the Advanced Python course. Here\'s your access link.' }, + { tag: 'learning', type: 'noise', applicability: 'email', noteEffect: 'skip', description: 'Courses, webinars, workshops, knowledge sessions, and education marketing — even from platforms you are enrolled in', example: 'Welcome to the Advanced Python course. Here\'s your access link.' }, // ── Action — urgency signals (all create) ───────────────────────────── { tag: 'action-required', type: 'action', applicability: 'both', noteEffect: 'create', description: 'Needs a response or action from you', example: 'Can you send me the pricing by Friday?' },