# Harvest Pipeline Architecture
**Full System Overview for Claude Review**
*Generated: 2026-05-08*

---

## Executive Summary

The Harvest Pipeline is an automated lead generation and outreach system for ABS/Xerox sales. It pulls daily Connecticut business filings, scores them for copier/printer need likelihood, assigns outreach personas (hats), and routes them through Zoho CRM for automated email sequences.

**Current Status:**
- ✅ Harvest (daily CT SoS sweep) - RUNNING
- ✅ Scoring system (PLLC + NAICS) - WORKING
- ✅ Haberdasher (hat assignment) - COMPLETE
- ✅ Rollback gatekeeper - BUILT
- ✅ Zoho push script - READY
- ⚠️ Zoho Automation - **NOT YET BUILT**
- ⚠️ Email sequences - **NOT YET BUILT**

**Backlog:** 151 leads ready to push (as of 2026-05-08)

---

## System Architecture

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           HARVEST PIPELINE v1.0                              │
└─────────────────────────────────────────────────────────────────────────────┘

CT Secretary of State API
         │
         ▼
    ┌─────────┐
    │ HARVEST │  Daily sweep at 9am EDT (13:00 UTC)
    │   .py   │  Pulls PLLCs + scores 90+ leads
    └────┬────┘
         │
         ▼
    cumulative-backlog.json
    (151 leads, scored & hat-assigned)
         │
         ▼
    ┌─────────────┐
    │ HABERDASHER │  Routes leads to hats based on NAICS/industry
    │    .py      │  hat_1=Legal, hat_4=General, hat_5=Healthcare, hat_6=Education
    └─────┬───────┘
          │
          ▼
    ┌─────────────┐
    │  ROLLBACK   │  Gatekeeper - logs every push
    │    .py      │  Tracks Zoho IDs, enables rollback
    └─────┬───────┘
          │
          ▼
    ┌─────────────────┐
    │  ZOHO CRM PUSH  │  Creates leads with:
    │  backlog-to-    │  - Hat_Assignment field
    │  zoho.py        │  - Score, Industry, Description
    └────────┬────────┘
             │
             ▼
    ┌─────────────────────────────────────┐
    │  ZOHO AUTOMATION (TO BE BUILT)       │
    │  - Trigger: New lead with Hat_Assign │
    │  - Route to hat-specific sequence    │
    │  - Day 1: Initial outreach           │
    │  - Day 4: Follow-up                  │
    │  - Day 9: Final nudge                │
    └─────────────────────────────────────┘
```

---

## Component Details

### 1. Harvest (`harvest.py`)

**Purpose:** Daily sweep of CT Secretary of State business filings.

**Schedule:** Daily at 9:00 AM EDT (13:00 UTC) via cron

**What it does:**
1. Pulls recent business filings from CT SoS Open Data API
2. Filters for:
   - PLLC filings (professional LLCs - doctors, lawyers, accountants)
   - High-score leads (score ≥ 90 based on NAICS codes)
3. Scores each lead using `gardener.json` config
4. Appends to `cumulative-backlog.json`

**Output files:**
- `memory/territory-scan/cumulative-backlog.json` - Master lead list
- `memory/territory-scan/harvest-YYYY-MM-DD.log` - Daily run log

**Key scoring logic (from gardener.json v3.4):**

| Type | Base Score | Notes |
|------|------------|-------|
| PLLC in name | 92 | Primary detection |
| Professional credentials (MD, DDS, Esq, CPA) | 82 | Secondary detection |
| NAICS tier_1 (legal, medical, dental, insurance) | 30-40 | Industry bonus |
| NAICS tier_2 (accounting, real estate, consulting) | 20-30 | Medium value |
| Recency bonus (0-2 days old) | +5 | Fresh filings |
| Location bonus (known CT city) | +3 | Local targeting |

**Recent results:**
- May 7: 188 scanned → 21 PLLCs + 1 high scorer = 22 saved
- May 8: 162 scanned → 4 PLLCs + 1 high scorer = 5 saved

---

### 2. Haberdasher (`haberdasher.py`)

**Purpose:** Assign outreach personas (hats) to leads based on industry.

**Hat routing table:**

| Hat | Name | Target Industry | Tone |
|-----|------|-----------------|------|
| hat_1 | Peer-to-Peer | Law firms, attorneys | Professional peer, "fellow business owner" |
| hat_4 | Straight Shooter | General business, retail, services | Direct, no-nonsense, value-focused |
| hat_5 | Advisor | Healthcare, medical, therapy | Consultative, patient-care angle |
| hat_6 | Community Member | Education, nonprofit, churches | Community-minded, mission-aligned |

**Routing logic:**
```python
if "law" in naics or "attorney" in name or "legal" in naics:
    hat = "hat_1"
elif any(keyword in naics for keyword in ["medical", "health", "dental", "therapy", "clinic", "physician"]):
    hat = "hat_5"
elif any(keyword in naics for keyword in ["school", "education", "church", "nonprofit"]):
    hat = "hat_6"
else:
    hat = "hat_4"  # Default: Straight Shooter
```

**Current backlog distribution:**
- hat_1 (Peer): 2 leads
- hat_4 (Straight Shooter): 103 leads
- hat_5 (Advisor): 41 leads
- hat_6 (Community Member): 5 leads

---

### 3. Rollback Gatekeeper (`rollback.py`)

**Purpose:** Safe push with full audit trail and rollback capability.

**Architecture:**
- Every push is registered BEFORE it happens (push_id)
- Zoho response is logged AFTER (zoho_id)
- Session tracking groups pushes into batches
- Rollback can target: single push, entire session, or all pending

**Key data structures:**

**push-log.json schema:**
```json
{
  "pushes": [
    {
      "push_id": "push_20260508_123456_abc123",
      "session_id": "session_20260508_123456",
      "timestamp": "2026-05-08T12:34:56Z",
      "lead_data": { "name": "...", "score": 100, "hat_assignment": "hat_5" },
      "zoho_id": "123456789",
      "status": "pushed",
      "rollback_available": true
    }
  ],
  "sessions": [
    {
      "session_id": "session_20260508_123456",
      "started_at": "2026-05-08T12:34:56Z",
      "ended_at": "2026-05-08T12:45:00Z",
      "push_count": 151,
      "status": "active | completed | rolled_back"
    }
  ]
}
```

**CLI commands:**
```bash
python3 rollback.py status [session_id]    # View session status
python3 rollback.py sessions              # List recent sessions
python3 rollback.py rollback [session_id] # Roll back entire session
python3 rollback.py rollback-push [id]    # Roll back single push
python3 rollback.py rollback-all         # Nuclear option
```

---

### 4. Zoho CRM Push (`backlog-to-zoho.py`)

**Purpose:** Push scored, hat-assigned leads to Zoho CRM.

**Field mapping:**

| Backlog Field | Zoho Field | Notes |
|---------------|------------|-------|
| name | Last_Name | Business name (all leads are B2B) |
| name | Company | Same as Last_Name |
| city / billingcity | City | Location |
| state / billingstate | State | Defaults to CT |
| street / billingstreet | Street | Address if available |
| zip / billingzip | Zip_Code | Postal code |
| email / business_email | Email | Contact email |
| phone | Phone | Contact phone |
| website | Website | Business website |
| naics_code / naics | Industry | NAICS description |
| score | Score | Lead score (0-100) |
| hat_assignment | **Hat_Assignment** | **Custom field - ROUTING KEY** |
| hat_name | Description | Included in description text |
| filing_date | Description | Included in description text |
| - | Lead_Source | "Mark's Automated Outreach" |

**Custom fields required in Zoho:**
- `Hat_Assignment` (Pick List: hat_1, hat_4, hat_5, hat_6) ← **CREATED ✓**
- `Score` (Number) ← Standard field

---

### 5. Zoho Automation (TO BE BUILT)

**Purpose:** Auto-fire email sequences when new leads land in CRM.

**Trigger condition:**
- New Lead created
- Hat_Assignment field is not empty
- Lead_Source = "Mark's Automated Outreach"

**Workflow design:**

```
┌────────────────────────────────────────────────────────────────┐
│  ZOHO AUTOMATION WORKFLOW                                       │
└────────────────────────────────────────────────────────────────┘

TRIGGER: New Lead created
         │
         ▼
    ┌─────────────────────┐
    │ Check Hat_Assignment│
    └──────────┬──────────┘
               │
    ┌──────────┼──────────┬──────────┬──────────┐
    ▼          ▼          ▼          ▼          │
 hat_1      hat_4      hat_5      hat_6        │
 (Peer)   (Straight)  (Advisor)  (Community)   │
    │          │          │          │          │
    ▼          ▼          ▼          ▼          │
  Email      Email      Email      Email       │
 Template   Template   Template   Template     │
  #1_legal  #4_general #5_health #6_mission    │
    │          │          │          │          │
    └──────────┴──────────┴──────────┘          │
               │                                │
               ▼                                │
        ┌─────────────┐                        │
        │  Wait 4 days│                        │
        └──────┬──────┘                        │
               │                                │
               ▼                                │
        ┌─────────────┐                        │
        │ Send Email  │                        │
        │   Day 4     │                        │
        │ (Follow-up) │                        │
        └──────┬──────┘                        │
               │                                │
               ▼                                │
        ┌─────────────┐                        │
        │  Wait 5 days│                        │
        └──────┬──────┘                        │
               │                                │
               ▼                                │
        ┌─────────────┐                        │
        │ Send Email  │                        │
        │   Day 9     │                        │
        │ (Final try) │                        │
        └─────────────┘                        │
```

**Email templates needed:**

| Hat | Day 1 Template | Day 4 Template | Day 9 Template |
|-----|-----------------|-----------------|-----------------|
| hat_1 | Peer professional intro | Case study (law firm saved $X) | "Last touch before I close your file" |
| hat_4 | Direct value prop | ROI calculator offer | "Should I update your contact info?" |
| hat_5 | Patient-care angle | Equipment reliability pitch | "When you're ready to upgrade..." |
| hat_6 | Community mission alignment | Budget-conscious options | "No pressure, here if needed" |

---

## Email Template Structure

Each hat has a distinct voice. Templates should include:

**Variables (Zoho merge fields):**
- `${Last_Name}` → Business name
- `${City}` → Location
- `${Industry}` → NAICS/industry
- `${First_Name}` → (Usually empty for B2B)

**Day 1 structure:**
1. Greeting: Hi, I noticed [Business] is setting up in [City]
2. Value hook: One sentence on why copier/print matters
3. Soft ask: "Would it make sense to connect?"
4. Sign-off: First name, phone

**Day 4 structure:**
1. Reference: "Following up on my note from Monday"
2. Additional value: Case study, testimonial, or specific benefit
3. Same soft ask

**Day 9 structure:**
1. Acknowledgment: "I know you're busy"
2. Low-pressure: "No need to respond, just wanted to leave the door open"
3. Contact info

---

## Integration Order (Critical)

**The automation MUST be built BEFORE pushing data.**

Reason: Zoho Automation fires on "new lead created" events. If we push 151 leads before the automation exists, they won't get the Day 1 email. We'd have to manually trigger each one.

**Correct sequence:**
1. ✅ Create Hat_Assignment field in Zoho → **DONE**
2. ⬜ Create email templates in Zoho (Day 1, 4, 9 for each hat)
3. ⬜ Build Zoho Automation workflow with hat-based routing
4. ⬜ Test automation with 1-2 manual lead creates
5. ⬜ Confirm automation fires and sends correct emails
6. ✅ THEN: Push backlog via backlog-to-zoho.py

---

## Files Reference

| File | Location | Purpose |
|------|----------|---------|
| harvest.py | scripts/ | Daily CT SoS sweep |
| gardener.json | scripts/ | Scoring config (v3.4) |
| haberdasher.py | scripts/ | Hat assignment logic |
| rollback.py | scripts/ | Gatekeeper + rollback |
| backlog-to-zoho.py | scripts/ | Zoho push with tracking |
| cumulative-backlog.json | memory/territory-scan/ | Master lead list |
| push-log.json | scripts/ | Rollback audit log |
| city-county-cache.json | memory/territory-scan/ | County enrichment |

---

## Cron Jobs

| Schedule | Command | Purpose |
|----------|---------|---------|
| 0 13 * * * | python3 harvest.py | Daily CT SoS sweep (9am EDT) |
| 0 14 * * * | python3 lead-heartbeat.py | Emergence detection (10am EDT) |

---

## API Credentials (Existing)

| Service | Purpose | Status |
|---------|---------|--------|
| CT SoS Open Data | Business filings | ✅ No auth needed |
| Zoho CRM | Lead management | ✅ OAuth2 configured |
| Brave Search | Emergence detection | ✅ API key in TOOLS.md |
| N8N | Webhook for lead vetting | ✅ Running |

---

## What Needs to Be Built

### Immediate (before push):
1. **Zoho Automation workflow** - hat-based routing
2. **Email templates** - Day 1, 4, 9 for hats 1, 4, 5, 6 (12 total)

### Near-term:
3. **Open tracking** - Track email opens/clicks
4. **Bounce handling** - Remove bad emails from future sends
5. **Reply detection** - Pause sequence if lead responds

### Future:
6. **MA/RI expansion** - Pull filings from MA and RI SoS
7. **Phone enrichment** - Auto-fill missing phone numbers
8. **Website scraping** - Extract contact info from lead websites

---

## Questions for Claude

1. **Automation design:** Should Day 4/9 emails be part of the same Zoho workflow, or separate workflows triggered by delays?

2. **Email templates:** Do we need all 12 templates upfront, or can we start with Day 1 only and add follow-ups later?

3. **Unsubscribe:** Should we include unsubscribe links in B2B outreach? (Legal requirement vs. practical reality)

4. **A/B testing:** Should hat_4 (largest group, 103 leads) be split into A/B test of two different Day 1 approaches?

5. **Phone calls:** Should high-score leads (95+) get a phone call instead of just email? How to integrate that into automation?

---

## Summary

The Harvest Pipeline is 80% complete. The missing 20% is the Zoho Automation layer that turns data pushes into actual outreach. Once that automation exists, pushing the 151 leads will trigger automatic Day 1 emails, and the system becomes a self-sustaining outreach engine.

**Next step:** Build the Zoho Automation workflow with hat-based routing before pushing any data.