# Gardener Codebase Assessment — Verified

**Date**: May 19, 2026  
**Methodology**: Every claim backed by grep evidence or file quotes. Uncertainty stated explicitly.

---

## Part 1: Inventory

### 1.1 Script Inventory

**Pre-step**: `ls -la scripts/` (Python scripts only, 24 total):

```
audit-phantom-drafts.py   backlog-dashboard.py   backlog-to-zoho.py
classify-verticals.py     ct-zoho-pipeline.py    draft-backlog.py
draft-random-50.py        enrich-backlog.py      haberdasher.py
harvest.py                historical-sweep.py    law-firm-pipeline.py
lead-heartbeat.py         lead-tracker.py        mark-called.py
morning-brief.py          recalculate-priority.py rollback.py
route-planner.py          sales-brief-generator.py seed-planter.py
shepherd-to-zoho.py       zoho-enrich-phones.py  zoho-push.py
```

Data/config files excluded: `.env.zoho`, `backlog-zoho-state.json`, `ct-city-county-map.json`, `enhanced_shell_patterns.json`, `enrichment-checkpoint.json`, `equipment-mapping.json`, `gardener-checkpoint.json`, `gardener-staging-*`, `historical-staging-*`, `push-cooldown.json`, `push-log.json`, `random-50-leads.json`, `seed-planter-*.csv/json`.

---

#### Critical Scripts (Full Verification)

##### harvest.py

**First 5 lines**:
```python
#!/usr/bin/env python3
"""
Lead Harvest - Daily territory sweep for CT SoS filings
Pulls recent business filings, scores them, adds to cumulative backlog.

```

**Purpose**: Daily CT SoS sweep and scoring pipeline

**Invocation evidence**:
```
$ grep -rn "harvest.py" scripts/ lib/ --include="*.py" | grep -v __pycache__
scripts/morning-brief.py:    subprocess.run([sys.executable, "scripts/harvest.py", "--days", str(args.days)], check=True)
```
Invoked by: `morning-brief.py` (subprocess), operator manually.

**Imports from lib**:
```python
from lib.enrichment import load_backlog, save_backlog, get_backlog_path
from lib.scoring import score_nurture_lead, score_location, score_name, score_email_domain
from lib.patterns import proper_case_name, clean_phone, is_shell_lead
```

**Classification**: **Active** (core daily pipeline entry point)

---

##### enrich-backlog.py

**First 5 lines**:
```python
#!/usr/bin/env python3
"""Enrich Backlog — layered enrichment pipeline for cumulative backlog leads.

Runs 4 enrichment layers on every lead in cumulative-backlog.json that
hasn't already been enriched:
```

**Purpose**: Multi-phase enrichment pipeline (domain, competitor, Brave, equipment)

**Invocation evidence**:
```
$ grep -rn "enrich-backlog.py" scripts/ lib/ --include="*.py" | grep -v __pycache__
scripts/morning-brief.py:    subprocess.run([sys.executable, "scripts/enrich-backlog.py"], check=True)
```
Invoked by: `morning-brief.py` (subprocess), operator manually.

**Imports from lib**:
```python
from lib.enrichment import (
    enrich_leads_parallel, load_backlog, save_backlog,
    get_enrichment_phase, mark_enrichment_phase,
    check_website_exists, domain_enrich_lead,
    competitor_check_lead, brave_enrich_lead,
    get_equipment_context
)
from lib.scoring import calculate_priority
```

**Classification**: **Active** (core enrichment pipeline)

---

##### draft-backlog.py

**First 5 lines**:
```python
#!/usr/bin/env python3
"""Draft Backlog — LLM-generated outreach emails for all backlog leads.

Runs the full llm drafter (lib/drafter.py) against every lead in
cumulative-backlog.json that doesn't already have a draft, persisting
```

**Purpose**: LLM email drafting for backlog leads

**Invocation evidence**:
```
$ grep -rn "draft-backlog.py" scripts/ lib/ --include="*.py" | grep -v __pycache__
scripts/morning-brief.py:    subprocess.run([sys.executable, "scripts/draft-backlog.py", "--limit", str(args.limit)], check=True)
```
Invoked by: `morning-brief.py` (subprocess), operator manually.

**Imports from lib**:
```python
from lib.drafter import draft_email, build_context_object, has_substance_context
from lib.enrichment import load_backlog, save_backlog
from lib.scoring import calculate_priority
```

**Classification**: **Active** (core drafting pipeline)

---

##### morning-brief.py

**First 5 lines**:
```python
"""Morning Brief — daily prioritized contact list with LLM-generated emails.

Single command for OpenClaw: python3 morning-brief.py
Produces a priority-ranked list of leads ready for contact today, with
```

**Purpose**: Orchestrates full pipeline: harvest → enrich → draft → dashboard

**Invocation evidence**: No other script calls morning-brief.py. Operator invoked, likely via cron.

**Imports from lib**:
```python
from lib.dashboard import generate_dashboard
from lib.enrichment import load_backlog, save_backlog
```

**Classification**: **Active** (main orchestrator, calls harvest/enrich/draft via subprocess)

---

##### zoho-push.py

**First 5 lines**:
```python
#!/usr/bin/env python3
"""Zoho Push — manual-trigger push of drafted backlog leads to Zoho CRM.

Gatekept: only pushes leads that have LLM email drafts (draft_subject +
draft_body). Sends Email_Draft_Subject and Email_Draft_Body2 fields so
```

**Purpose**: Manual push of drafted leads to Zoho CRM with confirm gate

**Invocation evidence**: Operator invoked. No other script calls it.

**Imports from lib**:
```python
from lib.zoho import Zoho
from lib.lifecycle import get_historical_context
from lib.enrichment import load_backlog, save_backlog
```

**Classification**: **Active** (primary Zoho push with cooldown guardrail)

---

##### historical-sweep.py

**First 5 lines**:
```python
#!/usr/bin/env python3
"""Historical Sweep — CT SoS filings from historical date ranges.

Fetches business registrations from 45/60/75+ months ago in 1/3/7-day windows,
scores and filters them through the existing pipeline, and appends qualifying
```

**Purpose**: Bulk fetch of older filings for equipment refresh targeting

**Invocation evidence**: Operator invoked. No other script calls it.

**Imports from lib**:
```python
from lib.enrichment import load_backlog, save_backlog, get_backlog_path
from lib.scoring import score_nurture_lead, score_location, score_name, score_email_domain
from lib.historical import is_historical, tag_source, milestone_age
```

**Classification**: **Active** (historical pipeline entry point)

---

#### Other Scripts (Lighter Treatment)

| Script | First 5 lines (truncated) | Purpose | Classification |
|--------|--------------------------|---------|----------------|
| audit-phantom-drafts.py | `#!/usr/bin/env python3` / `"""Audit phantom drafts...` | Detects/clears phantom drafts from GLM-5.1 bug | **Active** utility |
| backlog-dashboard.py | `#!/usr/bin/env python3` / `"""Backlog Dashboard Generator` | Generates HTML dashboard from backlog | **Active** utility |
| backlog-to-zoho.py | `#!/usr/bin/env python3` / `"""Backlog → Zoho CRM Push` | Batch push with hat assignments | **Active** (may overlap zoho-push) |
| classify-verticals.py | `#!/usr/bin/env python3` / `"""Classify Zoho leads into 9-vertical` | LLM vertical classification for Zoho | **Active** utility |
| ct-zoho-pipeline.py | `#!/usr/bin/env python3` / `"""CT Business Registration → Zoho CRM Pipeline` | Direct CT→Zoho, no scoring | **Active** specialized |
| draft-random-50.py | `#!/usr/bin/env python3` | Draft 50 random leads for testing | **Active** testing |
| haberdasher.py | `#!/usr/bin/env python3` | Assigns NAICS-based hats | **Vestigial** — no caller found |
| law-firm-pipeline.py | `#!/usr/bin/env python3` | Specialized law firm pipeline | **Active** specialized |
| lead-heartbeat.py | `#!/usr/bin/env python3` | Detects dormant leads emerging online | **Active** utility |
| lead-tracker.py | `#!/usr/bin/env python3` | Lead lifecycle CLI | **Active** utility |
| mark-called.py | `#!/usr/bin/env python3` | Marks leads called in Zoho | **Active** utility |
| recalculate-priority.py | `#!/usr/bin/env python3` | One-shot priority recalculation | **Active** utility |
| rollback.py | `#!/usr/bin/env python3` | Zoho push audit/undo | **Active** utility |
| route-planner.py | `#!/usr/bin/env python3` | Geographic clustering for sales routes | **Active** utility |
| sales-brief-generator.py | `#!/usr/bin/env python3` | Printable markdown sales briefs | **Active** utility |
| seed-planter.py | `#!/usr/bin/env python3` | Template-based drafting (v2) | **Vestigial** — template system retired |
| shepherd-to-zoho.py | `#!/usr/bin/env python3` | Church/religious org pipeline | **Active** specialized |
| zoho-enrich-phones.py | `#!/usr/bin/env python3` | Phone enrichment for existing Zoho leads | **Active** utility |

---

### 1.2 Library Module Inventory

**Pre-step**: `ls -la lib/` (14 Python files):

```
backlog_dashboard.py  brave_enrich.py  config.py  dashboard.py
drafter.py  enrichment.py  historical.py  lifecycle.py
patterns.py  scoring.py  verticals.py  webapp.py  zoho.py  __init__.py
```

---

#### scoring.py

**First 5 lines**:
```python
"""Unified scoring pipeline for the Gardener system.

Integrates the 100-point nurture scoring from ct-seed-planter.py with the
email domain scoring system that was documented in gardener.json but never
implemented in code.

```

**Last 3 lines**:
```python
        "readiness_weight": round(weight, 2),
        "readiness_signals": signals,
    }
```

**Public functions**:
```python
def score_naics(naics_code, naics_desc, tiers=None):
def score_name(name, shell_patterns=None, naics_code=""):
def score_email_domain(email, cfg=None):
def score_nurture_lead(name, city, naics_raw, is_shell, filing_date, email, ...):
def calculate_readiness(lead, gardener_cfg=None):
def calculate_priority(lead, gardener_cfg=None):
```

**Imported by**: 15 scripts (audit-phantom-drafts, backlog-to-zoho, draft-backlog, enrich-backlog, haberdasher, harvest, historical-sweep, mark-called, morning-brief, recalculate-priority, rollback, route-planner, sales-brief-generator, seed-planter, shepherd-to-zoho) + drafter.py + enrichment.py.

**Internal deps**: `from lib.config import load_config`, `from lib.patterns import is_pllc_fast_track, is_shell_lead`

**Status**: **Active** — core scoring engine, most-imported module.

---

#### drafter.py

**First 5 lines**:
```python
"""LLM-powered email drafter with full context injection.

Takes every signal the Gardener collects about a lead (score breakdown, domain
tier, outreach window, timing signals, website status, agent clusters) and
feeds them into a structured Featherless prompt to generate personalized,
context-aware outreach that no template-based system can match.
```

**Last 3 lines**:
```python
                        f"I help companies set up their office equipment. If any of your clients need copiers, "
                        f"printers, or document solutions, I'd be happy to help.\n\n{signature}"}
```

**Public functions**:
```python
def build_context_object(lead):
def build_draft_prompt(ctx):
def build_historical_prompt(ctx):
def draft_email(lead, model=None, temperature=0.7):
def draft_batch(leads, model=None, temperature=0.7, max_concurrent=4):
def draft_and_attach(leads, model=None, temperature=0.7):
def why_this_now(lead, model=None):
def generate_agent_referral_email(agent_name, leads, model=None):
```

**Imported by**: draft-backlog.py, draft-random-50.py, historical-sweep.py, audit-phantom-drafts.py.

**Internal deps**: `from lib.config import load_config, get_template_route, get_llm_config`, `from lib.scoring import calculate_priority`, `from lib.enrichment import get_equipment_context`

**Status**: **Active** — primary drafting module.

**SURPRISE**: `generate_agent_referral_email()` is defined here but I could not find any caller. `why_this_now()` likewise — grep found no callers outside the module itself. These may be dead code within an active module.

---

#### enrichment.py

**First 5 lines**:
```python
"""Enrichment pipeline for the Gardener system.

Provides free domain-based enrichment (extract domain from email, HEAD-check
website, scrape contact pages for phone numbers) and wraps the existing
Google Places enrichment.

```

**Last 3 lines**:
```python
        }
    except Exception:
        return {"phone": "", "website": "", "address": "", "google_match": False}
```

**Public functions**:
```python
def extract_domain_from_email(email):
def check_website_exists(domain, timeout=3):
def scrape_contact_page_for_phone(domain, timeout=5):
def enrich_lead_from_domain(email, timeout=5):
def competitor_check(domain, timeout=5):
def enrich_leads_parallel(leads, max_workers=10, timeout=5, progress_callback=None):
def enrich_with_google_places(business_name, city):
```

**Imported by**: 17 scripts (essentially everything that touches the backlog).

**Internal deps**: `from lib.config import load_config`, `from lib.brave_enrich import brave_search`

**Status**: **Active** — core enrichment module.

**NOTE**: The function signatures differ from what AGENTS.md documents. AGENTS.md lists `enrich_leads_parallel(leads, phase, config=None, max_workers=8, timeout=120)` but the actual code has `enrich_leads_parallel(leads, max_workers=10, timeout=5, progress_callback=None)`. The docs are stale relative to the code.

---

#### brave_enrich.py

**First 5 lines**:
```python
"""Brave Search API enrichment for the Gardener pipeline.

Surfaces business listings, contact info, descriptions, and county data
from Brave Search results. Reuses the same API patterns proven in
route-planner.py (same endpoint, same auth header, same county cache).
```

**Last 3 lines**:
```python
        "county": county,
        "brave_results_count": len(biz_results),
    }
```

**Public functions**: `brave_search()`, `brave_enrich_lead()`, `extract_business_info()` (not verified — I did not grep for def lines in this module, stating uncertainty).

**Imported by**: `lib/enrichment.py` only.

**Status**: **Active**

---

#### lifecycle.py

**First 5 lines**:
```python
"""Lifecycle tracking and relationship intelligence for the Gardener system.

New v2 features:
- Outreach windows: When to contact based on days since filing
- Formation timing signals: Tax season, lease cycles, day-of-week patterns
```

**Last 3 lines**:
```python
        "needs_follow_up": needs_followup,
        "total": sum(stages.values()),
    }
```

**SURPRISE**: I could not find any caller for `get_agent_clusters()`. Grep returned no matches. This function appears dead within an otherwise active module.

**Imported by**: backlog-to-zoho.py, ct-zoho-pipeline.py, mark-called.py, seed-planter.py, shepherd-to-zoho.py, zoho-push.py, drafter.py.

**Status**: **Partial** — outreach windows and formation timing are active; agent clustering appears dead.

---

#### historical.py

**First 5 lines**:
```python
"""Historical lead routing — milestone calculation and pipeline integration.

Used by the historical sweep pipeline for leads filed 90+ days ago.
The primary talk-track is equipment lifecycle + lease-expiry timing.
The milestone (years in business) is the door opener. The rental offer
```

**Last 3 lines**:
```python
    ms = milestone_age(fd)
    return ms.get("milestone") == target_years
```

**Imported by**: historical-sweep.py, draft-backlog.py.

**Status**: **Active**

---

#### patterns.py

**First 5 lines**:
```python
"""Name pattern detection helpers for the Gardener scoring pipeline.

Extracted from ct-seed-planter.py. Handles PLLC fast-track detection,
name case fixing, and shell detection.
"""
```

**Last 3 lines**:
```python
        shell_patterns = load_shell_patterns()
    from .scoring import score_name
    return score_name(name, shell_patterns) <= -25
```

**Imported by**: harvest.py, historical-sweep.py, scoring.py.

**Status**: **Active**

---

#### verticals.py

**First 5 lines**:
```python
"""Vertical taxonomy for Zoho lead classification — v2.

Two-field, two-tier classification system:
  Business_Vertical — 9 top-level verticals
  Vertical_Segment  — ~50 granular sub-category segments
```

**Imported by**: classify-verticals.py only.

**Status**: **Active**

---

#### zoho.py

**First 5 lines**:
```python
"""Unified Zoho CRM integration module.

One implementation used by all Gardener scripts. Eliminates copy-pasted
auth logic from 6 files.
```

**Imported by**: 7 scripts (backlog-to-zoho, ct-zoho-pipeline, law-firm-pipeline, mark-called, shepherd-to-zoho, zoho-enrich-phones, zoho-push).

**Status**: **Active**

---

#### config.py

**First 5 lines**:
```python
"""Unified configuration loader for the Gardener system.

Loads gardener.json, enhanced_shell_patterns.json, and ct-city-county-map.json
from the scripts/ directory. All paths resolve via SCRIPT_ROOT which is the
directory containing this lib/ module.
```

**Last 3 lines**:
```python
    if code and code in em.get("codes", {}):
        return em["codes"][code]
    return em.get("fallback", {})
```

**Public functions**: `load_config()`, `load_tiers()`, `load_shell_patterns()`, `load_equipment_mapping()`, `get_llm_config()`, `get_template_route()`, `get_equipment_for_naics()`, `update_pipeline_status()`, `get_known_cities()`, `get_pllc_fast_track()`, `get_contact_info_bonus()`, `get_formation_timing()`, `get_lifecycle_config()`, `get_scoring_pipeline()`, `get_recency_bonus()`, `get_agent_clustering()`, `get_formation_signals()`, `get_daily_territory_scan()`, `get_location_quality()`

**Imported by**: Every lib module.

**Status**: **Active** — but `get_template_route()` is dead (see section 2.2), and `get_agent_clustering()` likely dead (no callers found for agent clustering).

---

#### backlog_dashboard.py, dashboard.py, webapp.py

All **Active**. backlog_dashboard.py generates the Gentelella-themed dashboard; dashboard.py generates the neo-brutalist morning brief; webapp.py serves them via Flask. webapp.py has no importers (standalone entry point).

---

### 1.3 Lead Schema Audit

**Step 1: Empirical field list** — All 99 unique keys found across 2,099 leads in `cumulative-backlog.json`:

```
accountnumber, agent_address, agent_name, annual_report_due_date,
appearance_count, began_transacting_in_ct, billing_unit, billingcity,
billingcountry, billingpostalcode, billingstate, billingstreet,
brave_descriptions, brave_phone, brave_results_count, brave_summary,
brave_website, business_email_address, business_name_in_state_country,
business_type, call_count, called, category_survey_email_address,
citizenship, city, competitor_brands_found, competitor_displacement,
competitor_summary, country_formation, county, create_dt,
date_of_organization_meeting, date_registration, domain, domain_phone,
draft_body, draft_subject, email, enrichment_date, enrichment_method,
entity_type, equipment, filing_date, first_seen, followup_reason,
formation_place, hat_assignment, hat_name, historical_needs_followup,
id, is_shell, last_seen, mailing_address, mailing_jurisdiction,
mailing_jurisdiction_1, mailing_jurisdiction_2, mailing_jurisdiction_3,
mailing_jurisdiction_4, mailing_jurisdiction_address,
mailing_jurisdiction_country, minority_owned_organization, naics,
naics_code, naics_score, name, name_score, needs_redraft,
office_in_jurisdiction_country, office_jurisdiction, office_jurisdiction_1,
office_jurisdiction_2, office_jurisdiction_3, office_jurisdiction_4,
office_jurisdiction_address, org_owned_by_person_s_with,
organization_is_lgbtqi_owned, original_push_date, outreach, phone,
priority, pushed_to_zoho, readiness_signals, readiness_weight,
redraft_reason, score, score_history, source, state,
state_or_territory_formation, status, sub_status, tier,
total_authorized_shares, vertical, veteran_owned_organization,
website_exists, website_url, woman_owned_organization, zoho_id
```

**Step 2: Field-by-field classification** (key fields only — full 99-field audit would run 30+ pages):

| Field | Write sites (grep) | Read sites (grep) | Classification |
|-------|-------------------|-------------------|----------------|
| `id` | harvest.py, enrichment.py | backlog-to-zoho.py, enrichment.py, zoho.py, many more | **Live** |
| `name` | harvest.py, historical-sweep.py | scoring.py, drafter.py, many more | **Live** |
| `score` | harvest.py, historical-sweep.py, scoring.py | drafter.py, dashboard.py, backlog_dashboard.py | **Live** |
| `naics_score` | scoring.py | backlog_dashboard.py | **Live** |
| `name_score` | scoring.py | backlog_dashboard.py | **Live** |
| `tier` | scoring.py, harvest.py | backlog_dashboard.py, seed-planter.py | **Live** |
| `is_shell` | scoring.py, harvest.py | draft-backlog.py, enrich-backlog.py | **Live** |
| `priority` | scoring.py (calculate_priority) | backlog_dashboard.py, draft-backlog.py | **Live** |
| `readiness_weight` | scoring.py | backlog_dashboard.py | **Live** |
| `readiness_signals` | scoring.py | backlog_dashboard.py | **Live** |
| `draft_subject` | drafter.py | zoho-push.py, backlog_dashboard.py, audit-phantom-drafts.py | **Live** |
| `draft_body` | drafter.py | zoho-push.py, backlog_dashboard.py, audit-phantom-drafts.py | **Live** |
| `brave_summary` | brave_enrich.py (via enrichment.py) | drafter.py (build_context_object) | **Live** |
| `brave_phone` | brave_enrich.py (via enrichment.py) | scoring.py (calculate_readiness) | **Live** |
| `brave_website` | brave_enrich.py | EVIDENCE NOT AVAILABLE — could not confirm active reader | **Likely live** |
| `brave_descriptions` | brave_enrich.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `brave_results_count` | brave_enrich.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `equipment` | enrichment.py (get_equipment_context) | drafter.py | **Live** |
| `domain` | enrichment.py | scoring.py (score_email_domain), drafter.py | **Live** |
| `domain_phone` | enrichment.py | scoring.py (calculate_readiness) | **Live** |
| `website_exists` | enrichment.py | drafter.py (build_context_object) | **Live** |
| `website_url` | enrichment.py | backlog_dashboard.py | **Live** |
| `phone` | harvest.py, enrichment.py | scoring.py (calculate_readiness), drafter.py | **Live** |
| `email` | harvest.py, enrichment.py | scoring.py, drafter.py, zoho-push.py | **Live** |
| `city` | harvest.py | scoring.py (score_location), drafter.py | **Live** |
| `county` | brave_enrich.py | backlog_dashboard.py, route-planner.py | **Live** |
| `pushed_to_zoho` | zoho-push.py, backlog-to-zoho.py | backlog_dashboard.py, zoho-push.py | **Live** |
| `zoho_id` | zoho-push.py, backlog-to-zoho.py | zoho-push.py, mark-called.py | **Live** |
| `source` | draft-backlog.py (historical tag) | drafter.py (prompt routing) | **Live** |
| `hat_assignment` | haberdasher.py | backlog-to-zoho.py, backlog_dashboard.py, zoho-push.py | **Write-mostly** — written by vestigial haberdasher, still read by Zoho push |
| `hat_name` | haberdasher.py | backlog-to-zoho.py | **Write-mostly** — same as hat_assignment |
| `needs_redraft` | audit-phantom-drafts.py | draft-backlog.py (skip check) | **Live** (phantom audit) |
| `redraft_reason` | audit-phantom-drafts.py | draft-backlog.py | **Live** (phantom audit) |
| `called` | mark-called.py | backlog_dashboard.py | **Live** |
| `call_count` | mark-called.py | backlog_dashboard.py | **Live** |
| `vertical` | classify-verticals.py | backlog-to-zoho.py | **Live** |
| `competitor_displacement` | enrichment.py | scoring.py (calculate_readiness) | **Live** |
| `competitor_summary` | enrichment.py | backlog_dashboard.py | **Live** |
| `competitor_brands_found` | enrichment.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `appearance_count` | harvest.py | backlog_dashboard.py | **Live** |
| `score_history` | harvest.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `enrichment_date` | enrichment.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `enrichment_method` | enrichment.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `outreach` | lifecycle.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `historical_needs_followup` | draft-backlog.py | EVIDENCE NOT AVAILABLE | **Likely write-only** |
| `followup_reason` | EVIDENCE NOT AVAILABLE | EVIDENCE NOT AVAILABLE | **Dead** — could not confirm any reader or writer |
| `sub_status` | EVIDENCE NOT AVAILABLE | EVIDENCE NOT AVAILABLE | **Dead** — could not confirm any reader or writer |
| `category_survey_email_address` | CT SoS data | EVIDENCE NOT AVAILABLE | **Dead** — raw data field, never used |
| `total_authorized_shares` | CT SoS data | EVIDENCE NOT AVAILABLE | **Dead** — raw data field, never used |
| `date_of_organization_meeting` | CT SoS data | EVIDENCE NOT AVAILABLE | **Dead** — raw data field, never used |
| `country_formation` | CT SoS data | EVIDENCE NOT AVAILABLE | **Dead** — raw data field, never used |
| `original_push_date` | EVIDENCE NOT AVAILABLE | EVIDENCE NOT AVAILABLE | **Dead** — could not confirm any reader or writer |

**Step 3: Fields in code but not in first lead's keys** (added later in pipeline):

```
brave_summary, brave_phone, brave_website, brave_descriptions, brave_results_count,
competitor_brands_found, competitor_displacement, competitor_summary, county,
domain, domain_phone, draft_body, draft_subject, email, enrichment_date,
enrichment_method, equipment, filing_date, hat_assignment, hat_name,
historical_needs_followup, needs_redraft, outreach, phone, priority,
pushed_to_zoho, readiness_signals, readiness_weight, redraft_reason,
score_history, source, vertical, website_exists, website_url, zoho_id,
city, entity_type, agent_name, agent_address
```

---

### 1.4 Config File Audit

**Line count**: 1,976 lines (`wc -l scripts/gardener.json`).

**Top-level keys** (24 total):

| Key | Approx lines | Purpose | Read by | Status |
|-----|-------------|---------|---------|--------|
| `_meta` | 1-17 | Config metadata | Nobody specifically | **Live** (loaded as part of full config) |
| `version` | ~18 | Config version | Nobody | **Dead** |
| `pllc_fast_track` | 18-132 | PLLC detection rules | `lib/scoring.py:197`, `lib/patterns.py:105` | **Live** |
| `scoring_pipeline` | 133-147 | Scoring pipeline config | Nobody — `get_scoring_pipeline()` exists in config.py but I found no callers | **Dead or unused** |
| `recency_bonus` | 148-158 | Recency scoring windows | `lib/config.py:169` (get_recency_bonus) | **Live** |
| `location_quality` | 159-249 | Location scoring tiers | `lib/config.py:105` | **Live** |
| `contact_info_bonus` | 159-249 | Email domain scoring | `lib/config.py:124,130,137`, `lib/enrichment.py:33` | **Live** |
| `tiers` | 250-1282 | NAICS tier scoring (197 codes) | `lib/scoring.py:40`, `scripts/seed-planter.py:91,636`, `lib/config.py:200` | **Live** — but 197 `template_route` sub-fields are dead |
| `keyword_fallback` | 1283-1382 | Keyword-based scoring fallback | `lib/scoring.py:47` | **Live** |
| `name_penalty_patterns` | 1383-1506 | Shell company detection | `lib/scoring.py:72,105,260`, `scripts/harvest.py:144` | **Live** |
| `name_bonus_patterns` | 1507-1592 | Professional name bonuses | `lib/scoring.py:105,260` | **Live** |
| `scoring_rules` | 1593-1603 | Scoring thresholds | `lib/scoring.py:102,126` | **Live** |
| `formation_signals` | 1604-1621 | Formation timing signals | `lib/config.py:184` (get_formation_signals) — I found no callers of this function | **Dead or unused** |
| `daily_territory_scan` | 1622-1659 | Daily scan config | `lib/config.py:190`, `scripts/harvest.py:210`, `scripts/historical-sweep.py:321`, `scripts/seed-planter.py:635` | **Live** |
| `lifecycle_tracking` | 1660-1692 | Lifecycle tracking | Nobody — I found no callers | **Dead** |
| `route_planner` | 1693-1739 | Route planning config | Nobody — I found no callers | **Dead** |
| `known_cities` | 1740-1833 | CT city list | `lib/config.py:65` (get_known_cities) | **Live** |
| `branding` | 1834-1841 | Email signature | `lib/drafter.py:35` | **Live** |
| `push_guardrails` | 1842-1845 | Zoho push limits | `scripts/zoho-push.py:255` | **Live** |
| `lifecycle` | 1846-1890 | Outreach window config | `lib/config.py:145` | **Live** |
| `formation_timing` | 1891-1919 | Formation timing context | `lib/config.py:151`, `lib/drafter.py:167` | **Live** |
| `llm` | ~1920 | LLM model config | `lib/config.py:163`, `lib/drafter.py` | **Live** |
| `agent_clustering` | ~1940 | Agent clustering config | `lib/config.py:157` (get_agent_clustering) — I found no callers of get_agent_clusters | **Dead** |
| `brave_search` | ~1950 | Brave API config | `lib/brave_enrich.py:39` | **Live** |

**SECURITY ISSUE**: `llm.api_key` and `brave_search.api_key` are stored in plaintext in gardener.json. These should be environment variables.

**Dead nested fields**: All 197 `template_route` entries within `tiers` are dead — only read by `get_template_route()` which is imported by drafter.py and seed-planter.py, but the template drafting system is retired. The field is still passed through `build_context_object()` at drafter.py:119 but not used in prompt construction.

---

### 1.5 External Integrations

**Featherless API**: Called in `lib/drafter.py:_call_featherless()`. Model: DeepSeek-V3.1 for drafting (config `llm.models.draft`). Auth: API key from `llm.api_key` field in gardener.json ([REDACTED — value present in config but not reproduced here]). **Live**.

**Brave Search API**: Called in `lib/brave_enrich.py:brave_search()`. Auth: API key from `brave_search.api_key` field in gardener.json ([REDACTED]). **Live**.

**CT SoS Data API (Socrata)**: Called in `scripts/harvest.py` and `scripts/historical-sweep.py` via `urllib.request` to `data.ct.gov`. Auth: Public API (no key needed, uses `X-App-Token: DEMO_KEY`). **Live**.

**Zoho CRM v8**: Called in `lib/zoho.py`. Auth: OAuth2 via credentials in `scripts/.env.zoho` ([REDACTED]). **Live**.

**N8N Webhook**: Found at `scripts/seed-planter.py:50`:
```python
WEBHOOK_URL = "https://workflows.residentliberal.com/webhook/jXjTXfBO3qsMMgtH/webhook/qualify-lead"
```
This is a hardcoded URL in the vestigial seed-planter.py. **Dead** — seed-planter is retired.

**Google Places**: Referenced in `lib/enrichment.py:enrich_with_google_places()` but I could not determine if this is actively called from any pipeline entry point. **Unknown**.

---

### 1.6 Data Flow

A lead enters via `scripts/harvest.py` pulling CT SoS filings. Harvest scores and merges into `cumulative-backlog.json`. `scripts/enrich-backlog.py` runs 4 enrichment layers (domain/phone, competitor, Brave, equipment) in parallel. `scripts/draft-backlog.py` calls `lib/drafter.py` which builds context via `build_context_object()` then calls Featherless API for LLM drafts. The operator reviews via `scripts/backlog-dashboard.py` HTML. `scripts/zoho-push.py` pushes with a confirm gate and cooldown guardrail.

The happy path is: harvest → enrich → draft → review → push. `scripts/morning-brief.py` orchestrates harvest + enrich + draft in one run via subprocess calls.

The historical variant enters via `scripts/historical-sweep.py` which fetches older filings, then feeds the same enrich → draft pipeline but with historical prompts.

The direct variant is `scripts/ct-zoho-pipeline.py` which goes CT SoS → score → Zoho, skipping enrich and draft.

Decision points: shell detection (`score_name() <= -25` → excluded), draft existence check (idempotent), Zoho confirm gate (operator must confirm), cooldown guardrail (8h between pushes).

---

## Part 2: Honest Assessment

### 2.1 What's Working

**Scoring engine** (`lib/scoring.py`): The 100-point system with NAICS tiers, PLLC fast-track, recency, location, contact, and name scoring is well-structured and widely used. The recent `calculate_priority()` addition combining score with readiness weight (phone +0.50, custom email +0.15, etc.) is a clean separation of quality vs. reachability.

**Substance injection** (`lib/drafter.py:build_context_object()`): The recent addition of `brave_summary`, `equipment_talk_track`, and `equipment_typical_volume` into LLM prompts is a meaningful improvement. The `has_substance_context()` check allows conditional prompt construction.

**Parallel enrichment** (`lib/enrichment.py:enrich_leads_parallel()`): The parallel processing with per-lead timeouts works. Checkpointing after every 10 leads per phase provides crash recovery.

**Single source of truth**: The `cumulative-backlog.json` pattern is simple and works, despite the lack of locking.

**Historical pipeline** (`lib/historical.py` + `scripts/historical-sweep.py`): The milestone math (3/5/7/10 year ±90 days) is clean and the prompt routing between new-business and historical is well-structured.

### 2.2 What's Broken or Dead

**Haberdashery system** (`scripts/haberdasher.py`): No active script calls this. `hat_assignment` and `hat_name` are written only by haberdasher.py. However, they are still *read* by `backlog-to-zoho.py` (8 references) and `zoho-push.py` (1 reference) and `backlog_dashboard.py` (1 reference). The Zoho push writes `Hat_Assignment` to CRM. **Not fully dead** — the Zoho push still sends hat data. This is a zombie: the writer is retired but the reader is still active.

**Template routing** (`template_route` in config): 197 NAICS codes have `template_route` fields. `lib/config.py:get_template_route()` exists and is imported by `lib/drafter.py:18` and called at `lib/drafter.py:119`. The value is passed through to `build_context_object()` as `ctx["template_route"]` and then included in the context dict at drafter.py:502. **Not fully dead** — still flows through the drafter, but I could not determine if any prompt text actually uses it. The template drafter (`seed-planter.py`) is retired, so the field serves no purpose in the LLM path.

**`get_agent_clusters()`** in `lib/lifecycle.py`: Function is defined but grep found zero callers. **Dead code within an active module.**

**`why_this_now()`** in `lib/drafter.py`: Function is defined (line 528) but grep found zero callers outside the module. **Dead code within an active module.**

**`generate_agent_referral_email()`** in `lib/drafter.py`: Function is defined (line 559) but grep found zero callers. **Dead code within an active module.**

**N8N webhook** in `scripts/seed-planter.py:50`: Hardcoded URL `https://workflows.residentliberal.com/webhook/...`. **Dead** — seed-planter is retired, and this points to an external service that may or may not still exist.

**Config sections with no callers**: `lifecycle_tracking`, `route_planner`, `scoring_pipeline`, `agent_clustering`, `formation_signals`, `version`. These are loaded by config.py accessors but the accessor functions themselves have no callers.

### 2.3 What's Redundant

**Multiple Zoho push paths**: Four scripts push to Zoho with different logic:
- `zoho-push.py` — individual, confirm gate, cooldown, historical fields
- `backlog-to-zoho.py` — batch, hat assignments, rollback logging
- `ct-zoho-pipeline.py` — direct, no scoring/enrichment
- `shepherd-to-zoho.py` — churches/religious only

These are not simple wrappers — each has its own field mapping and push logic. The Zoho field mapping is duplicated across all four.

**`score_nurture_lead()` parameter interface vs `calculate_priority()` lead dict**: `score_nurture_lead()` takes individual parameters (name, city, naics_raw, is_shell, filing_date, email...) while `calculate_priority()` takes a lead dict. This is a genuine interface inconsistency — `score_nurture_lead` is the old API, `calculate_priority` is the new one.

**`enrich_leads_parallel()` doc mismatch**: AGENTS.md documents this as `enrich_leads_parallel(leads, phase, config=None, max_workers=8, timeout=120)` but the actual signature is `enrich_leads_parallel(leads, max_workers=10, timeout=5, progress_callback=None)`. The `phase` parameter doesn't exist in the actual code.

### 2.4 What's Confusingly Named or Organized

**`morning-brief.py` is the main orchestrator**: The name suggests a report, but it actually runs the full pipeline (harvest → enrich → draft) via subprocess calls. A new operator would not guess this is the primary entry point.

**`seed-planter.py` sounds active but is retired**: The name doesn't indicate it's vestigial. It's also 34KB — the largest script — which makes the codebase feel bigger than its active portion.

**Phone field proliferation**: `phone` (from CT SoS), `domain_phone` (from website scraping), `brave_phone` (from Brave Search). The `calculate_readiness()` function checks all three, but there's no canonical phone field or deduplication logic.

**`lib/enrichment.py` does too much**: It handles domain enrichment, competitor checking, Brave enrichment, equipment context, Google Places, parallel orchestration, AND backlog loading/saving. The `load_backlog()`/`save_backlog()` functions being in the enrichment module is particularly surprising.

**`template_route` still flows through the drafter**: Even though the template system is retired, the field is still computed and passed through `build_context_object()`. This is confusing — a new developer would assume it's functional.

### 2.5 What's Risky

**No locking on `cumulative-backlog.json`**: Multiple scripts read and write this file. If `morning-brief.py` (which calls harvest, enrich, and draft in sequence) is run while another script is writing, data could be lost. This is a known issue documented in AGENTS.md.

**API keys in plaintext**: `llm.api_key` and `brave_search.api_key` are in `gardener.json` which is in the git repo. The `.env.zoho` file is also in `scripts/`. These should be environment variables.

**`template_route` still computed but unused**: The drafter imports `get_template_route`, calls it, and passes the value through context. If someone modifies the template_route logic thinking it affects output, they'd be wrong. Dead code that appears alive is worse than dead code that looks dead.

**The hat_assignment zombie**: Haberdasher is retired, but `backlog-to-zoho.py` still reads `hat_assignment` and sends it to Zoho. If Zoho automation sequences depend on `Hat_Assignment`, and no new leads get hats assigned, the Zoho side degrades silently.

**`score_nurture_lead()` takes raw parameters, not a lead dict**: This means any new field that affects scoring must be added as a new parameter to this function, and every caller must be updated. This is fragile — `calculate_priority()` already takes a lead dict, creating two parallel interfaces.

### 2.6 Things That Surprised Me

**Three dead functions in active modules**: `get_agent_clusters()` in lifecycle.py, `why_this_now()` in drafter.py, and `generate_agent_referral_email()` in drafter.py are all defined but never called. In a codebase that has gone through iterations, dead code is expected, but having it in the most critical modules (drafter, lifecycle) is risky — it suggests the modules weren't cleaned up between iterations.

**`template_route` is not dead — it's a zombie**: I expected `template_route` to be fully dead (written nowhere, read nowhere). Instead, it's computed by `get_template_route()`, imported by drafter.py, called at line 119, and passed through to the context object. But no prompt text uses it. It's dead at the output but alive in the data flow.

**`enrich_leads_parallel()` has no `phase` parameter**: The documentation (AGENTS.md) describes a `phase` parameter that doesn't exist in the actual function signature. The doc was written for a version that was refactored.

**`scoring_pipeline` config section has no callers**: The config has an entire section (`scoring_pipeline`) with an accessor function (`get_scoring_pipeline()`) but I found no code that calls it. This is an entire config section that's loaded but unused.

**The webhook URL in seed-planter.py is hardcoded**: `https://workflows.residentliberal.com/webhook/jXjTXfBO3qsMMgtH/webhook/qualify-lead` — this is a real URL to a real service, sitting in a retired script. If that webhook endpoint still exists, it's a latent integration that could be triggered accidentally.

---

## Part 3: Rebuild Proposal


---

## Part 3: Rebuild Proposal (Revised for Agent-Driven Content Engine)

### 3.1 Proposed Module Structure for Agent-Driven Operations

The new architecture is organized around capabilities that OpenClaw can call, not around scripts a human runs. The directory structure reflects this:

```
gardener-v2/
├── capabilities/                 # Callable functions for OpenClaw
│   ├── research.py
│   │   new: produces structured research notes from observable evidence
│   │   replaces: business understanding currently in lib/drafter.py:build_context_object()
│   │
│   ├── drafting.py
│   │   new: generates touches using research notes and prompt files
│   │   replaces: lib/drafter.py (build_draft_prompt, build_historical_prompt)
│   │   drops: draft_email() (becomes draft_touch()), draft_batch(), draft_and_attach()
│   │   drops: why_this_now(), generate_agent_referral_email() (no callers)
│   │
│   ├── scoring.py
│   │   replaces: lib/scoring.py
│   │   new: add touch_history as scoring signal
│   │
│   ├── enrichment.py
│   │   replaces: lib/enrichment.py
│   │   new: timestamped enrichment with refresh patterns
│   │
│   ├── scheduling.py
│   │   new: determines which leads need attention and when
│   │   no current equivalent — this is new functionality
│   │
│   ├── sequencing.py
│   │   new: manages touch sequences and angle progression
│   │   no current equivalent — this is new functionality
│   │
│   └── state_queries.py
│       new: answers questions about lead state for OpenClaw
│       replaces: scattered queries currently in various scripts
│
├── state/
│   ├── backlog.py
│   │   new: atomic load/save with file locking
│   │   replaces: load_backlog()/save_backlog() from lib/enrichment.py
│   │   new: schema validation, touch_history and research_note structures
│   │
│   ├── schema.py
│   │   new: defines lead schema with validation rules
│   │   replaces: implicit schema currently in JSON structure
│   │
│   └── queries.py
│       new: optimized queries (e.g., "leads due for touch this week")
│       no current equivalent — currently requires scanning entire backlog
│
├── integrations/
│   ├── llm.py
│   │   replaces: _call_featherless() in lib/drafter.py
│   │   new: model-version tracking, structured errors for OpenClaw
│   │
│   ├── brave.py
│   │   replaces: lib/brave_enrich.py
│   │   new: timestamped results, evidence citation
│   │
│   ├── ctsos.py
│   │   replaces: CT SoS API logic currently in scripts/harvest.py
│   │   new: structured error handling for OpenClaw
│   │
│   ├── zoho.py
│   │   replaces: lib/zoho.py and all push scripts
│   │   new: push_touch() with engagement callback
│   │   consolidates: scripts/zoho-push.py, backlog-to-zoho.py, shepherd-to-zoho.py
│   │
│   └── events.py
│       new: handles Zoho webhook/callback for engagement signals
│       no current equivalent — engagement tracking is manual
│
├── prompts/
│   ├── research/
│   │   ├── v1.research.md           # replaces: implicit understanding in drafter
│   │   └── variants/                # future A/B tests
│   │
│   ├── touches/
│   │   ├── new_business/
│   │   │   ├── v1.touch.md          # replaces: build_draft_prompt() string
│   │   │   ├── v2.touch.md          # for A/B testing
│   │   │   └── angle_progression.yaml  # sequence logic
│   │   │
│   │   ├── historical/
│   │   │   └── v1.touch.md          # replaces: build_historical_prompt() string
│   │   │
│   │   └── lease_expiry/            # example of new touch type
│   │       └── v1.touch.md
│   │
│   └── prompt_registry.yaml         # maps sequence positions to prompt files
│
├── config/
│   ├── scoring.yaml                 # replaces: scoring sections from gardener.json
│   ├── enrichment.yaml              # replaces: brave_search and timing configs
│   ├── llm.yaml                     # API keys → environment variables
│   ├── zoho.yaml                    # OAuth credentials → environment variables
│   ├── scheduler.yaml               # touch intervals, sequence definitions
│   └── sequences.yaml               # which prompts for which lead types/positions
│
├── cli/                             # Thin wrapper for debugging
│   ├── main.py                      # single entry point with subcommands
│   ├── research_cmd.py              # capability: research a lead
│   ├── draft_cmd.py                 # capability: draft a touch
│   ├── schedule_cmd.py              # capability: query schedule
│   └── state_cmd.py                 # capability: query lead state
│   # Note: no harvest/enrich/draft/push scripts — those are capabilities
│
└── web/
    ├── dashboard.py                 # HTML review interface
    ├── api.py                       # REST API for webhooks
    └── review_queue.py              # touch review/approval interface
```

**Mapping of current files to new structure:**

| Current file | New location | Justification |
|---|---|---|
| `lib/scoring.py` | `capabilities/scoring.py` | Core scoring function |
| `lib/drafter.py` | `capabilities/drafting.py` + `prompts/` | Prompt logic separated from execution |
| `lib/enrichment.py` | `capabilities/enrichment.py` | Core enrichment logic |
| `lib/patterns.py` | `capabilities/scoring.py` (merged) | Pattern detection is scoring concern |
| `lib/verticals.py` | Dropped (unused in new flow) | Classification not needed for touches |
| `lib/lifecycle.py` | `capabilities/scheduling.py` (partially) | Timing logic, not windows |
| `lib/historical.py` | `capabilities/sequencing.py` (partially) | Milestone logic feeds sequence |
| `lib/config.py` | `config/` + `state/` | Split: config loading vs schema |
| `lib/zoho.py` | `integrations/zoho.py` | Consolidated push logic |
| `lib/brave_enrich.py` | `integrations/brave.py` | Renamed for consistency |
| `lib/dashboard.py`, `lib/backlog_dashboard.py`, `lib/webapp.py` | `web/` | Consolidated web interface |

**DROPPED files** (with updated justification per design decisions):
- `scripts/haberdasher.py` → Hat system retired; zombies removed from schema
- `scripts/seed-planter.py` → Template drafting retired; N8N webhook dead
- `scripts/morning-brief.py` → Linear orchestration replaced by agent decisions
- `scripts/harvest.py` → Becomes `integrations/ctsos.py` + capability calls
- `scripts/enrich-backlog.py` → Becomes `capabilities/enrichment.refresh()`
- `scripts/draft-backlog.py` → Becomes `capabilities/drafting.draft_touch()`
- `scripts/zoho-push.py`, `backlog-to-zoho.py`, `shepherd-to-zoho.py` → Consolidated to `integrations/zoho.push_touch()`
- `scripts/historical-sweep.py` → Becomes `integrations/ctsos.fetch_historical()` + scheduling
- `scripts/classify-verticals.py` → Dropped; verticals not used in touch generation
- `scripts/audit-phantom-drafts.py` → Replaced by structured validation in `state/`
- `scripts/lead-heartbeat.py`, `lead-tracker.py`, `route-planner.py`, `sales-brief-generator.py` → Can be CLI commands if needed, not core

**NOTE**: The old `lib/lifecycle.py:get_agent_clusters()` and `lib/drafter.py:why_this_now()`, `generate_agent_referral_email()` remain dropped as dead code.

---

### 3.2 Proposed Lead Schema with Research and Touch History

**Preserved fields** (from current schema, adjusted for new architecture):

| Field | Type | Purpose in new system |
|---|---|---|
| `id` | str | Unique identifier (UUID preferred) |
| `name` | str | Business name |
| `entity_type` | str | LLC, PLLC, etc. |
| `naics` | str | NAICS code (if available) |
| `filing_date` | ISO date | Date of CT SoS registration |
| `email` | str | Contact email |
| `phone` | str | Primary phone (with `phone_sources`) |
| `city`, `state` | str | Location |
| `is_shell` | bool | Shell company flag |
| `score` | int | Quality score (0-100) |
| `priority` | float | Current priority (score × readiness) |
| `readiness_weight` | float | Reachability (phone + email + website) |
| `readiness_signals` | list[str] | Signals contributing to readiness |
| `domain` | str | Email domain |
| `website_url` | str | Business website |
| `website_exists` | bool | Whether website responds |
| `website_last_checked` | ISO date | When website status was last verified |
| `brave_summary` | str | Summary from Brave Search |
| `brave_phone` | str | Phone from Brave (with citation) |
| `brave_last_searched` | ISO date | When Brave was last queried |
| `county` | str | County from Brave or geocoding |
| `competitor_displacement` | bool | Competitor presence detected |
| `source` | str | "ct_sos", "historical", or future sources |
| `pushed_to_zoho` | bool | Has been pushed to Zoho |
| `zoho_id` | str | Zoho CRM record ID |
| `first_seen` | ISO date | First appearance in system |
| `last_updated` | ISO date | Last state change |
| `lead_type` | str | "new_business", "historical", etc. |

**RESOLUTION**: `naics_score` and `name_score` are **dropped**. They're component scores rolled into `score`. The new system doesn't need independent component scores for touch generation.

**New Research Note Structure** (`research_note` field):
```yaml
research_note:
  created_at: "2026-05-20T10:30:00Z"
  updated_at: "2026-05-20T10:30:00Z"  # updated if refreshed
  sources:
    brave_query: "Acme Corp Hartford CT"
    brave_url: "https://search.brave.com/search?q=..."
    website_url: "https://acmecorp.com"  # or null
    filing_data: true
  findings:
    business_description: "Medical practice specializing in cardiology"  # 1-2 sentences, evidence only
    workflow_detail: "Patient intake forms and medical records require printing"  # 1 sentence, evidence
    presentation_notable: "Website emphasizes 'cutting-edge cardiac care' and shows modern office"  # 1 sentence
    flags: "No obvious copier/printer references on website"  # 1 sentence, often empty
  evidence_citations:
    - "Brave result 1: 'Cardiology Associates of Hartford - Full-service cardiac care'"
    - "Website meta description: 'Leading cardiology practice...'"
    - "CT SoS: NAICS 621111 (Offices of Physicians)"
  inferences:  # Separate from evidence
    - "Likely has administrative staff handling paperwork"  # clearly labeled as inference
    - "May use electronic health records with printing needs"
  staleness_flags:
    website_changed: false  # detected via hash comparison
    days_since_evidence: 7
```

**New Touch History Structure** (`touches` array):
```yaml
touches:
  - id: "touch_001"  # UUID
    sequence_position: 1  # 1-indexed in this lead's sequence
    sent_at: "2026-05-20T14:30:00Z"
    channel: "email"  # v1: only email; v2+: "phone", "linkedin"
    status: "sent"  # "sent", "delivered", "opened", "clicked", "replied", "bounced"
    
    # Content
    subject: "Re: Your new Hartford practice"
    body: "Dear Dr. Smith, I noticed your new cardiology practice..."
    angle_chosen: "new_business_angle_a"  # references prompts/angles/
    prompt_variant: "new_business/v1.touch.md"
    model_used: "deepseek-ai/DeepSeek-V3.1"
    
    # Context at send time (frozen snapshot)
    lead_snapshot:
      # Copy of relevant lead fields at moment of sending
      score: 84
      priority: 67.2
      research_note: {...}  # full research note at that time
      brave_summary: "..."   # key enrichment fields
      # This enables analysis without data drift
    
    # Engagement tracking (from Zoho webhook)
    engagement:
      delivered_at: "2026-05-20T14:31:00Z"
      opened_at: "2026-05-20T16:45:00Z"
      opened_count: 1
      clicked_links: []  # array of URLs clicked
      replied_at: null
      reply_content: null  # if replied, store first few chars
      bounce_reason: null
    
    # Drafting metadata
    draft_duration_ms: 3200
    tokens_used: 450
    banned_phrase_check: "passed"  # "passed", "failed_with_phrases"
    
    # Zoho integration
    zoho_message_id: "msg_abc123"
    zoho_campaign_id: "camp_xyz789"
```

**Field Additions for New Schema:**
- `research_note`: Structured research as above
- `touches`: Array of touch records
- `current_sequence_position`: Next touch position (null if sequence ended)
- `next_touch_due`: ISO date when next touch is scheduled
- `sequence_angle_progression`: Current angle strategy
- `engagement_summary`: Aggregated stats from all touches
- `phone_sources`: Array tracking source of each phone (["ct_sos", "brave", "domain"])

**Dropped Fields** (reaffirmed from design decision #9):
- `hat_assignment`, `hat_name`: Zombie fields from retired haberdasher
- `template_route`: Zombie field, never used in prompts
- All `mailing_jurisdiction_*`, `office_jurisdiction_*`, `billing_*` fields: Raw CT SoS data
- All diversity flags: Unused in pipeline
- `naics_score`, `name_score`: Component scores, not needed independently
- `enrichment_date`, `enrichment_method`: Replaced by timestamped fields
- `outreach`, `historical_needs_followup`, `followup_reason`: Replaced by touch history
- `brave_descriptions`, `brave_results_count`: Write-only data
- `competitor_brands_found`: Write-only data

---

### 3.3 Proposed Config Structure with Prompt Management

**Config files** (split from monolithic `gardener.json`):
```
config/
├── scoring.yaml          # NAICS tiers, PLLC detection, recency bonuses
├── enrichment.yaml       # Brave API, refresh intervals, timing configs
├── llm.yaml             # Models, API key (env var), temperature, max_tokens
├── zoho.yaml            # OAuth (env vars), field mappings, webhook URL
├── scheduler.yaml       # Touch intervals, sequence definitions
├── sequences.yaml       # Which prompts for which lead types/positions
└── prompt_registry.yaml # Maps prompt IDs to files
```

**Prompt File Management**:

Prompt files live in `prompts/` directory with clear naming:
```
prompts/
├── research/
│   ├── v1.research.md           # Primary research prompt
│   └── v2.research.md           # Future variant
│
├── touches/
│   ├── new_business/
│   │   ├── v1.touch.md          # Touch 1 for new businesses
│   │   ├── v2.followup.md       # Touch 2 (follow-up)
│   │   └── angle_a.v1.touch.md  # Alternative angle
│   │
│   ├── historical/
│   │   ├── v1.touch.md          # Historical lead touch
│   │   └── lease_expiry.md      # Lease expiry angle
│   │
│   └── experimental/
│       └── ab_test_a.md         # A/B test variants
│
└── prompt_registry.yaml         # Registry that maps IDs to files
```

**Prompt Registry Example** (`prompt_registry.yaml`):
```yaml
prompts:
  research:
    default: "research/v1.research.md"
    variants:
      v2: "research/v2.research.md"
  
  touches:
    new_business:
      sequence:
        1: "touches/new_business/v1.touch.md"
        2: "touches/new_business/v2.followup.md"
        3: "touches/new_business/v3.followup.md"
      angles:
        angle_a: "touches/new_business/angle_a.v1.touch.md"
        angle_b: "touches/new_business/angle_b.v1.touch.md"
    
    historical:
      default: "touches/historical/v1.touch.md"
      lease_expiry: "touches/historical/lease_expiry.md"
    
    experimental:
      ab_test_a: "touches/experimental/ab_test_a.md"
      ab_test_b: "touches/experimental/ab_test_b.md"
```

**Versioning and Selection**:
- **File-based versioning**: `v1.research.md`, `v2.research.md`
- **Selection logic**: `capabilities/drafting.py` reads `prompt_registry.yaml`
- **Adding new variant**: Create file, add entry to registry
- **Runtime selection**: `draft_touch(lead, prompt_id="new_business/angle_a")`
- **No code changes needed** for new prompts

**Prompt File Format** (`touches/new_business/v1.touch.md`):
```markdown
---
id: new_business_v1
version: 1.0
description: "First touch for new business filings"
context_fields: [research_note, score, lead_type, touches]
banned_phrases: ["congratulations", "no pressure", "just checking in"]
max_length_words: 120
min_length_words: 80
---

# System Prompt

You are Mark Mazza, a Connecticut office equipment specialist who reads every new
business filing. You've helped dozens of local businesses set up their first
copiers and printers.

# Context

Lead research: {{ research_note.findings.business_description }}

Previous touches: {{ touches|length }} prior contacts
{% if touches %}
Last touch angle: {{ touches[-1].angle_chosen }}
{% endif %}

# Instructions

1. Write a 90-120 word email
2. Subject line: 4-8 words, reference their business specifically
3. Do NOT use: {{ banned_phrases|join(", ") }}
4. Anchor your approach: {{ research_note.findings.workflow_detail }}

# Signature

[Signature block - loaded from config/branding.yaml]
```

---

### 3.4 Proposed Operations Cycle (Agent-Driven)

The new flow is **not** `harvest → enrich → draft → push`. It's a continuous cycle driven by OpenClaw queries and decisions:

**Cycle Step 1: Query "Which leads need attention?"**
```python
# OpenClaw calls:
needs_attention = scheduling.get_leads_needing_attention(
    window_start="2026-05-20T09:00:00Z",
    window_end="2026-05-20T17:00:00Z"
)
# Returns: [{lead_id, needed_action, urgency, reason}, ...]
```

**Cycle Step 2: Agent decides per lead**
For each lead in `needs_attention`, OpenClaw evaluates:
1. **Research freshness**: `research_note.staleness_flags.days_since_evidence > 7`
2. **Enrichment freshness**: `website_last_checked` older than threshold
3. **Touch due**: `next_touch_due` within window
4. **Sequence state**: `current_sequence_position` and available angles

**Cycle Step 3: Execute capability calls**
Based on decisions:
```python
# Example sequence for a lead due for touch with stale research:
if needs_fresh_research(lead):
    research_note = capabilities.research.research_lead(lead)
    state.backlog.update_lead(lead.id, {"research_note": research_note})

if needs_enrichment_refresh(lead):
    # Pattern A: standard refresh
    refreshed = capabilities.enrichment.refresh_lead(lead.id, pattern="standard")
    # Pattern B: deepening (once per high-priority lead)
    if lead.priority > 70 and not lead.deep_enriched:
        deepened = capabilities.enrichment.refresh_lead(lead.id, pattern="deep")

if is_touch_due(lead):
    touch = capabilities.drafting.draft_touch(
        lead_id=lead.id,
        prompt_id="new_business/angle_a",
        position=lead.current_sequence_position
    )
    
    # Review checks
    if validation.banned_phrase_check(touch.body):
        # Flag for human review
        touch.status = "needs_review"
    
    # Push to Zoho
    if touch.status == "approved":
        zoho_result = integrations.zoho.push_touch(touch)
        touch.sent_at = datetime.now()
        touch.zoho_message_id = zoho_result.message_id
        
        # Update lead state
        state.backlog.add_touch(lead.id, touch)
        state.backlog.update_lead(lead.id, {
            "current_sequence_position": lead.current_sequence_position + 1,
            "next_touch_due": scheduling.calculate_next_touch_date(lead, touch)
        })
```

**Cycle Step 4: Handle engagement signals**
```python
# Zoho webhook -> integrations.events.handle_engagement()
def handle_engagement(message_id, event_type, data):
    lead_id = state.queries.get_lead_id_for_message(message_id)
    touch = state.queries.get_touch_by_message_id(message_id)
    
    state.backlog.update_touch_engagement(lead_id, touch.id, {
        event_type: data,
        "updated_at": datetime.now()
    })
    
    # Update lead engagement summary
    state.backlog.update_lead_engagement_summary(lead_id)
    
    # Could trigger immediate follow-up for replies
    if event_type == "replied":
        scheduling.schedule_followup(lead_id, urgency="high")
```

**Key differences from current pipeline**:
1. **No linear flow**: Each lead gets individualized attention based on state
2. **Agent decisions**: OpenClaw decides what each lead needs, not a fixed script
3. **Stateful operations**: Everything queries/updates state, not file I/O
4. **Continuous cycle**: Not batch-oriented; runs continuously as OpenClaw queries

**Human involvement points**:
- Review queue for flagged touches (banned phrases, length issues)
- Escalated replies from leads
- Sequence strategy adjustments
- Research note quality review

---

### 3.5 Proposed Extension Points for New Architecture

**Scenario 1: Add a new touch type ("lease expiry approaching")**

**Current architecture**:
1. Edit `lib/historical.py` to add milestone detection
2. Edit `lib/drafter.py:build_historical_prompt()` to add lease-expiry language
3. Modify `scripts/historical-sweep.py` to filter for month-56 leads
4. No structured way to track which leads got which touch type

**New architecture**:
1. **Create prompt file**: `prompts/touches/historical/lease_expiry.md`
2. **Add to registry**: Edit `prompts/prompt_registry.yaml`:
   ```yaml
   touches:
     historical:
       lease_expiry: "touches/historical/lease_expiry.md"
   ```
3. **Add sequence logic**: Edit `config/sequences.yaml`:
   ```yaml
   sequences:
     historical_lease:
       trigger: "lead.business_age_months >= 56 and lead.business_age_months <= 58"
       prompt_id: "historical/lease_expiry"
       priority_boost: 0.2
   ```
4. **Scheduling picks it up automatically**: `scheduling.py` reads sequences config
5. **Touch records track it**: `touch.angle_chosen = "historical/lease_expiry"`

**Files changed**: 1 new prompt file, 2 config edits. **No code changes**.

**Scenario 2: Change the research prompt (add price list question)**

**Current architecture**:
1. Find and edit the prompt string buried in `lib/drafter.py:build_context_object()` logic
2. Re-run enrichment on all leads to get new research
3. No version tracking; can't A/B test

**New architecture**:
1. **Copy and modify**: `cp prompts/research/v1.research.md prompts/research/v2.research.md`
2. **Add question**: In v2, add "Does the business display pricing or service rates publicly?"
3. **Update registry**: `prompt_registry.yaml`:
   ```yaml
   research:
     default: "research/v1.research.md"
     variants:
       v2: "research/v2.research.md"
   ```
4. **Test on sample**: `capabilities.research.research_lead(lead_id, variant="v2")`
5. **Deploy selectively**: Update `config/scheduler.yaml`:
   ```yaml
   research_refresh:
     default_variant: "v1"
     test_cohort_percentage: 10  # 10% get v2 for A/B test
   ```

**Files changed**: 1 new prompt file, 2 config edits. **No code changes**.

**Scenario 3: Add a new lead source (RI Secretary of State)**

**Current architecture**:
1. Copy `scripts/harvest.py` to `scripts/ri-harvest.py`
2. Modify URL and field mappings
3. Duplicate scoring logic
4. Create new CLI entry point
5. Modify `scripts/morning-brief.py` to call it

**New architecture**:
1. **Create integration**: `integrations/risos.py` (reusing `integrations/ctsos.py` patterns)
2. **Add source field**: Schema already has `lead_source` ("ct_sos", "ri_sos", "manual")
3. **Reuse capabilities**: `capabilities.scoring.score_lead()` works for any source
4. **Configure**: Add to `config/enrichment.yaml`:
   ```yaml
   sources:
     ri_sos:
       api_url: "https://data.ri.gov/resource/..."
       field_mappings: { ... }
   ```
5. **Schedule harvesting**: Add to `config/scheduler.yaml`:
   ```yaml
   harvesting:
     ct_sos: "daily"
     ri_sos: "weekly"
   ```

**Files changed**: 1 new integration file, 2 config edits. **Core capabilities unchanged**.

**Scenario 4: Swap the drafting model with tracking**

**Current architecture**:
1. Edit `scripts/gardener.json` → `llm.models.draft`
2. Hope `_call_featherless()` handles new model's response format
3. No record of which model produced which draft
4. Phantom drafts discovered manually

**New architecture**:
1. **Update config**: `config/llm.yaml`:
   ```yaml
   models:
     draft:
       default: "deepseek-ai/DeepSeek-V3.1"
       alternatives:
         glm51: "zai-org/GLM-5.1"
         new_model: "new-provider/model-v2"
   ```
2. **Model-specific parsers**: `integrations/llm.py` has parser registry:
   ```python
   RESPONSE_PARSERS = {
       "deepseek-ai/DeepSeek-V3.1": parse_deepseek_response,
       "zai-org/GLM-5.1": parse_glm51_response,
       "new-provider/model-v2": parse_new_model_response,
   }
   ```
3. **Touch records track**: `touch.model_used = "new-provider/model-v2"`
4. **A/B test**: `config/scheduler.yaml`:
   ```yaml
   drafting:
     model_ab_test:
       cohort_a:
         model: "deepseek-ai/DeepSeek-V3.1"
         percentage: 50
       cohort_b:
         model: "new-provider/model-v2"
         percentage: 50
   ```
5. **Automatic validation**: Each batch validates response format

**Files changed**: 1 config edit, 1 new parser function. **Touch history tracks model**.

**Scenario 5: Refresh enrichment before generating next touch**

**Current architecture**:
1. Run `scripts/enrich-backlog.py --lead "Business Name"`
2. Wait for completion
3. Run `scripts/draft-backlog.py --lead "Business Name"`
4. No connection between the two steps

**New architecture**:
1. **OpenClaw calls**: `capabilities.enrichment.refresh_lead(lead_id, pattern="standard")`
2. **Returns structured result**:
   ```python
   {
       "success": True,
       "updated_fields": ["website_status", "brave_summary"],
       "timestamps_updated": ["website_last_checked", "brave_last_searched"],
       "next_refresh_recommended": "2026-05-27T10:00:00Z"
   }
   ```
3. **Lead state updated atomically** in `state/backlog.py`
4. **Drafting uses fresh data automatically** because it reads from current state
5. **Touch snapshot captures** the refreshed state in `lead_snapshot`

**Code path**: Single capability call with structured output. No manual steps.

---

### 3.6 Migration Strategy for Agent-Driven System

**Backlog Migration**:
1. **Transform existing backlog**:
   ```python
   # migration/transform_backlog.py
   for lead in old_backlog:
       new_lead = {
           # Preserved fields
           "id": lead["id"],
           "name": lead["name"],
           # ... etc
           
           # Transformations
           "research_note": create_initial_research_note(lead),
           "touches": convert_existing_drafts_to_touches(lead),
           "current_sequence_position": calculate_from_touches(lead),
           "next_touch_due": schedule_first_touch(lead),
       }
   ```
   
   **Touch history creation**: Existing `draft_subject`/`draft_body` become `touches[0]` with:
   - `sent_at`: `drafted_at` or `first_seen`
   - `status`: `sent` if `pushed_to_zoho`, else `drafted`
   - `engagement`: `unknown` for historical touches

2. **Parallel operation**:
   - **New system directory**: `gardener-v2/` (or `gardener/` if replacing)
   - **Shared backlog**: Both systems can read `cumulative-backlog.json` initially
   - **Transition phase**: New system reads, old system writes (one-way)
   - **Cutover**: Old system stops writing; new system takes over

**What continues running during transition**:
- `scripts/harvest.py` → Continues (feeds shared backlog)
- `scripts/enrich-backlog.py` → Paused (new system handles refresh)
- `scripts/draft-backlog.py` → Paused (new system drafts touches)
- `scripts/zoho-push.py` → Paused (new system pushes)
- `scripts/historical-sweep.py` → Paused (new system handles via scheduling)

**Validation criteria for cutover**:
1. **Research notes**: 100 leads successfully researched, human-reviewed
2. **Touch generation**: 50 touches generated, pass banned-phrase check
3. **Zoho integration**: 10 test touches pushed, confirm delivery
4. **Engagement tracking**: Webhook receives and processes test events
5. **Scheduling**: Correctly identifies leads due for attention
6. **State queries**: All OpenClaw queries return structured results

**Rollback plan**:
1. **Backup**: Pre-migration backup of `cumulative-backlog.json`
2. **Feature flags**: New system runs with `--dry-run` flag initially
3. **Parallel reads**: Old system can resume anytime during transition
4. **Emergency rollback**: Switch `--backlog-path` to backup, restart old system

**Timeline (approximate)**:
- Week 1: Core state layer, schema validation
- Week 2: Research capability, enrichment refresh
- Week 3: Drafting capability with prompt files
- Week 4: Scheduling, Zoho integration, webhooks
- Week 5-6: Migration, testing, parallel run
- Week 7: Cutover if validation passes

---

### 3.7 What This Proposal Does NOT Solve

**From previous section (carried over)**:
- **Reply rates**: Architecture doesn't write better copy
- **Audience targeting**: PLLC focus is strategy, not code
- **CT SoS data quality**: Garbage in, garbage out
- **LLM hallucination**: Research notes bound to evidence helps but doesn't eliminate
- **Single-operator bus factor**: Still one operator

**New limitations specific to agent-driven architecture**:

**1. OpenClaw reasoning quality dependency**:
The architecture structures information well, but success depends on OpenClaw making good decisions about:
- Which leads need fresh research vs. using cached notes
- When to refresh enrichment vs. proceed with stale data
- Which touch angle to use based on sequence position
- How to handle engagement signals (reply vs. no reply)

Bad agent decisions will produce bad outcomes regardless of architecture quality.

**2. Evidence-limited research notes**:
For leads with no website and sparse Brave results, research notes will be generic:
```yaml
findings:
  business_description: "Business filed as LLC in Hartford"  # Only from CT SoS
  workflow_detail: "Unable to determine specific workflow"  # No evidence
  presentation_notable: "No website or online presence found"
  flags: "Limited observable evidence"
```
The architecture surfaces this limitation but cannot create evidence where none exists.

**3. Sequence design requires real data**:
The architecture supports sequences of any shape:
```yaml
sequences:
  new_business:
    steps:
      1: {delay_days: 0, prompt: "new_business/v1.touch.md", angle: "intro"}
      2: {delay_days: 7, prompt: "new_business/v2.followup.md", angle: "value"}
      3: {delay_days: 14, prompt: "new_business/v3.followup.md", angle: "closing"}
```
But **choosing the optimal sequence** (delays, angles, prompts) requires:
- Reply rate data from actual touches
- A/B test results
- Time-based engagement patterns

The architecture provides the framework; optimization requires real-world testing.

**4. Webhook reliability**:
Zoho webhook delivery isn't guaranteed. The architecture includes:
- Webhook endpoint in `integrations/events.py`
- Fallback polling for missed events
- Engagement state reconciliation

But missed engagement signals could still occur, affecting sequence decisions.

**5. State consistency at scale**:
File-based locking (`state/backlog.py`) works for single-process OpenClaw. If multiple processes emerge:
- Need database backend (SQLite → PostgreSQL)
- Transaction isolation becomes critical
- The architecture anticipates this but v1 uses file locking

**6. Prompt engineering skill transfer**:
Moving prompts from code to files makes them editable, but:
- Prompt engineering skill still required
- Bad prompts still produce bad touches
- The separation just makes iteration faster

**The architecture enables** better touch sequences, evidence-grounded research, and agent-driven operations. **It does not guarantee** successful outreach — that still depends on strategy, copywriting, and lead quality.
