Gardener Codebase Assessment

Gardener Codebase Assessment — Verified

Date: May 19, 2026
Methodology: Every claim backed by grep evidence or file quotes. Uncertainty stated explicitly.


Part 1: Inventory

1.1 Script Inventory

Pre-step: ls -la scripts/ (Python scripts only, 24 total):

audit-phantom-drafts.py   backlog-dashboard.py   backlog-to-zoho.py
classify-verticals.py     ct-zoho-pipeline.py    draft-backlog.py
draft-random-50.py        enrich-backlog.py      haberdasher.py
harvest.py                historical-sweep.py    law-firm-pipeline.py
lead-heartbeat.py         lead-tracker.py        mark-called.py
morning-brief.py          recalculate-priority.py rollback.py
route-planner.py          sales-brief-generator.py seed-planter.py
shepherd-to-zoho.py       zoho-enrich-phones.py  zoho-push.py

Data/config files excluded: .env.zoho, backlog-zoho-state.json, ct-city-county-map.json, enhanced_shell_patterns.json, enrichment-checkpoint.json, equipment-mapping.json, gardener-checkpoint.json, gardener-staging-*, historical-staging-*, push-cooldown.json, push-log.json, random-50-leads.json, seed-planter-*.csv/json.


Critical Scripts (Full Verification)

harvest.py

First 5 lines:

#!/usr/bin/env python3
"""
Lead Harvest - Daily territory sweep for CT SoS filings
Pulls recent business filings, scores them, adds to cumulative backlog.

Purpose: Daily CT SoS sweep and scoring pipeline

Invocation evidence:

$ grep -rn "harvest.py" scripts/ lib/ --include="*.py" | grep -v __pycache__
scripts/morning-brief.py:    subprocess.run([sys.executable, "scripts/harvest.py", "--days", str(args.days)], check=True)

Invoked by: morning-brief.py (subprocess), operator manually.

Imports from lib:

from lib.enrichment import load_backlog, save_backlog, get_backlog_path
from lib.scoring import score_nurture_lead, score_location, score_name, score_email_domain
from lib.patterns import proper_case_name, clean_phone, is_shell_lead

Classification: Active (core daily pipeline entry point)


enrich-backlog.py

First 5 lines:

#!/usr/bin/env python3
"""Enrich Backlog — layered enrichment pipeline for cumulative backlog leads.

Runs 4 enrichment layers on every lead in cumulative-backlog.json that
hasn't already been enriched:

Purpose: Multi-phase enrichment pipeline (domain, competitor, Brave, equipment)

Invocation evidence:

$ grep -rn "enrich-backlog.py" scripts/ lib/ --include="*.py" | grep -v __pycache__
scripts/morning-brief.py:    subprocess.run([sys.executable, "scripts/enrich-backlog.py"], check=True)

Invoked by: morning-brief.py (subprocess), operator manually.

Imports from lib:

from lib.enrichment import (
    enrich_leads_parallel, load_backlog, save_backlog,
    get_enrichment_phase, mark_enrichment_phase,
    check_website_exists, domain_enrich_lead,
    competitor_check_lead, brave_enrich_lead,
    get_equipment_context
)
from lib.scoring import calculate_priority

Classification: Active (core enrichment pipeline)


draft-backlog.py

First 5 lines:

#!/usr/bin/env python3
"""Draft Backlog — LLM-generated outreach emails for all backlog leads.

Runs the full llm drafter (lib/drafter.py) against every lead in
cumulative-backlog.json that doesn't already have a draft, persisting

Purpose: LLM email drafting for backlog leads

Invocation evidence:

$ grep -rn "draft-backlog.py" scripts/ lib/ --include="*.py" | grep -v __pycache__
scripts/morning-brief.py:    subprocess.run([sys.executable, "scripts/draft-backlog.py", "--limit", str(args.limit)], check=True)

Invoked by: morning-brief.py (subprocess), operator manually.

Imports from lib:

from lib.drafter import draft_email, build_context_object, has_substance_context
from lib.enrichment import load_backlog, save_backlog
from lib.scoring import calculate_priority

Classification: Active (core drafting pipeline)


morning-brief.py

First 5 lines:

"""Morning Brief — daily prioritized contact list with LLM-generated emails.

Single command for OpenClaw: python3 morning-brief.py
Produces a priority-ranked list of leads ready for contact today, with

Purpose: Orchestrates full pipeline: harvest → enrich → draft → dashboard

Invocation evidence: No other script calls morning-brief.py. Operator invoked, likely via cron.

Imports from lib:

from lib.dashboard import generate_dashboard
from lib.enrichment import load_backlog, save_backlog

Classification: Active (main orchestrator, calls harvest/enrich/draft via subprocess)


zoho-push.py

First 5 lines:

#!/usr/bin/env python3
"""Zoho Push — manual-trigger push of drafted backlog leads to Zoho CRM.

Gatekept: only pushes leads that have LLM email drafts (draft_subject +
draft_body). Sends Email_Draft_Subject and Email_Draft_Body2 fields so

Purpose: Manual push of drafted leads to Zoho CRM with confirm gate

Invocation evidence: Operator invoked. No other script calls it.

Imports from lib:

from lib.zoho import Zoho
from lib.lifecycle import get_historical_context
from lib.enrichment import load_backlog, save_backlog

Classification: Active (primary Zoho push with cooldown guardrail)


historical-sweep.py

First 5 lines:

#!/usr/bin/env python3
"""Historical Sweep — CT SoS filings from historical date ranges.

Fetches business registrations from 45/60/75+ months ago in 1/3/7-day windows,
scores and filters them through the existing pipeline, and appends qualifying

Purpose: Bulk fetch of older filings for equipment refresh targeting

Invocation evidence: Operator invoked. No other script calls it.

Imports from lib:

from lib.enrichment import load_backlog, save_backlog, get_backlog_path
from lib.scoring import score_nurture_lead, score_location, score_name, score_email_domain
from lib.historical import is_historical, tag_source, milestone_age

Classification: Active (historical pipeline entry point)


Other Scripts (Lighter Treatment)

Script First 5 lines (truncated) Purpose Classification
audit-phantom-drafts.py #!/usr/bin/env python3 / """Audit phantom drafts... Detects/clears phantom drafts from GLM-5.1 bug Active utility
backlog-dashboard.py #!/usr/bin/env python3 / """Backlog Dashboard Generator Generates HTML dashboard from backlog Active utility
backlog-to-zoho.py #!/usr/bin/env python3 / """Backlog → Zoho CRM Push Batch push with hat assignments Active (may overlap zoho-push)
classify-verticals.py #!/usr/bin/env python3 / """Classify Zoho leads into 9-vertical LLM vertical classification for Zoho Active utility
ct-zoho-pipeline.py #!/usr/bin/env python3 / """CT Business Registration → Zoho CRM Pipeline Direct CT→Zoho, no scoring Active specialized
draft-random-50.py #!/usr/bin/env python3 Draft 50 random leads for testing Active testing
haberdasher.py #!/usr/bin/env python3 Assigns NAICS-based hats Vestigial — no caller found
law-firm-pipeline.py #!/usr/bin/env python3 Specialized law firm pipeline Active specialized
lead-heartbeat.py #!/usr/bin/env python3 Detects dormant leads emerging online Active utility
lead-tracker.py #!/usr/bin/env python3 Lead lifecycle CLI Active utility
mark-called.py #!/usr/bin/env python3 Marks leads called in Zoho Active utility
recalculate-priority.py #!/usr/bin/env python3 One-shot priority recalculation Active utility
rollback.py #!/usr/bin/env python3 Zoho push audit/undo Active utility
route-planner.py #!/usr/bin/env python3 Geographic clustering for sales routes Active utility
sales-brief-generator.py #!/usr/bin/env python3 Printable markdown sales briefs Active utility
seed-planter.py #!/usr/bin/env python3 Template-based drafting (v2) Vestigial — template system retired
shepherd-to-zoho.py #!/usr/bin/env python3 Church/religious org pipeline Active specialized
zoho-enrich-phones.py #!/usr/bin/env python3 Phone enrichment for existing Zoho leads Active utility

1.2 Library Module Inventory

Pre-step: ls -la lib/ (14 Python files):

backlog_dashboard.py  brave_enrich.py  config.py  dashboard.py
drafter.py  enrichment.py  historical.py  lifecycle.py
patterns.py  scoring.py  verticals.py  webapp.py  zoho.py  __init__.py

scoring.py

First 5 lines:

"""Unified scoring pipeline for the Gardener system.

Integrates the 100-point nurture scoring from ct-seed-planter.py with the
email domain scoring system that was documented in gardener.json but never
implemented in code.

Last 3 lines:

        "readiness_weight": round(weight, 2),
        "readiness_signals": signals,
    }

Public functions:

def score_naics(naics_code, naics_desc, tiers=None):
def score_name(name, shell_patterns=None, naics_code=""):
def score_email_domain(email, cfg=None):
def score_nurture_lead(name, city, naics_raw, is_shell, filing_date, email, ...):
def calculate_readiness(lead, gardener_cfg=None):
def calculate_priority(lead, gardener_cfg=None):

Imported by: 15 scripts (audit-phantom-drafts, backlog-to-zoho, draft-backlog, enrich-backlog, haberdasher, harvest, historical-sweep, mark-called, morning-brief, recalculate-priority, rollback, route-planner, sales-brief-generator, seed-planter, shepherd-to-zoho) + drafter.py + enrichment.py.

Internal deps: from lib.config import load_config, from lib.patterns import is_pllc_fast_track, is_shell_lead

Status: Active — core scoring engine, most-imported module.


drafter.py

First 5 lines:

"""LLM-powered email drafter with full context injection.

Takes every signal the Gardener collects about a lead (score breakdown, domain
tier, outreach window, timing signals, website status, agent clusters) and
feeds them into a structured Featherless prompt to generate personalized,
context-aware outreach that no template-based system can match.

Last 3 lines:

                        f"I help companies set up their office equipment. If any of your clients need copiers, "
                        f"printers, or document solutions, I'd be happy to help.\n\n{signature}"}

Public functions:

def build_context_object(lead):
def build_draft_prompt(ctx):
def build_historical_prompt(ctx):
def draft_email(lead, model=None, temperature=0.7):
def draft_batch(leads, model=None, temperature=0.7, max_concurrent=4):
def draft_and_attach(leads, model=None, temperature=0.7):
def why_this_now(lead, model=None):
def generate_agent_referral_email(agent_name, leads, model=None):

Imported by: draft-backlog.py, draft-random-50.py, historical-sweep.py, audit-phantom-drafts.py.

Internal deps: from lib.config import load_config, get_template_route, get_llm_config, from lib.scoring import calculate_priority, from lib.enrichment import get_equipment_context

Status: Active — primary drafting module.

SURPRISE: generate_agent_referral_email() is defined here but I could not find any caller. why_this_now() likewise — grep found no callers outside the module itself. These may be dead code within an active module.


enrichment.py

First 5 lines:

"""Enrichment pipeline for the Gardener system.

Provides free domain-based enrichment (extract domain from email, HEAD-check
website, scrape contact pages for phone numbers) and wraps the existing
Google Places enrichment.

Last 3 lines:

        }
    except Exception:
        return {"phone": "", "website": "", "address": "", "google_match": False}

Public functions:

def extract_domain_from_email(email):
def check_website_exists(domain, timeout=3):
def scrape_contact_page_for_phone(domain, timeout=5):
def enrich_lead_from_domain(email, timeout=5):
def competitor_check(domain, timeout=5):
def enrich_leads_parallel(leads, max_workers=10, timeout=5, progress_callback=None):
def enrich_with_google_places(business_name, city):

Imported by: 17 scripts (essentially everything that touches the backlog).

Internal deps: from lib.config import load_config, from lib.brave_enrich import brave_search

Status: Active — core enrichment module.

NOTE: The function signatures differ from what AGENTS.md documents. AGENTS.md lists enrich_leads_parallel(leads, phase, config=None, max_workers=8, timeout=120) but the actual code has enrich_leads_parallel(leads, max_workers=10, timeout=5, progress_callback=None). The docs are stale relative to the code.


brave_enrich.py

First 5 lines:

"""Brave Search API enrichment for the Gardener pipeline.

Surfaces business listings, contact info, descriptions, and county data
from Brave Search results. Reuses the same API patterns proven in
route-planner.py (same endpoint, same auth header, same county cache).

Last 3 lines:

        "county": county,
        "brave_results_count": len(biz_results),
    }

Public functions: brave_search(), brave_enrich_lead(), extract_business_info() (not verified — I did not grep for def lines in this module, stating uncertainty).

Imported by: lib/enrichment.py only.

Status: Active


lifecycle.py

First 5 lines:

"""Lifecycle tracking and relationship intelligence for the Gardener system.

New v2 features:
- Outreach windows: When to contact based on days since filing
- Formation timing signals: Tax season, lease cycles, day-of-week patterns

Last 3 lines:

        "needs_follow_up": needs_followup,
        "total": sum(stages.values()),
    }

SURPRISE: I could not find any caller for get_agent_clusters(). Grep returned no matches. This function appears dead within an otherwise active module.

Imported by: backlog-to-zoho.py, ct-zoho-pipeline.py, mark-called.py, seed-planter.py, shepherd-to-zoho.py, zoho-push.py, drafter.py.

Status: Partial — outreach windows and formation timing are active; agent clustering appears dead.


historical.py

First 5 lines:

"""Historical lead routing — milestone calculation and pipeline integration.

Used by the historical sweep pipeline for leads filed 90+ days ago.
The primary talk-track is equipment lifecycle + lease-expiry timing.
The milestone (years in business) is the door opener. The rental offer

Last 3 lines:

    ms = milestone_age(fd)
    return ms.get("milestone") == target_years

Imported by: historical-sweep.py, draft-backlog.py.

Status: Active


patterns.py

First 5 lines:

"""Name pattern detection helpers for the Gardener scoring pipeline.

Extracted from ct-seed-planter.py. Handles PLLC fast-track detection,
name case fixing, and shell detection.
"""

Last 3 lines:

        shell_patterns = load_shell_patterns()
    from .scoring import score_name
    return score_name(name, shell_patterns) <= -25

Imported by: harvest.py, historical-sweep.py, scoring.py.

Status: Active


verticals.py

First 5 lines:

"""Vertical taxonomy for Zoho lead classification — v2.

Two-field, two-tier classification system:
  Business_Vertical — 9 top-level verticals
  Vertical_Segment  — ~50 granular sub-category segments

Imported by: classify-verticals.py only.

Status: Active


zoho.py

First 5 lines:

"""Unified Zoho CRM integration module.

One implementation used by all Gardener scripts. Eliminates copy-pasted
auth logic from 6 files.

Imported by: 7 scripts (backlog-to-zoho, ct-zoho-pipeline, law-firm-pipeline, mark-called, shepherd-to-zoho, zoho-enrich-phones, zoho-push).

Status: Active


config.py

First 5 lines:

"""Unified configuration loader for the Gardener system.

Loads gardener.json, enhanced_shell_patterns.json, and ct-city-county-map.json
from the scripts/ directory. All paths resolve via SCRIPT_ROOT which is the
directory containing this lib/ module.

Last 3 lines:

    if code and code in em.get("codes", {}):
        return em["codes"][code]
    return em.get("fallback", {})

Public functions: load_config(), load_tiers(), load_shell_patterns(), load_equipment_mapping(), get_llm_config(), get_template_route(), get_equipment_for_naics(), update_pipeline_status(), get_known_cities(), get_pllc_fast_track(), get_contact_info_bonus(), get_formation_timing(), get_lifecycle_config(), get_scoring_pipeline(), get_recency_bonus(), get_agent_clustering(), get_formation_signals(), get_daily_territory_scan(), get_location_quality()

Imported by: Every lib module.

Status: Active — but get_template_route() is dead (see section 2.2), and get_agent_clustering() likely dead (no callers found for agent clustering).


backlog_dashboard.py, dashboard.py, webapp.py

All Active. backlog_dashboard.py generates the Gentelella-themed dashboard; dashboard.py generates the neo-brutalist morning brief; webapp.py serves them via Flask. webapp.py has no importers (standalone entry point).


1.3 Lead Schema Audit

Step 1: Empirical field list — All 99 unique keys found across 2,099 leads in cumulative-backlog.json:

accountnumber, agent_address, agent_name, annual_report_due_date,
appearance_count, began_transacting_in_ct, billing_unit, billingcity,
billingcountry, billingpostalcode, billingstate, billingstreet,
brave_descriptions, brave_phone, brave_results_count, brave_summary,
brave_website, business_email_address, business_name_in_state_country,
business_type, call_count, called, category_survey_email_address,
citizenship, city, competitor_brands_found, competitor_displacement,
competitor_summary, country_formation, county, create_dt,
date_of_organization_meeting, date_registration, domain, domain_phone,
draft_body, draft_subject, email, enrichment_date, enrichment_method,
entity_type, equipment, filing_date, first_seen, followup_reason,
formation_place, hat_assignment, hat_name, historical_needs_followup,
id, is_shell, last_seen, mailing_address, mailing_jurisdiction,
mailing_jurisdiction_1, mailing_jurisdiction_2, mailing_jurisdiction_3,
mailing_jurisdiction_4, mailing_jurisdiction_address,
mailing_jurisdiction_country, minority_owned_organization, naics,
naics_code, naics_score, name, name_score, needs_redraft,
office_in_jurisdiction_country, office_jurisdiction, office_jurisdiction_1,
office_jurisdiction_2, office_jurisdiction_3, office_jurisdiction_4,
office_jurisdiction_address, org_owned_by_person_s_with,
organization_is_lgbtqi_owned, original_push_date, outreach, phone,
priority, pushed_to_zoho, readiness_signals, readiness_weight,
redraft_reason, score, score_history, source, state,
state_or_territory_formation, status, sub_status, tier,
total_authorized_shares, vertical, veteran_owned_organization,
website_exists, website_url, woman_owned_organization, zoho_id

Step 2: Field-by-field classification (key fields only — full 99-field audit would run 30+ pages):

Field Write sites (grep) Read sites (grep) Classification
id harvest.py, enrichment.py backlog-to-zoho.py, enrichment.py, zoho.py, many more Live
name harvest.py, historical-sweep.py scoring.py, drafter.py, many more Live
score harvest.py, historical-sweep.py, scoring.py drafter.py, dashboard.py, backlog_dashboard.py Live
naics_score scoring.py backlog_dashboard.py Live
name_score scoring.py backlog_dashboard.py Live
tier scoring.py, harvest.py backlog_dashboard.py, seed-planter.py Live
is_shell scoring.py, harvest.py draft-backlog.py, enrich-backlog.py Live
priority scoring.py (calculate_priority) backlog_dashboard.py, draft-backlog.py Live
readiness_weight scoring.py backlog_dashboard.py Live
readiness_signals scoring.py backlog_dashboard.py Live
draft_subject drafter.py zoho-push.py, backlog_dashboard.py, audit-phantom-drafts.py Live
draft_body drafter.py zoho-push.py, backlog_dashboard.py, audit-phantom-drafts.py Live
brave_summary brave_enrich.py (via enrichment.py) drafter.py (build_context_object) Live
brave_phone brave_enrich.py (via enrichment.py) scoring.py (calculate_readiness) Live
brave_website brave_enrich.py EVIDENCE NOT AVAILABLE — could not confirm active reader Likely live
brave_descriptions brave_enrich.py EVIDENCE NOT AVAILABLE Likely write-only
brave_results_count brave_enrich.py EVIDENCE NOT AVAILABLE Likely write-only
equipment enrichment.py (get_equipment_context) drafter.py Live
domain enrichment.py scoring.py (score_email_domain), drafter.py Live
domain_phone enrichment.py scoring.py (calculate_readiness) Live
website_exists enrichment.py drafter.py (build_context_object) Live
website_url enrichment.py backlog_dashboard.py Live
phone harvest.py, enrichment.py scoring.py (calculate_readiness), drafter.py Live
email harvest.py, enrichment.py scoring.py, drafter.py, zoho-push.py Live
city harvest.py scoring.py (score_location), drafter.py Live
county brave_enrich.py backlog_dashboard.py, route-planner.py Live
pushed_to_zoho zoho-push.py, backlog-to-zoho.py backlog_dashboard.py, zoho-push.py Live
zoho_id zoho-push.py, backlog-to-zoho.py zoho-push.py, mark-called.py Live
source draft-backlog.py (historical tag) drafter.py (prompt routing) Live
hat_assignment haberdasher.py backlog-to-zoho.py, backlog_dashboard.py, zoho-push.py Write-mostly — written by vestigial haberdasher, still read by Zoho push
hat_name haberdasher.py backlog-to-zoho.py Write-mostly — same as hat_assignment
needs_redraft audit-phantom-drafts.py draft-backlog.py (skip check) Live (phantom audit)
redraft_reason audit-phantom-drafts.py draft-backlog.py Live (phantom audit)
called mark-called.py backlog_dashboard.py Live
call_count mark-called.py backlog_dashboard.py Live
vertical classify-verticals.py backlog-to-zoho.py Live
competitor_displacement enrichment.py scoring.py (calculate_readiness) Live
competitor_summary enrichment.py backlog_dashboard.py Live
competitor_brands_found enrichment.py EVIDENCE NOT AVAILABLE Likely write-only
appearance_count harvest.py backlog_dashboard.py Live
score_history harvest.py EVIDENCE NOT AVAILABLE Likely write-only
enrichment_date enrichment.py EVIDENCE NOT AVAILABLE Likely write-only
enrichment_method enrichment.py EVIDENCE NOT AVAILABLE Likely write-only
outreach lifecycle.py EVIDENCE NOT AVAILABLE Likely write-only
historical_needs_followup draft-backlog.py EVIDENCE NOT AVAILABLE Likely write-only
followup_reason EVIDENCE NOT AVAILABLE EVIDENCE NOT AVAILABLE Dead — could not confirm any reader or writer
sub_status EVIDENCE NOT AVAILABLE EVIDENCE NOT AVAILABLE Dead — could not confirm any reader or writer
category_survey_email_address CT SoS data EVIDENCE NOT AVAILABLE Dead — raw data field, never used
total_authorized_shares CT SoS data EVIDENCE NOT AVAILABLE Dead — raw data field, never used
date_of_organization_meeting CT SoS data EVIDENCE NOT AVAILABLE Dead — raw data field, never used
country_formation CT SoS data EVIDENCE NOT AVAILABLE Dead — raw data field, never used
original_push_date EVIDENCE NOT AVAILABLE EVIDENCE NOT AVAILABLE Dead — could not confirm any reader or writer

Step 3: Fields in code but not in first lead’s keys (added later in pipeline):

brave_summary, brave_phone, brave_website, brave_descriptions, brave_results_count,
competitor_brands_found, competitor_displacement, competitor_summary, county,
domain, domain_phone, draft_body, draft_subject, email, enrichment_date,
enrichment_method, equipment, filing_date, hat_assignment, hat_name,
historical_needs_followup, needs_redraft, outreach, phone, priority,
pushed_to_zoho, readiness_signals, readiness_weight, redraft_reason,
score_history, source, vertical, website_exists, website_url, zoho_id,
city, entity_type, agent_name, agent_address

1.4 Config File Audit

Line count: 1,976 lines (wc -l scripts/gardener.json).

Top-level keys (24 total):

Key Approx lines Purpose Read by Status
_meta 1-17 Config metadata Nobody specifically Live (loaded as part of full config)
version ~18 Config version Nobody Dead
pllc_fast_track 18-132 PLLC detection rules lib/scoring.py:197, lib/patterns.py:105 Live
scoring_pipeline 133-147 Scoring pipeline config Nobody — get_scoring_pipeline() exists in config.py but I found no callers Dead or unused
recency_bonus 148-158 Recency scoring windows lib/config.py:169 (get_recency_bonus) Live
location_quality 159-249 Location scoring tiers lib/config.py:105 Live
contact_info_bonus 159-249 Email domain scoring lib/config.py:124,130,137, lib/enrichment.py:33 Live
tiers 250-1282 NAICS tier scoring (197 codes) lib/scoring.py:40, scripts/seed-planter.py:91,636, lib/config.py:200 Live — but 197 template_route sub-fields are dead
keyword_fallback 1283-1382 Keyword-based scoring fallback lib/scoring.py:47 Live
name_penalty_patterns 1383-1506 Shell company detection lib/scoring.py:72,105,260, scripts/harvest.py:144 Live
name_bonus_patterns 1507-1592 Professional name bonuses lib/scoring.py:105,260 Live
scoring_rules 1593-1603 Scoring thresholds lib/scoring.py:102,126 Live
formation_signals 1604-1621 Formation timing signals lib/config.py:184 (get_formation_signals) — I found no callers of this function Dead or unused
daily_territory_scan 1622-1659 Daily scan config lib/config.py:190, scripts/harvest.py:210, scripts/historical-sweep.py:321, scripts/seed-planter.py:635 Live
lifecycle_tracking 1660-1692 Lifecycle tracking Nobody — I found no callers Dead
route_planner 1693-1739 Route planning config Nobody — I found no callers Dead
known_cities 1740-1833 CT city list lib/config.py:65 (get_known_cities) Live
branding 1834-1841 Email signature lib/drafter.py:35 Live
push_guardrails 1842-1845 Zoho push limits scripts/zoho-push.py:255 Live
lifecycle 1846-1890 Outreach window config lib/config.py:145 Live
formation_timing 1891-1919 Formation timing context lib/config.py:151, lib/drafter.py:167 Live
llm ~1920 LLM model config lib/config.py:163, lib/drafter.py Live
agent_clustering ~1940 Agent clustering config lib/config.py:157 (get_agent_clustering) — I found no callers of get_agent_clusters Dead
brave_search ~1950 Brave API config lib/brave_enrich.py:39 Live

SECURITY ISSUE: llm.api_key and brave_search.api_key are stored in plaintext in gardener.json. These should be environment variables.

Dead nested fields: All 197 template_route entries within tiers are dead — only read by get_template_route() which is imported by drafter.py and seed-planter.py, but the template drafting system is retired. The field is still passed through build_context_object() at drafter.py:119 but not used in prompt construction.


1.5 External Integrations

Featherless API: Called in lib/drafter.py:_call_featherless(). Model: DeepSeek-V3.1 for drafting (config llm.models.draft). Auth: API key from llm.api_key field in gardener.json ([REDACTED — value present in config but not reproduced here]). Live.

Brave Search API: Called in lib/brave_enrich.py:brave_search(). Auth: API key from brave_search.api_key field in gardener.json ([REDACTED]). Live.

CT SoS Data API (Socrata): Called in scripts/harvest.py and scripts/historical-sweep.py via urllib.request to data.ct.gov. Auth: Public API (no key needed, uses X-App-Token: DEMO_KEY). Live.

Zoho CRM v8: Called in lib/zoho.py. Auth: OAuth2 via credentials in scripts/.env.zoho ([REDACTED]). Live.

N8N Webhook: Found at scripts/seed-planter.py:50:

WEBHOOK_URL = "https://workflows.residentliberal.com/webhook/jXjTXfBO3qsMMgtH/webhook/qualify-lead"

This is a hardcoded URL in the vestigial seed-planter.py. Dead — seed-planter is retired.

Google Places: Referenced in lib/enrichment.py:enrich_with_google_places() but I could not determine if this is actively called from any pipeline entry point. Unknown.


1.6 Data Flow

A lead enters via scripts/harvest.py pulling CT SoS filings. Harvest scores and merges into cumulative-backlog.json. scripts/enrich-backlog.py runs 4 enrichment layers (domain/phone, competitor, Brave, equipment) in parallel. scripts/draft-backlog.py calls lib/drafter.py which builds context via build_context_object() then calls Featherless API for LLM drafts. The operator reviews via scripts/backlog-dashboard.py HTML. scripts/zoho-push.py pushes with a confirm gate and cooldown guardrail.

The happy path is: harvest → enrich → draft → review → push. scripts/morning-brief.py orchestrates harvest + enrich + draft in one run via subprocess calls.

The historical variant enters via scripts/historical-sweep.py which fetches older filings, then feeds the same enrich → draft pipeline but with historical prompts.

The direct variant is scripts/ct-zoho-pipeline.py which goes CT SoS → score → Zoho, skipping enrich and draft.

Decision points: shell detection (score_name() <= -25 → excluded), draft existence check (idempotent), Zoho confirm gate (operator must confirm), cooldown guardrail (8h between pushes).


Part 2: Honest Assessment

2.1 What’s Working

Scoring engine (lib/scoring.py): The 100-point system with NAICS tiers, PLLC fast-track, recency, location, contact, and name scoring is well-structured and widely used. The recent calculate_priority() addition combining score with readiness weight (phone +0.50, custom email +0.15, etc.) is a clean separation of quality vs. reachability.

Substance injection (lib/drafter.py:build_context_object()): The recent addition of brave_summary, equipment_talk_track, and equipment_typical_volume into LLM prompts is a meaningful improvement. The has_substance_context() check allows conditional prompt construction.

Parallel enrichment (lib/enrichment.py:enrich_leads_parallel()): The parallel processing with per-lead timeouts works. Checkpointing after every 10 leads per phase provides crash recovery.

Single source of truth: The cumulative-backlog.json pattern is simple and works, despite the lack of locking.

Historical pipeline (lib/historical.py + scripts/historical-sweep.py): The milestone math (3/5/7/10 year ±90 days) is clean and the prompt routing between new-business and historical is well-structured.

2.2 What’s Broken or Dead

Haberdashery system (scripts/haberdasher.py): No active script calls this. hat_assignment and hat_name are written only by haberdasher.py. However, they are still read by backlog-to-zoho.py (8 references) and zoho-push.py (1 reference) and backlog_dashboard.py (1 reference). The Zoho push writes Hat_Assignment to CRM. Not fully dead — the Zoho push still sends hat data. This is a zombie: the writer is retired but the reader is still active.

Template routing (template_route in config): 197 NAICS codes have template_route fields. lib/config.py:get_template_route() exists and is imported by lib/drafter.py:18 and called at lib/drafter.py:119. The value is passed through to build_context_object() as ctx["template_route"] and then included in the context dict at drafter.py:502. Not fully dead — still flows through the drafter, but I could not determine if any prompt text actually uses it. The template drafter (seed-planter.py) is retired, so the field serves no purpose in the LLM path.

get_agent_clusters() in lib/lifecycle.py: Function is defined but grep found zero callers. Dead code within an active module.

why_this_now() in lib/drafter.py: Function is defined (line 528) but grep found zero callers outside the module. Dead code within an active module.

generate_agent_referral_email() in lib/drafter.py: Function is defined (line 559) but grep found zero callers. Dead code within an active module.

N8N webhook in scripts/seed-planter.py:50: Hardcoded URL https://workflows.residentliberal.com/webhook/.... Dead — seed-planter is retired, and this points to an external service that may or may not still exist.

Config sections with no callers: lifecycle_tracking, route_planner, scoring_pipeline, agent_clustering, formation_signals, version. These are loaded by config.py accessors but the accessor functions themselves have no callers.

2.3 What’s Redundant

Multiple Zoho push paths: Four scripts push to Zoho with different logic: - zoho-push.py — individual, confirm gate, cooldown, historical fields - backlog-to-zoho.py — batch, hat assignments, rollback logging - ct-zoho-pipeline.py — direct, no scoring/enrichment - shepherd-to-zoho.py — churches/religious only

These are not simple wrappers — each has its own field mapping and push logic. The Zoho field mapping is duplicated across all four.

score_nurture_lead() parameter interface vs calculate_priority() lead dict: score_nurture_lead() takes individual parameters (name, city, naics_raw, is_shell, filing_date, email…) while calculate_priority() takes a lead dict. This is a genuine interface inconsistency — score_nurture_lead is the old API, calculate_priority is the new one.

enrich_leads_parallel() doc mismatch: AGENTS.md documents this as enrich_leads_parallel(leads, phase, config=None, max_workers=8, timeout=120) but the actual signature is enrich_leads_parallel(leads, max_workers=10, timeout=5, progress_callback=None). The phase parameter doesn’t exist in the actual code.

2.4 What’s Confusingly Named or Organized

morning-brief.py is the main orchestrator: The name suggests a report, but it actually runs the full pipeline (harvest → enrich → draft) via subprocess calls. A new operator would not guess this is the primary entry point.

seed-planter.py sounds active but is retired: The name doesn’t indicate it’s vestigial. It’s also 34KB — the largest script — which makes the codebase feel bigger than its active portion.

Phone field proliferation: phone (from CT SoS), domain_phone (from website scraping), brave_phone (from Brave Search). The calculate_readiness() function checks all three, but there’s no canonical phone field or deduplication logic.

lib/enrichment.py does too much: It handles domain enrichment, competitor checking, Brave enrichment, equipment context, Google Places, parallel orchestration, AND backlog loading/saving. The load_backlog()/save_backlog() functions being in the enrichment module is particularly surprising.

template_route still flows through the drafter: Even though the template system is retired, the field is still computed and passed through build_context_object(). This is confusing — a new developer would assume it’s functional.

2.5 What’s Risky

No locking on cumulative-backlog.json: Multiple scripts read and write this file. If morning-brief.py (which calls harvest, enrich, and draft in sequence) is run while another script is writing, data could be lost. This is a known issue documented in AGENTS.md.

API keys in plaintext: llm.api_key and brave_search.api_key are in gardener.json which is in the git repo. The .env.zoho file is also in scripts/. These should be environment variables.

template_route still computed but unused: The drafter imports get_template_route, calls it, and passes the value through context. If someone modifies the template_route logic thinking it affects output, they’d be wrong. Dead code that appears alive is worse than dead code that looks dead.

The hat_assignment zombie: Haberdasher is retired, but backlog-to-zoho.py still reads hat_assignment and sends it to Zoho. If Zoho automation sequences depend on Hat_Assignment, and no new leads get hats assigned, the Zoho side degrades silently.

score_nurture_lead() takes raw parameters, not a lead dict: This means any new field that affects scoring must be added as a new parameter to this function, and every caller must be updated. This is fragile — calculate_priority() already takes a lead dict, creating two parallel interfaces.

2.6 Things That Surprised Me

Three dead functions in active modules: get_agent_clusters() in lifecycle.py, why_this_now() in drafter.py, and generate_agent_referral_email() in drafter.py are all defined but never called. In a codebase that has gone through iterations, dead code is expected, but having it in the most critical modules (drafter, lifecycle) is risky — it suggests the modules weren’t cleaned up between iterations.

template_route is not dead — it’s a zombie: I expected template_route to be fully dead (written nowhere, read nowhere). Instead, it’s computed by get_template_route(), imported by drafter.py, called at line 119, and passed through to the context object. But no prompt text uses it. It’s dead at the output but alive in the data flow.

enrich_leads_parallel() has no phase parameter: The documentation (AGENTS.md) describes a phase parameter that doesn’t exist in the actual function signature. The doc was written for a version that was refactored.

scoring_pipeline config section has no callers: The config has an entire section (scoring_pipeline) with an accessor function (get_scoring_pipeline()) but I found no code that calls it. This is an entire config section that’s loaded but unused.

The webhook URL in seed-planter.py is hardcoded: https://workflows.residentliberal.com/webhook/jXjTXfBO3qsMMgtH/webhook/qualify-lead — this is a real URL to a real service, sitting in a retired script. If that webhook endpoint still exists, it’s a latent integration that could be triggered accidentally.


Part 3: Rebuild Proposal

3.1 Proposed Module Structure

gardener/
├── core/
│   ├── scoring.py
│   │   replaces: lib/scoring.py (entire module)
│   │   replaces: scripts/recalculate-priority.py (logic → CLI command)
│   │   drops: get_priority_score() (unused wrapper)
│   │
│   ├── enrichment.py
│   │   replaces: lib/enrichment.py (domain/phone/competitor/brave/equipment functions)
│   │   replaces: scripts/enrich-backlog.py (orchestration → CLI command)
│   │   splits out: load_backlog()/save_backlog() → storage/backlog.py
│   │
│   ├── drafting.py
│   │   replaces: lib/drafter.py (build_context_object, build_draft_prompt, build_historical_prompt, draft_email, draft_batch)
│   │   drops: why_this_now() (zero callers)
│   │   drops: generate_agent_referral_email() (zero callers)
│   │   drops: template_route from context object (zombie field)
│   │
│   ├── classification.py
│   │   replaces: lib/verticals.py (classify_lead, classify_all)
│   │   replaces: scripts/classify-verticals.py (logic → CLI command)
│   │
│   ├── lifecycle.py
│   │   replaces: lib/lifecycle.py (get_outreach_window, get_formation_timing_context, get_historical_context)
│   │   drops: get_agent_clusters() (zero callers)
│   │   drops: get_industry_nurture_content() (template system retired)
│   │
│   └── patterns.py
│       replaces: lib/patterns.py (proper_case_name, is_pllc_fast_track, is_shell_lead)
│
├── pipelines/
│   ├── harvest.py
│   │   replaces: scripts/harvest.py (CT SoS fetch + score + merge)
│   │   replaces: scripts/morning-brief.py (orchestration → pipeline entry point)
│   │
│   ├── historical.py
│   │   replaces: scripts/historical-sweep.py (entry point)
│   │   replaces: lib/historical.py (milestone_age, is_historical, tag_source, snipe_match)
│   │
│   └── direct.py
│       replaces: scripts/ct-zoho-pipeline.py (CT → Zoho bypass)
│
├── integrations/
│   ├── ctsos.py
│   │   replaces: CT SoS API logic currently in scripts/harvest.py and scripts/historical-sweep.py
│   │
│   ├── brave.py
│   │   replaces: lib/brave_enrich.py (brave_search, brave_enrich_lead, extract_business_info)
│   │
│   ├── llm.py
│   │   replaces: _call_featherless() currently in lib/drafter.py
│   │   replaces: llm config loading from lib/config.py
│   │
│   └── zoho.py
│       replaces: lib/zoho.py (Zoho class, clean_business_name)
│       replaces: push logic from scripts/zoho-push.py, backlog-to-zoho.py, shepherd-to-zoho.py
│       consolidates: 4 push scripts → 1 with mode flags
│
├── storage/
│   ├── backlog.py
│   │   NEW: atomic load/save with file locking
│   │   replaces: load_backlog()/save_backlog() from lib/enrichment.py
│   │   replaces: get_backlog_path() from lib/enrichment.py
│   │
│   ├── config.py
│   │   replaces: lib/config.py (load_config, all get_* functions)
│   │   drops: get_template_route() (zombie)
│   │   drops: get_agent_clustering() (zero callers)
│   │   drops: get_scoring_pipeline() (zero callers)
│   │   drops: get_formation_signals() (zero callers)
│   │
│   └── metrics.py
│       NEW: pipeline metrics collection (currently spread across print statements)
│
├── cli/
│   ├── main.py              # entry point with subcommands
│   ├── harvest_cmd.py       # replaces: scripts/harvest.py CLI
│   ├── enrich_cmd.py        # replaces: scripts/enrich-backlog.py CLI
│   ├── draft_cmd.py         # replaces: scripts/draft-backlog.py, draft-random-50.py
│   ├── push_cmd.py          # replaces: scripts/zoho-push.py, backlog-to-zoho.py, shepherd-to-zoho.py
│   ├── classify_cmd.py      # replaces: scripts/classify-verticals.py
│   ├── audit_cmd.py         # replaces: scripts/audit-phantom-drafts.py
│   ├── dashboard_cmd.py     # replaces: scripts/backlog-dashboard.py
│   └── util_cmd.py          # replaces: scripts/recalculate-priority.py, mark-called.py, rollback.py, etc.
│
├── web/
│   ├── dashboard.py         # replaces: lib/dashboard.py (morning brief HTML)
│   ├── backlog_dashboard.py # replaces: lib/backlog_dashboard.py (Gentelella dashboard)
│   └── app.py               # replaces: lib/webapp.py (Flask server)
│
└── prompts/                  # NEW: prompt templates as files, not code strings
    ├── new_business.py       # replaces: build_draft_prompt() string in lib/drafter.py
    ├── historical.py         # replaces: build_historical_prompt() string in lib/drafter.py
    └── variants/             # NEW: A/B test variants

DROPPED files (with justification):

Dropped file Justification
scripts/haberdasher.py Hat assignment system retired; hat_assignment still read by Zoho push but should be removed from Zoho mapping
scripts/seed-planter.py Template drafting retired; contains dead N8N webhook URL
scripts/lead-heartbeat.py Signs-of-life monitoring; can be a CLI subcommand
scripts/lead-tracker.py Lifecycle tracking; can be a CLI subcommand
scripts/route-planner.py Geographic clustering; can be a CLI subcommand
scripts/sales-brief-generator.py Brief generation; can be a CLI subcommand
scripts/law-firm-pipeline.py Specialized; use filters in generic pipeline
scripts/shepherd-to-zoho.py Specialized; use filters in generic push

3.2 Proposed Lead Schema

Preserved fields (with rename mapping where applicable):

Current name New name Type Written by Read by
id id str harvest, historical-sweep everywhere
name name str harvest everywhere
business_type entity_type str harvest scoring, dashboard
naics_code naics str harvest scoring, drafter, Zoho
filing_date filing_date str harvest scoring, drafter, historical
email email str harvest, enrichment scoring, drafter, Zoho
phone phone str harvest, enrichment scoring, drafter, Zoho
city city str harvest scoring, drafter, Zoho
state state str harvest Zoho
is_shell is_shell bool scoring draft-backlog, enrich-backlog
score score int scoring drafter, dashboard, Zoho
priority priority float scoring dashboard, draft-backlog
readiness_weight readiness_weight float scoring dashboard
readiness_signals readiness_signals list scoring dashboard
tier tier str scoring dashboard, Zoho
domain domain str enrichment scoring, drafter
domain_phone domain_phone str enrichment scoring
website_url website_url str enrichment dashboard
website_exists website_exists bool enrichment scoring, drafter
brave_summary brave_summary str enrichment drafter
brave_phone brave_phone str enrichment scoring
county county str enrichment dashboard, route-planner
competitor_displacement competitor_displacement bool enrichment scoring
competitor_summary competitor_summary str enrichment dashboard
equipment equipment dict enrichment drafter
draft_subject draft_subject str drafter Zoho, dashboard
draft_body draft_body str drafter Zoho, dashboard
source source str draft-backlog drafter
pushed_to_zoho pushed_to_zoho bool zoho-push dashboard, zoho-push
zoho_id zoho_id str zoho-push zoho-push, mark-called
called called bool mark-called dashboard
call_count call_count int mark-called dashboard
vertical vertical str classify-verticals Zoho
first_seen first_seen str harvest dashboard
last_seen last_seen str harvest dashboard
appearance_count appearance_count int harvest dashboard
needs_redraft needs_redraft bool audit-phantom-drafts draft-backlog
citizenship citizenship str harvest scoring

New fields:

New name Type Justification
brave_phone in context Already exists but NOT passed to drafter (confirmed bug)
phone_sources list Track where each phone number came from

Dropped fields (each was classified as dead/write-only/zombie in section 1.3):

Dropped field Justification
hat_assignment Written by retired haberdasher; still read by Zoho push but should be removed from mapping
hat_name Same as hat_assignment
template_route Zombie — still computed but not used in any prompt
score_history Write-only — written by harvest, never read
enrichment_date Write-only — never read by any pipeline step
enrichment_method Write-only — never read
outreach Write-only — lifecycle.py writes but no reader found
historical_needs_followup Write-only — never read
followup_reason Dead — no writer or reader found
sub_status Dead — no writer or reader found
category_survey_email_address Dead — raw CT SoS field, never used
total_authorized_shares Dead — raw CT SoS field, never used
date_of_organization_meeting Dead — raw CT SoS field, never used
country_formation Dead — raw CT SoS field, never used
original_push_date Dead — no writer or reader found
brave_descriptions Write-only — written by brave_enrich, never read
brave_results_count Write-only — written by brave_enrich, never read
competitor_brands_found Write-only — written by enrichment, never read
All mailing_jurisdiction_* fields (6 fields) Raw CT SoS data, never used in any pipeline logic
All office_jurisdiction_* fields (5 fields) Raw CT SoS data, never used in any pipeline logic
All billing_* fields (5 fields) Raw CT SoS data, never used in any pipeline logic
All diversity flags (5 fields) Raw CT SoS data, never used in any pipeline logic
annual_report_due_date Raw CT SoS data, never used
began_transacting_in_ct Raw CT SoS data, never used
business_name_in_state_country Redundant with name
formation_place Redundant with state
state_or_territory_formation Redundant with state
accountnumber CT SoS internal, never used
naics_score Component of score, not used independently
name_score Component of score, not used independently

3.3 Proposed Config Structure

Split gardener.json (1,976 lines) into focused files:

config/
├── scoring.yaml          # replaces: pllc_fast_track, tiers, keyword_fallback,
│                         #   name_penalty_patterns, name_bonus_patterns,
│                         #   scoring_rules, recency_bonus, location_quality,
│                         #   contact_info_bonus, known_cities
├── enrichment.yaml       # replaces: brave_search section
├── llm.yaml              # replaces: llm section
│                         #   API keys → environment variables
├── zoho.yaml             # replaces: push_guardrails
│                         #   OAuth credentials → environment variables
├── pipeline.yaml         # replaces: daily_territory_scan, lifecycle, formation_timing
├── branding.yaml         # replaces: branding section
└── drops:                # removed entirely
    _meta                 #   (unnecessary in code)
    version               #   (zero callers)
    scoring_pipeline      #   (zero callers)
    agent_clustering      #   (zero callers)
    lifecycle_tracking    #   (zero callers)
    route_planner         #   (zero callers)
    formation_signals     #   (zero callers)
    template_route (197)  #   (zombie — computed but unused)

Pain points in current config: 1. 1,976 lines is too large to reason about 2. API keys in plaintext in a git-tracked file 3. 7 sections have zero callers (dead config) 4. 197 template_route entries are zombie data 5. No schema validation — typos in config fail silently


3.4 Proposed Pipeline Flow

A lead enters via pipelines/harvest.py which calls integrations/ctsos.py to fetch CT SoS filings. Each lead is scored via core/scoring.py and saved to the backlog via storage/backlog.py (which provides atomic load/save with file locking).

Enrichment runs via core/enrichment.py which calls integrations/brave.py for Brave data and does domain/phone/competitor checks in parallel. Results are saved atomically.

Drafting runs via core/drafting.py which builds context from enrichment data, selects a prompt from prompts/, and calls integrations/llm.py for the Featherless API. Drafts are saved atomically.

The operator reviews via web/dashboard.py HTML output. Push to Zoho goes through integrations/zoho.py with a confirm gate.

Concurrency: storage/backlog.py uses file locking (fcntl or filelock) for all reads and writes. No more concurrent-write corruption risk.

Failure modes: Each pipeline step saves progress after each lead (existing checkpoint pattern). If a step crashes, re-running it skips already-completed leads.

Logging: storage/metrics.py collects structured metrics (leads processed, time per lead, errors) replacing the current print-statement approach.


3.5 Proposed Extension Points

Scenario 1: A/B test two prompt variants for new-business outreach

Current architecture: 1. Edit the prompt string inside lib/drafter.py:build_draft_prompt() (line ~186) 2. Run draft 3. Manually compare results 4. No structured way to track which prompt produced which draft 5. To revert, edit the string back

Proposed architecture: 1. Create prompts/variants/new_business_v2.py with the variant prompt 2. Add variant name to config/pipeline.yaml under prompt_variants 3. Run gardener draft --variant new_business_v2 --limit 50 4. Each draft gets a prompt_variant field in the backlog 5. Compare results: gardener audit --compare-variants new_business new_business_v2 6. The prompts/ directory becomes the single source of truth for all prompt text — no more prompt strings buried in Python code

Files changed: 1 new file (prompts/variants/new_business_v2.py), 1 config edit, 0 core code changes.

Current architecture: 1 core code edit (risky), 0 new files, no tracking.

Scenario 2: Swap drafting model from DeepSeek-V3.1 to a new model

Current architecture: 1. Edit scripts/gardener.jsonllm.models.draft 2. Verify _call_featherless() in lib/drafter.py handles the new model’s response format (the GLM-5.1 bug showed this is not guaranteed) 3. No model-version tracking in the backlog — can’t tell which model produced which draft 4. If the new model produces phantom drafts, you discover it manually

Proposed architecture: 1. Edit config/llm.yamlmodels.draft 2. Each draft gets a model field in the backlog 3. integrations/llm.py has model-specific response parsers (one per model, not one parser that guesses) 4. gardener draft --model new-model-name --limit 5 for testing 5. audit-phantom-drafts equivalent runs automatically after each batch

Files changed: 1 config edit, potentially 1 new parser in integrations/llm.py.

Current architecture: 1 config edit, 1 potential parser fix in lib/drafter.py (all models share one parser), no tracking.

Scenario 3: Introduce “renewal leads” from a new data source

Current architecture: 1. Add new fields to backlog JSON (no schema enforcement) 2. Create a new script (copy of harvest.py) for the new data source 3. Add scoring logic to lib/scoring.py (edit existing functions) 4. Add prompt variant by editing build_draft_prompt() in lib/drafter.py 5. Add new CLI entry point 6. Update morning-brief.py to call the new script 7. No way to distinguish renewal leads from new-business leads in the backlog without a convention

Proposed architecture: 1. Add lead_type field to schema (values: “new_business”, “historical”, “renewal”) 2. Create pipelines/renewal.py (new entry point, reuses core/ modules) 3. Add renewal scoring rules to config/scoring.yaml 4. Create prompts/renewal.py (new prompt template) 5. The pipeline automatically routes based on lead_type 6. storage/backlog.py validates schema on save

Files changed: 1 new pipeline file, 1 new prompt file, 1 config addition, 1 schema addition.

Current architecture: 1 new script (copy-paste of harvest.py), edits to 3 existing files (scoring.py, drafter.py, morning-brief.py), no schema enforcement.

Scenario 4: Add LinkedIn presence as an enrichment signal

Current architecture: 1. Add linkedin_url field to backlog JSON (no schema enforcement, no validation) 2. Add LinkedIn scraping function to lib/enrichment.py (already 327+ lines) 3. Add linkedin_url to build_context_object() in lib/drafter.py 4. Add linkedin signal to calculate_readiness() in lib/scoring.py 5. Update calculate_priority() weight calculation 6. No way to know which leads have LinkedIn data without scanning the backlog

Proposed architecture: 1. Add linkedin_url and linkedin_signal to schema in storage/backlog.py (validated) 2. Create integrations/linkedin.py (new integration module) 3. Add enrichment step to core/enrichment.py (calls the new integration) 4. Add signal to core/scoring.py with weight in config/scoring.yaml 5. Drafter automatically includes it via build_context_object() which iterates all enrichment signals 6. New signal is tracked in readiness_signals list

Files changed: 1 new integration file, 1 enrichment step addition, 1 config addition, 1 schema addition.

Current architecture: edits to 3 existing files (enrichment.py, drafter.py, scoring.py), no schema enforcement, no separation of concerns.


3.6 Migration Strategy

Backlog: The existing cumulative-backlog.json migrates via a transformation script that drops dead fields, renames inconsistent fields, and adds missing schema fields with defaults. The script produces a new backlog file and a diff report. Old file is archived, not deleted.

Coexistence: The new codebase lives in gardener/ alongside gardener-fork/. A --backlog-path flag lets either codebase point at either backlog file. During migration, both codebases read the same file. Once the new codebase is validated, gardener-fork/ is archived.

Rollback: The migration script is idempotent and reversible. The old backlog file is never modified in place — a copy is made before transformation. If the new codebase produces bad drafts, switch the --backlog-path flag back to the old file and run the old codebase.

Timeline: Not specified here — this is a sketch, not a plan. The operator and Claude will refine it together.


3.7 What This Proposal Does NOT Solve

Reply rates: A cleaner codebase does not produce better cold emails. The substance injection improvements (brave_summary, equipment_talk_track) were the right move, but architecture doesn’t fix copy.

Audience targeting: Whether brand-new PLLCs are the right audience for copier outreach is a strategy question, not an architecture question. No refactor changes this.

CT SoS data quality: The pipeline trusts NAICS codes from CT SoS, which are often wrong. Garbage in, garbage out.

LLM hallucination: The drafter trusts brave_summary content without validation. A cleaner codebase still passes unvalidated enrichment data to the LLM.

Zoho automation decay: If Zoho automation sequences depend on Hat_Assignment and no new leads get hats, the Zoho side degrades. This proposal drops hat_assignment but doesn’t fix the Zoho automation.

Single-operator bus factor: This codebase has one operator. A cleaner architecture doesn’t create a second operator.

The phone coverage gap: brave_phone is collected but NOT passed to build_context_object() in lib/drafter.py. This is a confirmed bug that exists regardless of architecture.