Files
B.Tech-Project-III/thirdeye/docs/meet_extension.md
2026-04-05 00:43:23 +05:30

85 KiB
Raw Permalink Blame History

ThirdEye — Meet Extension Milestones (14→16)

Prerequisite: Milestones 013 must be COMPLETE and PASSING. This feature layers on top of the existing working system. Same rule: Do NOT skip milestones. Do NOT skip tests. Every test must PASS before moving to the next milestone.


WHAT THIS ADDS

A Google Meet Chrome extension that:

  1. Captures live meeting audio and transcribes it in-browser (Web Speech API — free, no API key)
  2. Sends transcript chunks every 30 seconds to your existing ThirdEye FastAPI backend
  3. A new backend agent extracts meeting signals (decisions, action items, blockers, risks) and stores them in ChromaDB alongside your existing chat/doc/link signals
  4. New Telegram commands: /meetsum, /meetask, /meetmatch — summarize meetings, query them, and surface connections to prior team discussions

The integration is seamless: Meet transcripts become first-class signals in your existing knowledge graph. When you /meetmatch, ThirdEye finds where your meeting decisions align with or contradict signals from your Telegram groups.


PRE-WORK: Dependencies & Config Updates

Step 0.1 — No new pip packages needed

The transcription is done in the browser via the Web Speech API (Chrome built-in, free, no key). The backend receives plain text. Your existing FastAPI, ChromaDB, and provider router handle everything else.

Optionally, for better transcription quality if Web Speech API is unavailable, add Groq's Whisper (free):

# Optional only — only needed if you want server-side Whisper fallback
# Groq already supports Whisper via your existing GROQ_API_KEY
pip install python-multipart  # for file upload endpoint (if using audio fallback)

Add to thirdeye/requirements.txt (optional Whisper fallback only):

python-multipart==0.0.20

Step 0.2 — Add new env vars

Append to thirdeye/.env:

# Google Meet Extension
MEET_INGEST_SECRET=thirdeye_meet_secret_change_me   # Shared secret between extension and backend
MEET_DEFAULT_GROUP_ID=meet_sessions                  # ChromaDB group for all meet signals
ENABLE_MEET_INGESTION=true
MEET_CROSS_REF_GROUPS=acme_dev,acme_product          # Comma-separated group IDs to cross-reference

Step 0.3 — Update config.py

Add these lines at the bottom of thirdeye/backend/config.py:

# Google Meet Extension
MEET_INGEST_SECRET = os.getenv("MEET_INGEST_SECRET", "thirdeye_meet_secret_change_me")
MEET_DEFAULT_GROUP_ID = os.getenv("MEET_DEFAULT_GROUP_ID", "meet_sessions")
ENABLE_MEET_INGESTION = os.getenv("ENABLE_MEET_INGESTION", "true").lower() == "true"
MEET_CROSS_REF_GROUPS = [
    g.strip() for g in os.getenv("MEET_CROSS_REF_GROUPS", "").split(",") if g.strip()
]

MILESTONE 14: Google Meet Chrome Extension (120%)

Goal: A working Chrome extension that joins any Google Meet, captures live audio via the Web Speech API, buffers the transcript, and POSTs chunks every 30 seconds to the ThirdEye backend at /api/meet/ingest. A small popup shows recording status and meeting ID.

Step 14.1 — Create the extension folder

mkdir -p thirdeye/meet_extension

Step 14.2 — Create manifest.json

Create file: thirdeye/meet_extension/manifest.json

{
  "manifest_version": 3,
  "name": "ThirdEye Meet Recorder",
  "version": "1.0.0",
  "description": "Captures Google Meet transcripts and sends them to your ThirdEye knowledge base.",
  "permissions": [
    "activeTab",
    "storage",
    "scripting"
  ],
  "host_permissions": [
    "https://meet.google.com/*"
  ],
  "background": {
    "service_worker": "background.js"
  },
  "content_scripts": [
    {
      "matches": ["https://meet.google.com/*"],
      "js": ["content.js"],
      "run_at": "document_idle"
    }
  ],
  "action": {
    "default_popup": "popup.html",
    "default_title": "ThirdEye Meet Recorder"
  },
  "icons": {
    "16": "icon16.png",
    "48": "icon48.png",
    "128": "icon128.png"
  }
}

Step 14.3 — Create content.js (the core recorder)

Create file: thirdeye/meet_extension/content.js

/**
 * ThirdEye Meet Recorder — Content Script
 * Injected into meet.google.com pages.
 * Uses Web Speech API (Chrome built-in) for live transcription.
 * Buffers transcript and POSTs chunks to ThirdEye backend every CHUNK_INTERVAL_MS.
 */

const CHUNK_INTERVAL_MS = 30000; // Send a chunk every 30 seconds
const MAX_BUFFER_CHARS = 8000;   // ~2000 tokens — safe for LLM processing

let recognition = null;
let isRecording = false;
let transcriptBuffer = "";
let meetingId = null;
let backendUrl = "http://localhost:8000";
let ingestSecret = "thirdeye_meet_secret_change_me";
let groupId = "meet_sessions";
let chunkTimer = null;
let chunkCount = 0;

// --- Helpers ---

function getMeetingId() {
  // Extract meeting code from URL: meet.google.com/abc-defg-hij
  const match = window.location.pathname.match(/\/([a-z]{3}-[a-z]{4}-[a-z]{3})/i);
  if (match) return match[1];
  // Fallback: use timestamp-based ID
  return `meet_${Date.now()}`;
}

function getParticipantName() {
  // Try to find the user's own name from the Meet DOM
  const nameEl = document.querySelector('[data-self-name]');
  if (nameEl) return nameEl.getAttribute('data-self-name');
  const profileEl = document.querySelector('img[data-iml]');
  if (profileEl && profileEl.alt) return profileEl.alt;
  return "Unknown";
}

// --- Transport ---

async function sendChunkToBackend(text, isFinal = false) {
  if (!text || text.trim().length < 10) return; // Don't send near-empty chunks

  const payload = {
    meeting_id: meetingId,
    group_id: groupId,
    chunk_index: chunkCount++,
    text: text.trim(),
    speaker: getParticipantName(),
    timestamp: new Date().toISOString(),
    is_final: isFinal,
  };

  try {
    const res = await fetch(`${backendUrl}/api/meet/ingest`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X-ThirdEye-Secret": ingestSecret,
      },
      body: JSON.stringify(payload),
    });

    if (!res.ok) {
      console.warn(`[ThirdEye] Backend rejected chunk: ${res.status}`);
    } else {
      console.log(`[ThirdEye] Chunk ${payload.chunk_index} sent (${text.length} chars)`);
      chrome.runtime.sendMessage({
        type: "CHUNK_SENT",
        chunkIndex: payload.chunk_index,
        charCount: text.length,
        meetingId: meetingId,
      });
    }
  } catch (err) {
    console.warn(`[ThirdEye] Failed to send chunk: ${err.message}`);
    // Buffer is NOT cleared on failure — next flush will include this text
    transcriptBuffer = text + "\n" + transcriptBuffer;
  }
}

// --- Periodic flush ---

function flushBuffer() {
  if (transcriptBuffer.trim().length > 0) {
    sendChunkToBackend(transcriptBuffer);
    transcriptBuffer = "";
  }
}

function startChunkTimer() {
  if (chunkTimer) clearInterval(chunkTimer);
  chunkTimer = setInterval(flushBuffer, CHUNK_INTERVAL_MS);
}

function stopChunkTimer() {
  if (chunkTimer) {
    clearInterval(chunkTimer);
    chunkTimer = null;
  }
}

// --- Web Speech API ---

function initSpeechRecognition() {
  const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

  if (!SpeechRecognition) {
    console.error("[ThirdEye] Web Speech API not available in this browser.");
    chrome.runtime.sendMessage({ type: "ERROR", message: "Web Speech API not supported." });
    return null;
  }

  const rec = new SpeechRecognition();
  rec.continuous = true;       // Don't stop after first pause
  rec.interimResults = true;   // Get partial results while speaking
  rec.lang = "en-US";          // Change if needed
  rec.maxAlternatives = 1;

  rec.onstart = () => {
    console.log("[ThirdEye] Speech recognition started.");
    chrome.runtime.sendMessage({ type: "STATUS", status: "recording", meetingId });
  };

  rec.onresult = (event) => {
    let newText = "";
    for (let i = event.resultIndex; i < event.results.length; i++) {
      const result = event.results[i];
      if (result.isFinal) {
        newText += result[0].transcript + " ";
      }
      // We only accumulate FINAL results to avoid duplicates from interim results
    }

    if (newText.trim()) {
      transcriptBuffer += newText;
      // Guard: if buffer is getting huge, flush early
      if (transcriptBuffer.length > MAX_BUFFER_CHARS) {
        flushBuffer();
      }
    }
  };

  rec.onerror = (event) => {
    console.warn(`[ThirdEye] Speech recognition error: ${event.error}`);
    if (event.error === "not-allowed") {
      chrome.runtime.sendMessage({
        type: "ERROR",
        message: "Microphone permission denied. Allow mic access in Chrome settings.",
      });
    } else if (event.error !== "no-speech") {
      // Restart on non-fatal errors
      setTimeout(() => { if (isRecording) rec.start(); }, 1000);
    }
  };

  rec.onend = () => {
    console.log("[ThirdEye] Speech recognition ended.");
    if (isRecording) {
      // Auto-restart: Chrome's Web Speech API stops after ~60s of silence
      setTimeout(() => { if (isRecording) rec.start(); }, 250);
    }
  };

  return rec;
}

// --- Public controls (called from popup via chrome.tabs.sendMessage) ---

async function startRecording(config = {}) {
  if (isRecording) return;

  // Load config from storage or use provided values
  const stored = await chrome.storage.sync.get(["backendUrl", "ingestSecret", "groupId"]);
  backendUrl = config.backendUrl || stored.backendUrl || "http://localhost:8000";
  ingestSecret = config.ingestSecret || stored.ingestSecret || "thirdeye_meet_secret_change_me";
  groupId = config.groupId || stored.groupId || "meet_sessions";

  meetingId = getMeetingId();
  chunkCount = 0;
  transcriptBuffer = "";
  isRecording = true;

  recognition = initSpeechRecognition();
  if (!recognition) {
    isRecording = false;
    return;
  }

  recognition.start();
  startChunkTimer();

  // Notify backend that a new meeting has started
  try {
    await fetch(`${backendUrl}/api/meet/start`, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "X-ThirdEye-Secret": ingestSecret,
      },
      body: JSON.stringify({
        meeting_id: meetingId,
        group_id: groupId,
        started_at: new Date().toISOString(),
        speaker: getParticipantName(),
      }),
    });
  } catch (err) {
    console.warn("[ThirdEye] Could not notify backend of meeting start:", err.message);
  }
}

async function stopRecording() {
  if (!isRecording) return;

  isRecording = false;
  stopChunkTimer();

  if (recognition) {
    recognition.stop();
    recognition = null;
  }

  // Send the final buffered chunk marked as final
  await sendChunkToBackend(transcriptBuffer, true);
  transcriptBuffer = "";

  chrome.runtime.sendMessage({ type: "STATUS", status: "stopped", meetingId });
  console.log("[ThirdEye] Recording stopped.");
}

// --- Message listener (from popup) ---

chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => {
  if (msg.type === "START_RECORDING") {
    startRecording(msg.config || {}).then(() => sendResponse({ ok: true }));
    return true; // async
  }
  if (msg.type === "STOP_RECORDING") {
    stopRecording().then(() => sendResponse({ ok: true }));
    return true;
  }
  if (msg.type === "GET_STATUS") {
    sendResponse({
      isRecording,
      meetingId,
      bufferLength: transcriptBuffer.length,
      chunkCount,
    });
    return true;
  }
});

console.log("[ThirdEye] Content script loaded on", window.location.href);

Step 14.4 — Create popup.html

Create file: thirdeye/meet_extension/popup.html

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>ThirdEye Meet</title>
  <style>
    * { box-sizing: border-box; margin: 0; padding: 0; }
    body {
      width: 320px;
      font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", sans-serif;
      font-size: 13px;
      background: #0f172a;
      color: #e2e8f0;
    }
    .header {
      background: #1e293b;
      padding: 12px 16px;
      display: flex;
      align-items: center;
      gap: 8px;
      border-bottom: 1px solid #334155;
    }
    .logo { font-weight: 700; font-size: 15px; color: #38bdf8; }
    .badge {
      font-size: 10px;
      background: #0ea5e9;
      color: white;
      padding: 2px 6px;
      border-radius: 9999px;
      font-weight: 600;
    }
    .section { padding: 12px 16px; border-bottom: 1px solid #1e293b; }
    .status-row {
      display: flex;
      align-items: center;
      justify-content: space-between;
      margin-bottom: 8px;
    }
    .status-dot {
      width: 10px; height: 10px;
      border-radius: 50%;
      display: inline-block;
      margin-right: 6px;
    }
    .status-dot.idle { background: #475569; }
    .status-dot.recording { background: #ef4444; animation: pulse 1.2s infinite; }
    @keyframes pulse {
      0%, 100% { opacity: 1; } 50% { opacity: 0.4; }
    }
    .meeting-id {
      font-size: 11px;
      color: #94a3b8;
      font-family: monospace;
      margin-top: 4px;
    }
    .stat { font-size: 11px; color: #64748b; }
    .btn {
      width: 100%;
      padding: 9px;
      border: none;
      border-radius: 6px;
      font-size: 13px;
      font-weight: 600;
      cursor: pointer;
      transition: opacity 0.15s;
    }
    .btn:hover { opacity: 0.85; }
    .btn-start { background: #0ea5e9; color: white; }
    .btn-stop { background: #ef4444; color: white; }
    .btn-disabled { background: #334155; color: #64748b; cursor: not-allowed; }
    label { display: block; font-size: 11px; color: #94a3b8; margin-bottom: 3px; }
    input {
      width: 100%;
      background: #1e293b;
      border: 1px solid #334155;
      color: #e2e8f0;
      border-radius: 5px;
      padding: 6px 8px;
      font-size: 12px;
      margin-bottom: 8px;
    }
    input:focus { outline: none; border-color: #0ea5e9; }
    .save-btn {
      background: #1e293b;
      border: 1px solid #334155;
      color: #94a3b8;
      border-radius: 5px;
      padding: 5px 10px;
      font-size: 11px;
      cursor: pointer;
      width: 100%;
    }
    .save-btn:hover { border-color: #0ea5e9; color: #0ea5e9; }
    .not-meet {
      padding: 24px 16px;
      text-align: center;
      color: #64748b;
      font-size: 12px;
    }
    .not-meet strong { display: block; color: #94a3b8; margin-bottom: 6px; }
  </style>
</head>
<body>
  <div class="header">
    <span class="logo">👁 ThirdEye</span>
    <span class="badge">Meet</span>
  </div>

  <div id="not-meet" class="not-meet" style="display:none;">
    <strong>Not on Google Meet</strong>
    Open a meeting at meet.google.com to start recording.
  </div>

  <div id="meet-ui" style="display:none;">
    <div class="section">
      <div class="status-row">
        <div>
          <span class="status-dot idle" id="status-dot"></span>
          <span id="status-label">Idle</span>
        </div>
        <span class="stat" id="chunks-label">0 chunks sent</span>
      </div>
      <div class="meeting-id" id="meeting-id-label">Meeting ID: —</div>
    </div>

    <div class="section">
      <button class="btn btn-start" id="main-btn">▶ Start Recording</button>
    </div>

    <div class="section">
      <label>Backend URL</label>
      <input type="text" id="backend-url" placeholder="http://localhost:8000" />
      <label>Group ID</label>
      <input type="text" id="group-id" placeholder="meet_sessions" />
      <label>Ingest Secret</label>
      <input type="password" id="ingest-secret" placeholder="thirdeye_meet_secret_change_me" />
      <button class="save-btn" id="save-btn">Save Settings</button>
    </div>
  </div>

  <script src="popup.js"></script>
</body>
</html>

Step 14.5 — Create popup.js

Create file: thirdeye/meet_extension/popup.js

let currentTab = null;
let isRecording = false;

async function getActiveTab() {
  const tabs = await chrome.tabs.query({ active: true, currentWindow: true });
  return tabs[0] || null;
}

function isMeetTab(tab) {
  return tab && tab.url && tab.url.includes("meet.google.com");
}

async function loadSettings() {
  const stored = await chrome.storage.sync.get(["backendUrl", "ingestSecret", "groupId"]);
  document.getElementById("backend-url").value = stored.backendUrl || "http://localhost:8000";
  document.getElementById("ingest-secret").value = stored.ingestSecret || "";
  document.getElementById("group-id").value = stored.groupId || "meet_sessions";
}

async function saveSettings() {
  const backendUrl = document.getElementById("backend-url").value.trim();
  const ingestSecret = document.getElementById("ingest-secret").value.trim();
  const groupId = document.getElementById("group-id").value.trim();
  await chrome.storage.sync.set({ backendUrl, ingestSecret, groupId });
  const btn = document.getElementById("save-btn");
  btn.textContent = "✓ Saved";
  setTimeout(() => { btn.textContent = "Save Settings"; }, 1500);
}

function setRecordingUI(recording, meetingId, chunks) {
  isRecording = recording;
  const dot = document.getElementById("status-dot");
  const label = document.getElementById("status-label");
  const btn = document.getElementById("main-btn");
  const meetLabel = document.getElementById("meeting-id-label");
  const chunksLabel = document.getElementById("chunks-label");

  dot.className = "status-dot " + (recording ? "recording" : "idle");
  label.textContent = recording ? "Recording…" : "Idle";
  btn.className = "btn " + (recording ? "btn-stop" : "btn-start");
  btn.textContent = recording ? "■ Stop Recording" : "▶ Start Recording";
  meetLabel.textContent = meetingId ? `Meeting ID: ${meetingId}` : "Meeting ID: —";
  chunksLabel.textContent = `${chunks || 0} chunks sent`;
}

async function getTabStatus() {
  if (!currentTab) return;
  try {
    const status = await chrome.tabs.sendMessage(currentTab.id, { type: "GET_STATUS" });
    setRecordingUI(status.isRecording, status.meetingId, status.chunkCount);
  } catch {
    setRecordingUI(false, null, 0);
  }
}

async function handleMainBtn() {
  if (!currentTab) return;
  const stored = await chrome.storage.sync.get(["backendUrl", "ingestSecret", "groupId"]);

  if (!isRecording) {
    await chrome.tabs.sendMessage(currentTab.id, {
      type: "START_RECORDING",
      config: {
        backendUrl: stored.backendUrl || "http://localhost:8000",
        ingestSecret: stored.ingestSecret || "thirdeye_meet_secret_change_me",
        groupId: stored.groupId || "meet_sessions",
      },
    });
  } else {
    await chrome.tabs.sendMessage(currentTab.id, { type: "STOP_RECORDING" });
  }

  // Poll for updated status
  setTimeout(getTabStatus, 500);
}

// Listen for status updates from content script
chrome.runtime.onMessage.addListener((msg) => {
  if (msg.type === "STATUS") {
    setRecordingUI(msg.status === "recording", msg.meetingId, null);
  }
  if (msg.type === "CHUNK_SENT") {
    document.getElementById("chunks-label").textContent = `${msg.chunkIndex + 1} chunks sent`;
    document.getElementById("meeting-id-label").textContent = `Meeting ID: ${msg.meetingId}`;
  }
});

// Init
document.addEventListener("DOMContentLoaded", async () => {
  currentTab = await getActiveTab();

  if (!isMeetTab(currentTab)) {
    document.getElementById("not-meet").style.display = "block";
    document.getElementById("meet-ui").style.display = "none";
    return;
  }

  document.getElementById("meet-ui").style.display = "block";
  document.getElementById("not-meet").style.display = "none";

  await loadSettings();
  await getTabStatus();

  document.getElementById("main-btn").addEventListener("click", handleMainBtn);
  document.getElementById("save-btn").addEventListener("click", saveSettings);

  // Poll status every 5s while popup is open
  setInterval(getTabStatus, 5000);
});

Step 14.6 — Create background.js

Create file: thirdeye/meet_extension/background.js

/**
 * ThirdEye Meet — Service Worker (Manifest V3 background)
 * Minimal: just keeps the extension alive and relays messages.
 */
chrome.runtime.onInstalled.addListener(() => {
  console.log("[ThirdEye] Meet extension installed.");
});

Step 14.7 — Create placeholder icons

# Create simple placeholder icons (replace with real icons before production)
# These are 1x1 pixel transparent PNGs encoded as base64

python3 -c "
import base64, os

# Minimal 1x1 white PNG
PNG_1x1 = base64.b64decode(
    'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI6QAAAABJRU5ErkJggg=='
)

for size in [16, 48, 128]:
    with open(f'meet_extension/icon{size}.png', 'wb') as f:
        f.write(PNG_1x1)
    print(f'  Created icon{size}.png (placeholder)')
"

Note: Replace icons with proper 16×16, 48×48, and 128×128 images before final demo. Use any PNG with a lens/eye motif.

Step 14.8 — Add the /api/meet/ingest and /api/meet/start endpoints to FastAPI

Add to thirdeye/backend/api/routes.py (append after existing routes):

# ─────────────────────────────────────────────
#  Google Meet Ingestion Endpoints
# ─────────────────────────────────────────────

from pydantic import BaseModel
from backend.config import MEET_INGEST_SECRET, ENABLE_MEET_INGESTION

class MeetStartPayload(BaseModel):
    meeting_id: str
    group_id: str = "meet_sessions"
    started_at: str
    speaker: str = "Unknown"

class MeetChunkPayload(BaseModel):
    meeting_id: str
    group_id: str = "meet_sessions"
    chunk_index: int
    text: str
    speaker: str = "Unknown"
    timestamp: str
    is_final: bool = False

def _verify_meet_secret(request: Request):
    secret = request.headers.get("X-ThirdEye-Secret", "")
    if secret != MEET_INGEST_SECRET:
        from fastapi import HTTPException
        raise HTTPException(status_code=403, detail="Invalid Meet ingest secret")

@app.post("/api/meet/start")
async def meet_start(payload: MeetStartPayload, request: Request):
    """Called by extension when a new meeting begins."""
    _verify_meet_secret(request)
    if not ENABLE_MEET_INGESTION:
        return {"ok": False, "reason": "Meet ingestion disabled"}

    # Store a meeting-started signal immediately
    from backend.db.chroma import store_signals
    from datetime import datetime
    import uuid

    signal = {
        "id": str(uuid.uuid4()),
        "type": "meet_started",
        "summary": f"Meeting {payload.meeting_id} started by {payload.speaker}",
        "raw_quote": "",
        "severity": "low",
        "status": "active",
        "sentiment": "neutral",
        "urgency": "none",
        "entities": [f"@{payload.speaker}", f"#{payload.meeting_id}"],
        "keywords": ["meeting", "started", payload.meeting_id],
        "timestamp": payload.started_at,
        "group_id": payload.group_id,
        "lens": "meet",
        "meeting_id": payload.meeting_id,
    }
    store_signals(payload.group_id, [signal])
    return {"ok": True, "meeting_id": payload.meeting_id}


@app.post("/api/meet/ingest")
async def meet_ingest(payload: MeetChunkPayload, request: Request, background_tasks: BackgroundTasks):
    """Called by extension every 30s with a transcript chunk."""
    _verify_meet_secret(request)
    if not ENABLE_MEET_INGESTION:
        return {"ok": False, "reason": "Meet ingestion disabled"}

    if len(payload.text.strip()) < 10:
        return {"ok": True, "skipped": True, "reason": "chunk too short"}

    # Process asynchronously so the extension gets a fast response
    from backend.agents.meet_ingestor import process_meet_chunk
    background_tasks.add_task(
        process_meet_chunk,
        payload.meeting_id,
        payload.group_id,
        payload.chunk_index,
        payload.text,
        payload.speaker,
        payload.timestamp,
        payload.is_final,
    )

    return {
        "ok": True,
        "meeting_id": payload.meeting_id,
        "chunk_index": payload.chunk_index,
        "queued": True,
    }


@app.get("/api/meet/meetings")
async def list_meetings():
    """List all recorded meetings with their signal counts."""
    from backend.db.chroma import get_all_signals
    from backend.config import MEET_DEFAULT_GROUP_ID
    all_signals = get_all_signals(MEET_DEFAULT_GROUP_ID)
    meetings = {}
    for sig in all_signals:
        mid = sig.get("meeting_id") or sig.get("metadata", {}).get("meeting_id", "unknown")
        if mid not in meetings:
            meetings[mid] = {"meeting_id": mid, "signal_count": 0, "types": {}}
        meetings[mid]["signal_count"] += 1
        t = sig.get("type", "unknown")
        meetings[mid]["types"][t] = meetings[mid]["types"].get(t, 0) + 1
    return {"meetings": list(meetings.values())}


@app.get("/api/meet/meetings/{meeting_id}/signals")
async def get_meeting_signals(meeting_id: str):
    """Get all signals for a specific meeting."""
    from backend.db.chroma import query_signals
    from backend.config import MEET_DEFAULT_GROUP_ID
    results = query_signals(MEET_DEFAULT_GROUP_ID, meeting_id, n_results=50)
    return {"meeting_id": meeting_id, "signals": results}

Important: If you're importing Request and BackgroundTasks from FastAPI, make sure these are already imported in routes.py. Add them to the imports block at the top if needed:

from fastapi import BackgroundTasks, Request

Step 14.9 — Load the extension in Chrome

1. Open Chrome → chrome://extensions/
2. Enable "Developer mode" (top-right toggle)
3. Click "Load unpacked"
4. Select the thirdeye/meet_extension/ folder
5. The ThirdEye Meet extension icon should appear in your toolbar
6. Open any Google Meet → click the extension icon

TEST MILESTONE 14

Create file: thirdeye/scripts/test_m14.py

"""
Test Milestone 14: Meet extension backend endpoints.
Tests the /api/meet/start and /api/meet/ingest endpoints directly (no Chrome needed).
"""
import asyncio
import os
import sys
import json
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))


async def test_manifest_valid():
    """Test extension manifest.json is valid."""
    import json
    manifest_path = os.path.join(
        os.path.dirname(__file__), '..', 'meet_extension', 'manifest.json'
    )
    assert os.path.exists(manifest_path), "manifest.json not found at meet_extension/manifest.json"
    with open(manifest_path) as f:
        manifest = json.load(f)

    assert manifest.get("manifest_version") == 3, "Must be Manifest V3"
    assert "meet.google.com" in str(manifest.get("host_permissions", [])), \
        "Missing meet.google.com host permission"
    assert manifest.get("background", {}).get("service_worker"), "Missing service_worker"
    assert manifest.get("action", {}).get("default_popup"), "Missing popup"
    print("  ✅ manifest.json is valid MV3")


async def test_extension_files_exist():
    """Test all required extension files exist."""
    base = os.path.join(os.path.dirname(__file__), '..', 'meet_extension')
    required = ["manifest.json", "content.js", "popup.html", "popup.js", "background.js"]
    for filename in required:
        path = os.path.join(base, filename)
        assert os.path.exists(path), f"Missing extension file: {filename}"
        print(f"  ✅ {filename} exists")


async def test_meet_start_endpoint():
    """Test /api/meet/start returns 200 with correct secret."""
    import httpx
    from backend.config import MEET_INGEST_SECRET

    async with httpx.AsyncClient(base_url="http://localhost:8000") as client:
        # Test with correct secret
        resp = await client.post(
            "/api/meet/start",
            json={
                "meeting_id": "test-meet-m14",
                "group_id": "test_meet_m14",
                "started_at": "2026-03-21T10:00:00Z",
                "speaker": "Test User",
            },
            headers={"X-ThirdEye-Secret": MEET_INGEST_SECRET},
        )
        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}: {resp.text}"
        data = resp.json()
        assert data.get("ok") is True
        print(f"  ✅ /api/meet/start returned ok=True for meeting test-meet-m14")

        # Test with wrong secret → should get 403
        resp_bad = await client.post(
            "/api/meet/start",
            json={"meeting_id": "fake", "group_id": "fake", "started_at": "2026-03-21T10:00:00Z"},
            headers={"X-ThirdEye-Secret": "wrong_secret"},
        )
        assert resp_bad.status_code == 403, f"Expected 403 for bad secret, got {resp_bad.status_code}"
        print("  ✅ Bad secret correctly rejected with 403")


async def test_meet_ingest_endpoint():
    """Test /api/meet/ingest accepts a transcript chunk and queues processing."""
    import httpx
    from backend.config import MEET_INGEST_SECRET

    async with httpx.AsyncClient(base_url="http://localhost:8000") as client:
        resp = await client.post(
            "/api/meet/ingest",
            json={
                "meeting_id": "test-meet-m14",
                "group_id": "test_meet_m14",
                "chunk_index": 0,
                "text": "We decided to go with PostgreSQL for the primary database. "
                        "Alex will set up the schema by Thursday. "
                        "The migration scripts need to be reviewed before deployment.",
                "speaker": "Test User",
                "timestamp": "2026-03-21T10:01:00Z",
                "is_final": False,
            },
            headers={"X-ThirdEye-Secret": MEET_INGEST_SECRET},
            timeout=10.0,
        )
        assert resp.status_code == 200, f"Expected 200, got {resp.status_code}: {resp.text}"
        data = resp.json()
        assert data.get("ok") is True
        assert data.get("queued") is True
        print(f"  ✅ /api/meet/ingest chunk accepted and queued")

    # Wait briefly for background task to process
    await asyncio.sleep(5)

    # Verify the meeting appears in /api/meet/meetings
    async with httpx.AsyncClient(base_url="http://localhost:8000") as client:
        resp = await client.get("/api/meet/meetings")
        assert resp.status_code == 200
        data = resp.json()
        meetings = data.get("meetings", [])
        ids = [m["meeting_id"] for m in meetings]
        assert "test-meet-m14" in ids, f"Meeting test-meet-m14 not found in {ids}"
        print(f"  ✅ Meeting test-meet-m14 visible in /api/meet/meetings")


async def test_meet_skip_short_chunk():
    """Test that very short chunks are gracefully skipped."""
    import httpx
    from backend.config import MEET_INGEST_SECRET

    async with httpx.AsyncClient(base_url="http://localhost:8000") as client:
        resp = await client.post(
            "/api/meet/ingest",
            json={
                "meeting_id": "test-meet-m14",
                "group_id": "test_meet_m14",
                "chunk_index": 99,
                "text": "Uh",   # Too short
                "speaker": "Test User",
                "timestamp": "2026-03-21T10:02:00Z",
                "is_final": False,
            },
            headers={"X-ThirdEye-Secret": MEET_INGEST_SECRET},
        )
        assert resp.status_code == 200
        data = resp.json()
        assert data.get("skipped") is True, "Expected short chunk to be skipped"
        print(f"  ✅ Short chunk correctly skipped")


async def main():
    print("Running Milestone 14 tests...\n")
    print("NOTE: The FastAPI server must be running: python run_api.py\n")

    await test_manifest_valid()
    await test_extension_files_exist()

    try:
        await test_meet_start_endpoint()
        await test_meet_ingest_endpoint()
        await test_meet_skip_short_chunk()
        print("\n🎉 MILESTONE 14 PASSED — Extension files valid, backend endpoints working")
    except Exception as e:
        print(f"\n💥 MILESTONE 14 FAILED: {e}")
        print("   Make sure: python run_api.py is running before running this test")
        raise


asyncio.run(main())

Run: cd thirdeye && python run_api.py & # in one terminal Then: cd thirdeye && python scripts/test_m14.py

Expected output:

  ✅ manifest.json is valid MV3
  ✅ content.js exists
  ✅ popup.html exists
  ✅ popup.js exists
  ✅ background.js exists
  ✅ /api/meet/start returned ok=True for meeting test-meet-m14
  ✅ Bad secret correctly rejected with 403
  ✅ /api/meet/ingest chunk accepted and queued
  ✅ Meeting test-meet-m14 visible in /api/meet/meetings
  ✅ Short chunk correctly skipped

🎉 MILESTONE 14 PASSED — Extension files valid, backend endpoints working

MILESTONE 15: Meet Transcript Processing Agent (125%)

Goal: A new meet_ingestor.py agent that processes raw transcript chunks — extracting meeting-specific signals (decisions, action items, blockers, open questions, risks) — and stores them in ChromaDB. These signals are queryable alongside all existing chat/document/link signals. A final-chunk handler produces a full meeting summary signal.

Step 15.1 — Create the Meet Ingestor agent

Create file: thirdeye/backend/agents/meet_ingestor.py

"""
Meet Ingestor Agent
Processes raw Google Meet transcript chunks and extracts structured signals.

Signal types produced:
  meet_decision    — A decision made during the meeting
  meet_action_item — A task assigned to someone
  meet_blocker     — A blocker or dependency raised
  meet_risk        — A risk or concern identified
  meet_open_q      — An unresolved question left open
  meet_summary     — Full meeting summary (emitted on is_final=True)
  meet_chunk_raw   — Raw transcript chunk (always stored, for full-text search)
"""
import asyncio
import json
import logging
import uuid
from datetime import datetime

from backend.providers import call_llm
from backend.db.chroma import store_signals

logger = logging.getLogger("thirdeye.agents.meet_ingestor")


# ─── Extraction prompt ───────────────────────────────────────────────────────

EXTRACTION_SYSTEM_PROMPT = """You are an expert meeting analyst. You receive raw transcript chunks from a Google Meet recording and extract structured signals.

Extract ONLY signals that are clearly present. Do NOT hallucinate or infer beyond what is stated.

Return ONLY a valid JSON object with this exact structure:
{
  "decisions": [
    {"text": "...", "owner": "@name or null", "confidence": "high|medium|low"}
  ],
  "action_items": [
    {"text": "...", "owner": "@name or null", "due": "date string or null", "confidence": "high|medium|low"}
  ],
  "blockers": [
    {"text": "...", "blocking_what": "...", "confidence": "high|medium|low"}
  ],
  "risks": [
    {"text": "...", "severity": "high|medium|low", "confidence": "high|medium|low"}
  ],
  "open_questions": [
    {"text": "...", "confidence": "high|medium|low"}
  ]
}

Rules:
- If a category has nothing, use an empty array []
- owner must start with @ if it's a person's name (e.g. "@Alex")
- text must be a clear, standalone sentence — not a fragment
- Only include confidence "high" if the signal is unambiguous
- Do NOT reproduce filler words, pleasantries, or off-topic banter
- Return JSON only — no markdown, no preamble, no explanation"""


SUMMARY_SYSTEM_PROMPT = """You are a meeting intelligence expert. Given a full meeting transcript (possibly from multiple chunks), write a concise but complete meeting summary.

Structure your summary as:
1. One-sentence overview (what was the meeting about)
2. Key decisions made (bullet points, max 5)
3. Action items assigned (who does what by when)
4. Blockers or risks raised
5. Open questions still unresolved

Keep the summary under 400 words. Be specific. Use names when available. Do NOT use filler phrases like "the team discussed" — just state what was decided/agreed/assigned."""


# ─── Signal builder ─────────────────────────────────────────────────────────

def _build_signal(
    signal_type: str,
    summary: str,
    raw_quote: str,
    severity: str,
    entities: list[str],
    keywords: list[str],
    timestamp: str,
    group_id: str,
    meeting_id: str,
    urgency: str = "none",
    status: str = "open",
) -> dict:
    return {
        "id": str(uuid.uuid4()),
        "type": signal_type,
        "summary": summary,
        "raw_quote": raw_quote[:500] if raw_quote else "",
        "severity": severity,
        "status": status,
        "sentiment": "neutral",
        "urgency": urgency,
        "entities": entities,
        "keywords": keywords,
        "timestamp": timestamp,
        "group_id": group_id,
        "lens": "meet",
        "meeting_id": meeting_id,
    }


def _extract_entities(text: str, owner: str = None) -> list[str]:
    """Extract entity strings from text (names starting with @)."""
    import re
    entities = re.findall(r"@[\w]+", text)
    if owner and owner.startswith("@"):
        entities.append(owner)
    return list(set(entities))


def _extract_keywords(text: str) -> list[str]:
    """Simple keyword extraction: lowercase meaningful words."""
    stopwords = {"the", "a", "an", "is", "are", "was", "were", "will", "to", "of",
                 "in", "on", "at", "for", "by", "with", "this", "that", "and", "or",
                 "but", "we", "i", "it", "be", "do", "have", "has", "had", "not"}
    words = text.lower().split()
    keywords = [w.strip(".,!?;:\"'") for w in words if len(w) > 3 and w not in stopwords]
    return list(dict.fromkeys(keywords))[:10]  # deduplicate, keep first 10


# ─── Main processing function ────────────────────────────────────────────────

async def process_meet_chunk(
    meeting_id: str,
    group_id: str,
    chunk_index: int,
    text: str,
    speaker: str,
    timestamp: str,
    is_final: bool,
):
    """
    Full pipeline for a transcript chunk:
    1. Always store raw chunk for full-text search
    2. Extract structured signals via LLM
    3. If is_final, generate a full meeting summary
    """
    logger.info(f"Processing meet chunk {chunk_index} for meeting {meeting_id} ({len(text)} chars)")
    signals_to_store = []

    # 1. Always store the raw chunk (enables full-text similarity search later)
    raw_signal = _build_signal(
        signal_type="meet_chunk_raw",
        summary=f"[{meeting_id}] Chunk {chunk_index}: {text[:120]}...",
        raw_quote=text,
        severity="low",
        entities=[f"@{speaker}"] if speaker and speaker != "Unknown" else [],
        keywords=_extract_keywords(text),
        timestamp=timestamp,
        group_id=group_id,
        meeting_id=meeting_id,
    )
    signals_to_store.append(raw_signal)

    # 2. Extract structured signals via LLM
    try:
        result = await call_llm(
            task_type="fast_large",
            messages=[
                {"role": "system", "content": EXTRACTION_SYSTEM_PROMPT},
                {"role": "user", "content": f"Transcript chunk from speaker '{speaker}':\n\n{text}"},
            ],
            temperature=0.1,
            max_tokens=1500,
            response_format={"type": "json_object"},
        )

        raw_json = result["content"].strip()
        # Strip markdown code fences if present
        if raw_json.startswith("```"):
            raw_json = raw_json.split("```")[1]
            if raw_json.startswith("json"):
                raw_json = raw_json[4:]
        extracted = json.loads(raw_json)

    except Exception as e:
        logger.warning(f"Meet extraction LLM failed for chunk {chunk_index}: {e}")
        extracted = {}

    # Decisions
    for item in extracted.get("decisions", []):
        if item.get("confidence") in ("high", "medium"):
            signals_to_store.append(_build_signal(
                signal_type="meet_decision",
                summary=item["text"],
                raw_quote=item["text"],
                severity="medium",
                entities=_extract_entities(item["text"], item.get("owner")),
                keywords=_extract_keywords(item["text"]),
                timestamp=timestamp,
                group_id=group_id,
                meeting_id=meeting_id,
                status="decided",
            ))

    # Action items
    for item in extracted.get("action_items", []):
        if item.get("confidence") in ("high", "medium"):
            due_str = f" Due: {item['due']}." if item.get("due") else ""
            signals_to_store.append(_build_signal(
                signal_type="meet_action_item",
                summary=f"{item['text']}{due_str}",
                raw_quote=item["text"],
                severity="medium",
                entities=_extract_entities(item["text"], item.get("owner")),
                keywords=_extract_keywords(item["text"]),
                timestamp=timestamp,
                group_id=group_id,
                meeting_id=meeting_id,
                urgency="medium" if item.get("due") else "low",
                status="open",
            ))

    # Blockers
    for item in extracted.get("blockers", []):
        if item.get("confidence") in ("high", "medium"):
            signals_to_store.append(_build_signal(
                signal_type="meet_blocker",
                summary=item["text"],
                raw_quote=item["text"],
                severity="high",
                entities=_extract_entities(item["text"]),
                keywords=_extract_keywords(item["text"]),
                timestamp=timestamp,
                group_id=group_id,
                meeting_id=meeting_id,
                urgency="high",
                status="open",
            ))

    # Risks
    for item in extracted.get("risks", []):
        signals_to_store.append(_build_signal(
            signal_type="meet_risk",
            summary=item["text"],
            raw_quote=item["text"],
            severity=item.get("severity", "medium"),
            entities=_extract_entities(item["text"]),
            keywords=_extract_keywords(item["text"]),
            timestamp=timestamp,
            group_id=group_id,
            meeting_id=meeting_id,
            urgency="medium",
            status="open",
        ))

    # Open questions
    for item in extracted.get("open_questions", []):
        if item.get("confidence") in ("high", "medium"):
            signals_to_store.append(_build_signal(
                signal_type="meet_open_q",
                summary=item["text"],
                raw_quote=item["text"],
                severity="low",
                entities=_extract_entities(item["text"]),
                keywords=_extract_keywords(item["text"]),
                timestamp=timestamp,
                group_id=group_id,
                meeting_id=meeting_id,
                status="open",
            ))

    # 3. If this is the final chunk, generate a meeting summary
    if is_final:
        summary_signal = await _generate_meeting_summary(
            meeting_id, group_id, text, speaker, timestamp
        )
        if summary_signal:
            signals_to_store.append(summary_signal)

    # Store everything
    if signals_to_store:
        store_signals(group_id, signals_to_store)
        logger.info(
            f"Stored {len(signals_to_store)} signals for meeting {meeting_id} chunk {chunk_index}"
        )

    return signals_to_store


async def _generate_meeting_summary(
    meeting_id: str,
    group_id: str,
    final_chunk_text: str,
    speaker: str,
    timestamp: str,
) -> dict | None:
    """
    Pull all raw chunks for this meeting from ChromaDB and generate a summary.
    Falls back to summarizing just the final chunk if retrieval fails.
    """
    from backend.db.chroma import query_signals

    try:
        # Get all raw chunks for this meeting
        raw_chunks = query_signals(
            group_id,
            meeting_id,
            n_results=50,
            signal_type="meet_chunk_raw",
        )
        full_transcript = "\n\n".join(
            [s.get("raw_quote") or s.get("summary", "") for s in raw_chunks]
        )
        if not full_transcript.strip():
            full_transcript = final_chunk_text
    except Exception:
        full_transcript = final_chunk_text

    try:
        result = await call_llm(
            task_type="fast_large",
            messages=[
                {"role": "system", "content": SUMMARY_SYSTEM_PROMPT},
                {
                    "role": "user",
                    "content": f"Meeting ID: {meeting_id}\n\nFull transcript:\n\n{full_transcript[:6000]}",
                },
            ],
            temperature=0.3,
            max_tokens=600,
        )
        summary_text = result["content"].strip()
    except Exception as e:
        logger.warning(f"Meeting summary generation failed: {e}")
        return None

    return _build_signal(
        signal_type="meet_summary",
        summary=summary_text,
        raw_quote=full_transcript[:500],
        severity="medium",
        entities=[f"@{speaker}"] if speaker and speaker != "Unknown" else [],
        keywords=_extract_keywords(summary_text),
        timestamp=timestamp,
        group_id=group_id,
        meeting_id=meeting_id,
        status="completed",
    )

TEST MILESTONE 15

Create file: thirdeye/scripts/test_m15.py

"""
Test Milestone 15: Meet transcript processing agent.
Tests signal extraction from transcript text WITHOUT needing the extension or Chrome.
"""
import asyncio
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))


# Sample meeting transcript (realistic, multi-topic)
SAMPLE_TRANSCRIPT_1 = """
Alex: Alright, let's get started. So the main thing today is the database migration.
Sam: Yeah, we've been going back and forth but I think we should just commit to PostgreSQL. 
     It has better support for our JSON query patterns and the team already knows it.
Alex: Agreed, let's make that the decision. We go with PostgreSQL.
Priya: I can set up the initial schema. I'll have it ready by Thursday.
Sam: Great. One thing though — the legacy MySQL tables still have some data we need to migrate.
     I have no idea how long that's going to take. That's a real risk.
Alex: Who's owning the migration scripts?
Priya: I'll do it, but I'll need the final schema signed off before I start. That's a blocker for me.
Sam: When do we need the migration done by?
Alex: End of sprint, so March 28th.
Sam: Can we do that? I'm not sure.
Alex: We'll try. Priya, can you at least start the schema this week?
Priya: Yes, schema by Thursday, migration scripts next week if all goes well.
"""

SAMPLE_TRANSCRIPT_2 = """
Lisa: Moving on — the client dashboard is still blocked waiting on design specs.
Alex: Yeah that's been two weeks now. We literally cannot start without those specs.
Lisa: I know, I'll follow up with design today. That's on me.
Sam: Also, the checkout endpoint is still hitting intermittent timeouts. 
     Third time this sprint. We need to actually fix this, not just restart pods.
Alex: Agreed, that needs an owner. Sam can you pick that up?
Sam: Yeah I'll investigate this week. I'll add a ticket.
Lisa: Any risks before we close?
Priya: The OAuth integration is touching a lot of the auth layer. 
       If something breaks there, it could affect all our users at once. High risk.
Alex: Good call. Let's make sure we do that in a feature branch and have a rollback plan.
"""


async def test_signal_extraction_chunk_1():
    """Test extraction from a decision-heavy transcript."""
    from backend.agents.meet_ingestor import process_meet_chunk
    from backend.db.chroma import query_signals
    import chromadb
    from backend.config import CHROMA_DB_PATH

    group_id = "test_meet_m15_a"
    meeting_id = "sprint-planning-m15"

    print("Testing signal extraction from transcript chunk 1 (decisions + action items)...")
    signals = await process_meet_chunk(
        meeting_id=meeting_id,
        group_id=group_id,
        chunk_index=0,
        text=SAMPLE_TRANSCRIPT_1.strip(),
        speaker="Alex",
        timestamp="2026-03-21T10:00:00Z",
        is_final=False,
    )

    assert len(signals) > 0, "Expected at least some signals to be extracted"
    print(f"  ✅ {len(signals)} total signals produced")

    types = [s["type"] for s in signals]
    print(f"  Types found: {set(types)}")

    # Must have at least a raw chunk
    assert "meet_chunk_raw" in types, "Expected raw chunk signal"
    print("  ✅ Raw chunk stored (enables full-text search)")

    # Should have extracted decisions (PostgreSQL decision is clear)
    decisions = [s for s in signals if s["type"] == "meet_decision"]
    assert len(decisions) > 0, "Expected at least one decision (PostgreSQL decision is explicit)"
    print(f"  ✅ {len(decisions)} decision signal(s) extracted")
    print(f"     First decision: {decisions[0]['summary'][:100]}")

    # Should have extracted action items (Priya - schema by Thursday)
    actions = [s for s in signals if s["type"] == "meet_action_item"]
    assert len(actions) > 0, "Expected at least one action item (Priya - schema by Thursday)"
    print(f"  ✅ {len(actions)} action item(s) extracted")
    print(f"     First action: {actions[0]['summary'][:100]}")

    # Verify signals are in ChromaDB
    results = query_signals(group_id, "database decision PostgreSQL")
    assert len(results) > 0, "Expected signals to be queryable from ChromaDB"
    print(f"  ✅ Signals queryable from ChromaDB ({len(results)} results for 'database decision PostgreSQL')")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    try:
        client.delete_collection(f"ll_{group_id}")
    except Exception:
        pass


async def test_signal_extraction_chunk_2():
    """Test extraction from a blocker + risk heavy transcript."""
    from backend.agents.meet_ingestor import process_meet_chunk
    import chromadb
    from backend.config import CHROMA_DB_PATH

    group_id = "test_meet_m15_b"
    meeting_id = "standup-m15"

    print("\nTesting signal extraction from transcript chunk 2 (blockers + risks)...")
    signals = await process_meet_chunk(
        meeting_id=meeting_id,
        group_id=group_id,
        chunk_index=0,
        text=SAMPLE_TRANSCRIPT_2.strip(),
        speaker="Lisa",
        timestamp="2026-03-21T10:30:00Z",
        is_final=False,
    )

    types = [s["type"] for s in signals]
    print(f"  Types found: {set(types)}")

    blockers = [s for s in signals if s["type"] == "meet_blocker"]
    risks = [s for s in signals if s["type"] == "meet_risk"]

    assert len(blockers) > 0, "Expected at least one blocker (dashboard blocked on design specs)"
    print(f"  ✅ {len(blockers)} blocker(s) extracted")
    print(f"     First blocker: {blockers[0]['summary'][:100]}")

    assert len(risks) > 0, "Expected at least one risk (OAuth touching auth layer)"
    print(f"  ✅ {len(risks)} risk(s) extracted")
    print(f"     First risk: {risks[0]['summary'][:100]}")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    try:
        client.delete_collection(f"ll_{group_id}")
    except Exception:
        pass


async def test_final_chunk_generates_summary():
    """Test that is_final=True triggers a summary signal generation."""
    from backend.agents.meet_ingestor import process_meet_chunk
    import chromadb
    from backend.config import CHROMA_DB_PATH

    group_id = "test_meet_m15_c"
    meeting_id = "full-meeting-m15"

    print("\nTesting final chunk triggers meeting summary...")

    # First chunk
    await process_meet_chunk(
        meeting_id=meeting_id,
        group_id=group_id,
        chunk_index=0,
        text=SAMPLE_TRANSCRIPT_1.strip(),
        speaker="Alex",
        timestamp="2026-03-21T10:00:00Z",
        is_final=False,
    )

    # Final chunk
    signals = await process_meet_chunk(
        meeting_id=meeting_id,
        group_id=group_id,
        chunk_index=1,
        text=SAMPLE_TRANSCRIPT_2.strip(),
        speaker="Lisa",
        timestamp="2026-03-21T10:30:00Z",
        is_final=True,
    )

    types = [s["type"] for s in signals]
    assert "meet_summary" in types, "Expected a meet_summary signal on is_final=True"
    summary_sig = next(s for s in signals if s["type"] == "meet_summary")
    assert len(summary_sig["summary"]) > 50, "Summary should be at least 50 chars"
    print(f"  ✅ Meeting summary generated ({len(summary_sig['summary'])} chars)")
    print(f"     Preview: {summary_sig['summary'][:150]}...")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    try:
        client.delete_collection(f"ll_{group_id}")
    except Exception:
        pass


async def test_signals_coexist_with_chat_signals():
    """Test that meet signals are queryable alongside existing chat signals."""
    from backend.agents.meet_ingestor import process_meet_chunk
    from backend.pipeline import process_message_batch, query_knowledge, set_lens
    import chromadb
    from backend.config import CHROMA_DB_PATH

    group_id = "test_meet_m15_d"
    meeting_id = "integration-test-m15"
    set_lens(group_id, "dev")

    print("\nTesting meet signals + chat signals coexist...")

    # Add chat signals
    chat_messages = [
        {"sender": "Alex", "text": "The team agreed in a previous meeting we'd use Redis for caching.", "timestamp": "2026-03-20T09:00:00Z"},
        {"sender": "Priya", "text": "The timeout bug on checkout is still unresolved from last sprint.", "timestamp": "2026-03-20T09:05:00Z"},
    ]
    await process_message_batch(group_id, chat_messages)
    print("  ✅ Chat signals stored")

    # Add meet signals
    await process_meet_chunk(
        meeting_id=meeting_id,
        group_id=group_id,
        chunk_index=0,
        text="We decided in today's meeting to switch from Redis to Memcached for the caching layer. Sam will update the config by Friday.",
        speaker="Alex",
        timestamp="2026-03-21T10:00:00Z",
        is_final=False,
    )
    print("  ✅ Meet signals stored")

    # Query across both
    answer = await query_knowledge(group_id, "What did we decide about caching?")
    assert len(answer) > 20, "Expected a substantive answer about caching"
    print(f"  ✅ Query across chat + meet: {answer[:120]}...")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    try:
        client.delete_collection(f"ll_{group_id}")
    except Exception:
        pass


async def main():
    print("Running Milestone 15 tests...\n")
    await test_signal_extraction_chunk_1()
    await test_signal_extraction_chunk_2()
    await test_final_chunk_generates_summary()
    await test_signals_coexist_with_chat_signals()
    print("\n🎉 MILESTONE 15 PASSED — Meet transcript agent extracting and storing signals correctly")


asyncio.run(main())

Run: cd thirdeye && python scripts/test_m15.py

Expected output: All checks. Decisions, action items, blockers, and risks extracted correctly. Final chunk generates a summary. Meet signals coexist and are queryable alongside chat signals.

🎉 MILESTONE 15 PASSED — Meet transcript agent extracting and storing signals correctly

MILESTONE 16: Telegram Meet Commands + Cross-Reference (130%)

Goal: Three new Telegram commands that query your meeting knowledge and surface connections to prior Telegram group discussions:

  • /meetsum [meeting_id] — summarizes a meeting (or the most recent one)
  • /meetask [meeting_id] [question] — asks a specific question about a meeting
  • /meetmatch — finds where your meeting signals align with OR contradict signals from your monitored Telegram groups (the big wow moment)

Step 16.1 — Create the meet cross-reference agent

Create file: thirdeye/backend/agents/meet_cross_ref.py

"""
Meet Cross-Reference Agent
Finds connections between meeting signals and existing Telegram group signals.
Surfaces: confirmations (meeting agrees with chat), contradictions (meeting contradicts chat),
and blind spots (meeting discusses something chat groups don't know about).
"""
import logging
from backend.providers import call_llm
from backend.db.chroma import query_signals, get_all_signals
from backend.config import MEET_CROSS_REF_GROUPS, MEET_DEFAULT_GROUP_ID

logger = logging.getLogger("thirdeye.agents.meet_cross_ref")

CROSS_REF_SYSTEM_PROMPT = """You are an expert at finding connections between meeting discussions and team chat history.

You will receive:
1. MEETING SIGNALS — decisions, action items, blockers, risks from a recent Google Meet
2. CHAT SIGNALS — existing signals from team Telegram groups

Find meaningful connections across three categories:

CONFIRMATIONS: Meeting agrees with or reinforces something from chat history
CONTRADICTIONS: Meeting decision conflicts with what was said/decided in chat
BLIND SPOTS: Important things from the meeting that the chat teams don't seem to know about

Return ONLY a valid JSON object:
{
  "confirmations": [
    {"meeting_signal": "...", "chat_signal": "...", "group": "...", "significance": "high|medium|low"}
  ],
  "contradictions": [
    {"meeting_signal": "...", "chat_signal": "...", "group": "...", "impact": "...", "significance": "high|medium|low"}
  ],
  "blind_spots": [
    {"meeting_signal": "...", "teams_unaware": ["group1", "group2"], "recommendation": "..."}
  ]
}

Rules:
- Only include HIGH confidence matches — do not stretch for weak connections
- Keep each signal description concise (1 sentence max)
- significance "high" = this matters for team alignment; "medium" = worth noting; "low" = minor
- If a category has nothing meaningful, use an empty array []
- Return JSON only"""


async def find_cross_references(
    meeting_id: str,
    group_id: str = None,
    cross_ref_group_ids: list[str] = None,
) -> dict:
    """
    Compare meeting signals against chat group signals.
  
    Args:
        meeting_id: The meeting to analyze
        group_id: ChromaDB group where meet signals are stored (defaults to MEET_DEFAULT_GROUP_ID)
        cross_ref_group_ids: Groups to compare against (defaults to MEET_CROSS_REF_GROUPS from config)
  
    Returns:
        Dict with confirmations, contradictions, blind_spots lists
    """
    group_id = group_id or MEET_DEFAULT_GROUP_ID
    cross_ref_group_ids = cross_ref_group_ids or MEET_CROSS_REF_GROUPS

    if not cross_ref_group_ids:
        return {
            "confirmations": [],
            "contradictions": [],
            "blind_spots": [],
            "error": "No cross-reference groups configured. Set MEET_CROSS_REF_GROUPS in .env",
        }

    # 1. Get meeting signals (decisions, actions, blockers, risks — NOT raw chunks)
    meet_signals = query_signals(group_id, meeting_id, n_results=30)
    structured_meet = [
        s for s in meet_signals
        if s.get("type") in ("meet_decision", "meet_action_item", "meet_blocker", "meet_risk", "meet_open_q")
    ]

    if not structured_meet:
        return {
            "confirmations": [],
            "contradictions": [],
            "blind_spots": [],
            "error": f"No structured signals found for meeting {meeting_id}. Has it been processed yet?",
        }

    # 2. Get signals from each cross-reference group
    chat_context_parts = []
    for gid in cross_ref_group_ids:
        try:
            all_sig = get_all_signals(gid)
            if all_sig:
                formatted = "\n".join([
                    f"  [{s.get('type', '?')}] {s.get('summary', '')[:120]}"
                    for s in all_sig[:20]  # Cap at 20 per group to stay within token limits
                ])
                chat_context_parts.append(f"Group '{gid}':\n{formatted}")
        except Exception as e:
            logger.warning(f"Could not load signals for group {gid}: {e}")

    if not chat_context_parts:
        return {
            "confirmations": [],
            "contradictions": [],
            "blind_spots": [],
            "error": "Could not load any signals from cross-reference groups.",
        }

    # 3. Format inputs for LLM
    meet_text = "\n".join([
        f"  [{s['type']}] {s['summary'][:150]}" for s in structured_meet
    ])
    chat_text = "\n\n".join(chat_context_parts)

    prompt = f"""MEETING SIGNALS (from meeting: {meeting_id}):
{meet_text}

CHAT SIGNALS (from monitored Telegram groups):
{chat_text}"""

    try:
        import json
        result = await call_llm(
            task_type="reasoning",
            messages=[
                {"role": "system", "content": CROSS_REF_SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            temperature=0.2,
            max_tokens=1500,
            response_format={"type": "json_object"},
        )
        raw = result["content"].strip()
        if raw.startswith("```"):
            raw = raw.split("```")[1]
            if raw.startswith("json"):
                raw = raw[4:]
        return json.loads(raw)

    except Exception as e:
        logger.error(f"Cross-reference LLM call failed: {e}")
        return {
            "confirmations": [],
            "contradictions": [],
            "blind_spots": [],
            "error": str(e),
        }


def format_cross_ref_for_telegram(analysis: dict, meeting_id: str) -> str:
    """Format cross-reference results as a Telegram message."""
    parts = [f"🔗 *Meet ↔ Chat Cross-Reference*\nMeeting: `{meeting_id}`\n"]

    if analysis.get("error"):
        return f"⚠️ Cross-reference failed: {analysis['error']}"

    confirmations = analysis.get("confirmations", [])
    contradictions = analysis.get("contradictions", [])
    blind_spots = analysis.get("blind_spots", [])

    if not confirmations and not contradictions and not blind_spots:
        return f"🔗 *Meet ↔ Chat Cross-Reference*\nMeeting `{meeting_id}`: No significant connections found between this meeting and your chat groups."

    if confirmations:
        parts.append(f"✅ *Confirmations* ({len(confirmations)})")
        for c in confirmations[:3]:  # Cap at 3 for readability
            sig = "🔴" if c.get("significance") == "high" else "🟡"
            parts.append(f"{sig} Meeting: _{c['meeting_signal'][:100]}_")
            parts.append(f"   Matches [{c.get('group', '?')}]: _{c['chat_signal'][:100]}_\n")

    if contradictions:
        parts.append(f"⚡ *Contradictions* ({len(contradictions)}) — ACTION NEEDED")
        for c in contradictions[:3]:
            parts.append(f"🔴 Meeting decided: _{c['meeting_signal'][:100]}_")
            parts.append(f"   BUT [{c.get('group', '?')}] says: _{c['chat_signal'][:100]}_")
            if c.get("impact"):
                parts.append(f"   Impact: {c['impact'][:100]}\n")

    if blind_spots:
        parts.append(f"🔦 *Blind Spots* ({len(blind_spots)}) — Teams may not know")
        for b in blind_spots[:3]:
            parts.append(f"🟠 {b['meeting_signal'][:120]}")
            if b.get("recommendation"):
                parts.append(f"   → {b['recommendation'][:100]}\n")

    return "\n".join(parts)

Step 16.2 — Add new Telegram commands to bot.py

Add to thirdeye/backend/bot/commands.py (or wherever your command handlers live):

# ─────────────────────────────────────────────
#  Meet Commands — add to existing commands.py
# ─────────────────────────────────────────────

async def cmd_meetsum(update, context):
    """
    /meetsum [meeting_id]
    Summarizes a meeting. Uses the most recent meeting if no ID given.
    """
    from backend.db.chroma import query_signals
    from backend.config import MEET_DEFAULT_GROUP_ID

    args = context.args or []
    group_id = MEET_DEFAULT_GROUP_ID

    if args:
        meeting_id = args[0]
    else:
        # Find the most recent meeting ID from meet_summary signals
        all_summaries = query_signals(group_id, "meeting summary", n_results=5, signal_type="meet_summary")
        if not all_summaries:
            await update.message.reply_text(
                "📭 No meeting summaries found yet. Record a meeting with the ThirdEye Meet extension first."
            )
            return
        # The most recent signal (by position in results, or pick first)
        meeting_id = all_summaries[0].get("meeting_id") or all_summaries[0].get("metadata", {}).get("meeting_id")

    if not meeting_id:
        await update.message.reply_text("Usage: /meetsum [meeting_id]")
        return

    # Fetch summary signal
    signals = query_signals(group_id, meeting_id, n_results=20)
    summary_signal = next(
        (s for s in signals if s.get("type") == "meet_summary"), None
    )

    if summary_signal:
        await update.message.reply_text(
            f"📋 *Meeting Summary*\nID: `{meeting_id}`\n\n{summary_signal['summary']}",
            parse_mode="Markdown",
        )
        return

    # No summary yet — gather structured signals and summarize on-the-fly
    structured = [
        s for s in signals
        if s.get("type") in ("meet_decision", "meet_action_item", "meet_blocker", "meet_risk")
    ]

    if not structured:
        await update.message.reply_text(
            f"📭 No signals found for meeting `{meeting_id}`. "
            "Make sure the meeting has been recorded and processed.",
            parse_mode="Markdown",
        )
        return

    lines = [f"📋 *Meeting Signals* for `{meeting_id}`\n"]
    type_labels = {
        "meet_decision": "✅ Decision",
        "meet_action_item": "📌 Action",
        "meet_blocker": "🚧 Blocker",
        "meet_risk": "⚠️ Risk",
    }
    for s in structured[:10]:
        label = type_labels.get(s["type"], s["type"])
        lines.append(f"{label}: {s['summary'][:120]}")

    await update.message.reply_text("\n".join(lines), parse_mode="Markdown")


async def cmd_meetask(update, context):
    """
    /meetask [meeting_id] [question]
    Asks a question about a specific meeting's knowledge.
    """
    from backend.pipeline import query_knowledge
    from backend.config import MEET_DEFAULT_GROUP_ID

    args = context.args or []
    if len(args) < 2:
        await update.message.reply_text(
            "Usage: /meetask [meeting_id] [your question]\n"
            "Example: /meetask abc-defg-hij What did we decide about the database?"
        )
        return

    meeting_id = args[0]
    question = " ".join(args[1:])
    group_id = MEET_DEFAULT_GROUP_ID

    await update.message.reply_text(f"🔍 Searching meeting `{meeting_id}`...", parse_mode="Markdown")

    # Augment the question with meeting ID context so the RAG retrieval is scoped
    scoped_question = f"From meeting {meeting_id}: {question}"
    answer = await query_knowledge(group_id, scoped_question)

    await update.message.reply_text(
        f"🎙️ *Meet Q&A* — `{meeting_id}`\n\nQ: _{question}_\n\nA: {answer}",
        parse_mode="Markdown",
    )


async def cmd_meetmatch(update, context):
    """
    /meetmatch [meeting_id]
    Finds where a meeting's decisions/blockers match or conflict with your team chat signals.
    """
    from backend.agents.meet_cross_ref import find_cross_references, format_cross_ref_for_telegram
    from backend.db.chroma import query_signals
    from backend.config import MEET_DEFAULT_GROUP_ID

    args = context.args or []
    group_id = MEET_DEFAULT_GROUP_ID

    if args:
        meeting_id = args[0]
    else:
        # Find most recent meeting
        all_summaries = query_signals(group_id, "meeting", n_results=5, signal_type="meet_summary")
        if not all_summaries:
            await update.message.reply_text(
                "📭 No meetings found. Record a meeting first with the ThirdEye Meet extension."
            )
            return
        meeting_id = all_summaries[0].get("meeting_id", "unknown")

    await update.message.reply_text(
        f"🔗 Comparing meeting `{meeting_id}` against your team chats...\n"
        "_This may take a moment._",
        parse_mode="Markdown",
    )

    analysis = await find_cross_references(meeting_id, group_id=group_id)
    formatted = format_cross_ref_for_telegram(analysis, meeting_id)
    await update.message.reply_text(formatted, parse_mode="Markdown")

Step 16.3 — Register the new commands in bot.py

In thirdeye/backend/bot/bot.py, add the new handlers inside your existing application setup (alongside where /ask, /digest, etc. are registered):

# Add these lines where other CommandHandlers are registered:
from backend.bot.commands import cmd_meetsum, cmd_meetask, cmd_meetmatch

application.add_handler(CommandHandler("meetsum", cmd_meetsum))
application.add_handler(CommandHandler("meetask", cmd_meetask))
application.add_handler(CommandHandler("meetmatch", cmd_meetmatch))

Also update the /start message to include the new commands:

# In your /start handler, add to the commands list:
"🎙️ /meetsum [id] — Summarize a meeting\n"
"🔍 /meetask [id] [q] — Ask about a meeting\n"
"🔗 /meetmatch [id] — Match meeting to team chats\n"
In BotFather → /setcommands → paste:
meetsum - Summarize a Google Meet recording
meetask - Ask a question about a meeting
meetmatch - Find connections between meeting and team chats

TEST MILESTONE 16

Create file: thirdeye/scripts/test_m16.py

"""
Test Milestone 16: Meet Telegram commands and cross-reference agent.
Tests command logic and cross-reference analysis without needing a live Telegram bot.
"""
import asyncio
import os
import sys
import json
sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..'))


# ─── Seed data ────────────────────────────────────────────────────────────────

async def seed_meet_signals(meeting_id: str, group_id: str):
    """Seed a meeting with realistic signals."""
    from backend.agents.meet_ingestor import process_meet_chunk

    transcript = """
    Alex: We decided to go with PostgreSQL. Final decision, no more debate.
    Priya: I'll set up the schema by Thursday and own the migration.
    Sam: The checkout timeout bug is still blocking us. Third time this sprint.
    Lisa: Dashboard is still waiting on design specs. That's a two-week blocker.
    Alex: The OAuth refactor is risky — touches everything. High risk if we rush it.
    Sam: Should we delay the deploy to next week? Nobody answered.
    """

    await process_meet_chunk(
        meeting_id=meeting_id,
        group_id=group_id,
        chunk_index=0,
        text=transcript.strip(),
        speaker="Alex",
        timestamp="2026-03-21T10:00:00Z",
        is_final=True,  # triggers summary generation
    )


async def seed_chat_signals(chat_group_id: str):
    """Seed Telegram chat signals to cross-reference against."""
    from backend.pipeline import process_message_batch, set_lens

    set_lens(chat_group_id, "dev")
    messages = [
        {"sender": "Alex", "text": "We should switch to MySQL instead of PostgreSQL. It's simpler.", "timestamp": "2026-03-15T09:00:00Z"},
        {"sender": "Sam", "text": "The checkout timeout is back again. Still not fixed.", "timestamp": "2026-03-18T10:00:00Z"},
        {"sender": "Priya", "text": "OAuth integration is going into the main branch directly, no feature branch.", "timestamp": "2026-03-19T11:00:00Z"},
    ]
    await process_message_batch(chat_group_id, messages)


# ─── Tests ────────────────────────────────────────────────────────────────────

async def test_meetsum_logic():
    """Test that meetsum can find and return a meeting summary."""
    from backend.db.chroma import query_signals
    import chromadb
    from backend.config import CHROMA_DB_PATH

    meeting_id = "test-meet-m16-sum"
    group_id = "test_meet_m16_sum"

    print("Testing /meetsum logic...")
    await seed_meet_signals(meeting_id, group_id)

    # Simulate what /meetsum does
    signals = query_signals(group_id, meeting_id, n_results=20)
    assert len(signals) > 0, "Expected signals for the seeded meeting"
    print(f"  ✅ {len(signals)} signals found for meeting {meeting_id}")

    types = {s["type"] for s in signals}
    print(f"  Signal types: {types}")

    has_summary = "meet_summary" in types
    has_structured = any(t in types for t in ("meet_decision", "meet_action_item", "meet_blocker", "meet_risk"))
    assert has_summary or has_structured, "Expected summary or structured signals"
    print(f"  ✅ Has summary: {has_summary} | Has structured signals: {has_structured}")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    try:
        client.delete_collection(f"ll_{group_id}")
    except Exception:
        pass


async def test_meetask_logic():
    """Test that meetask can answer questions about a meeting."""
    from backend.pipeline import query_knowledge
    import chromadb
    from backend.config import CHROMA_DB_PATH

    meeting_id = "test-meet-m16-ask"
    group_id = "test_meet_m16_ask"

    print("\nTesting /meetask logic...")
    await seed_meet_signals(meeting_id, group_id)

    # Test 1: Question about decisions
    answer = await query_knowledge(group_id, f"From meeting {meeting_id}: What database did we decide on?")
    assert len(answer) > 10, "Expected a substantive answer"
    postgres_mentioned = "postgres" in answer.lower() or "database" in answer.lower() or "sql" in answer.lower()
    assert postgres_mentioned, f"Expected PostgreSQL to be mentioned. Got: {answer[:200]}"
    print(f"  ✅ Decision query answered: {answer[:120]}...")

    # Test 2: Question about action items
    answer2 = await query_knowledge(group_id, f"From meeting {meeting_id}: Who is setting up the schema?")
    assert len(answer2) > 10, "Expected a substantive answer about schema ownership"
    print(f"  ✅ Action item query answered: {answer2[:120]}...")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    try:
        client.delete_collection(f"ll_{group_id}")
    except Exception:
        pass


async def test_meetmatch_cross_reference():
    """Test cross-reference finds contradictions and confirmations."""
    from backend.agents.meet_cross_ref import find_cross_references, format_cross_ref_for_telegram
    import chromadb
    from backend.config import CHROMA_DB_PATH

    meeting_id = "test-meet-m16-match"
    meet_group = "test_meet_m16_match"
    chat_group = "test_chat_m16_match"

    print("\nTesting /meetmatch cross-reference...")
    await seed_meet_signals(meeting_id, meet_group)
    await seed_chat_signals(chat_group)

    # Run cross-reference
    analysis = await find_cross_references(
        meeting_id=meeting_id,
        group_id=meet_group,
        cross_ref_group_ids=[chat_group],
    )

    if analysis.get("error"):
        print(f"  ⚠️ Cross-reference returned error: {analysis['error']}")
        # This is OK if the groups are empty — test passes but notes the condition
    else:
        contradictions = analysis.get("contradictions", [])
        confirmations = analysis.get("confirmations", [])
        blind_spots = analysis.get("blind_spots", [])

        print(f"  Found: {len(contradictions)} contradiction(s), {len(confirmations)} confirmation(s), {len(blind_spots)} blind spot(s)")

        # We seeded a clear contradiction: meeting says PostgreSQL, chat says MySQL
        # And a confirmation: checkout timeout mentioned in both
        # At least one of these categories should have results
        total = len(contradictions) + len(confirmations) + len(blind_spots)
        assert total > 0, "Expected at least one cross-reference finding (contradiction: PostgreSQL vs MySQL)"
        print(f"  ✅ {total} total cross-reference findings")

        if contradictions:
            print(f"  ✅ Contradiction found: {contradictions[0]['meeting_signal'][:80]}")
        if confirmations:
            print(f"  ✅ Confirmation found: {confirmations[0]['meeting_signal'][:80]}")

    # Test formatter
    formatted = format_cross_ref_for_telegram(analysis, meeting_id)
    assert len(formatted) > 20, "Expected a non-empty formatted message"
    assert meeting_id in formatted, "Expected meeting ID in formatted output"
    print(f"  ✅ Telegram formatter produced {len(formatted)} char message")
    print(f"  Preview: {formatted[:200]}...")

    # Cleanup
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
    for gid in [meet_group, chat_group]:
        try:
            client.delete_collection(f"ll_{gid}")
        except Exception:
            pass


async def test_command_handlers_importable():
    """Test that all three command handlers can be imported without errors."""
    try:
        from backend.bot.commands import cmd_meetsum, cmd_meetask, cmd_meetmatch
        print("\n  ✅ cmd_meetsum importable")
        print("  ✅ cmd_meetask importable")
        print("  ✅ cmd_meetmatch importable")
    except ImportError as e:
        print(f"\n  ❌ Command import failed: {e}")
        raise


async def main():
    print("Running Milestone 16 tests...\n")
    await test_meetsum_logic()
    await test_meetask_logic()
    await test_meetmatch_cross_reference()
    await test_command_handlers_importable()
    print("\n🎉 MILESTONE 16 PASSED — Meet commands working, cross-reference finding connections")


asyncio.run(main())

Run: cd thirdeye && python scripts/test_m16.py

Expected output:

  ✅ Signals found for meeting test-meet-m16-sum
  ✅ Has summary: True | Has structured signals: True
  ✅ Decision query answered: Based on the meeting, the team decided to go with PostgreSQL...
  ✅ Action item query answered: Priya is responsible for setting up the schema...
  ✅ Contradiction found: We decided to go with PostgreSQL...
  ✅ Telegram formatter produced 450 char message

🎉 MILESTONE 16 PASSED — Meet commands working, cross-reference finding connections

MILESTONE SUMMARY (Updated)

# Milestone What You Have %
0 Scaffolding Folders, deps, env vars, all API keys 0%
1 Provider Router Multi-provider LLM calls with fallback 10%
2 ChromaDB + Embeddings Store and retrieve signals with vector search 20%
3 Core Agents Signal Extractor + Classifier + Context Detector 30%
4 Full Pipeline Messages → Extract → Classify → Store → Query 45%
5 Intelligence Layer Pattern detection + Cross-group analysis 60%
6 Telegram Bot Live bot processing group messages 70%
7 FastAPI + Dashboard API REST API serving all data 85%
8 Unified Runner Bot + API running together 90%
9 Demo Data 3 groups seeded with realistic data 95%
10 Polish & Demo Ready README, rehearsed demo, everything working 100%
11 Document & PDF Ingestion PDFs/DOCX/TXT shared in groups → chunked → stored in RAG 105%
12 Tavily Web Search Query Agent searches web when KB is empty or question is external 110%
13 Link Fetch & Ingestion URLs in messages → fetched → summarized → stored as signals 115%
14 Meet Chrome Extension Extension captures Meet audio → transcribes → POSTs chunks to backend 120%
15 Meet Signal Processing Transcript chunks → decisions/actions/blockers/risks → ChromaDB 125%
16 Meet Telegram Commands /meetsum, /meetask, /meetmatch with cross-reference to chat signals 130%

FILE CHANGE SUMMARY

New Files Created

thirdeye/meet_extension/manifest.json         # Milestone 14 — Chrome extension manifest
thirdeye/meet_extension/content.js            # Milestone 14 — Audio capture + transcription
thirdeye/meet_extension/popup.html            # Milestone 14 — Extension popup UI
thirdeye/meet_extension/popup.js              # Milestone 14 — Popup logic
thirdeye/meet_extension/background.js         # Milestone 14 — Service worker
thirdeye/backend/agents/meet_ingestor.py      # Milestone 15 — Transcript signal extraction
thirdeye/backend/agents/meet_cross_ref.py     # Milestone 16 — Chat ↔ meet cross-reference
thirdeye/scripts/test_m14.py                  # Milestone 14 test
thirdeye/scripts/test_m15.py                  # Milestone 15 test
thirdeye/scripts/test_m16.py                  # Milestone 16 test

Existing Files Modified

thirdeye/.env                                 # Pre-work: MEET_INGEST_SECRET + new vars
thirdeye/backend/config.py                    # Pre-work: new meet config vars
thirdeye/backend/api/routes.py                # M14: /api/meet/start, /api/meet/ingest, /api/meet/meetings
thirdeye/backend/bot/commands.py              # M16: cmd_meetsum, cmd_meetask, cmd_meetmatch
thirdeye/backend/bot/bot.py                   # M16: register 3 new CommandHandlers

Updated Repo Structure (additions only)

thirdeye/
├── meet_extension/                # NEW — Chrome extension
│   ├── manifest.json
│   ├── content.js                 # Core: audio capture + Web Speech API
│   ├── popup.html
│   ├── popup.js
│   ├── background.js
│   ├── icon16.png
│   ├── icon48.png
│   └── icon128.png
│
├── backend/
│   ├── agents/
│   │   ├── meet_ingestor.py       # NEW — transcript → signals
│   │   └── meet_cross_ref.py      # NEW — meet ↔ chat cross-reference
│   ├── api/
│   │   └── routes.py              # MODIFIED — 3 new /api/meet/* endpoints
│   └── bot/
│       ├── commands.py            # MODIFIED — 3 new meet commands
│       └── bot.py                 # MODIFIED — register meet handlers
│
└── scripts/
    ├── test_m14.py                # NEW — extension + endpoint tests
    ├── test_m15.py                # NEW — transcript processing tests
    └── test_m16.py                # NEW — command + cross-reference tests

UPDATED COMMANDS REFERENCE

EXISTING (unchanged):
/start        — Welcome message
/ask [q]      — Query the knowledge base (chat + docs + links)
/search [q]   — Explicit web search via Tavily
/digest       — Intelligence summary
/lens [mode]  — Set/check detection lens
/alerts       — View active warnings

NEW — Google Meet:
/meetsum [id]        — Summarize a meeting (omit ID for most recent)
/meetask [id] [q]    — Ask a question about a specific meeting
/meetmatch [id]      — Find connections between meeting and your team chat signals

PASSIVE (no command needed):
• Text messages      → batched → signal extraction (existing)
• Document drops     → downloaded → chunked → stored (M11)
• URLs in messages   → fetched → summarized → stored (M13)
• Meet Extension     → audio → transcribed → chunked → stored every 30s (M14/M15)

HOW THE FULL MEET FLOW WORKS (End-to-End)

1. Open Google Meet → click ThirdEye extension icon → "▶ Start Recording"

2. Web Speech API transcribes your meeting audio in-browser (no audio leaves the browser)

3. Every 30 seconds, the extension POSTs the transcript text to:
   POST /api/meet/ingest  {meeting_id, text, speaker, chunk_index, ...}

4. Backend queues background processing → meet_ingestor.py extracts signals:
   • meet_decision    → "Team decided to use PostgreSQL"
   • meet_action_item → "Priya to set up schema by Thursday"
   • meet_blocker     → "Dashboard blocked on design specs"
   • meet_risk        → "OAuth refactor is high risk"
   • meet_chunk_raw   → full text stored for similarity search

5. When recording stops, the final chunk triggers meet_summary generation
   (pulls all chunks from ChromaDB, summarizes the full meeting)

6. In Telegram:
   /meetsum abc-defg-hij         → Get the meeting summary
   /meetask abc-defg-hij [q]     → Ask anything about the meeting
   /meetmatch                    → 🔥 Find where the meeting
                                    agrees or conflicts with
                                    your team chat signals

THE WOW MOMENT FOR /meetmatch

"Your sprint planning decided to use PostgreSQL. But your dev team's Telegram chat from last week shows Alex proposed MySQL and nobody objected. And your product team doesn't know about the OAuth risk your tech lead raised in the meeting. ThirdEye connected the dots."

This is Cross-Group Intelligence extended to meetings — the same killer feature that makes ThirdEye unique, now spanning across your live calls AND your async chats.


Every milestone has a test. Every test must pass. No skipping.