# ThirdEye — Meet Extension Milestones (14→16) > **Prerequisite: Milestones 0–13 must be COMPLETE and PASSING. This feature layers on top of the existing working system.** > **Same rule: Do NOT skip milestones. Do NOT skip tests. Every test must PASS before moving to the next milestone.** --- ## WHAT THIS ADDS A Google Meet Chrome extension that: 1. Captures live meeting audio and transcribes it in-browser (Web Speech API — free, no API key) 2. Sends transcript chunks every 30 seconds to your existing ThirdEye FastAPI backend 3. A new backend agent extracts meeting signals (decisions, action items, blockers, risks) and stores them in ChromaDB alongside your existing chat/doc/link signals 4. New Telegram commands: `/meetsum`, `/meetask`, `/meetmatch` — summarize meetings, query them, and surface connections to prior team discussions **The integration is seamless:** Meet transcripts become first-class signals in your existing knowledge graph. When you `/meetmatch`, ThirdEye finds where your meeting decisions align with or contradict signals from your Telegram groups. --- ## PRE-WORK: Dependencies & Config Updates ### Step 0.1 — No new pip packages needed The transcription is done in the browser via the Web Speech API (Chrome built-in, free, no key). The backend receives plain text. Your existing FastAPI, ChromaDB, and provider router handle everything else. Optionally, for better transcription quality if Web Speech API is unavailable, add Groq's Whisper (free): ```bash # Optional only — only needed if you want server-side Whisper fallback # Groq already supports Whisper via your existing GROQ_API_KEY pip install python-multipart # for file upload endpoint (if using audio fallback) ``` Add to `thirdeye/requirements.txt` (optional Whisper fallback only): ``` python-multipart==0.0.20 ``` ### Step 0.2 — Add new env vars Append to `thirdeye/.env`: ```bash # Google Meet Extension MEET_INGEST_SECRET=thirdeye_meet_secret_change_me # Shared secret between extension and backend MEET_DEFAULT_GROUP_ID=meet_sessions # ChromaDB group for all meet signals ENABLE_MEET_INGESTION=true MEET_CROSS_REF_GROUPS=acme_dev,acme_product # Comma-separated group IDs to cross-reference ``` ### Step 0.3 — Update config.py Add these lines at the bottom of `thirdeye/backend/config.py`: ```python # Google Meet Extension MEET_INGEST_SECRET = os.getenv("MEET_INGEST_SECRET", "thirdeye_meet_secret_change_me") MEET_DEFAULT_GROUP_ID = os.getenv("MEET_DEFAULT_GROUP_ID", "meet_sessions") ENABLE_MEET_INGESTION = os.getenv("ENABLE_MEET_INGESTION", "true").lower() == "true" MEET_CROSS_REF_GROUPS = [ g.strip() for g in os.getenv("MEET_CROSS_REF_GROUPS", "").split(",") if g.strip() ] ``` --- ## MILESTONE 14: Google Meet Chrome Extension (120%) **Goal:** A working Chrome extension that joins any Google Meet, captures live audio via the Web Speech API, buffers the transcript, and POSTs chunks every 30 seconds to the ThirdEye backend at `/api/meet/ingest`. A small popup shows recording status and meeting ID. ### Step 14.1 — Create the extension folder ```bash mkdir -p thirdeye/meet_extension ``` ### Step 14.2 — Create manifest.json Create file: `thirdeye/meet_extension/manifest.json` ```json { "manifest_version": 3, "name": "ThirdEye Meet Recorder", "version": "1.0.0", "description": "Captures Google Meet transcripts and sends them to your ThirdEye knowledge base.", "permissions": [ "activeTab", "storage", "scripting" ], "host_permissions": [ "https://meet.google.com/*" ], "background": { "service_worker": "background.js" }, "content_scripts": [ { "matches": ["https://meet.google.com/*"], "js": ["content.js"], "run_at": "document_idle" } ], "action": { "default_popup": "popup.html", "default_title": "ThirdEye Meet Recorder" }, "icons": { "16": "icon16.png", "48": "icon48.png", "128": "icon128.png" } } ``` ### Step 14.3 — Create content.js (the core recorder) Create file: `thirdeye/meet_extension/content.js` ```javascript /** * ThirdEye Meet Recorder — Content Script * Injected into meet.google.com pages. * Uses Web Speech API (Chrome built-in) for live transcription. * Buffers transcript and POSTs chunks to ThirdEye backend every CHUNK_INTERVAL_MS. */ const CHUNK_INTERVAL_MS = 30000; // Send a chunk every 30 seconds const MAX_BUFFER_CHARS = 8000; // ~2000 tokens — safe for LLM processing let recognition = null; let isRecording = false; let transcriptBuffer = ""; let meetingId = null; let backendUrl = "http://localhost:8000"; let ingestSecret = "thirdeye_meet_secret_change_me"; let groupId = "meet_sessions"; let chunkTimer = null; let chunkCount = 0; // --- Helpers --- function getMeetingId() { // Extract meeting code from URL: meet.google.com/abc-defg-hij const match = window.location.pathname.match(/\/([a-z]{3}-[a-z]{4}-[a-z]{3})/i); if (match) return match[1]; // Fallback: use timestamp-based ID return `meet_${Date.now()}`; } function getParticipantName() { // Try to find the user's own name from the Meet DOM const nameEl = document.querySelector('[data-self-name]'); if (nameEl) return nameEl.getAttribute('data-self-name'); const profileEl = document.querySelector('img[data-iml]'); if (profileEl && profileEl.alt) return profileEl.alt; return "Unknown"; } // --- Transport --- async function sendChunkToBackend(text, isFinal = false) { if (!text || text.trim().length < 10) return; // Don't send near-empty chunks const payload = { meeting_id: meetingId, group_id: groupId, chunk_index: chunkCount++, text: text.trim(), speaker: getParticipantName(), timestamp: new Date().toISOString(), is_final: isFinal, }; try { const res = await fetch(`${backendUrl}/api/meet/ingest`, { method: "POST", headers: { "Content-Type": "application/json", "X-ThirdEye-Secret": ingestSecret, }, body: JSON.stringify(payload), }); if (!res.ok) { console.warn(`[ThirdEye] Backend rejected chunk: ${res.status}`); } else { console.log(`[ThirdEye] Chunk ${payload.chunk_index} sent (${text.length} chars)`); chrome.runtime.sendMessage({ type: "CHUNK_SENT", chunkIndex: payload.chunk_index, charCount: text.length, meetingId: meetingId, }); } } catch (err) { console.warn(`[ThirdEye] Failed to send chunk: ${err.message}`); // Buffer is NOT cleared on failure — next flush will include this text transcriptBuffer = text + "\n" + transcriptBuffer; } } // --- Periodic flush --- function flushBuffer() { if (transcriptBuffer.trim().length > 0) { sendChunkToBackend(transcriptBuffer); transcriptBuffer = ""; } } function startChunkTimer() { if (chunkTimer) clearInterval(chunkTimer); chunkTimer = setInterval(flushBuffer, CHUNK_INTERVAL_MS); } function stopChunkTimer() { if (chunkTimer) { clearInterval(chunkTimer); chunkTimer = null; } } // --- Web Speech API --- function initSpeechRecognition() { const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition; if (!SpeechRecognition) { console.error("[ThirdEye] Web Speech API not available in this browser."); chrome.runtime.sendMessage({ type: "ERROR", message: "Web Speech API not supported." }); return null; } const rec = new SpeechRecognition(); rec.continuous = true; // Don't stop after first pause rec.interimResults = true; // Get partial results while speaking rec.lang = "en-US"; // Change if needed rec.maxAlternatives = 1; rec.onstart = () => { console.log("[ThirdEye] Speech recognition started."); chrome.runtime.sendMessage({ type: "STATUS", status: "recording", meetingId }); }; rec.onresult = (event) => { let newText = ""; for (let i = event.resultIndex; i < event.results.length; i++) { const result = event.results[i]; if (result.isFinal) { newText += result[0].transcript + " "; } // We only accumulate FINAL results to avoid duplicates from interim results } if (newText.trim()) { transcriptBuffer += newText; // Guard: if buffer is getting huge, flush early if (transcriptBuffer.length > MAX_BUFFER_CHARS) { flushBuffer(); } } }; rec.onerror = (event) => { console.warn(`[ThirdEye] Speech recognition error: ${event.error}`); if (event.error === "not-allowed") { chrome.runtime.sendMessage({ type: "ERROR", message: "Microphone permission denied. Allow mic access in Chrome settings.", }); } else if (event.error !== "no-speech") { // Restart on non-fatal errors setTimeout(() => { if (isRecording) rec.start(); }, 1000); } }; rec.onend = () => { console.log("[ThirdEye] Speech recognition ended."); if (isRecording) { // Auto-restart: Chrome's Web Speech API stops after ~60s of silence setTimeout(() => { if (isRecording) rec.start(); }, 250); } }; return rec; } // --- Public controls (called from popup via chrome.tabs.sendMessage) --- async function startRecording(config = {}) { if (isRecording) return; // Load config from storage or use provided values const stored = await chrome.storage.sync.get(["backendUrl", "ingestSecret", "groupId"]); backendUrl = config.backendUrl || stored.backendUrl || "http://localhost:8000"; ingestSecret = config.ingestSecret || stored.ingestSecret || "thirdeye_meet_secret_change_me"; groupId = config.groupId || stored.groupId || "meet_sessions"; meetingId = getMeetingId(); chunkCount = 0; transcriptBuffer = ""; isRecording = true; recognition = initSpeechRecognition(); if (!recognition) { isRecording = false; return; } recognition.start(); startChunkTimer(); // Notify backend that a new meeting has started try { await fetch(`${backendUrl}/api/meet/start`, { method: "POST", headers: { "Content-Type": "application/json", "X-ThirdEye-Secret": ingestSecret, }, body: JSON.stringify({ meeting_id: meetingId, group_id: groupId, started_at: new Date().toISOString(), speaker: getParticipantName(), }), }); } catch (err) { console.warn("[ThirdEye] Could not notify backend of meeting start:", err.message); } } async function stopRecording() { if (!isRecording) return; isRecording = false; stopChunkTimer(); if (recognition) { recognition.stop(); recognition = null; } // Send the final buffered chunk marked as final await sendChunkToBackend(transcriptBuffer, true); transcriptBuffer = ""; chrome.runtime.sendMessage({ type: "STATUS", status: "stopped", meetingId }); console.log("[ThirdEye] Recording stopped."); } // --- Message listener (from popup) --- chrome.runtime.onMessage.addListener((msg, sender, sendResponse) => { if (msg.type === "START_RECORDING") { startRecording(msg.config || {}).then(() => sendResponse({ ok: true })); return true; // async } if (msg.type === "STOP_RECORDING") { stopRecording().then(() => sendResponse({ ok: true })); return true; } if (msg.type === "GET_STATUS") { sendResponse({ isRecording, meetingId, bufferLength: transcriptBuffer.length, chunkCount, }); return true; } }); console.log("[ThirdEye] Content script loaded on", window.location.href); ``` ### Step 14.4 — Create popup.html Create file: `thirdeye/meet_extension/popup.html` ```html ThirdEye Meet
Meet
``` ### Step 14.5 — Create popup.js Create file: `thirdeye/meet_extension/popup.js` ```javascript let currentTab = null; let isRecording = false; async function getActiveTab() { const tabs = await chrome.tabs.query({ active: true, currentWindow: true }); return tabs[0] || null; } function isMeetTab(tab) { return tab && tab.url && tab.url.includes("meet.google.com"); } async function loadSettings() { const stored = await chrome.storage.sync.get(["backendUrl", "ingestSecret", "groupId"]); document.getElementById("backend-url").value = stored.backendUrl || "http://localhost:8000"; document.getElementById("ingest-secret").value = stored.ingestSecret || ""; document.getElementById("group-id").value = stored.groupId || "meet_sessions"; } async function saveSettings() { const backendUrl = document.getElementById("backend-url").value.trim(); const ingestSecret = document.getElementById("ingest-secret").value.trim(); const groupId = document.getElementById("group-id").value.trim(); await chrome.storage.sync.set({ backendUrl, ingestSecret, groupId }); const btn = document.getElementById("save-btn"); btn.textContent = "✓ Saved"; setTimeout(() => { btn.textContent = "Save Settings"; }, 1500); } function setRecordingUI(recording, meetingId, chunks) { isRecording = recording; const dot = document.getElementById("status-dot"); const label = document.getElementById("status-label"); const btn = document.getElementById("main-btn"); const meetLabel = document.getElementById("meeting-id-label"); const chunksLabel = document.getElementById("chunks-label"); dot.className = "status-dot " + (recording ? "recording" : "idle"); label.textContent = recording ? "Recording…" : "Idle"; btn.className = "btn " + (recording ? "btn-stop" : "btn-start"); btn.textContent = recording ? "■ Stop Recording" : "▶ Start Recording"; meetLabel.textContent = meetingId ? `Meeting ID: ${meetingId}` : "Meeting ID: —"; chunksLabel.textContent = `${chunks || 0} chunks sent`; } async function getTabStatus() { if (!currentTab) return; try { const status = await chrome.tabs.sendMessage(currentTab.id, { type: "GET_STATUS" }); setRecordingUI(status.isRecording, status.meetingId, status.chunkCount); } catch { setRecordingUI(false, null, 0); } } async function handleMainBtn() { if (!currentTab) return; const stored = await chrome.storage.sync.get(["backendUrl", "ingestSecret", "groupId"]); if (!isRecording) { await chrome.tabs.sendMessage(currentTab.id, { type: "START_RECORDING", config: { backendUrl: stored.backendUrl || "http://localhost:8000", ingestSecret: stored.ingestSecret || "thirdeye_meet_secret_change_me", groupId: stored.groupId || "meet_sessions", }, }); } else { await chrome.tabs.sendMessage(currentTab.id, { type: "STOP_RECORDING" }); } // Poll for updated status setTimeout(getTabStatus, 500); } // Listen for status updates from content script chrome.runtime.onMessage.addListener((msg) => { if (msg.type === "STATUS") { setRecordingUI(msg.status === "recording", msg.meetingId, null); } if (msg.type === "CHUNK_SENT") { document.getElementById("chunks-label").textContent = `${msg.chunkIndex + 1} chunks sent`; document.getElementById("meeting-id-label").textContent = `Meeting ID: ${msg.meetingId}`; } }); // Init document.addEventListener("DOMContentLoaded", async () => { currentTab = await getActiveTab(); if (!isMeetTab(currentTab)) { document.getElementById("not-meet").style.display = "block"; document.getElementById("meet-ui").style.display = "none"; return; } document.getElementById("meet-ui").style.display = "block"; document.getElementById("not-meet").style.display = "none"; await loadSettings(); await getTabStatus(); document.getElementById("main-btn").addEventListener("click", handleMainBtn); document.getElementById("save-btn").addEventListener("click", saveSettings); // Poll status every 5s while popup is open setInterval(getTabStatus, 5000); }); ``` ### Step 14.6 — Create background.js Create file: `thirdeye/meet_extension/background.js` ```javascript /** * ThirdEye Meet — Service Worker (Manifest V3 background) * Minimal: just keeps the extension alive and relays messages. */ chrome.runtime.onInstalled.addListener(() => { console.log("[ThirdEye] Meet extension installed."); }); ``` ### Step 14.7 — Create placeholder icons ```bash # Create simple placeholder icons (replace with real icons before production) # These are 1x1 pixel transparent PNGs encoded as base64 python3 -c " import base64, os # Minimal 1x1 white PNG PNG_1x1 = base64.b64decode( 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8/5+hHgAHggJ/PchI6QAAAABJRU5ErkJggg==' ) for size in [16, 48, 128]: with open(f'meet_extension/icon{size}.png', 'wb') as f: f.write(PNG_1x1) print(f' Created icon{size}.png (placeholder)') " ``` > **Note:** Replace icons with proper 16×16, 48×48, and 128×128 images before final demo. Use any PNG with a lens/eye motif. ### Step 14.8 — Add the `/api/meet/ingest` and `/api/meet/start` endpoints to FastAPI Add to `thirdeye/backend/api/routes.py` (append after existing routes): ```python # ───────────────────────────────────────────── # Google Meet Ingestion Endpoints # ───────────────────────────────────────────── from pydantic import BaseModel from backend.config import MEET_INGEST_SECRET, ENABLE_MEET_INGESTION class MeetStartPayload(BaseModel): meeting_id: str group_id: str = "meet_sessions" started_at: str speaker: str = "Unknown" class MeetChunkPayload(BaseModel): meeting_id: str group_id: str = "meet_sessions" chunk_index: int text: str speaker: str = "Unknown" timestamp: str is_final: bool = False def _verify_meet_secret(request: Request): secret = request.headers.get("X-ThirdEye-Secret", "") if secret != MEET_INGEST_SECRET: from fastapi import HTTPException raise HTTPException(status_code=403, detail="Invalid Meet ingest secret") @app.post("/api/meet/start") async def meet_start(payload: MeetStartPayload, request: Request): """Called by extension when a new meeting begins.""" _verify_meet_secret(request) if not ENABLE_MEET_INGESTION: return {"ok": False, "reason": "Meet ingestion disabled"} # Store a meeting-started signal immediately from backend.db.chroma import store_signals from datetime import datetime import uuid signal = { "id": str(uuid.uuid4()), "type": "meet_started", "summary": f"Meeting {payload.meeting_id} started by {payload.speaker}", "raw_quote": "", "severity": "low", "status": "active", "sentiment": "neutral", "urgency": "none", "entities": [f"@{payload.speaker}", f"#{payload.meeting_id}"], "keywords": ["meeting", "started", payload.meeting_id], "timestamp": payload.started_at, "group_id": payload.group_id, "lens": "meet", "meeting_id": payload.meeting_id, } store_signals(payload.group_id, [signal]) return {"ok": True, "meeting_id": payload.meeting_id} @app.post("/api/meet/ingest") async def meet_ingest(payload: MeetChunkPayload, request: Request, background_tasks: BackgroundTasks): """Called by extension every 30s with a transcript chunk.""" _verify_meet_secret(request) if not ENABLE_MEET_INGESTION: return {"ok": False, "reason": "Meet ingestion disabled"} if len(payload.text.strip()) < 10: return {"ok": True, "skipped": True, "reason": "chunk too short"} # Process asynchronously so the extension gets a fast response from backend.agents.meet_ingestor import process_meet_chunk background_tasks.add_task( process_meet_chunk, payload.meeting_id, payload.group_id, payload.chunk_index, payload.text, payload.speaker, payload.timestamp, payload.is_final, ) return { "ok": True, "meeting_id": payload.meeting_id, "chunk_index": payload.chunk_index, "queued": True, } @app.get("/api/meet/meetings") async def list_meetings(): """List all recorded meetings with their signal counts.""" from backend.db.chroma import get_all_signals from backend.config import MEET_DEFAULT_GROUP_ID all_signals = get_all_signals(MEET_DEFAULT_GROUP_ID) meetings = {} for sig in all_signals: mid = sig.get("meeting_id") or sig.get("metadata", {}).get("meeting_id", "unknown") if mid not in meetings: meetings[mid] = {"meeting_id": mid, "signal_count": 0, "types": {}} meetings[mid]["signal_count"] += 1 t = sig.get("type", "unknown") meetings[mid]["types"][t] = meetings[mid]["types"].get(t, 0) + 1 return {"meetings": list(meetings.values())} @app.get("/api/meet/meetings/{meeting_id}/signals") async def get_meeting_signals(meeting_id: str): """Get all signals for a specific meeting.""" from backend.db.chroma import query_signals from backend.config import MEET_DEFAULT_GROUP_ID results = query_signals(MEET_DEFAULT_GROUP_ID, meeting_id, n_results=50) return {"meeting_id": meeting_id, "signals": results} ``` > **Important:** If you're importing `Request` and `BackgroundTasks` from FastAPI, make sure these are already imported in `routes.py`. Add them to the imports block at the top if needed: > > ```python > from fastapi import BackgroundTasks, Request > ``` ### Step 14.9 — Load the extension in Chrome ``` 1. Open Chrome → chrome://extensions/ 2. Enable "Developer mode" (top-right toggle) 3. Click "Load unpacked" 4. Select the thirdeye/meet_extension/ folder 5. The ThirdEye Meet extension icon should appear in your toolbar 6. Open any Google Meet → click the extension icon ``` ### ✅ TEST MILESTONE 14 Create file: `thirdeye/scripts/test_m14.py` ```python """ Test Milestone 14: Meet extension backend endpoints. Tests the /api/meet/start and /api/meet/ingest endpoints directly (no Chrome needed). """ import asyncio import os import sys import json sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) async def test_manifest_valid(): """Test extension manifest.json is valid.""" import json manifest_path = os.path.join( os.path.dirname(__file__), '..', 'meet_extension', 'manifest.json' ) assert os.path.exists(manifest_path), "manifest.json not found at meet_extension/manifest.json" with open(manifest_path) as f: manifest = json.load(f) assert manifest.get("manifest_version") == 3, "Must be Manifest V3" assert "meet.google.com" in str(manifest.get("host_permissions", [])), \ "Missing meet.google.com host permission" assert manifest.get("background", {}).get("service_worker"), "Missing service_worker" assert manifest.get("action", {}).get("default_popup"), "Missing popup" print(" ✅ manifest.json is valid MV3") async def test_extension_files_exist(): """Test all required extension files exist.""" base = os.path.join(os.path.dirname(__file__), '..', 'meet_extension') required = ["manifest.json", "content.js", "popup.html", "popup.js", "background.js"] for filename in required: path = os.path.join(base, filename) assert os.path.exists(path), f"Missing extension file: {filename}" print(f" ✅ {filename} exists") async def test_meet_start_endpoint(): """Test /api/meet/start returns 200 with correct secret.""" import httpx from backend.config import MEET_INGEST_SECRET async with httpx.AsyncClient(base_url="http://localhost:8000") as client: # Test with correct secret resp = await client.post( "/api/meet/start", json={ "meeting_id": "test-meet-m14", "group_id": "test_meet_m14", "started_at": "2026-03-21T10:00:00Z", "speaker": "Test User", }, headers={"X-ThirdEye-Secret": MEET_INGEST_SECRET}, ) assert resp.status_code == 200, f"Expected 200, got {resp.status_code}: {resp.text}" data = resp.json() assert data.get("ok") is True print(f" ✅ /api/meet/start returned ok=True for meeting test-meet-m14") # Test with wrong secret → should get 403 resp_bad = await client.post( "/api/meet/start", json={"meeting_id": "fake", "group_id": "fake", "started_at": "2026-03-21T10:00:00Z"}, headers={"X-ThirdEye-Secret": "wrong_secret"}, ) assert resp_bad.status_code == 403, f"Expected 403 for bad secret, got {resp_bad.status_code}" print(" ✅ Bad secret correctly rejected with 403") async def test_meet_ingest_endpoint(): """Test /api/meet/ingest accepts a transcript chunk and queues processing.""" import httpx from backend.config import MEET_INGEST_SECRET async with httpx.AsyncClient(base_url="http://localhost:8000") as client: resp = await client.post( "/api/meet/ingest", json={ "meeting_id": "test-meet-m14", "group_id": "test_meet_m14", "chunk_index": 0, "text": "We decided to go with PostgreSQL for the primary database. " "Alex will set up the schema by Thursday. " "The migration scripts need to be reviewed before deployment.", "speaker": "Test User", "timestamp": "2026-03-21T10:01:00Z", "is_final": False, }, headers={"X-ThirdEye-Secret": MEET_INGEST_SECRET}, timeout=10.0, ) assert resp.status_code == 200, f"Expected 200, got {resp.status_code}: {resp.text}" data = resp.json() assert data.get("ok") is True assert data.get("queued") is True print(f" ✅ /api/meet/ingest chunk accepted and queued") # Wait briefly for background task to process await asyncio.sleep(5) # Verify the meeting appears in /api/meet/meetings async with httpx.AsyncClient(base_url="http://localhost:8000") as client: resp = await client.get("/api/meet/meetings") assert resp.status_code == 200 data = resp.json() meetings = data.get("meetings", []) ids = [m["meeting_id"] for m in meetings] assert "test-meet-m14" in ids, f"Meeting test-meet-m14 not found in {ids}" print(f" ✅ Meeting test-meet-m14 visible in /api/meet/meetings") async def test_meet_skip_short_chunk(): """Test that very short chunks are gracefully skipped.""" import httpx from backend.config import MEET_INGEST_SECRET async with httpx.AsyncClient(base_url="http://localhost:8000") as client: resp = await client.post( "/api/meet/ingest", json={ "meeting_id": "test-meet-m14", "group_id": "test_meet_m14", "chunk_index": 99, "text": "Uh", # Too short "speaker": "Test User", "timestamp": "2026-03-21T10:02:00Z", "is_final": False, }, headers={"X-ThirdEye-Secret": MEET_INGEST_SECRET}, ) assert resp.status_code == 200 data = resp.json() assert data.get("skipped") is True, "Expected short chunk to be skipped" print(f" ✅ Short chunk correctly skipped") async def main(): print("Running Milestone 14 tests...\n") print("NOTE: The FastAPI server must be running: python run_api.py\n") await test_manifest_valid() await test_extension_files_exist() try: await test_meet_start_endpoint() await test_meet_ingest_endpoint() await test_meet_skip_short_chunk() print("\n🎉 MILESTONE 14 PASSED — Extension files valid, backend endpoints working") except Exception as e: print(f"\n💥 MILESTONE 14 FAILED: {e}") print(" Make sure: python run_api.py is running before running this test") raise asyncio.run(main()) ``` Run: `cd thirdeye && python run_api.py & # in one terminal` Then: `cd thirdeye && python scripts/test_m14.py` **Expected output:** ``` ✅ manifest.json is valid MV3 ✅ content.js exists ✅ popup.html exists ✅ popup.js exists ✅ background.js exists ✅ /api/meet/start returned ok=True for meeting test-meet-m14 ✅ Bad secret correctly rejected with 403 ✅ /api/meet/ingest chunk accepted and queued ✅ Meeting test-meet-m14 visible in /api/meet/meetings ✅ Short chunk correctly skipped 🎉 MILESTONE 14 PASSED — Extension files valid, backend endpoints working ``` --- ## MILESTONE 15: Meet Transcript Processing Agent (125%) **Goal:** A new `meet_ingestor.py` agent that processes raw transcript chunks — extracting meeting-specific signals (decisions, action items, blockers, open questions, risks) — and stores them in ChromaDB. These signals are queryable alongside all existing chat/document/link signals. A final-chunk handler produces a full meeting summary signal. ### Step 15.1 — Create the Meet Ingestor agent Create file: `thirdeye/backend/agents/meet_ingestor.py` ```python """ Meet Ingestor Agent Processes raw Google Meet transcript chunks and extracts structured signals. Signal types produced: meet_decision — A decision made during the meeting meet_action_item — A task assigned to someone meet_blocker — A blocker or dependency raised meet_risk — A risk or concern identified meet_open_q — An unresolved question left open meet_summary — Full meeting summary (emitted on is_final=True) meet_chunk_raw — Raw transcript chunk (always stored, for full-text search) """ import asyncio import json import logging import uuid from datetime import datetime from backend.providers import call_llm from backend.db.chroma import store_signals logger = logging.getLogger("thirdeye.agents.meet_ingestor") # ─── Extraction prompt ─────────────────────────────────────────────────────── EXTRACTION_SYSTEM_PROMPT = """You are an expert meeting analyst. You receive raw transcript chunks from a Google Meet recording and extract structured signals. Extract ONLY signals that are clearly present. Do NOT hallucinate or infer beyond what is stated. Return ONLY a valid JSON object with this exact structure: { "decisions": [ {"text": "...", "owner": "@name or null", "confidence": "high|medium|low"} ], "action_items": [ {"text": "...", "owner": "@name or null", "due": "date string or null", "confidence": "high|medium|low"} ], "blockers": [ {"text": "...", "blocking_what": "...", "confidence": "high|medium|low"} ], "risks": [ {"text": "...", "severity": "high|medium|low", "confidence": "high|medium|low"} ], "open_questions": [ {"text": "...", "confidence": "high|medium|low"} ] } Rules: - If a category has nothing, use an empty array [] - owner must start with @ if it's a person's name (e.g. "@Alex") - text must be a clear, standalone sentence — not a fragment - Only include confidence "high" if the signal is unambiguous - Do NOT reproduce filler words, pleasantries, or off-topic banter - Return JSON only — no markdown, no preamble, no explanation""" SUMMARY_SYSTEM_PROMPT = """You are a meeting intelligence expert. Given a full meeting transcript (possibly from multiple chunks), write a concise but complete meeting summary. Structure your summary as: 1. One-sentence overview (what was the meeting about) 2. Key decisions made (bullet points, max 5) 3. Action items assigned (who does what by when) 4. Blockers or risks raised 5. Open questions still unresolved Keep the summary under 400 words. Be specific. Use names when available. Do NOT use filler phrases like "the team discussed" — just state what was decided/agreed/assigned.""" # ─── Signal builder ───────────────────────────────────────────────────────── def _build_signal( signal_type: str, summary: str, raw_quote: str, severity: str, entities: list[str], keywords: list[str], timestamp: str, group_id: str, meeting_id: str, urgency: str = "none", status: str = "open", ) -> dict: return { "id": str(uuid.uuid4()), "type": signal_type, "summary": summary, "raw_quote": raw_quote[:500] if raw_quote else "", "severity": severity, "status": status, "sentiment": "neutral", "urgency": urgency, "entities": entities, "keywords": keywords, "timestamp": timestamp, "group_id": group_id, "lens": "meet", "meeting_id": meeting_id, } def _extract_entities(text: str, owner: str = None) -> list[str]: """Extract entity strings from text (names starting with @).""" import re entities = re.findall(r"@[\w]+", text) if owner and owner.startswith("@"): entities.append(owner) return list(set(entities)) def _extract_keywords(text: str) -> list[str]: """Simple keyword extraction: lowercase meaningful words.""" stopwords = {"the", "a", "an", "is", "are", "was", "were", "will", "to", "of", "in", "on", "at", "for", "by", "with", "this", "that", "and", "or", "but", "we", "i", "it", "be", "do", "have", "has", "had", "not"} words = text.lower().split() keywords = [w.strip(".,!?;:\"'") for w in words if len(w) > 3 and w not in stopwords] return list(dict.fromkeys(keywords))[:10] # deduplicate, keep first 10 # ─── Main processing function ──────────────────────────────────────────────── async def process_meet_chunk( meeting_id: str, group_id: str, chunk_index: int, text: str, speaker: str, timestamp: str, is_final: bool, ): """ Full pipeline for a transcript chunk: 1. Always store raw chunk for full-text search 2. Extract structured signals via LLM 3. If is_final, generate a full meeting summary """ logger.info(f"Processing meet chunk {chunk_index} for meeting {meeting_id} ({len(text)} chars)") signals_to_store = [] # 1. Always store the raw chunk (enables full-text similarity search later) raw_signal = _build_signal( signal_type="meet_chunk_raw", summary=f"[{meeting_id}] Chunk {chunk_index}: {text[:120]}...", raw_quote=text, severity="low", entities=[f"@{speaker}"] if speaker and speaker != "Unknown" else [], keywords=_extract_keywords(text), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, ) signals_to_store.append(raw_signal) # 2. Extract structured signals via LLM try: result = await call_llm( task_type="fast_large", messages=[ {"role": "system", "content": EXTRACTION_SYSTEM_PROMPT}, {"role": "user", "content": f"Transcript chunk from speaker '{speaker}':\n\n{text}"}, ], temperature=0.1, max_tokens=1500, response_format={"type": "json_object"}, ) raw_json = result["content"].strip() # Strip markdown code fences if present if raw_json.startswith("```"): raw_json = raw_json.split("```")[1] if raw_json.startswith("json"): raw_json = raw_json[4:] extracted = json.loads(raw_json) except Exception as e: logger.warning(f"Meet extraction LLM failed for chunk {chunk_index}: {e}") extracted = {} # Decisions for item in extracted.get("decisions", []): if item.get("confidence") in ("high", "medium"): signals_to_store.append(_build_signal( signal_type="meet_decision", summary=item["text"], raw_quote=item["text"], severity="medium", entities=_extract_entities(item["text"], item.get("owner")), keywords=_extract_keywords(item["text"]), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, status="decided", )) # Action items for item in extracted.get("action_items", []): if item.get("confidence") in ("high", "medium"): due_str = f" Due: {item['due']}." if item.get("due") else "" signals_to_store.append(_build_signal( signal_type="meet_action_item", summary=f"{item['text']}{due_str}", raw_quote=item["text"], severity="medium", entities=_extract_entities(item["text"], item.get("owner")), keywords=_extract_keywords(item["text"]), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, urgency="medium" if item.get("due") else "low", status="open", )) # Blockers for item in extracted.get("blockers", []): if item.get("confidence") in ("high", "medium"): signals_to_store.append(_build_signal( signal_type="meet_blocker", summary=item["text"], raw_quote=item["text"], severity="high", entities=_extract_entities(item["text"]), keywords=_extract_keywords(item["text"]), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, urgency="high", status="open", )) # Risks for item in extracted.get("risks", []): signals_to_store.append(_build_signal( signal_type="meet_risk", summary=item["text"], raw_quote=item["text"], severity=item.get("severity", "medium"), entities=_extract_entities(item["text"]), keywords=_extract_keywords(item["text"]), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, urgency="medium", status="open", )) # Open questions for item in extracted.get("open_questions", []): if item.get("confidence") in ("high", "medium"): signals_to_store.append(_build_signal( signal_type="meet_open_q", summary=item["text"], raw_quote=item["text"], severity="low", entities=_extract_entities(item["text"]), keywords=_extract_keywords(item["text"]), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, status="open", )) # 3. If this is the final chunk, generate a meeting summary if is_final: summary_signal = await _generate_meeting_summary( meeting_id, group_id, text, speaker, timestamp ) if summary_signal: signals_to_store.append(summary_signal) # Store everything if signals_to_store: store_signals(group_id, signals_to_store) logger.info( f"Stored {len(signals_to_store)} signals for meeting {meeting_id} chunk {chunk_index}" ) return signals_to_store async def _generate_meeting_summary( meeting_id: str, group_id: str, final_chunk_text: str, speaker: str, timestamp: str, ) -> dict | None: """ Pull all raw chunks for this meeting from ChromaDB and generate a summary. Falls back to summarizing just the final chunk if retrieval fails. """ from backend.db.chroma import query_signals try: # Get all raw chunks for this meeting raw_chunks = query_signals( group_id, meeting_id, n_results=50, signal_type="meet_chunk_raw", ) full_transcript = "\n\n".join( [s.get("raw_quote") or s.get("summary", "") for s in raw_chunks] ) if not full_transcript.strip(): full_transcript = final_chunk_text except Exception: full_transcript = final_chunk_text try: result = await call_llm( task_type="fast_large", messages=[ {"role": "system", "content": SUMMARY_SYSTEM_PROMPT}, { "role": "user", "content": f"Meeting ID: {meeting_id}\n\nFull transcript:\n\n{full_transcript[:6000]}", }, ], temperature=0.3, max_tokens=600, ) summary_text = result["content"].strip() except Exception as e: logger.warning(f"Meeting summary generation failed: {e}") return None return _build_signal( signal_type="meet_summary", summary=summary_text, raw_quote=full_transcript[:500], severity="medium", entities=[f"@{speaker}"] if speaker and speaker != "Unknown" else [], keywords=_extract_keywords(summary_text), timestamp=timestamp, group_id=group_id, meeting_id=meeting_id, status="completed", ) ``` ### ✅ TEST MILESTONE 15 Create file: `thirdeye/scripts/test_m15.py` ```python """ Test Milestone 15: Meet transcript processing agent. Tests signal extraction from transcript text WITHOUT needing the extension or Chrome. """ import asyncio import os import sys sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) # Sample meeting transcript (realistic, multi-topic) SAMPLE_TRANSCRIPT_1 = """ Alex: Alright, let's get started. So the main thing today is the database migration. Sam: Yeah, we've been going back and forth but I think we should just commit to PostgreSQL. It has better support for our JSON query patterns and the team already knows it. Alex: Agreed, let's make that the decision. We go with PostgreSQL. Priya: I can set up the initial schema. I'll have it ready by Thursday. Sam: Great. One thing though — the legacy MySQL tables still have some data we need to migrate. I have no idea how long that's going to take. That's a real risk. Alex: Who's owning the migration scripts? Priya: I'll do it, but I'll need the final schema signed off before I start. That's a blocker for me. Sam: When do we need the migration done by? Alex: End of sprint, so March 28th. Sam: Can we do that? I'm not sure. Alex: We'll try. Priya, can you at least start the schema this week? Priya: Yes, schema by Thursday, migration scripts next week if all goes well. """ SAMPLE_TRANSCRIPT_2 = """ Lisa: Moving on — the client dashboard is still blocked waiting on design specs. Alex: Yeah that's been two weeks now. We literally cannot start without those specs. Lisa: I know, I'll follow up with design today. That's on me. Sam: Also, the checkout endpoint is still hitting intermittent timeouts. Third time this sprint. We need to actually fix this, not just restart pods. Alex: Agreed, that needs an owner. Sam can you pick that up? Sam: Yeah I'll investigate this week. I'll add a ticket. Lisa: Any risks before we close? Priya: The OAuth integration is touching a lot of the auth layer. If something breaks there, it could affect all our users at once. High risk. Alex: Good call. Let's make sure we do that in a feature branch and have a rollback plan. """ async def test_signal_extraction_chunk_1(): """Test extraction from a decision-heavy transcript.""" from backend.agents.meet_ingestor import process_meet_chunk from backend.db.chroma import query_signals import chromadb from backend.config import CHROMA_DB_PATH group_id = "test_meet_m15_a" meeting_id = "sprint-planning-m15" print("Testing signal extraction from transcript chunk 1 (decisions + action items)...") signals = await process_meet_chunk( meeting_id=meeting_id, group_id=group_id, chunk_index=0, text=SAMPLE_TRANSCRIPT_1.strip(), speaker="Alex", timestamp="2026-03-21T10:00:00Z", is_final=False, ) assert len(signals) > 0, "Expected at least some signals to be extracted" print(f" ✅ {len(signals)} total signals produced") types = [s["type"] for s in signals] print(f" Types found: {set(types)}") # Must have at least a raw chunk assert "meet_chunk_raw" in types, "Expected raw chunk signal" print(" ✅ Raw chunk stored (enables full-text search)") # Should have extracted decisions (PostgreSQL decision is clear) decisions = [s for s in signals if s["type"] == "meet_decision"] assert len(decisions) > 0, "Expected at least one decision (PostgreSQL decision is explicit)" print(f" ✅ {len(decisions)} decision signal(s) extracted") print(f" First decision: {decisions[0]['summary'][:100]}") # Should have extracted action items (Priya - schema by Thursday) actions = [s for s in signals if s["type"] == "meet_action_item"] assert len(actions) > 0, "Expected at least one action item (Priya - schema by Thursday)" print(f" ✅ {len(actions)} action item(s) extracted") print(f" First action: {actions[0]['summary'][:100]}") # Verify signals are in ChromaDB results = query_signals(group_id, "database decision PostgreSQL") assert len(results) > 0, "Expected signals to be queryable from ChromaDB" print(f" ✅ Signals queryable from ChromaDB ({len(results)} results for 'database decision PostgreSQL')") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) try: client.delete_collection(f"ll_{group_id}") except Exception: pass async def test_signal_extraction_chunk_2(): """Test extraction from a blocker + risk heavy transcript.""" from backend.agents.meet_ingestor import process_meet_chunk import chromadb from backend.config import CHROMA_DB_PATH group_id = "test_meet_m15_b" meeting_id = "standup-m15" print("\nTesting signal extraction from transcript chunk 2 (blockers + risks)...") signals = await process_meet_chunk( meeting_id=meeting_id, group_id=group_id, chunk_index=0, text=SAMPLE_TRANSCRIPT_2.strip(), speaker="Lisa", timestamp="2026-03-21T10:30:00Z", is_final=False, ) types = [s["type"] for s in signals] print(f" Types found: {set(types)}") blockers = [s for s in signals if s["type"] == "meet_blocker"] risks = [s for s in signals if s["type"] == "meet_risk"] assert len(blockers) > 0, "Expected at least one blocker (dashboard blocked on design specs)" print(f" ✅ {len(blockers)} blocker(s) extracted") print(f" First blocker: {blockers[0]['summary'][:100]}") assert len(risks) > 0, "Expected at least one risk (OAuth touching auth layer)" print(f" ✅ {len(risks)} risk(s) extracted") print(f" First risk: {risks[0]['summary'][:100]}") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) try: client.delete_collection(f"ll_{group_id}") except Exception: pass async def test_final_chunk_generates_summary(): """Test that is_final=True triggers a summary signal generation.""" from backend.agents.meet_ingestor import process_meet_chunk import chromadb from backend.config import CHROMA_DB_PATH group_id = "test_meet_m15_c" meeting_id = "full-meeting-m15" print("\nTesting final chunk triggers meeting summary...") # First chunk await process_meet_chunk( meeting_id=meeting_id, group_id=group_id, chunk_index=0, text=SAMPLE_TRANSCRIPT_1.strip(), speaker="Alex", timestamp="2026-03-21T10:00:00Z", is_final=False, ) # Final chunk signals = await process_meet_chunk( meeting_id=meeting_id, group_id=group_id, chunk_index=1, text=SAMPLE_TRANSCRIPT_2.strip(), speaker="Lisa", timestamp="2026-03-21T10:30:00Z", is_final=True, ) types = [s["type"] for s in signals] assert "meet_summary" in types, "Expected a meet_summary signal on is_final=True" summary_sig = next(s for s in signals if s["type"] == "meet_summary") assert len(summary_sig["summary"]) > 50, "Summary should be at least 50 chars" print(f" ✅ Meeting summary generated ({len(summary_sig['summary'])} chars)") print(f" Preview: {summary_sig['summary'][:150]}...") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) try: client.delete_collection(f"ll_{group_id}") except Exception: pass async def test_signals_coexist_with_chat_signals(): """Test that meet signals are queryable alongside existing chat signals.""" from backend.agents.meet_ingestor import process_meet_chunk from backend.pipeline import process_message_batch, query_knowledge, set_lens import chromadb from backend.config import CHROMA_DB_PATH group_id = "test_meet_m15_d" meeting_id = "integration-test-m15" set_lens(group_id, "dev") print("\nTesting meet signals + chat signals coexist...") # Add chat signals chat_messages = [ {"sender": "Alex", "text": "The team agreed in a previous meeting we'd use Redis for caching.", "timestamp": "2026-03-20T09:00:00Z"}, {"sender": "Priya", "text": "The timeout bug on checkout is still unresolved from last sprint.", "timestamp": "2026-03-20T09:05:00Z"}, ] await process_message_batch(group_id, chat_messages) print(" ✅ Chat signals stored") # Add meet signals await process_meet_chunk( meeting_id=meeting_id, group_id=group_id, chunk_index=0, text="We decided in today's meeting to switch from Redis to Memcached for the caching layer. Sam will update the config by Friday.", speaker="Alex", timestamp="2026-03-21T10:00:00Z", is_final=False, ) print(" ✅ Meet signals stored") # Query across both answer = await query_knowledge(group_id, "What did we decide about caching?") assert len(answer) > 20, "Expected a substantive answer about caching" print(f" ✅ Query across chat + meet: {answer[:120]}...") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) try: client.delete_collection(f"ll_{group_id}") except Exception: pass async def main(): print("Running Milestone 15 tests...\n") await test_signal_extraction_chunk_1() await test_signal_extraction_chunk_2() await test_final_chunk_generates_summary() await test_signals_coexist_with_chat_signals() print("\n🎉 MILESTONE 15 PASSED — Meet transcript agent extracting and storing signals correctly") asyncio.run(main()) ``` Run: `cd thirdeye && python scripts/test_m15.py` **Expected output:** All ✅ checks. Decisions, action items, blockers, and risks extracted correctly. Final chunk generates a summary. Meet signals coexist and are queryable alongside chat signals. ``` 🎉 MILESTONE 15 PASSED — Meet transcript agent extracting and storing signals correctly ``` --- ## MILESTONE 16: Telegram Meet Commands + Cross-Reference (130%) **Goal:** Three new Telegram commands that query your meeting knowledge and surface connections to prior Telegram group discussions: - `/meetsum [meeting_id]` — summarizes a meeting (or the most recent one) - `/meetask [meeting_id] [question]` — asks a specific question about a meeting - `/meetmatch` — finds where your meeting signals align with OR contradict signals from your monitored Telegram groups (the big wow moment) ### Step 16.1 — Create the meet cross-reference agent Create file: `thirdeye/backend/agents/meet_cross_ref.py` ```python """ Meet Cross-Reference Agent Finds connections between meeting signals and existing Telegram group signals. Surfaces: confirmations (meeting agrees with chat), contradictions (meeting contradicts chat), and blind spots (meeting discusses something chat groups don't know about). """ import logging from backend.providers import call_llm from backend.db.chroma import query_signals, get_all_signals from backend.config import MEET_CROSS_REF_GROUPS, MEET_DEFAULT_GROUP_ID logger = logging.getLogger("thirdeye.agents.meet_cross_ref") CROSS_REF_SYSTEM_PROMPT = """You are an expert at finding connections between meeting discussions and team chat history. You will receive: 1. MEETING SIGNALS — decisions, action items, blockers, risks from a recent Google Meet 2. CHAT SIGNALS — existing signals from team Telegram groups Find meaningful connections across three categories: CONFIRMATIONS: Meeting agrees with or reinforces something from chat history CONTRADICTIONS: Meeting decision conflicts with what was said/decided in chat BLIND SPOTS: Important things from the meeting that the chat teams don't seem to know about Return ONLY a valid JSON object: { "confirmations": [ {"meeting_signal": "...", "chat_signal": "...", "group": "...", "significance": "high|medium|low"} ], "contradictions": [ {"meeting_signal": "...", "chat_signal": "...", "group": "...", "impact": "...", "significance": "high|medium|low"} ], "blind_spots": [ {"meeting_signal": "...", "teams_unaware": ["group1", "group2"], "recommendation": "..."} ] } Rules: - Only include HIGH confidence matches — do not stretch for weak connections - Keep each signal description concise (1 sentence max) - significance "high" = this matters for team alignment; "medium" = worth noting; "low" = minor - If a category has nothing meaningful, use an empty array [] - Return JSON only""" async def find_cross_references( meeting_id: str, group_id: str = None, cross_ref_group_ids: list[str] = None, ) -> dict: """ Compare meeting signals against chat group signals. Args: meeting_id: The meeting to analyze group_id: ChromaDB group where meet signals are stored (defaults to MEET_DEFAULT_GROUP_ID) cross_ref_group_ids: Groups to compare against (defaults to MEET_CROSS_REF_GROUPS from config) Returns: Dict with confirmations, contradictions, blind_spots lists """ group_id = group_id or MEET_DEFAULT_GROUP_ID cross_ref_group_ids = cross_ref_group_ids or MEET_CROSS_REF_GROUPS if not cross_ref_group_ids: return { "confirmations": [], "contradictions": [], "blind_spots": [], "error": "No cross-reference groups configured. Set MEET_CROSS_REF_GROUPS in .env", } # 1. Get meeting signals (decisions, actions, blockers, risks — NOT raw chunks) meet_signals = query_signals(group_id, meeting_id, n_results=30) structured_meet = [ s for s in meet_signals if s.get("type") in ("meet_decision", "meet_action_item", "meet_blocker", "meet_risk", "meet_open_q") ] if not structured_meet: return { "confirmations": [], "contradictions": [], "blind_spots": [], "error": f"No structured signals found for meeting {meeting_id}. Has it been processed yet?", } # 2. Get signals from each cross-reference group chat_context_parts = [] for gid in cross_ref_group_ids: try: all_sig = get_all_signals(gid) if all_sig: formatted = "\n".join([ f" [{s.get('type', '?')}] {s.get('summary', '')[:120]}" for s in all_sig[:20] # Cap at 20 per group to stay within token limits ]) chat_context_parts.append(f"Group '{gid}':\n{formatted}") except Exception as e: logger.warning(f"Could not load signals for group {gid}: {e}") if not chat_context_parts: return { "confirmations": [], "contradictions": [], "blind_spots": [], "error": "Could not load any signals from cross-reference groups.", } # 3. Format inputs for LLM meet_text = "\n".join([ f" [{s['type']}] {s['summary'][:150]}" for s in structured_meet ]) chat_text = "\n\n".join(chat_context_parts) prompt = f"""MEETING SIGNALS (from meeting: {meeting_id}): {meet_text} CHAT SIGNALS (from monitored Telegram groups): {chat_text}""" try: import json result = await call_llm( task_type="reasoning", messages=[ {"role": "system", "content": CROSS_REF_SYSTEM_PROMPT}, {"role": "user", "content": prompt}, ], temperature=0.2, max_tokens=1500, response_format={"type": "json_object"}, ) raw = result["content"].strip() if raw.startswith("```"): raw = raw.split("```")[1] if raw.startswith("json"): raw = raw[4:] return json.loads(raw) except Exception as e: logger.error(f"Cross-reference LLM call failed: {e}") return { "confirmations": [], "contradictions": [], "blind_spots": [], "error": str(e), } def format_cross_ref_for_telegram(analysis: dict, meeting_id: str) -> str: """Format cross-reference results as a Telegram message.""" parts = [f"🔗 *Meet ↔ Chat Cross-Reference*\nMeeting: `{meeting_id}`\n"] if analysis.get("error"): return f"⚠️ Cross-reference failed: {analysis['error']}" confirmations = analysis.get("confirmations", []) contradictions = analysis.get("contradictions", []) blind_spots = analysis.get("blind_spots", []) if not confirmations and not contradictions and not blind_spots: return f"🔗 *Meet ↔ Chat Cross-Reference*\nMeeting `{meeting_id}`: No significant connections found between this meeting and your chat groups." if confirmations: parts.append(f"✅ *Confirmations* ({len(confirmations)})") for c in confirmations[:3]: # Cap at 3 for readability sig = "🔴" if c.get("significance") == "high" else "🟡" parts.append(f"{sig} Meeting: _{c['meeting_signal'][:100]}_") parts.append(f" Matches [{c.get('group', '?')}]: _{c['chat_signal'][:100]}_\n") if contradictions: parts.append(f"⚡ *Contradictions* ({len(contradictions)}) — ACTION NEEDED") for c in contradictions[:3]: parts.append(f"🔴 Meeting decided: _{c['meeting_signal'][:100]}_") parts.append(f" BUT [{c.get('group', '?')}] says: _{c['chat_signal'][:100]}_") if c.get("impact"): parts.append(f" Impact: {c['impact'][:100]}\n") if blind_spots: parts.append(f"🔦 *Blind Spots* ({len(blind_spots)}) — Teams may not know") for b in blind_spots[:3]: parts.append(f"🟠 {b['meeting_signal'][:120]}") if b.get("recommendation"): parts.append(f" → {b['recommendation'][:100]}\n") return "\n".join(parts) ``` ### Step 16.2 — Add new Telegram commands to bot.py Add to `thirdeye/backend/bot/commands.py` (or wherever your command handlers live): ```python # ───────────────────────────────────────────── # Meet Commands — add to existing commands.py # ───────────────────────────────────────────── async def cmd_meetsum(update, context): """ /meetsum [meeting_id] Summarizes a meeting. Uses the most recent meeting if no ID given. """ from backend.db.chroma import query_signals from backend.config import MEET_DEFAULT_GROUP_ID args = context.args or [] group_id = MEET_DEFAULT_GROUP_ID if args: meeting_id = args[0] else: # Find the most recent meeting ID from meet_summary signals all_summaries = query_signals(group_id, "meeting summary", n_results=5, signal_type="meet_summary") if not all_summaries: await update.message.reply_text( "📭 No meeting summaries found yet. Record a meeting with the ThirdEye Meet extension first." ) return # The most recent signal (by position in results, or pick first) meeting_id = all_summaries[0].get("meeting_id") or all_summaries[0].get("metadata", {}).get("meeting_id") if not meeting_id: await update.message.reply_text("Usage: /meetsum [meeting_id]") return # Fetch summary signal signals = query_signals(group_id, meeting_id, n_results=20) summary_signal = next( (s for s in signals if s.get("type") == "meet_summary"), None ) if summary_signal: await update.message.reply_text( f"📋 *Meeting Summary*\nID: `{meeting_id}`\n\n{summary_signal['summary']}", parse_mode="Markdown", ) return # No summary yet — gather structured signals and summarize on-the-fly structured = [ s for s in signals if s.get("type") in ("meet_decision", "meet_action_item", "meet_blocker", "meet_risk") ] if not structured: await update.message.reply_text( f"📭 No signals found for meeting `{meeting_id}`. " "Make sure the meeting has been recorded and processed.", parse_mode="Markdown", ) return lines = [f"📋 *Meeting Signals* for `{meeting_id}`\n"] type_labels = { "meet_decision": "✅ Decision", "meet_action_item": "📌 Action", "meet_blocker": "🚧 Blocker", "meet_risk": "⚠️ Risk", } for s in structured[:10]: label = type_labels.get(s["type"], s["type"]) lines.append(f"{label}: {s['summary'][:120]}") await update.message.reply_text("\n".join(lines), parse_mode="Markdown") async def cmd_meetask(update, context): """ /meetask [meeting_id] [question] Asks a question about a specific meeting's knowledge. """ from backend.pipeline import query_knowledge from backend.config import MEET_DEFAULT_GROUP_ID args = context.args or [] if len(args) < 2: await update.message.reply_text( "Usage: /meetask [meeting_id] [your question]\n" "Example: /meetask abc-defg-hij What did we decide about the database?" ) return meeting_id = args[0] question = " ".join(args[1:]) group_id = MEET_DEFAULT_GROUP_ID await update.message.reply_text(f"🔍 Searching meeting `{meeting_id}`...", parse_mode="Markdown") # Augment the question with meeting ID context so the RAG retrieval is scoped scoped_question = f"From meeting {meeting_id}: {question}" answer = await query_knowledge(group_id, scoped_question) await update.message.reply_text( f"🎙️ *Meet Q&A* — `{meeting_id}`\n\nQ: _{question}_\n\nA: {answer}", parse_mode="Markdown", ) async def cmd_meetmatch(update, context): """ /meetmatch [meeting_id] Finds where a meeting's decisions/blockers match or conflict with your team chat signals. """ from backend.agents.meet_cross_ref import find_cross_references, format_cross_ref_for_telegram from backend.db.chroma import query_signals from backend.config import MEET_DEFAULT_GROUP_ID args = context.args or [] group_id = MEET_DEFAULT_GROUP_ID if args: meeting_id = args[0] else: # Find most recent meeting all_summaries = query_signals(group_id, "meeting", n_results=5, signal_type="meet_summary") if not all_summaries: await update.message.reply_text( "📭 No meetings found. Record a meeting first with the ThirdEye Meet extension." ) return meeting_id = all_summaries[0].get("meeting_id", "unknown") await update.message.reply_text( f"🔗 Comparing meeting `{meeting_id}` against your team chats...\n" "_This may take a moment._", parse_mode="Markdown", ) analysis = await find_cross_references(meeting_id, group_id=group_id) formatted = format_cross_ref_for_telegram(analysis, meeting_id) await update.message.reply_text(formatted, parse_mode="Markdown") ``` ### Step 16.3 — Register the new commands in bot.py In `thirdeye/backend/bot/bot.py`, add the new handlers inside your existing application setup (alongside where `/ask`, `/digest`, etc. are registered): ```python # Add these lines where other CommandHandlers are registered: from backend.bot.commands import cmd_meetsum, cmd_meetask, cmd_meetmatch application.add_handler(CommandHandler("meetsum", cmd_meetsum)) application.add_handler(CommandHandler("meetask", cmd_meetask)) application.add_handler(CommandHandler("meetmatch", cmd_meetmatch)) ``` Also update the `/start` message to include the new commands: ```python # In your /start handler, add to the commands list: "🎙️ /meetsum [id] — Summarize a meeting\n" "🔍 /meetask [id] [q] — Ask about a meeting\n" "🔗 /meetmatch [id] — Match meeting to team chats\n" ``` ### Step 16.4 — Register commands with Telegram BotFather (optional but recommended) ``` In BotFather → /setcommands → paste: meetsum - Summarize a Google Meet recording meetask - Ask a question about a meeting meetmatch - Find connections between meeting and team chats ``` ### ✅ TEST MILESTONE 16 Create file: `thirdeye/scripts/test_m16.py` ```python """ Test Milestone 16: Meet Telegram commands and cross-reference agent. Tests command logic and cross-reference analysis without needing a live Telegram bot. """ import asyncio import os import sys import json sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) # ─── Seed data ──────────────────────────────────────────────────────────────── async def seed_meet_signals(meeting_id: str, group_id: str): """Seed a meeting with realistic signals.""" from backend.agents.meet_ingestor import process_meet_chunk transcript = """ Alex: We decided to go with PostgreSQL. Final decision, no more debate. Priya: I'll set up the schema by Thursday and own the migration. Sam: The checkout timeout bug is still blocking us. Third time this sprint. Lisa: Dashboard is still waiting on design specs. That's a two-week blocker. Alex: The OAuth refactor is risky — touches everything. High risk if we rush it. Sam: Should we delay the deploy to next week? Nobody answered. """ await process_meet_chunk( meeting_id=meeting_id, group_id=group_id, chunk_index=0, text=transcript.strip(), speaker="Alex", timestamp="2026-03-21T10:00:00Z", is_final=True, # triggers summary generation ) async def seed_chat_signals(chat_group_id: str): """Seed Telegram chat signals to cross-reference against.""" from backend.pipeline import process_message_batch, set_lens set_lens(chat_group_id, "dev") messages = [ {"sender": "Alex", "text": "We should switch to MySQL instead of PostgreSQL. It's simpler.", "timestamp": "2026-03-15T09:00:00Z"}, {"sender": "Sam", "text": "The checkout timeout is back again. Still not fixed.", "timestamp": "2026-03-18T10:00:00Z"}, {"sender": "Priya", "text": "OAuth integration is going into the main branch directly, no feature branch.", "timestamp": "2026-03-19T11:00:00Z"}, ] await process_message_batch(chat_group_id, messages) # ─── Tests ──────────────────────────────────────────────────────────────────── async def test_meetsum_logic(): """Test that meetsum can find and return a meeting summary.""" from backend.db.chroma import query_signals import chromadb from backend.config import CHROMA_DB_PATH meeting_id = "test-meet-m16-sum" group_id = "test_meet_m16_sum" print("Testing /meetsum logic...") await seed_meet_signals(meeting_id, group_id) # Simulate what /meetsum does signals = query_signals(group_id, meeting_id, n_results=20) assert len(signals) > 0, "Expected signals for the seeded meeting" print(f" ✅ {len(signals)} signals found for meeting {meeting_id}") types = {s["type"] for s in signals} print(f" Signal types: {types}") has_summary = "meet_summary" in types has_structured = any(t in types for t in ("meet_decision", "meet_action_item", "meet_blocker", "meet_risk")) assert has_summary or has_structured, "Expected summary or structured signals" print(f" ✅ Has summary: {has_summary} | Has structured signals: {has_structured}") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) try: client.delete_collection(f"ll_{group_id}") except Exception: pass async def test_meetask_logic(): """Test that meetask can answer questions about a meeting.""" from backend.pipeline import query_knowledge import chromadb from backend.config import CHROMA_DB_PATH meeting_id = "test-meet-m16-ask" group_id = "test_meet_m16_ask" print("\nTesting /meetask logic...") await seed_meet_signals(meeting_id, group_id) # Test 1: Question about decisions answer = await query_knowledge(group_id, f"From meeting {meeting_id}: What database did we decide on?") assert len(answer) > 10, "Expected a substantive answer" postgres_mentioned = "postgres" in answer.lower() or "database" in answer.lower() or "sql" in answer.lower() assert postgres_mentioned, f"Expected PostgreSQL to be mentioned. Got: {answer[:200]}" print(f" ✅ Decision query answered: {answer[:120]}...") # Test 2: Question about action items answer2 = await query_knowledge(group_id, f"From meeting {meeting_id}: Who is setting up the schema?") assert len(answer2) > 10, "Expected a substantive answer about schema ownership" print(f" ✅ Action item query answered: {answer2[:120]}...") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) try: client.delete_collection(f"ll_{group_id}") except Exception: pass async def test_meetmatch_cross_reference(): """Test cross-reference finds contradictions and confirmations.""" from backend.agents.meet_cross_ref import find_cross_references, format_cross_ref_for_telegram import chromadb from backend.config import CHROMA_DB_PATH meeting_id = "test-meet-m16-match" meet_group = "test_meet_m16_match" chat_group = "test_chat_m16_match" print("\nTesting /meetmatch cross-reference...") await seed_meet_signals(meeting_id, meet_group) await seed_chat_signals(chat_group) # Run cross-reference analysis = await find_cross_references( meeting_id=meeting_id, group_id=meet_group, cross_ref_group_ids=[chat_group], ) if analysis.get("error"): print(f" ⚠️ Cross-reference returned error: {analysis['error']}") # This is OK if the groups are empty — test passes but notes the condition else: contradictions = analysis.get("contradictions", []) confirmations = analysis.get("confirmations", []) blind_spots = analysis.get("blind_spots", []) print(f" Found: {len(contradictions)} contradiction(s), {len(confirmations)} confirmation(s), {len(blind_spots)} blind spot(s)") # We seeded a clear contradiction: meeting says PostgreSQL, chat says MySQL # And a confirmation: checkout timeout mentioned in both # At least one of these categories should have results total = len(contradictions) + len(confirmations) + len(blind_spots) assert total > 0, "Expected at least one cross-reference finding (contradiction: PostgreSQL vs MySQL)" print(f" ✅ {total} total cross-reference findings") if contradictions: print(f" ✅ Contradiction found: {contradictions[0]['meeting_signal'][:80]}") if confirmations: print(f" ✅ Confirmation found: {confirmations[0]['meeting_signal'][:80]}") # Test formatter formatted = format_cross_ref_for_telegram(analysis, meeting_id) assert len(formatted) > 20, "Expected a non-empty formatted message" assert meeting_id in formatted, "Expected meeting ID in formatted output" print(f" ✅ Telegram formatter produced {len(formatted)} char message") print(f" Preview: {formatted[:200]}...") # Cleanup client = chromadb.PersistentClient(path=CHROMA_DB_PATH) for gid in [meet_group, chat_group]: try: client.delete_collection(f"ll_{gid}") except Exception: pass async def test_command_handlers_importable(): """Test that all three command handlers can be imported without errors.""" try: from backend.bot.commands import cmd_meetsum, cmd_meetask, cmd_meetmatch print("\n ✅ cmd_meetsum importable") print(" ✅ cmd_meetask importable") print(" ✅ cmd_meetmatch importable") except ImportError as e: print(f"\n ❌ Command import failed: {e}") raise async def main(): print("Running Milestone 16 tests...\n") await test_meetsum_logic() await test_meetask_logic() await test_meetmatch_cross_reference() await test_command_handlers_importable() print("\n🎉 MILESTONE 16 PASSED — Meet commands working, cross-reference finding connections") asyncio.run(main()) ``` Run: `cd thirdeye && python scripts/test_m16.py` **Expected output:** ``` ✅ Signals found for meeting test-meet-m16-sum ✅ Has summary: True | Has structured signals: True ✅ Decision query answered: Based on the meeting, the team decided to go with PostgreSQL... ✅ Action item query answered: Priya is responsible for setting up the schema... ✅ Contradiction found: We decided to go with PostgreSQL... ✅ Telegram formatter produced 450 char message 🎉 MILESTONE 16 PASSED — Meet commands working, cross-reference finding connections ``` --- ## MILESTONE SUMMARY (Updated) | # | Milestone | What You Have | % | | ------------ | -------------------------------- | --------------------------------------------------------------------------------- | -------------- | | 0 | Scaffolding | Folders, deps, env vars, all API keys | 0% | | 1 | Provider Router | Multi-provider LLM calls with fallback | 10% | | 2 | ChromaDB + Embeddings | Store and retrieve signals with vector search | 20% | | 3 | Core Agents | Signal Extractor + Classifier + Context Detector | 30% | | 4 | Full Pipeline | Messages → Extract → Classify → Store → Query | 45% | | 5 | Intelligence Layer | Pattern detection + Cross-group analysis | 60% | | 6 | Telegram Bot | Live bot processing group messages | 70% | | 7 | FastAPI + Dashboard API | REST API serving all data | 85% | | 8 | Unified Runner | Bot + API running together | 90% | | 9 | Demo Data | 3 groups seeded with realistic data | 95% | | 10 | Polish & Demo Ready | README, rehearsed demo, everything working | 100% | | 11 | Document & PDF Ingestion | PDFs/DOCX/TXT shared in groups → chunked → stored in RAG | 105% | | 12 | Tavily Web Search | Query Agent searches web when KB is empty or question is external | 110% | | 13 | Link Fetch & Ingestion | URLs in messages → fetched → summarized → stored as signals | 115% | | **14** | **Meet Chrome Extension** | **Extension captures Meet audio → transcribes → POSTs chunks to backend** | **120%** | | **15** | **Meet Signal Processing** | **Transcript chunks → decisions/actions/blockers/risks → ChromaDB** | **125%** | | **16** | **Meet Telegram Commands** | **/meetsum, /meetask, /meetmatch with cross-reference to chat signals** | **130%** | --- ## FILE CHANGE SUMMARY ### New Files Created ``` thirdeye/meet_extension/manifest.json # Milestone 14 — Chrome extension manifest thirdeye/meet_extension/content.js # Milestone 14 — Audio capture + transcription thirdeye/meet_extension/popup.html # Milestone 14 — Extension popup UI thirdeye/meet_extension/popup.js # Milestone 14 — Popup logic thirdeye/meet_extension/background.js # Milestone 14 — Service worker thirdeye/backend/agents/meet_ingestor.py # Milestone 15 — Transcript signal extraction thirdeye/backend/agents/meet_cross_ref.py # Milestone 16 — Chat ↔ meet cross-reference thirdeye/scripts/test_m14.py # Milestone 14 test thirdeye/scripts/test_m15.py # Milestone 15 test thirdeye/scripts/test_m16.py # Milestone 16 test ``` ### Existing Files Modified ``` thirdeye/.env # Pre-work: MEET_INGEST_SECRET + new vars thirdeye/backend/config.py # Pre-work: new meet config vars thirdeye/backend/api/routes.py # M14: /api/meet/start, /api/meet/ingest, /api/meet/meetings thirdeye/backend/bot/commands.py # M16: cmd_meetsum, cmd_meetask, cmd_meetmatch thirdeye/backend/bot/bot.py # M16: register 3 new CommandHandlers ``` ### Updated Repo Structure (additions only) ``` thirdeye/ ├── meet_extension/ # NEW — Chrome extension │ ├── manifest.json │ ├── content.js # Core: audio capture + Web Speech API │ ├── popup.html │ ├── popup.js │ ├── background.js │ ├── icon16.png │ ├── icon48.png │ └── icon128.png │ ├── backend/ │ ├── agents/ │ │ ├── meet_ingestor.py # NEW — transcript → signals │ │ └── meet_cross_ref.py # NEW — meet ↔ chat cross-reference │ ├── api/ │ │ └── routes.py # MODIFIED — 3 new /api/meet/* endpoints │ └── bot/ │ ├── commands.py # MODIFIED — 3 new meet commands │ └── bot.py # MODIFIED — register meet handlers │ └── scripts/ ├── test_m14.py # NEW — extension + endpoint tests ├── test_m15.py # NEW — transcript processing tests └── test_m16.py # NEW — command + cross-reference tests ``` --- ## UPDATED COMMANDS REFERENCE ``` EXISTING (unchanged): /start — Welcome message /ask [q] — Query the knowledge base (chat + docs + links) /search [q] — Explicit web search via Tavily /digest — Intelligence summary /lens [mode] — Set/check detection lens /alerts — View active warnings NEW — Google Meet: /meetsum [id] — Summarize a meeting (omit ID for most recent) /meetask [id] [q] — Ask a question about a specific meeting /meetmatch [id] — Find connections between meeting and your team chat signals PASSIVE (no command needed): • Text messages → batched → signal extraction (existing) • Document drops → downloaded → chunked → stored (M11) • URLs in messages → fetched → summarized → stored (M13) • Meet Extension → audio → transcribed → chunked → stored every 30s (M14/M15) ``` --- ## HOW THE FULL MEET FLOW WORKS (End-to-End) ``` 1. Open Google Meet → click ThirdEye extension icon → "▶ Start Recording" 2. Web Speech API transcribes your meeting audio in-browser (no audio leaves the browser) 3. Every 30 seconds, the extension POSTs the transcript text to: POST /api/meet/ingest {meeting_id, text, speaker, chunk_index, ...} 4. Backend queues background processing → meet_ingestor.py extracts signals: • meet_decision → "Team decided to use PostgreSQL" • meet_action_item → "Priya to set up schema by Thursday" • meet_blocker → "Dashboard blocked on design specs" • meet_risk → "OAuth refactor is high risk" • meet_chunk_raw → full text stored for similarity search 5. When recording stops, the final chunk triggers meet_summary generation (pulls all chunks from ChromaDB, summarizes the full meeting) 6. In Telegram: /meetsum abc-defg-hij → Get the meeting summary /meetask abc-defg-hij [q] → Ask anything about the meeting /meetmatch → 🔥 Find where the meeting agrees or conflicts with your team chat signals ``` --- ## THE WOW MOMENT FOR /meetmatch > *"Your sprint planning decided to use PostgreSQL. But your dev team's Telegram chat from last week shows Alex proposed MySQL and nobody objected. And your product team doesn't know about the OAuth risk your tech lead raised in the meeting. ThirdEye connected the dots."* This is Cross-Group Intelligence extended to meetings — the same killer feature that makes ThirdEye unique, now spanning across your live calls AND your async chats. --- *Every milestone has a test. Every test must pass. No skipping.*