init

2026-04-21 05:31:48 +00:00 · 2026-04-05 00:43:23 +05:30
commit 8be37d3e92
425 changed files with 101853 additions and 0 deletions
--- a/thirdeye/docs/BATCHING_CONFIG.md
+++ b/thirdeye/docs/BATCHING_CONFIG.md
@@ -0,0 +1,204 @@
+# Message Batching Configuration
+
+ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.
+
+---
+
+## How Batching Works
+
+Messages are processed when either condition is met:
+1. **Buffer reaches BATCH_SIZE** messages (processes immediately)
+2. **Timer reaches BATCH_TIMEOUT_SECONDS** (processes whatever is in buffer)
+
+This prevents processing every single message individually while ensuring messages don't wait forever.
+
+---
+
+## Configuration Variables
+
+Edit `.env`:
+
+```bash
+BATCH_SIZE=5                  # Process after N messages
+BATCH_TIMEOUT_SECONDS=60      # Or wait this many seconds
+```
+
+### Recommended Settings
+
+**For Production (efficiency priority):**
+```bash
+BATCH_SIZE=5
+BATCH_TIMEOUT_SECONDS=60
+```
+- Batches 5 messages together (reduces API calls)
+- Waits up to 60 seconds for more messages
+- Good for active groups with frequent messages
+
+**For Testing/Demo (speed priority):**
+```bash
+BATCH_SIZE=3
+BATCH_TIMEOUT_SECONDS=15
+```
+- Processes after just 3 messages
+- Only waits 15 seconds max
+- Better for testing and demos where you want quick feedback
+
+**For Low-Volume Groups:**
+```bash
+BATCH_SIZE=3
+BATCH_TIMEOUT_SECONDS=30
+```
+- Smaller batch size (triggers faster)
+- Moderate timeout (30 seconds)
+
+**For High-Volume Groups:**
+```bash
+BATCH_SIZE=10
+BATCH_TIMEOUT_SECONDS=90
+```
+- Larger batches (more efficient)
+- Longer timeout (less frequent processing)
+
+---
+
+## Manual Flush Command
+
+If you don't want to wait for the timer, use:
+
+```
+/flush
+```
+
+This immediately processes all buffered messages for the current group.
+
+**When to use /flush:**
+- During testing/demos (want instant results)
+- After important messages (need to query them right away)
+- When buffer has accumulated but timer hasn't triggered yet
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Active Group Chat (10 messages in 30 seconds)
+
+With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
+- First 5 messages → processed immediately
+- Next 5 messages → processed immediately
+- Total: 2 batch operations
+
+### Scenario 2: Slow Group Chat (1 message, then silence)
+
+With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
+- 1 message arrives → starts 60s timer
+- 60 seconds pass → timer triggers → processes the 1 message
+- Total: 1 batch operation after 60s delay
+
+**Use `/flush` to skip the wait!**
+
+### Scenario 3: Demo with Testing Settings
+
+With `BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15`:
+- Send test message
+- Wait 15 seconds OR send 2 more messages
+- Processed automatically
+- Can query with `/ask` immediately after
+
+---
+
+## Cost vs Speed Trade-offs
+
+### Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):
+- ✅ Instant results
+- ❌ More API calls (expensive)
+- ❌ More LLM tokens used
+- ❌ Slower overall (more network overhead)
+
+### Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):
+- ✅ Fewer API calls (cheaper)
+- ✅ More context for LLM (better signal extraction)
+- ✅ More efficient
+- ❌ Slight delay before messages are searchable
+
+---
+
+## Current Status Check
+
+To see if messages are waiting in buffer:
+
+```
+/flush
+```
+
+If buffer is empty:
+```
+✅ No messages in buffer (already processed)
+```
+
+If messages are waiting:
+```
+⚡ Processing 3 buffered message(s)...
+✅ Processed 3 messages → 5 signals extracted.
+```
+
+---
+
+## Debugging
+
+If messages aren't being found by `/ask`:
+
+1. **Check if they're in the buffer:**
+   ```
+   /flush
+   ```
+
+2. **Check if signals were extracted:**
+   ```
+   /digest
+   ```
+   Shows total signal count - this should increase after processing
+
+3. **Check backend logs:**
+   Look for:
+   ```
+   INFO: Processed batch: N signals from [group_name]
+   ```
+   or
+   ```
+   INFO: Timer flush: N signals from [group_name]
+   ```
+
+4. **Verify processing happened:**
+   After sending a message, wait BATCH_TIMEOUT_SECONDS, then check `/digest` again
+
+---
+
+## Recommended: Use Faster Settings for Testing
+
+For now, update your `.env`:
+
+```bash
+BATCH_SIZE=2
+BATCH_TIMEOUT_SECONDS=10
+```
+
+Then restart the bot:
+```bash
+# Ctrl+C to stop
+python -m backend.bot.bot
+```
+
+Now messages will be processed within 10 seconds (or after just 2 messages).
+
+---
+
+## Quick Reference
+
+| Setting | Production | Testing | Demo |
+|---------|-----------|---------|------|
+| BATCH_SIZE | 5 | 2 | 3 |
+| BATCH_TIMEOUT_SECONDS | 60 | 10 | 15 |
+| Processing delay | ~60s max | ~10s max | ~15s max |
+| API efficiency | High | Medium | Medium |
+
+Use `/flush` anytime you want instant processing!
--- a/thirdeye/docs/TESTING_WITH_MEET_TELEGRAM.md
+++ b/thirdeye/docs/TESTING_WITH_MEET_TELEGRAM.md
@@ -0,0 +1,510 @@
+# Testing ThirdEye with Actual Google Meet & Telegram
+
+This guide walks you through testing the full Meet + Telegram integration end-to-end.
+
+---
+
+## Prerequisites
+
+1. **Chrome browser** with Developer mode enabled
+2. **Google account** for Google Meet
+3. **Telegram account** and bot token (already configured)
+4. **ThirdEye backend + bot** both running
+
+---
+
+## Step 1: Start Your Services
+
+Open **two terminals** and run both services:
+
+### Terminal 1: FastAPI Backend
+```bash
+cd c:\Projects\binary_hack
+python run_api.py
+```
+
+You should see:
+```
+INFO:     Application startup complete.
+INFO:     Uvicorn running on http://127.0.0.1:8000
+```
+
+### Terminal 2: Telegram Bot
+```bash
+cd c:\Projects\binary_hack
+python -m backend.bot.bot
+```
+
+You should see:
+```
+INFO:thirdeye.bot:ThirdEye bot starting...
+```
+
+**Keep both running throughout the test.**
+
+---
+
+## Step 2: Verify Services Are Running
+
+### Check API Health
+```bash
+curl http://localhost:8000/health
+```
+
+Expected response:
+```json
+{"status":"ok","service":"thirdeye"}
+```
+
+### Check Telegram Bot
+Send `/start` to your bot in Telegram. You should see the welcome message with all available commands including:
+- `/meetsum [id]` — Summarize a meeting
+- `/meetask [id] [q]` — Ask about a meeting  
+- `/meetmatch [id]` — Match meeting to team chats
+
+---
+
+## Step 3: Load the Chrome Extension
+
+1. Open Chrome and navigate to `chrome://extensions/`
+2. Enable **Developer mode** (toggle in top-right corner)
+3. Click **Load unpacked**
+4. Navigate to `c:\Projects\binary_hack\meet_extension\`
+5. Click **Select Folder**
+
+You should see "ThirdEye Meet Recorder" in your extensions list.
+
+---
+
+## Step 4: Configure the Extension
+
+1. Click the ThirdEye extension icon in Chrome toolbar
+2. In the popup, scroll to the bottom section
+3. Enter these settings:
+   - **Backend URL**: `http://localhost:8000`
+   - **Group ID**: `meet_sessions`
+   - **Ingest Secret**: `thirdeye_meet_secret_change_me`
+4. Click **Save Settings**
+
+You should see "✓ Saved" confirmation.
+
+---
+
+## Step 5: Start a Test Google Meet
+
+### 5.1 Create or Join a Meeting
+
+Go to https://meet.google.com and either:
+- Create a new meeting (click "New meeting" → "Start an instant meeting")
+- Join an existing meeting using a code
+
+### 5.2 Start Recording
+
+1. Once in the meeting, click the ThirdEye extension icon
+2. You should see the popup UI with "▶ Start Recording" button
+3. Click **Start Recording**
+4. Chrome will prompt for microphone access - click **Allow**
+
+### 5.3 Verify Recording Started
+
+The extension popup should show:
+- A pulsing **red dot** next to "Recording..."
+- The **Meeting ID** (e.g., `abc-defg-hij`)
+- **"0 chunks sent"** (this will increment)
+
+### 5.4 Speak Some Test Content
+
+Say things like:
+- "We decided to use PostgreSQL for the database"
+- "Alex will complete the migration by Friday"
+- "The API timeout is blocking us from deploying"
+- "Should we move the deadline? I'm not sure."
+
+The extension will capture this and transcribe it using Chrome's Web Speech API.
+
+---
+
+## Step 6: Check Backend is Receiving Data
+
+### 6.1 Watch FastAPI Logs
+
+In Terminal 1 (where `run_api.py` is running), you should see:
+```
+INFO:     127.0.0.1:xxxxx - "POST /api/meet/start HTTP/1.1" 200 OK
+INFO:     127.0.0.1:xxxxx - "POST /api/meet/ingest HTTP/1.1" 200 OK
+INFO:thirdeye.agents.meet_ingestor:Processing meet chunk 0 for meeting abc-defg-hij (245 chars)
+INFO:thirdeye.agents.meet_ingestor:Stored 5 signals for meeting abc-defg-hij chunk 0
+```
+
+### 6.2 Check API Endpoint
+
+While recording or after:
+```bash
+curl http://localhost:8000/api/meet/meetings
+```
+
+Expected response:
+```json
+{
+  "meetings": [
+    {
+      "meeting_id": "abc-defg-hij",
+      "signal_count": 8,
+      "types": {
+        "meet_chunk_raw": 2,
+        "meet_decision": 2,
+        "meet_action_item": 2,
+        "meet_blocker": 1,
+        "meet_started": 1
+      }
+    }
+  ]
+}
+```
+
+### 6.3 Run Diagnostic Script
+
+```bash
+python scripts/debug_meet.py
+```
+
+This shows all meet signals in ChromaDB and which meetings have been recorded.
+
+---
+
+## Step 7: Stop Recording & Generate Summary
+
+1. Click the extension icon
+2. Click **■ Stop Recording**
+3. This sends the final chunk with `is_final=True`, which triggers full meeting summary generation
+
+Watch the backend logs - you should see:
+```
+INFO:thirdeye.agents.meet_ingestor:Processing meet chunk N for meeting abc-defg-hij (final)
+```
+
+---
+
+## Step 8: Query the Meeting in Telegram
+
+Now go to your Telegram bot chat:
+
+### 8.1 Get Meeting Summary
+```
+/meetsum
+```
+
+or with specific meeting ID:
+```
+/meetsum abc-defg-hij
+```
+
+Expected response:
+```
+📋 Meeting Summary
+ID: abc-defg-hij
+
+Overview: Sprint planning meeting to finalize database choice and assign action items.
+
+Key Decisions:
+• PostgreSQL selected for primary database
+• ...
+
+Action Items:
+• Alex: Complete migration by Friday
+• ...
+
+Blockers:
+• API timeout blocking deployment
+• ...
+```
+
+### 8.2 Ask Questions About the Meeting
+```
+/meetask abc-defg-hij What did we decide about the database?
+```
+
+Expected response:
+```
+🎙️ Meet Q&A — abc-defg-hij
+
+Q: What did we decide about the database?
+
+A: The team decided to use PostgreSQL for the primary database...
+```
+
+### 8.3 Cross-Reference with Chat History
+
+First, make sure you have some Telegram chat messages in a monitored group. Then:
+
+```
+/meetmatch abc-defg-hij
+```
+
+Expected response:
+```
+🔗 Meet ↔ Chat Cross-Reference
+Meeting: abc-defg-hij
+
+⚡ Contradictions (1) — ACTION NEEDED
+🔴 Meeting decided: We will use PostgreSQL for the database
+   BUT [your_dev_group] says: We should use MySQL, it's simpler
+   Impact: Database choice inconsistency may cause confusion
+
+✅ Confirmations (1)
+🟡 Meeting: API timeout blocking deployment
+   Matches [your_dev_group]: API timeout still happening
+
+🔦 Blind Spots (1) — Teams may not know
+🟠 OAuth refactor marked as high risk in meeting discussion
+   → Consider notifying product team about deployment risks
+```
+
+---
+
+## Step 9: Debugging Common Issues
+
+### Issue 1: Extension Shows "Not on Google Meet"
+
+**Cause**: You're not on a `meet.google.com` URL
+
+**Fix**: Navigate to https://meet.google.com and join/start a meeting
+
+### Issue 2: Extension Recording But No Chunks Sent
+
+**Symptoms**: 
+- Popup shows "Recording..." with red dot
+- But "chunks sent" stays at 0
+
+**Debugging**:
+1. Open Chrome DevTools (F12) on the Meet page
+2. Go to Console tab
+3. Look for `[ThirdEye]` messages
+4. You should see: `[ThirdEye] Content script loaded` and `[ThirdEye] Speech recognition started`
+
+**Common causes**:
+- Microphone permission denied - check Chrome settings
+- Backend URL incorrect in extension settings - verify it's `http://localhost:8000`
+- Backend not running - check Terminal 1
+
+### Issue 3: Backend Returning 403 Forbidden
+
+**Cause**: Secret mismatch
+
+**Fix**: 
+1. Check your `.env` file: `MEET_INGEST_SECRET=thirdeye_meet_secret_change_me`
+2. Check extension settings (click icon → scroll down) - secret must match exactly
+3. Save settings and try recording again
+
+### Issue 4: Telegram Commands Say "No meetings found"
+
+**Debugging steps**:
+
+1. Run the diagnostic script:
+```bash
+python scripts/debug_meet.py
+```
+
+2. Check if signals exist but in wrong group:
+```python
+from backend.db.chroma import get_group_ids, get_all_signals
+
+for group in get_group_ids():
+    signals = get_all_signals(group)
+    meet_sigs = [s for s in signals if s.get("metadata", {}).get("lens") == "meet"]
+    if meet_sigs:
+        print(f"{group}: {len(meet_sigs)} meet signals")
+```
+
+3. Check if meeting_id is stored in metadata:
+```python
+from backend.db.chroma import get_all_signals
+
+signals = get_all_signals("meet_sessions")
+for s in signals[:5]:
+    meta = s.get("metadata", {})
+    print(f"Type: {meta.get('type')}, MeetingID: {meta.get('meeting_id')}")
+```
+
+### Issue 5: /meetmatch Returns No Connections
+
+**Cause**: No cross-reference groups configured or no signals in those groups
+
+**Fix**:
+1. Check `.env`: `MEET_CROSS_REF_GROUPS=your_telegram_group_id`
+2. Make sure you have actual Telegram chat signals in those groups
+3. Send some test messages in your Telegram group and wait 30s for processing
+4. Verify with `/digest` in Telegram that signals exist
+
+---
+
+## Step 10: Full Integration Test Scenario
+
+Here's a complete test scenario to verify everything works:
+
+### Day 1: Seed Telegram Chat History
+
+In your Telegram group, send these messages:
+```
+Alex: I think we should use MySQL for the new database
+Sam: The checkout endpoint is timing out again
+Priya: Let's just push OAuth changes directly to main
+```
+
+Wait 60 seconds for the bot to process these into signals.
+
+### Day 2: Record a Google Meet
+
+1. Start recording with extension
+2. During meeting, say (or have participants say):
+   - "We've decided - PostgreSQL is the database we're using"
+   - "The checkout timeout is blocking the entire deploy"
+   - "OAuth refactor is risky, we need a feature branch first"
+3. Stop recording after 2-3 minutes
+
+### Day 3: Query and Cross-Reference
+
+In Telegram:
+```
+/meetsum
+```
+Should show summary with decisions, actions, blockers.
+
+```
+/meetask abc-defg-hij What database did we choose?
+```
+Should answer: PostgreSQL
+
+```
+/meetmatch
+```
+Should show:
+- **Contradiction**: PostgreSQL (meet) vs MySQL (chat)
+- **Confirmation**: Checkout timeout mentioned in both
+- **Contradiction**: Feature branch (meet) vs direct push (chat)
+
+---
+
+## API Endpoints Reference
+
+For debugging, you can also hit the API directly:
+
+### List All Meetings
+```bash
+curl http://localhost:8000/api/meet/meetings
+```
+
+### Get Signals for Specific Meeting
+```bash
+curl http://localhost:8000/api/meet/meetings/abc-defg-hij/signals
+```
+
+### Manual Ingest Test (without extension)
+```bash
+curl -X POST http://localhost:8000/api/meet/ingest \
+  -H "Content-Type: application/json" \
+  -H "X-ThirdEye-Secret: thirdeye_meet_secret_change_me" \
+  -d '{
+    "meeting_id": "manual-test",
+    "group_id": "meet_sessions",
+    "chunk_index": 0,
+    "text": "We decided to use PostgreSQL. Alex will set up the schema by Friday.",
+    "speaker": "Test User",
+    "timestamp": "2026-03-21T10:00:00Z",
+    "is_final": false
+  }'
+```
+
+Then check:
+```bash
+curl http://localhost:8000/api/meet/meetings
+```
+
+You should see `manual-test` in the list.
+
+---
+
+## Troubleshooting Checklist
+
+Before asking for help, verify:
+
+- [ ] Both services running (API on :8000, bot polling Telegram)
+- [ ] Extension loaded in Chrome (check `chrome://extensions/`)
+- [ ] Extension settings saved with correct backend URL and secret
+- [ ] You're on `meet.google.com` (not hangouts)
+- [ ] Microphone permission granted in Chrome
+- [ ] Extension popup shows "Recording..." with red pulsing dot
+- [ ] Backend logs show "Processing meet chunk..." messages
+- [ ] `http://localhost:8000/api/meet/meetings` returns your meeting
+- [ ] Telegram bot responds to `/start` command
+- [ ] `.env` has `MEET_INGEST_SECRET` that matches extension
+
+---
+
+## Quick Smoke Test (No Meeting Required)
+
+To test the backend without recording a real meeting:
+
+```bash
+# Terminal 1
+python run_api.py
+
+# Terminal 2 (new window)
+python scripts/test_m14.py
+```
+
+This simulates the extension and verifies:
+- API endpoints work
+- Secret validation works
+- Signals are stored correctly
+- Meetings are queryable
+
+If this passes, your backend is working. If Chrome extension still doesn't work, the issue is in the browser/extension setup.
+
+---
+
+## Expected Timeline
+
+- **Chunk sent every**: 30 seconds while recording
+- **Signal extraction**: 2-5 seconds after chunk received
+- **Meeting summary**: Generated when you stop recording (takes 5-10 seconds)
+- **Telegram query**: Instant (ChromaDB is fast)
+- **Cross-reference**: 5-15 seconds (LLM analysis)
+
+---
+
+## Success Indicators
+
+You'll know it's working when:
+
+1. ✅ Extension popup shows incrementing "chunks sent" counter
+2. ✅ Backend logs show "Stored N signals for meeting..."
+3. ✅ `/api/meet/meetings` returns your meeting ID
+4. ✅ `/meetsum` in Telegram shows your meeting summary
+5. ✅ `/meetmatch` finds connections to your chat history
+
+---
+
+## Document Upload Best Practices
+
+When uploading documents to Telegram:
+- The bot will respond immediately with "📄 Processing..."
+- Processing happens in the background (won't block other commands)
+- You'll get a follow-up message when processing completes
+- Large files (PDFs, DOCX) may take 30-60 seconds
+- Maximum file size: 10 MB
+
+If document processing fails, check:
+- File format is supported (.pdf, .docx, .txt, .md, .csv, .json, .log)
+- File is under 10 MB
+- Backend API keys are configured (GROQ, CEREBRAS, etc.)
+
+---
+
+## Contact & Support
+
+If you're stuck:
+1. Run `python scripts/debug_meet.py` and share output
+2. Check backend logs (Terminal 1) for errors
+3. Check browser console (F12 on Meet page) for `[ThirdEye]` errors
+4. Share the output of `curl http://localhost:8000/api/meet/meetings`
--- a/thirdeye/docs/Voice_milestones.md
+++ b/thirdeye/docs/Voice_milestones.md
--- a/thirdeye/docs/additional.md
+++ b/thirdeye/docs/additional.md
--- a/thirdeye/docs/implementation.md
+++ b/thirdeye/docs/implementation.md
--- a/thirdeye/docs/jira_milestones.md
+++ b/thirdeye/docs/jira_milestones.md
--- a/thirdeye/docs/meet_extension.md
+++ b/thirdeye/docs/meet_extension.md
--- a/thirdeye/docs/sot.md
+++ b/thirdeye/docs/sot.md