This commit is contained in:
2026-04-05 00:43:23 +05:30
commit 8be37d3e92
425 changed files with 101853 additions and 0 deletions

View File

@@ -0,0 +1,204 @@
# Message Batching Configuration
ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.
---
## How Batching Works
Messages are processed when either condition is met:
1. **Buffer reaches BATCH_SIZE** messages (processes immediately)
2. **Timer reaches BATCH_TIMEOUT_SECONDS** (processes whatever is in buffer)
This prevents processing every single message individually while ensuring messages don't wait forever.
---
## Configuration Variables
Edit `.env`:
```bash
BATCH_SIZE=5 # Process after N messages
BATCH_TIMEOUT_SECONDS=60 # Or wait this many seconds
```
### Recommended Settings
**For Production (efficiency priority):**
```bash
BATCH_SIZE=5
BATCH_TIMEOUT_SECONDS=60
```
- Batches 5 messages together (reduces API calls)
- Waits up to 60 seconds for more messages
- Good for active groups with frequent messages
**For Testing/Demo (speed priority):**
```bash
BATCH_SIZE=3
BATCH_TIMEOUT_SECONDS=15
```
- Processes after just 3 messages
- Only waits 15 seconds max
- Better for testing and demos where you want quick feedback
**For Low-Volume Groups:**
```bash
BATCH_SIZE=3
BATCH_TIMEOUT_SECONDS=30
```
- Smaller batch size (triggers faster)
- Moderate timeout (30 seconds)
**For High-Volume Groups:**
```bash
BATCH_SIZE=10
BATCH_TIMEOUT_SECONDS=90
```
- Larger batches (more efficient)
- Longer timeout (less frequent processing)
---
## Manual Flush Command
If you don't want to wait for the timer, use:
```
/flush
```
This immediately processes all buffered messages for the current group.
**When to use /flush:**
- During testing/demos (want instant results)
- After important messages (need to query them right away)
- When buffer has accumulated but timer hasn't triggered yet
---
## Example Scenarios
### Scenario 1: Active Group Chat (10 messages in 30 seconds)
With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
- First 5 messages → processed immediately
- Next 5 messages → processed immediately
- Total: 2 batch operations
### Scenario 2: Slow Group Chat (1 message, then silence)
With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
- 1 message arrives → starts 60s timer
- 60 seconds pass → timer triggers → processes the 1 message
- Total: 1 batch operation after 60s delay
**Use `/flush` to skip the wait!**
### Scenario 3: Demo with Testing Settings
With `BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15`:
- Send test message
- Wait 15 seconds OR send 2 more messages
- Processed automatically
- Can query with `/ask` immediately after
---
## Cost vs Speed Trade-offs
### Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):
- ✅ Instant results
- ❌ More API calls (expensive)
- ❌ More LLM tokens used
- ❌ Slower overall (more network overhead)
### Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):
- ✅ Fewer API calls (cheaper)
- ✅ More context for LLM (better signal extraction)
- ✅ More efficient
- ❌ Slight delay before messages are searchable
---
## Current Status Check
To see if messages are waiting in buffer:
```
/flush
```
If buffer is empty:
```
✅ No messages in buffer (already processed)
```
If messages are waiting:
```
⚡ Processing 3 buffered message(s)...
✅ Processed 3 messages → 5 signals extracted.
```
---
## Debugging
If messages aren't being found by `/ask`:
1. **Check if they're in the buffer:**
```
/flush
```
2. **Check if signals were extracted:**
```
/digest
```
Shows total signal count - this should increase after processing
3. **Check backend logs:**
Look for:
```
INFO: Processed batch: N signals from [group_name]
```
or
```
INFO: Timer flush: N signals from [group_name]
```
4. **Verify processing happened:**
After sending a message, wait BATCH_TIMEOUT_SECONDS, then check `/digest` again
---
## Recommended: Use Faster Settings for Testing
For now, update your `.env`:
```bash
BATCH_SIZE=2
BATCH_TIMEOUT_SECONDS=10
```
Then restart the bot:
```bash
# Ctrl+C to stop
python -m backend.bot.bot
```
Now messages will be processed within 10 seconds (or after just 2 messages).
---
## Quick Reference
| Setting | Production | Testing | Demo |
|---------|-----------|---------|------|
| BATCH_SIZE | 5 | 2 | 3 |
| BATCH_TIMEOUT_SECONDS | 60 | 10 | 15 |
| Processing delay | ~60s max | ~10s max | ~15s max |
| API efficiency | High | Medium | Medium |
Use `/flush` anytime you want instant processing!

View File

@@ -0,0 +1,510 @@
# Testing ThirdEye with Actual Google Meet & Telegram
This guide walks you through testing the full Meet + Telegram integration end-to-end.
---
## Prerequisites
1. **Chrome browser** with Developer mode enabled
2. **Google account** for Google Meet
3. **Telegram account** and bot token (already configured)
4. **ThirdEye backend + bot** both running
---
## Step 1: Start Your Services
Open **two terminals** and run both services:
### Terminal 1: FastAPI Backend
```bash
cd c:\Projects\binary_hack
python run_api.py
```
You should see:
```
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000
```
### Terminal 2: Telegram Bot
```bash
cd c:\Projects\binary_hack
python -m backend.bot.bot
```
You should see:
```
INFO:thirdeye.bot:ThirdEye bot starting...
```
**Keep both running throughout the test.**
---
## Step 2: Verify Services Are Running
### Check API Health
```bash
curl http://localhost:8000/health
```
Expected response:
```json
{"status":"ok","service":"thirdeye"}
```
### Check Telegram Bot
Send `/start` to your bot in Telegram. You should see the welcome message with all available commands including:
- `/meetsum [id]` — Summarize a meeting
- `/meetask [id] [q]` — Ask about a meeting
- `/meetmatch [id]` — Match meeting to team chats
---
## Step 3: Load the Chrome Extension
1. Open Chrome and navigate to `chrome://extensions/`
2. Enable **Developer mode** (toggle in top-right corner)
3. Click **Load unpacked**
4. Navigate to `c:\Projects\binary_hack\meet_extension\`
5. Click **Select Folder**
You should see "ThirdEye Meet Recorder" in your extensions list.
---
## Step 4: Configure the Extension
1. Click the ThirdEye extension icon in Chrome toolbar
2. In the popup, scroll to the bottom section
3. Enter these settings:
- **Backend URL**: `http://localhost:8000`
- **Group ID**: `meet_sessions`
- **Ingest Secret**: `thirdeye_meet_secret_change_me`
4. Click **Save Settings**
You should see "✓ Saved" confirmation.
---
## Step 5: Start a Test Google Meet
### 5.1 Create or Join a Meeting
Go to https://meet.google.com and either:
- Create a new meeting (click "New meeting" → "Start an instant meeting")
- Join an existing meeting using a code
### 5.2 Start Recording
1. Once in the meeting, click the ThirdEye extension icon
2. You should see the popup UI with "▶ Start Recording" button
3. Click **Start Recording**
4. Chrome will prompt for microphone access - click **Allow**
### 5.3 Verify Recording Started
The extension popup should show:
- A pulsing **red dot** next to "Recording..."
- The **Meeting ID** (e.g., `abc-defg-hij`)
- **"0 chunks sent"** (this will increment)
### 5.4 Speak Some Test Content
Say things like:
- "We decided to use PostgreSQL for the database"
- "Alex will complete the migration by Friday"
- "The API timeout is blocking us from deploying"
- "Should we move the deadline? I'm not sure."
The extension will capture this and transcribe it using Chrome's Web Speech API.
---
## Step 6: Check Backend is Receiving Data
### 6.1 Watch FastAPI Logs
In Terminal 1 (where `run_api.py` is running), you should see:
```
INFO: 127.0.0.1:xxxxx - "POST /api/meet/start HTTP/1.1" 200 OK
INFO: 127.0.0.1:xxxxx - "POST /api/meet/ingest HTTP/1.1" 200 OK
INFO:thirdeye.agents.meet_ingestor:Processing meet chunk 0 for meeting abc-defg-hij (245 chars)
INFO:thirdeye.agents.meet_ingestor:Stored 5 signals for meeting abc-defg-hij chunk 0
```
### 6.2 Check API Endpoint
While recording or after:
```bash
curl http://localhost:8000/api/meet/meetings
```
Expected response:
```json
{
"meetings": [
{
"meeting_id": "abc-defg-hij",
"signal_count": 8,
"types": {
"meet_chunk_raw": 2,
"meet_decision": 2,
"meet_action_item": 2,
"meet_blocker": 1,
"meet_started": 1
}
}
]
}
```
### 6.3 Run Diagnostic Script
```bash
python scripts/debug_meet.py
```
This shows all meet signals in ChromaDB and which meetings have been recorded.
---
## Step 7: Stop Recording & Generate Summary
1. Click the extension icon
2. Click **■ Stop Recording**
3. This sends the final chunk with `is_final=True`, which triggers full meeting summary generation
Watch the backend logs - you should see:
```
INFO:thirdeye.agents.meet_ingestor:Processing meet chunk N for meeting abc-defg-hij (final)
```
---
## Step 8: Query the Meeting in Telegram
Now go to your Telegram bot chat:
### 8.1 Get Meeting Summary
```
/meetsum
```
or with specific meeting ID:
```
/meetsum abc-defg-hij
```
Expected response:
```
📋 Meeting Summary
ID: abc-defg-hij
Overview: Sprint planning meeting to finalize database choice and assign action items.
Key Decisions:
• PostgreSQL selected for primary database
• ...
Action Items:
• Alex: Complete migration by Friday
• ...
Blockers:
• API timeout blocking deployment
• ...
```
### 8.2 Ask Questions About the Meeting
```
/meetask abc-defg-hij What did we decide about the database?
```
Expected response:
```
🎙️ Meet Q&A — abc-defg-hij
Q: What did we decide about the database?
A: The team decided to use PostgreSQL for the primary database...
```
### 8.3 Cross-Reference with Chat History
First, make sure you have some Telegram chat messages in a monitored group. Then:
```
/meetmatch abc-defg-hij
```
Expected response:
```
🔗 Meet ↔ Chat Cross-Reference
Meeting: abc-defg-hij
⚡ Contradictions (1) — ACTION NEEDED
🔴 Meeting decided: We will use PostgreSQL for the database
BUT [your_dev_group] says: We should use MySQL, it's simpler
Impact: Database choice inconsistency may cause confusion
✅ Confirmations (1)
🟡 Meeting: API timeout blocking deployment
Matches [your_dev_group]: API timeout still happening
🔦 Blind Spots (1) — Teams may not know
🟠 OAuth refactor marked as high risk in meeting discussion
→ Consider notifying product team about deployment risks
```
---
## Step 9: Debugging Common Issues
### Issue 1: Extension Shows "Not on Google Meet"
**Cause**: You're not on a `meet.google.com` URL
**Fix**: Navigate to https://meet.google.com and join/start a meeting
### Issue 2: Extension Recording But No Chunks Sent
**Symptoms**:
- Popup shows "Recording..." with red dot
- But "chunks sent" stays at 0
**Debugging**:
1. Open Chrome DevTools (F12) on the Meet page
2. Go to Console tab
3. Look for `[ThirdEye]` messages
4. You should see: `[ThirdEye] Content script loaded` and `[ThirdEye] Speech recognition started`
**Common causes**:
- Microphone permission denied - check Chrome settings
- Backend URL incorrect in extension settings - verify it's `http://localhost:8000`
- Backend not running - check Terminal 1
### Issue 3: Backend Returning 403 Forbidden
**Cause**: Secret mismatch
**Fix**:
1. Check your `.env` file: `MEET_INGEST_SECRET=thirdeye_meet_secret_change_me`
2. Check extension settings (click icon → scroll down) - secret must match exactly
3. Save settings and try recording again
### Issue 4: Telegram Commands Say "No meetings found"
**Debugging steps**:
1. Run the diagnostic script:
```bash
python scripts/debug_meet.py
```
2. Check if signals exist but in wrong group:
```python
from backend.db.chroma import get_group_ids, get_all_signals
for group in get_group_ids():
signals = get_all_signals(group)
meet_sigs = [s for s in signals if s.get("metadata", {}).get("lens") == "meet"]
if meet_sigs:
print(f"{group}: {len(meet_sigs)} meet signals")
```
3. Check if meeting_id is stored in metadata:
```python
from backend.db.chroma import get_all_signals
signals = get_all_signals("meet_sessions")
for s in signals[:5]:
meta = s.get("metadata", {})
print(f"Type: {meta.get('type')}, MeetingID: {meta.get('meeting_id')}")
```
### Issue 5: /meetmatch Returns No Connections
**Cause**: No cross-reference groups configured or no signals in those groups
**Fix**:
1. Check `.env`: `MEET_CROSS_REF_GROUPS=your_telegram_group_id`
2. Make sure you have actual Telegram chat signals in those groups
3. Send some test messages in your Telegram group and wait 30s for processing
4. Verify with `/digest` in Telegram that signals exist
---
## Step 10: Full Integration Test Scenario
Here's a complete test scenario to verify everything works:
### Day 1: Seed Telegram Chat History
In your Telegram group, send these messages:
```
Alex: I think we should use MySQL for the new database
Sam: The checkout endpoint is timing out again
Priya: Let's just push OAuth changes directly to main
```
Wait 60 seconds for the bot to process these into signals.
### Day 2: Record a Google Meet
1. Start recording with extension
2. During meeting, say (or have participants say):
- "We've decided - PostgreSQL is the database we're using"
- "The checkout timeout is blocking the entire deploy"
- "OAuth refactor is risky, we need a feature branch first"
3. Stop recording after 2-3 minutes
### Day 3: Query and Cross-Reference
In Telegram:
```
/meetsum
```
Should show summary with decisions, actions, blockers.
```
/meetask abc-defg-hij What database did we choose?
```
Should answer: PostgreSQL
```
/meetmatch
```
Should show:
- **Contradiction**: PostgreSQL (meet) vs MySQL (chat)
- **Confirmation**: Checkout timeout mentioned in both
- **Contradiction**: Feature branch (meet) vs direct push (chat)
---
## API Endpoints Reference
For debugging, you can also hit the API directly:
### List All Meetings
```bash
curl http://localhost:8000/api/meet/meetings
```
### Get Signals for Specific Meeting
```bash
curl http://localhost:8000/api/meet/meetings/abc-defg-hij/signals
```
### Manual Ingest Test (without extension)
```bash
curl -X POST http://localhost:8000/api/meet/ingest \
-H "Content-Type: application/json" \
-H "X-ThirdEye-Secret: thirdeye_meet_secret_change_me" \
-d '{
"meeting_id": "manual-test",
"group_id": "meet_sessions",
"chunk_index": 0,
"text": "We decided to use PostgreSQL. Alex will set up the schema by Friday.",
"speaker": "Test User",
"timestamp": "2026-03-21T10:00:00Z",
"is_final": false
}'
```
Then check:
```bash
curl http://localhost:8000/api/meet/meetings
```
You should see `manual-test` in the list.
---
## Troubleshooting Checklist
Before asking for help, verify:
- [ ] Both services running (API on :8000, bot polling Telegram)
- [ ] Extension loaded in Chrome (check `chrome://extensions/`)
- [ ] Extension settings saved with correct backend URL and secret
- [ ] You're on `meet.google.com` (not hangouts)
- [ ] Microphone permission granted in Chrome
- [ ] Extension popup shows "Recording..." with red pulsing dot
- [ ] Backend logs show "Processing meet chunk..." messages
- [ ] `http://localhost:8000/api/meet/meetings` returns your meeting
- [ ] Telegram bot responds to `/start` command
- [ ] `.env` has `MEET_INGEST_SECRET` that matches extension
---
## Quick Smoke Test (No Meeting Required)
To test the backend without recording a real meeting:
```bash
# Terminal 1
python run_api.py
# Terminal 2 (new window)
python scripts/test_m14.py
```
This simulates the extension and verifies:
- API endpoints work
- Secret validation works
- Signals are stored correctly
- Meetings are queryable
If this passes, your backend is working. If Chrome extension still doesn't work, the issue is in the browser/extension setup.
---
## Expected Timeline
- **Chunk sent every**: 30 seconds while recording
- **Signal extraction**: 2-5 seconds after chunk received
- **Meeting summary**: Generated when you stop recording (takes 5-10 seconds)
- **Telegram query**: Instant (ChromaDB is fast)
- **Cross-reference**: 5-15 seconds (LLM analysis)
---
## Success Indicators
You'll know it's working when:
1. ✅ Extension popup shows incrementing "chunks sent" counter
2. ✅ Backend logs show "Stored N signals for meeting..."
3.`/api/meet/meetings` returns your meeting ID
4.`/meetsum` in Telegram shows your meeting summary
5.`/meetmatch` finds connections to your chat history
---
## Document Upload Best Practices
When uploading documents to Telegram:
- The bot will respond immediately with "📄 Processing..."
- Processing happens in the background (won't block other commands)
- You'll get a follow-up message when processing completes
- Large files (PDFs, DOCX) may take 30-60 seconds
- Maximum file size: 10 MB
If document processing fails, check:
- File format is supported (.pdf, .docx, .txt, .md, .csv, .json, .log)
- File is under 10 MB
- Backend API keys are configured (GROQ, CEREBRAS, etc.)
---
## Contact & Support
If you're stuck:
1. Run `python scripts/debug_meet.py` and share output
2. Check backend logs (Terminal 1) for errors
3. Check browser console (F12 on Meet page) for `[ThirdEye]` errors
4. Share the output of `curl http://localhost:8000/api/meet/meetings`

File diff suppressed because it is too large Load Diff

1622
thirdeye/docs/additional.md Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

1522
thirdeye/docs/sot.md Normal file

File diff suppressed because it is too large Load Diff