init

2026-04-19 12:41:48 +00:00 · 2026-04-05 00:43:23 +05:30
commit 8be37d3e92
425 changed files with 101853 additions and 0 deletions
--- a/thirdeye/docs/BATCHING_CONFIG.md
+++ b/thirdeye/docs/BATCHING_CONFIG.md
@@ -0,0 +1,204 @@
+# Message Batching Configuration
+
+ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.
+
+---
+
+## How Batching Works
+
+Messages are processed when either condition is met:
+1. **Buffer reaches BATCH_SIZE** messages (processes immediately)
+2. **Timer reaches BATCH_TIMEOUT_SECONDS** (processes whatever is in buffer)
+
+This prevents processing every single message individually while ensuring messages don't wait forever.
+
+---
+
+## Configuration Variables
+
+Edit `.env`:
+
+```bash
+BATCH_SIZE=5                  # Process after N messages
+BATCH_TIMEOUT_SECONDS=60      # Or wait this many seconds
+```
+
+### Recommended Settings
+
+**For Production (efficiency priority):**
+```bash
+BATCH_SIZE=5
+BATCH_TIMEOUT_SECONDS=60
+```
+- Batches 5 messages together (reduces API calls)
+- Waits up to 60 seconds for more messages
+- Good for active groups with frequent messages
+
+**For Testing/Demo (speed priority):**
+```bash
+BATCH_SIZE=3
+BATCH_TIMEOUT_SECONDS=15
+```
+- Processes after just 3 messages
+- Only waits 15 seconds max
+- Better for testing and demos where you want quick feedback
+
+**For Low-Volume Groups:**
+```bash
+BATCH_SIZE=3
+BATCH_TIMEOUT_SECONDS=30
+```
+- Smaller batch size (triggers faster)
+- Moderate timeout (30 seconds)
+
+**For High-Volume Groups:**
+```bash
+BATCH_SIZE=10
+BATCH_TIMEOUT_SECONDS=90
+```
+- Larger batches (more efficient)
+- Longer timeout (less frequent processing)
+
+---
+
+## Manual Flush Command
+
+If you don't want to wait for the timer, use:
+
+```
+/flush
+```
+
+This immediately processes all buffered messages for the current group.
+
+**When to use /flush:**
+- During testing/demos (want instant results)
+- After important messages (need to query them right away)
+- When buffer has accumulated but timer hasn't triggered yet
+
+---
+
+## Example Scenarios
+
+### Scenario 1: Active Group Chat (10 messages in 30 seconds)
+
+With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
+- First 5 messages → processed immediately
+- Next 5 messages → processed immediately
+- Total: 2 batch operations
+
+### Scenario 2: Slow Group Chat (1 message, then silence)
+
+With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
+- 1 message arrives → starts 60s timer
+- 60 seconds pass → timer triggers → processes the 1 message
+- Total: 1 batch operation after 60s delay
+
+**Use `/flush` to skip the wait!**
+
+### Scenario 3: Demo with Testing Settings
+
+With `BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15`:
+- Send test message
+- Wait 15 seconds OR send 2 more messages
+- Processed automatically
+- Can query with `/ask` immediately after
+
+---
+
+## Cost vs Speed Trade-offs
+
+### Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):
+- ✅ Instant results
+- ❌ More API calls (expensive)
+- ❌ More LLM tokens used
+- ❌ Slower overall (more network overhead)
+
+### Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):
+- ✅ Fewer API calls (cheaper)
+- ✅ More context for LLM (better signal extraction)
+- ✅ More efficient
+- ❌ Slight delay before messages are searchable
+
+---
+
+## Current Status Check
+
+To see if messages are waiting in buffer:
+
+```
+/flush
+```
+
+If buffer is empty:
+```
+✅ No messages in buffer (already processed)
+```
+
+If messages are waiting:
+```
+⚡ Processing 3 buffered message(s)...
+✅ Processed 3 messages → 5 signals extracted.
+```
+
+---
+
+## Debugging
+
+If messages aren't being found by `/ask`:
+
+1. **Check if they're in the buffer:**
+   ```
+   /flush
+   ```
+
+2. **Check if signals were extracted:**
+   ```
+   /digest
+   ```
+   Shows total signal count - this should increase after processing
+
+3. **Check backend logs:**
+   Look for:
+   ```
+   INFO: Processed batch: N signals from [group_name]
+   ```
+   or
+   ```
+   INFO: Timer flush: N signals from [group_name]
+   ```
+
+4. **Verify processing happened:**
+   After sending a message, wait BATCH_TIMEOUT_SECONDS, then check `/digest` again
+
+---
+
+## Recommended: Use Faster Settings for Testing
+
+For now, update your `.env`:
+
+```bash
+BATCH_SIZE=2
+BATCH_TIMEOUT_SECONDS=10
+```
+
+Then restart the bot:
+```bash
+# Ctrl+C to stop
+python -m backend.bot.bot
+```
+
+Now messages will be processed within 10 seconds (or after just 2 messages).
+
+---
+
+## Quick Reference
+
+| Setting | Production | Testing | Demo |
+|---------|-----------|---------|------|
+| BATCH_SIZE | 5 | 2 | 3 |
+| BATCH_TIMEOUT_SECONDS | 60 | 10 | 15 |
+| Processing delay | ~60s max | ~10s max | ~15s max |
+| API efficiency | High | Medium | Medium |
+
+Use `/flush` anytime you want instant processing!