mirror of
https://github.com/arkorty/B.Tech-Project-III.git
synced 2026-04-19 12:41:48 +00:00
205 lines
4.4 KiB
Markdown
205 lines
4.4 KiB
Markdown
# Message Batching Configuration
|
|
|
|
ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.
|
|
|
|
---
|
|
|
|
## How Batching Works
|
|
|
|
Messages are processed when either condition is met:
|
|
1. **Buffer reaches BATCH_SIZE** messages (processes immediately)
|
|
2. **Timer reaches BATCH_TIMEOUT_SECONDS** (processes whatever is in buffer)
|
|
|
|
This prevents processing every single message individually while ensuring messages don't wait forever.
|
|
|
|
---
|
|
|
|
## Configuration Variables
|
|
|
|
Edit `.env`:
|
|
|
|
```bash
|
|
BATCH_SIZE=5 # Process after N messages
|
|
BATCH_TIMEOUT_SECONDS=60 # Or wait this many seconds
|
|
```
|
|
|
|
### Recommended Settings
|
|
|
|
**For Production (efficiency priority):**
|
|
```bash
|
|
BATCH_SIZE=5
|
|
BATCH_TIMEOUT_SECONDS=60
|
|
```
|
|
- Batches 5 messages together (reduces API calls)
|
|
- Waits up to 60 seconds for more messages
|
|
- Good for active groups with frequent messages
|
|
|
|
**For Testing/Demo (speed priority):**
|
|
```bash
|
|
BATCH_SIZE=3
|
|
BATCH_TIMEOUT_SECONDS=15
|
|
```
|
|
- Processes after just 3 messages
|
|
- Only waits 15 seconds max
|
|
- Better for testing and demos where you want quick feedback
|
|
|
|
**For Low-Volume Groups:**
|
|
```bash
|
|
BATCH_SIZE=3
|
|
BATCH_TIMEOUT_SECONDS=30
|
|
```
|
|
- Smaller batch size (triggers faster)
|
|
- Moderate timeout (30 seconds)
|
|
|
|
**For High-Volume Groups:**
|
|
```bash
|
|
BATCH_SIZE=10
|
|
BATCH_TIMEOUT_SECONDS=90
|
|
```
|
|
- Larger batches (more efficient)
|
|
- Longer timeout (less frequent processing)
|
|
|
|
---
|
|
|
|
## Manual Flush Command
|
|
|
|
If you don't want to wait for the timer, use:
|
|
|
|
```
|
|
/flush
|
|
```
|
|
|
|
This immediately processes all buffered messages for the current group.
|
|
|
|
**When to use /flush:**
|
|
- During testing/demos (want instant results)
|
|
- After important messages (need to query them right away)
|
|
- When buffer has accumulated but timer hasn't triggered yet
|
|
|
|
---
|
|
|
|
## Example Scenarios
|
|
|
|
### Scenario 1: Active Group Chat (10 messages in 30 seconds)
|
|
|
|
With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
|
|
- First 5 messages → processed immediately
|
|
- Next 5 messages → processed immediately
|
|
- Total: 2 batch operations
|
|
|
|
### Scenario 2: Slow Group Chat (1 message, then silence)
|
|
|
|
With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
|
|
- 1 message arrives → starts 60s timer
|
|
- 60 seconds pass → timer triggers → processes the 1 message
|
|
- Total: 1 batch operation after 60s delay
|
|
|
|
**Use `/flush` to skip the wait!**
|
|
|
|
### Scenario 3: Demo with Testing Settings
|
|
|
|
With `BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15`:
|
|
- Send test message
|
|
- Wait 15 seconds OR send 2 more messages
|
|
- Processed automatically
|
|
- Can query with `/ask` immediately after
|
|
|
|
---
|
|
|
|
## Cost vs Speed Trade-offs
|
|
|
|
### Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):
|
|
- ✅ Instant results
|
|
- ❌ More API calls (expensive)
|
|
- ❌ More LLM tokens used
|
|
- ❌ Slower overall (more network overhead)
|
|
|
|
### Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):
|
|
- ✅ Fewer API calls (cheaper)
|
|
- ✅ More context for LLM (better signal extraction)
|
|
- ✅ More efficient
|
|
- ❌ Slight delay before messages are searchable
|
|
|
|
---
|
|
|
|
## Current Status Check
|
|
|
|
To see if messages are waiting in buffer:
|
|
|
|
```
|
|
/flush
|
|
```
|
|
|
|
If buffer is empty:
|
|
```
|
|
✅ No messages in buffer (already processed)
|
|
```
|
|
|
|
If messages are waiting:
|
|
```
|
|
⚡ Processing 3 buffered message(s)...
|
|
✅ Processed 3 messages → 5 signals extracted.
|
|
```
|
|
|
|
---
|
|
|
|
## Debugging
|
|
|
|
If messages aren't being found by `/ask`:
|
|
|
|
1. **Check if they're in the buffer:**
|
|
```
|
|
/flush
|
|
```
|
|
|
|
2. **Check if signals were extracted:**
|
|
```
|
|
/digest
|
|
```
|
|
Shows total signal count - this should increase after processing
|
|
|
|
3. **Check backend logs:**
|
|
Look for:
|
|
```
|
|
INFO: Processed batch: N signals from [group_name]
|
|
```
|
|
or
|
|
```
|
|
INFO: Timer flush: N signals from [group_name]
|
|
```
|
|
|
|
4. **Verify processing happened:**
|
|
After sending a message, wait BATCH_TIMEOUT_SECONDS, then check `/digest` again
|
|
|
|
---
|
|
|
|
## Recommended: Use Faster Settings for Testing
|
|
|
|
For now, update your `.env`:
|
|
|
|
```bash
|
|
BATCH_SIZE=2
|
|
BATCH_TIMEOUT_SECONDS=10
|
|
```
|
|
|
|
Then restart the bot:
|
|
```bash
|
|
# Ctrl+C to stop
|
|
python -m backend.bot.bot
|
|
```
|
|
|
|
Now messages will be processed within 10 seconds (or after just 2 messages).
|
|
|
|
---
|
|
|
|
## Quick Reference
|
|
|
|
| Setting | Production | Testing | Demo |
|
|
|---------|-----------|---------|------|
|
|
| BATCH_SIZE | 5 | 2 | 3 |
|
|
| BATCH_TIMEOUT_SECONDS | 60 | 10 | 15 |
|
|
| Processing delay | ~60s max | ~10s max | ~15s max |
|
|
| API efficiency | High | Medium | Medium |
|
|
|
|
Use `/flush` anytime you want instant processing!
|