mirror of
https://github.com/arkorty/B.Tech-Project-III.git
synced 2026-04-19 12:41:48 +00:00
init
This commit is contained in:
204
thirdeye/docs/BATCHING_CONFIG.md
Normal file
204
thirdeye/docs/BATCHING_CONFIG.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Message Batching Configuration
|
||||
|
||||
ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.
|
||||
|
||||
---
|
||||
|
||||
## How Batching Works
|
||||
|
||||
Messages are processed when either condition is met:
|
||||
1. **Buffer reaches BATCH_SIZE** messages (processes immediately)
|
||||
2. **Timer reaches BATCH_TIMEOUT_SECONDS** (processes whatever is in buffer)
|
||||
|
||||
This prevents processing every single message individually while ensuring messages don't wait forever.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Variables
|
||||
|
||||
Edit `.env`:
|
||||
|
||||
```bash
|
||||
BATCH_SIZE=5 # Process after N messages
|
||||
BATCH_TIMEOUT_SECONDS=60 # Or wait this many seconds
|
||||
```
|
||||
|
||||
### Recommended Settings
|
||||
|
||||
**For Production (efficiency priority):**
|
||||
```bash
|
||||
BATCH_SIZE=5
|
||||
BATCH_TIMEOUT_SECONDS=60
|
||||
```
|
||||
- Batches 5 messages together (reduces API calls)
|
||||
- Waits up to 60 seconds for more messages
|
||||
- Good for active groups with frequent messages
|
||||
|
||||
**For Testing/Demo (speed priority):**
|
||||
```bash
|
||||
BATCH_SIZE=3
|
||||
BATCH_TIMEOUT_SECONDS=15
|
||||
```
|
||||
- Processes after just 3 messages
|
||||
- Only waits 15 seconds max
|
||||
- Better for testing and demos where you want quick feedback
|
||||
|
||||
**For Low-Volume Groups:**
|
||||
```bash
|
||||
BATCH_SIZE=3
|
||||
BATCH_TIMEOUT_SECONDS=30
|
||||
```
|
||||
- Smaller batch size (triggers faster)
|
||||
- Moderate timeout (30 seconds)
|
||||
|
||||
**For High-Volume Groups:**
|
||||
```bash
|
||||
BATCH_SIZE=10
|
||||
BATCH_TIMEOUT_SECONDS=90
|
||||
```
|
||||
- Larger batches (more efficient)
|
||||
- Longer timeout (less frequent processing)
|
||||
|
||||
---
|
||||
|
||||
## Manual Flush Command
|
||||
|
||||
If you don't want to wait for the timer, use:
|
||||
|
||||
```
|
||||
/flush
|
||||
```
|
||||
|
||||
This immediately processes all buffered messages for the current group.
|
||||
|
||||
**When to use /flush:**
|
||||
- During testing/demos (want instant results)
|
||||
- After important messages (need to query them right away)
|
||||
- When buffer has accumulated but timer hasn't triggered yet
|
||||
|
||||
---
|
||||
|
||||
## Example Scenarios
|
||||
|
||||
### Scenario 1: Active Group Chat (10 messages in 30 seconds)
|
||||
|
||||
With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
|
||||
- First 5 messages → processed immediately
|
||||
- Next 5 messages → processed immediately
|
||||
- Total: 2 batch operations
|
||||
|
||||
### Scenario 2: Slow Group Chat (1 message, then silence)
|
||||
|
||||
With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
|
||||
- 1 message arrives → starts 60s timer
|
||||
- 60 seconds pass → timer triggers → processes the 1 message
|
||||
- Total: 1 batch operation after 60s delay
|
||||
|
||||
**Use `/flush` to skip the wait!**
|
||||
|
||||
### Scenario 3: Demo with Testing Settings
|
||||
|
||||
With `BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15`:
|
||||
- Send test message
|
||||
- Wait 15 seconds OR send 2 more messages
|
||||
- Processed automatically
|
||||
- Can query with `/ask` immediately after
|
||||
|
||||
---
|
||||
|
||||
## Cost vs Speed Trade-offs
|
||||
|
||||
### Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):
|
||||
- ✅ Instant results
|
||||
- ❌ More API calls (expensive)
|
||||
- ❌ More LLM tokens used
|
||||
- ❌ Slower overall (more network overhead)
|
||||
|
||||
### Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):
|
||||
- ✅ Fewer API calls (cheaper)
|
||||
- ✅ More context for LLM (better signal extraction)
|
||||
- ✅ More efficient
|
||||
- ❌ Slight delay before messages are searchable
|
||||
|
||||
---
|
||||
|
||||
## Current Status Check
|
||||
|
||||
To see if messages are waiting in buffer:
|
||||
|
||||
```
|
||||
/flush
|
||||
```
|
||||
|
||||
If buffer is empty:
|
||||
```
|
||||
✅ No messages in buffer (already processed)
|
||||
```
|
||||
|
||||
If messages are waiting:
|
||||
```
|
||||
⚡ Processing 3 buffered message(s)...
|
||||
✅ Processed 3 messages → 5 signals extracted.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Debugging
|
||||
|
||||
If messages aren't being found by `/ask`:
|
||||
|
||||
1. **Check if they're in the buffer:**
|
||||
```
|
||||
/flush
|
||||
```
|
||||
|
||||
2. **Check if signals were extracted:**
|
||||
```
|
||||
/digest
|
||||
```
|
||||
Shows total signal count - this should increase after processing
|
||||
|
||||
3. **Check backend logs:**
|
||||
Look for:
|
||||
```
|
||||
INFO: Processed batch: N signals from [group_name]
|
||||
```
|
||||
or
|
||||
```
|
||||
INFO: Timer flush: N signals from [group_name]
|
||||
```
|
||||
|
||||
4. **Verify processing happened:**
|
||||
After sending a message, wait BATCH_TIMEOUT_SECONDS, then check `/digest` again
|
||||
|
||||
---
|
||||
|
||||
## Recommended: Use Faster Settings for Testing
|
||||
|
||||
For now, update your `.env`:
|
||||
|
||||
```bash
|
||||
BATCH_SIZE=2
|
||||
BATCH_TIMEOUT_SECONDS=10
|
||||
```
|
||||
|
||||
Then restart the bot:
|
||||
```bash
|
||||
# Ctrl+C to stop
|
||||
python -m backend.bot.bot
|
||||
```
|
||||
|
||||
Now messages will be processed within 10 seconds (or after just 2 messages).
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Setting | Production | Testing | Demo |
|
||||
|---------|-----------|---------|------|
|
||||
| BATCH_SIZE | 5 | 2 | 3 |
|
||||
| BATCH_TIMEOUT_SECONDS | 60 | 10 | 15 |
|
||||
| Processing delay | ~60s max | ~10s max | ~15s max |
|
||||
| API efficiency | High | Medium | Medium |
|
||||
|
||||
Use `/flush` anytime you want instant processing!
|
||||
Reference in New Issue
Block a user