B.Tech-Project-III/thirdeye/docs/BATCHING_CONFIG.md

# Message Batching Configuration

ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.

---

## How Batching Works

Messages are processed when either condition is met:
1. **Buffer reaches BATCH_SIZE** messages (processes immediately)
2. **Timer reaches BATCH_TIMEOUT_SECONDS** (processes whatever is in buffer)

This prevents processing every single message individually while ensuring messages don't wait forever.

---

## Configuration Variables

Edit `.env`:

```bash
BATCH_SIZE=5                  # Process after N messages
BATCH_TIMEOUT_SECONDS=60      # Or wait this many seconds
```

### Recommended Settings

**For Production (efficiency priority):**
```bash
BATCH_SIZE=5
BATCH_TIMEOUT_SECONDS=60
```
- Batches 5 messages together (reduces API calls)
- Waits up to 60 seconds for more messages
- Good for active groups with frequent messages

**For Testing/Demo (speed priority):**
```bash
BATCH_SIZE=3
BATCH_TIMEOUT_SECONDS=15
```
- Processes after just 3 messages
- Only waits 15 seconds max
- Better for testing and demos where you want quick feedback

**For Low-Volume Groups:**
```bash
BATCH_SIZE=3
BATCH_TIMEOUT_SECONDS=30
```
- Smaller batch size (triggers faster)
- Moderate timeout (30 seconds)

**For High-Volume Groups:**
```bash
BATCH_SIZE=10
BATCH_TIMEOUT_SECONDS=90
```
- Larger batches (more efficient)
- Longer timeout (less frequent processing)

---

## Manual Flush Command

If you don't want to wait for the timer, use:

```
/flush
```

This immediately processes all buffered messages for the current group.

**When to use /flush:**
- During testing/demos (want instant results)
- After important messages (need to query them right away)
- When buffer has accumulated but timer hasn't triggered yet

---

## Example Scenarios

### Scenario 1: Active Group Chat (10 messages in 30 seconds)

With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
- First 5 messages → processed immediately
- Next 5 messages → processed immediately
- Total: 2 batch operations

### Scenario 2: Slow Group Chat (1 message, then silence)

With `BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60`:
- 1 message arrives → starts 60s timer
- 60 seconds pass → timer triggers → processes the 1 message
- Total: 1 batch operation after 60s delay

**Use `/flush` to skip the wait!**

### Scenario 3: Demo with Testing Settings

With `BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15`:
- Send test message
- Wait 15 seconds OR send 2 more messages
- Processed automatically
- Can query with `/ask` immediately after

---

## Cost vs Speed Trade-offs

### Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):
- ✅ Instant results
- ❌ More API calls (expensive)
- ❌ More LLM tokens used
- ❌ Slower overall (more network overhead)

### Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):
- ✅ Fewer API calls (cheaper)
- ✅ More context for LLM (better signal extraction)
- ✅ More efficient
- ❌ Slight delay before messages are searchable

---

## Current Status Check

To see if messages are waiting in buffer:

```
/flush
```

If buffer is empty:
```
✅ No messages in buffer (already processed)
```

If messages are waiting:
```
⚡ Processing 3 buffered message(s)...
✅ Processed 3 messages → 5 signals extracted.
```

---

## Debugging

If messages aren't being found by `/ask`:

1. **Check if they're in the buffer:**
   ```
   /flush
   ```

2. **Check if signals were extracted:**
   ```
   /digest
   ```
   Shows total signal count - this should increase after processing

3. **Check backend logs:**
   Look for:
   ```
   INFO: Processed batch: N signals from [group_name]
   ```
   or
   ```
   INFO: Timer flush: N signals from [group_name]
   ```

4. **Verify processing happened:**
   After sending a message, wait BATCH_TIMEOUT_SECONDS, then check `/digest` again

---

## Recommended: Use Faster Settings for Testing

For now, update your `.env`:

```bash
BATCH_SIZE=2
BATCH_TIMEOUT_SECONDS=10
```

Then restart the bot:
```bash
# Ctrl+C to stop
python -m backend.bot.bot
```

Now messages will be processed within 10 seconds (or after just 2 messages).

---

## Quick Reference

| Setting | Production | Testing | Demo |
|---------|-----------|---------|------|
| BATCH_SIZE | 5 | 2 | 3 |
| BATCH_TIMEOUT_SECONDS | 60 | 10 | 15 |
| Processing delay | ~60s max | ~10s max | ~15s max |
| API efficiency | High | Medium | Medium |

Use `/flush` anytime you want instant processing!