Files
B.Tech-Project-III/thirdeye/docs/BATCHING_CONFIG.md
2026-04-05 00:43:23 +05:30

4.4 KiB

Message Batching Configuration

ThirdEye batches messages before processing to improve efficiency and reduce API costs. Here's how to configure it for your needs.


How Batching Works

Messages are processed when either condition is met:

  1. Buffer reaches BATCH_SIZE messages (processes immediately)
  2. Timer reaches BATCH_TIMEOUT_SECONDS (processes whatever is in buffer)

This prevents processing every single message individually while ensuring messages don't wait forever.


Configuration Variables

Edit .env:

BATCH_SIZE=5                  # Process after N messages
BATCH_TIMEOUT_SECONDS=60      # Or wait this many seconds

For Production (efficiency priority):

BATCH_SIZE=5
BATCH_TIMEOUT_SECONDS=60
  • Batches 5 messages together (reduces API calls)
  • Waits up to 60 seconds for more messages
  • Good for active groups with frequent messages

For Testing/Demo (speed priority):

BATCH_SIZE=3
BATCH_TIMEOUT_SECONDS=15
  • Processes after just 3 messages
  • Only waits 15 seconds max
  • Better for testing and demos where you want quick feedback

For Low-Volume Groups:

BATCH_SIZE=3
BATCH_TIMEOUT_SECONDS=30
  • Smaller batch size (triggers faster)
  • Moderate timeout (30 seconds)

For High-Volume Groups:

BATCH_SIZE=10
BATCH_TIMEOUT_SECONDS=90
  • Larger batches (more efficient)
  • Longer timeout (less frequent processing)

Manual Flush Command

If you don't want to wait for the timer, use:

/flush

This immediately processes all buffered messages for the current group.

When to use /flush:

  • During testing/demos (want instant results)
  • After important messages (need to query them right away)
  • When buffer has accumulated but timer hasn't triggered yet

Example Scenarios

Scenario 1: Active Group Chat (10 messages in 30 seconds)

With BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60:

  • First 5 messages → processed immediately
  • Next 5 messages → processed immediately
  • Total: 2 batch operations

Scenario 2: Slow Group Chat (1 message, then silence)

With BATCH_SIZE=5, BATCH_TIMEOUT_SECONDS=60:

  • 1 message arrives → starts 60s timer
  • 60 seconds pass → timer triggers → processes the 1 message
  • Total: 1 batch operation after 60s delay

Use /flush to skip the wait!

Scenario 3: Demo with Testing Settings

With BATCH_SIZE=3, BATCH_TIMEOUT_SECONDS=15:

  • Send test message
  • Wait 15 seconds OR send 2 more messages
  • Processed automatically
  • Can query with /ask immediately after

Cost vs Speed Trade-offs

Processing every message individually (BATCH_SIZE=1, BATCH_TIMEOUT_SECONDS=0):

  • Instant results
  • More API calls (expensive)
  • More LLM tokens used
  • Slower overall (more network overhead)

Batching messages (BATCH_SIZE=5+, BATCH_TIMEOUT_SECONDS=60):

  • Fewer API calls (cheaper)
  • More context for LLM (better signal extraction)
  • More efficient
  • Slight delay before messages are searchable

Current Status Check

To see if messages are waiting in buffer:

/flush

If buffer is empty:

✅ No messages in buffer (already processed)

If messages are waiting:

⚡ Processing 3 buffered message(s)...
✅ Processed 3 messages → 5 signals extracted.

Debugging

If messages aren't being found by /ask:

  1. Check if they're in the buffer:

    /flush
    
  2. Check if signals were extracted:

    /digest
    

    Shows total signal count - this should increase after processing

  3. Check backend logs: Look for:

    INFO: Processed batch: N signals from [group_name]
    

    or

    INFO: Timer flush: N signals from [group_name]
    
  4. Verify processing happened: After sending a message, wait BATCH_TIMEOUT_SECONDS, then check /digest again


For now, update your .env:

BATCH_SIZE=2
BATCH_TIMEOUT_SECONDS=10

Then restart the bot:

# Ctrl+C to stop
python -m backend.bot.bot

Now messages will be processed within 10 seconds (or after just 2 messages).


Quick Reference

Setting Production Testing Demo
BATCH_SIZE 5 2 3
BATCH_TIMEOUT_SECONDS 60 10 15
Processing delay ~60s max ~10s max ~15s max
API efficiency High Medium Medium

Use /flush anytime you want instant processing!