B.Tech-Project-III/thirdeye/README.md

# ThirdEye 👁️

> **"Your team already has the answers. They're just trapped in chat."**

ThirdEye is a **multi-agent conversation intelligence engine** that passively monitors Telegram group chats, extracts structured knowledge in real-time, builds a living knowledge graph, detects patterns humans miss, and catches cross-team blind spots before they become crises.

It is not a chatbot. Not a summarizer. Not a search tool. It is a **conversation intelligence engine** — the first of its kind to work across multiple teams simultaneously and detect what falls through the cracks *between* them.

---

## Table of Contents

- [The Problem](#the-problem)
- [What ThirdEye Does](#what-thirdeye-does)
- [Key Features](#key-features)
- [Architecture Overview](#architecture-overview)
- [The Adaptive Lens System](#the-adaptive-lens-system)
- [Cross-Group Intelligence](#cross-group-intelligence)
- [Agent Pipeline](#agent-pipeline)
- [LLM Provider Stack](#llm-provider-stack)
- [Integrations](#integrations)
  - [Telegram Bot](#telegram-bot)
  - [Google Meet Extension](#google-meet-chrome-extension)
  - [Jira Integration](#jira-integration)
  - [Voice Message Intelligence](#voice-message-intelligence)
  - [Document & Link Ingestion](#document--link-ingestion)
- [Knowledge Base Dashboard](#knowledge-base-dashboard)
- [Tech Stack](#tech-stack)
- [Project Structure](#project-structure)
- [Setup & Installation](#setup--installation)
- [Running the System](#running-the-system)
- [Environment Variables](#environment-variables)
- [API Reference](#api-reference)
- [Data Models](#data-models)
- [Seeding Demo Data](#seeding-demo-data)

---

## The Problem

Every day, your team makes hundreds of decisions, discovers bugs, promises clients things, and solves problems — all in group chat. And every day, that intelligence vanishes as messages scroll past.

**The Knowledge Graveyard Problem:**
- Teams generate enormous value in group chats: decisions, recommendations, warnings, technical knowledge, client commitments
- 100% of this knowledge is lost within days as messages scroll past
- When someone asks *"why did we choose Redis?"* or *"what did we promise the client?"* — nobody remembers, and nobody can find it
- When Team A is blocked by Team B, but Team B doesn't know — nobody connects the dots until it's a crisis

**Who feels this pain:**
- Engineering teams — decisions lost, tech debt invisible, knowledge silos forming
- Product teams — feature requests scattered, roadmap drift unnoticed, priority conflicts buried
- Client-facing teams — promises untracked, scope creep uncaught, sentiment shifts missed
- Any team of 3+ people communicating in chat

---

## What ThirdEye Does

```
Telegram Groups ──► Signal Extraction ──► ChromaDB Knowledge Graph
                          │                        │
                    8 AI Agents               Pattern Detection
                          │                        │
                    Cross-Group ◄────── Blind Spot Detection
                    Intelligence                   │
                          │                    Dashboard
                    Proactive Alerts          + Telegram /ask
```

### Core Capabilities

| Capability | Description |
|---|---|
| **Passive Intelligence Extraction** | Silently monitors group messages and extracts structured signals — no commands, no tagging, no forms |
| **Adaptive Context Detection** | Auto-detects whether a group is a dev team, product team, client channel, or community, then activates the right extraction ontology |
| **Living Knowledge Graph** | Builds a persistent, queryable knowledge base in ChromaDB that grows smarter over time |
| **Natural Language Querying** | Any group member can ask questions and get community-sourced answers with source attribution |
| **Pattern Detection** | Identifies trends: "Tech debt mentions up 300% this sprint." "Only @raj discusses payment — bus factor = 1." |
| **Proactive Alerts** | Sends Telegram alerts for concerning patterns: overdue promises, sentiment shifts, knowledge silos, recurring bugs |
| **Cross-Group Intelligence ⭐** | When deployed in multiple groups simultaneously, detects blind spots *between* teams: blocked handoffs, conflicting decisions, information silos, promise-reality gaps |

---

## Key Features

### Signal Types Extracted

**DevLens (Engineering Teams)**
- `architecture_decision` — Technology choices with rationale, alternatives considered, decision owner
- `tech_debt` — Shortcuts, hardcoded values, "will fix later" patterns
- `knowledge_silo_evidence` — Only one person discusses a critical domain (bus factor = 1)
- `recurring_bug` — Same issue mentioned repeatedly across conversations
- `stack_decision` — Technology/framework choices (proposed or decided)
- `deployment_risk` — Risky deployment practices caught in chat
- `workaround` — Temporary fixes being applied repeatedly

**ProductLens (Product Teams)**
- `feature_request` — Features users or team members are asking for, with frequency
- `user_pain_point` — User difficulties, complaints, confusion with evidence
- `roadmap_drift` — Discussion of topics not on the current plan
- `priority_conflict` — Team members disagreeing on what's most important
- `metric_mention` — Specific numbers, conversion rates, performance data
- `user_quote` — Direct quotes from customers with sentiment
- `competitor_intel` — Mentions of competitor actions or features

**ClientLens (Agency / Client Channels)**
- `promise` — Commitments made with deadlines (explicit or implicit)
- `scope_creep` — Additional requests introduced casually without formal change
- `sentiment_signal` — Tone changes: growing frustration, formality shifts, positive praise
- `unanswered_request` — Client questions that haven't received responses (with time elapsed)
- `escalation_risk` — Mentions of involving management, expressing deadline concerns
- `client_decision` — Decisions made by the client

**MeetLens (Google Meet)**
- `meet_decision` — Decisions made during meeting
- `meet_action_item` — Actions committed to with owner
- `meet_blocker` — Blockers raised in meeting
- `meet_risk` — Risks identified during discussion
- `meet_summary` — AI-generated meeting summary

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│                          INPUT SOURCES                              │
├──────────────┬──────────────┬────────────────┬───────────────────── ┤
│  Telegram    │  Voice Notes │  Documents     │   Google Meet        │
│  Messages    │  (OGG/MP4)   │  (PDF/DOCX/TXT)│   (Chrome Extension) │
└──────┬───────┴──────┬───────┴───────┬────────┴──────┬──────────────┘
       │              │               │               │
       ▼              ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     PROCESSING PIPELINE                             │
│                                                                     │
│  1. Context Detector ──► Lens Assignment (dev/product/client/meet)  │
│  2. Signal Extractor ──► Structured signal extraction (lens-aware)  │
│  3. Classifier Agent ──► Sentiment, urgency, keyword tagging        │
│  4. Knowledge Architect ► Conflict resolution, entity linking       │
│                                                                     │
└─────────────────────────────────────────────────┬───────────────────┘
                                                  │
                                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     CHROMADB KNOWLEDGE GRAPH                        │
│              (Per-group vector collections, Cohere embeddings)      │
└───────────────────────────────────┬─────────────────────────────────┘
                                    │
              ┌─────────────────────┼──────────────────────┐
              ▼                     ▼                      ▼
┌─────────────────────┐  ┌──────────────────────┐  ┌──────────────────┐
│  Pattern Detector   │  │  Cross-Group Analyst  │  │   Query Agent    │
│  (within a group)   │  │  (between all groups) │  │  (RAG pipeline)  │
└──────────┬──────────┘  └──────────┬───────────┘  └────────┬─────────┘
           │                        │                        │
           ▼                        ▼                        ▼
┌──────────────────────────────────────────────────────────────────────┐
│                         OUTPUT LAYER                                 │
├───────────────┬───────────────────┬──────────────┬───────────────── ┤
│  Telegram     │  Next.js          │  Jira        │  Proactive       │
│  Bot Replies  │  Dashboard        │  Tickets     │  Alerts          │
│  (/ask, etc.) │  (Knowledge Graph,│  (Auto-raise │  (Patterns,      │
│               │   Timeline, Maps) │  + Manual)   │   Cross-group)   │
└───────────────┴───────────────────┴──────────────┴──────────────────┘
```

---

## The Adaptive Lens System

When ThirdEye joins a group, the **Context Detector Agent** reads the first 50–100 messages and auto-classifies into one of four modes. Users can override at any time with `/lens set dev` or combine with `/lens set dev+product`.

```
                 ┌─────────────────────────────────────────┐
                 │          Incoming Group Messages         │
                 └────────────────────┬────────────────────┘
                                      │
                                      ▼
                      ┌───────────────────────────┐
                      │   Context Detector Agent   │
                      │   (SambaNova 405B)         │
                      │   Reads first 50-100 msgs  │
                      └───────────────────────────┘
                            │     │     │     │
              ┌─────────────┘     │     │     └─────────────────┐
              ▼                   ▼     ▼                       ▼
       ┌──────────┐        ┌──────────┐ ┌──────────┐    ┌──────────────┐
       │ DevLens  │        │ Product  │ │  Client  │    │  Community   │
       │ (dev)    │        │ Lens     │ │  Lens    │    │  Lens        │
       │          │        │(product) │ │ (client) │    │ (community)  │
       │ Tech debt│        │ Features │ │ Promises │    │ Recommend-   │
       │ Arch dec │        │ Pain pts │ │ Sentiment│    │ ations,      │
       │ Bugs     │        │ Roadmap  │ │ Scope    │    │ Events,      │
       │ Silos    │        │ Metrics  │ │ Escalat. │    │ Local info   │
       └──────────┘        └──────────┘ └──────────┘    └──────────────┘
```

---

## Cross-Group Intelligence

This is the **killer feature**. When ThirdEye monitors 2+ groups, the Cross-Group Analyst Agent activates and finds things no human would catch:

### What It Detects

**Blocked Handoffs**
```
Dev Team:     "Still waiting for design specs" (mentioned 3× this week)
Design Team:  No mention of this deliverable anywhere
→ ALERT: "Likely handoff gap between dev and design teams."
```

**Conflicting Decisions**
```
Product Team: Decided "mobile-first approach" on Mar 15
Dev Team:     Discussing "API-first priority" on Mar 17
→ ALERT: "Potential misalignment between product and dev priorities."
```

**Information Silos**
```
Client Channel: "New regulatory requirement mentioned" on Mar 12
Dev + Product:  No mention of this requirement in either group
→ ALERT: "Critical client information not flowing to internal teams."
```

**Promise-Reality Gaps**
```
Product Team: "We'll have the dashboard ready by Friday for the client"
Dev Team:     "Dashboard completely blocked. No design specs. Week 2 of waiting."
→ ALERT: "Promise to client may not be deliverable. Timeline risk."
```

**Duplicated Effort**
```
Dev Team:     Building an internal analytics dashboard
Design Team:  Simultaneously prototyping a similar dashboard
→ ALERT: "Possible duplicated effort across teams."
```

---

## Agent Pipeline

ThirdEye uses **8 specialized AI agents**, each mapped to the optimal LLM for its task:

```
Message arrives from ANY Telegram group
         │
         ▼
┌─────────────────────────────┐
│  1. Context Detector Agent  │  Auto-classifies group type (one-time + periodic)
│     (SambaNova 405B)        │  Activates appropriate lens ontology
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  2. Signal Extractor Agent  │  Processes batched messages (every 5-10 msgs)
│     (Groq Llama 3.3 70B)   │  Extracts domain-specific signals per active lens
└─────────────┬───────────────┘
              │
         ┌────┴────┐
         ▼         ▼
┌──────────────┐  ┌──────────────────────┐
│  3. Classif. │  │  4. Entity           │
│     Agent    │  │     Resolver Agent   │
│  (Groq 8B)   │  │  (Groq 70B)          │
│  Tags type,  │  │  Resolves "this bug",│
│  sentiment,  │  │  "same issue" refs   │
│  urgency     │  │  across messages     │
└──────┬───────┘  └────────┬─────────────┘
       └────────┬───────────┘
                ▼
┌─────────────────────────────┐
│  5. Knowledge Architect     │  Builds + maintains ChromaDB knowledge graph
│     Agent                   │  Resolves conflicts, tracks temporal changes
│  (SambaNova 405B /          │  Links entities across conversations
│   OpenRouter Nemotron 120B) │
└─────────────┬───────────────┘
              │
         ┌────┴────┐
         ▼         ▼
┌──────────────┐  ┌──────────────────┐
│  6. Pattern  │  │  7. Cross-Group   │  ⭐ THE KILLER AGENT
│     Detector │  │     Analyst Agent │  Compares intelligence across
│     Agent    │  │  (SambaNova 405B) │  ALL deployed groups
│  (OpenRouter │  │                   │  Detects blind spots, conflicts,
│   Nemotron)  │  │                   │  information silos
└──────┬───────┘  └────────┬──────────┘
       └────────┬───────────┘
                ▼
┌─────────────────────────────┐
│  8. Query Agent             │  Handles /ask queries via agentic RAG
│  (Groq 70B — fast response) │  over ChromaDB + web search fallback
└─────────────────────────────┘
```

---

## LLM Provider Stack

ThirdEye uses **5 free LLM providers** with automatic fallback routing — zero API costs:

```
Groq (fastest) → Cerebras (fast) → SambaNova (405B) → OpenRouter (27 models) → Google AI Studio
```

| Provider | Key Models | Daily Capacity | Cost |
|---|---|---|---|
| **Groq** | Llama 3.3 70B, Llama 3.1 8B, **Whisper v3** | ~6,000 RPD | Free |
| **Cerebras** | Llama 3.3 70B, Llama 3.1 8B | ~2,000 RPD | Free |
| **SambaNova** | **Llama 3.1 405B**, Llama 3.3 70B | ~1,000 RPD | Free |
| **OpenRouter** | Nemotron 120B, GPT-OSS 120B, 25+ models | ~5,400 RPD | Free |
| **Cohere** | embed-english-v3.0 (embeddings), Rerank 3.5 | 1,000/month | Free |
| **Google AI Studio** | Gemini 2.5 Flash/Pro | ~50 RPD | Free |

**Combined daily capacity: ~10,000–15,000 requests for $0**

### Agent → Model Mapping

| Agent | Task Type | Primary | Fallback |
|---|---|---|---|
| Context Detector | Complex reasoning | SambaNova / Llama 3.1 405B | OpenRouter / Nemotron 120B |
| Signal Extractor | Structured extraction | Groq / Llama 3.3 70B | Cerebras / Llama 3.3 70B |
| Classifier | Fast classification | Groq / Llama 3.1 8B | Cerebras / Llama 3.1 8B |
| Pattern Detector | Trend analysis | OpenRouter / Nemotron 120B | SambaNova / 405B |
| Cross-Group Analyst | Cross-referencing | SambaNova / Llama 3.1 405B | OpenRouter / GPT-OSS 120B |
| Query Agent | Fast Q&A | Groq / Llama 3.3 70B | Cerebras / Llama 3.3 70B |
| Voice Transcriber | Audio-to-text | Groq / Whisper Large v3 | — |

---

## Integrations

### Telegram Bot

The bot is the primary input interface. It operates **passively** — no need to tag it or use commands for intelligence extraction to happen.

#### Commands

```
/start           — Welcome message and command reference
/ask [question]  — Query the knowledge base with natural language
/search [query]  — Explicit web search via Tavily
/digest          — Get an intelligence summary of the group
/lens [mode]     — Set or check detection mode (dev/product/client/community)
/flush           — Force-process the current message buffer immediately
/debug           — Show group config and signal count

/jira            — Show Jira integration status
/jirastatus      — Get latest status of a ticket (e.g. /jirastatus ENG-42)
/jirasearch      — Search Jira tickets by keyword
/jiraraised      — List all tickets raised by ThirdEye for this group
/jirawatch       — Watch a signal and auto-raise if it recurs

/voicelog        — List all voice note decisions extracted from this group
/meetsum         — Summarize the most recent Google Meet meeting
/meetask [q]     — Ask a question about meeting content
/meetmatch       — Find connections between meeting content and Telegram discussions
```

#### Passive Capabilities (no command needed)
- **Text messages** → batched every 5–10 messages → signal extraction pipeline
- **Documents** (PDF, DOCX, TXT, MD, CSV, JSON, LOG up to 10MB) → auto-ingested into knowledge graph
- **Voice notes** (OGG) and **video notes** (MP4) → transcribed by Groq Whisper → signal extracted
- **URLs in messages** → fetched → summarized → stored as `link_knowledge` signals

---

### Google Meet Chrome Extension

A Chrome extension that captures live Google Meet transcripts and feeds them into ThirdEye in real-time.

**How it works:**
1. Extension uses the browser's **Web Speech API** (free, no API key) for real-time transcription
2. Every 30 seconds, a transcript chunk is sent to the ThirdEye FastAPI backend
3. The backend's `MeetIngestor` agent extracts meeting signals (decisions, action items, blockers, risks)
4. All signals are stored in ChromaDB alongside your Telegram/document/voice signals
5. After the meeting, `/meetsum` and `/meetask` in Telegram give instant access to meeting intelligence

**Signal types extracted from meetings:**
- `meet_decision` — Formal decisions made during the call
- `meet_action_item` — "X will do Y by Z" commitments
- `meet_blocker` — Blockers raised during discussion
- `meet_risk` — Risks identified in conversation
- `meet_summary` — Full AI-generated meeting summary

**Cross-referencing:** `/meetmatch` connects meeting content to your Telegram history — surfacing prior discussions, related signals, and any contradictions between what was decided in the meeting vs. what's in the group chat.

---

### Jira Integration

ThirdEye connects to Atlassian Jira Cloud to convert extracted signals into actionable tickets.

#### How Signals Become Tickets

```
Signal in ChromaDB ──► Jira Agent ──► LLM generates well-formed ticket
                                               │
                              ┌────────────────┼────────────────────┐
                              ▼                ▼                    ▼
                         Issue Type       Priority           Description
                         (Bug/Task/       (based on         (summary +
                          Story)          severity)          raw quote +
                                                             entity context)
                                               │
                                               ▼
                                      Jira REST API v3
                                               │
                                               ▼
                              ┌────────────────┴────────────────┐
                              ▼                                  ▼
                     Ticket created                   `jira_raised` tracking
                     in Atlassian                     signal stored in
                     Cloud                            ChromaDB (prevents
                                                      duplicates)
```

#### Commands
```bash
/ask raise jira on [topic]               # Semantic search → bulk raise
/ask assign jira ticket for [X] to [Name] # Raise + assign in one command
/jirastatus ENG-42                        # Live ticket status from Jira
/jiraraised                               # List all tickets raised by ThirdEye
```

#### Auto-Raise Mode
When `JIRA_AUTO_RAISE=true`, ThirdEye automatically raises Jira tickets when:
- A signal with severity ≥ `JIRA_AUTO_RAISE_SEVERITY` is extracted
- A recurring bug pattern is detected (3+ occurrences)
- A critical blocker is identified in a meeting

---

### Voice Message Intelligence

**The differentiator:** Most chat intelligence products are text-only. ThirdEye is the first to process voice messages.

**How it works:**
```
Voice/Video Note Received
         │
         ▼
Download from Telegram (OGG/MP4)
         │
         ▼
Groq Whisper Large v3 Transcription
(7,200 sec/hour free tier — ~240 voice notes/hour)
         │
         ▼
Transcript → Signal Extraction Pipeline
(same as text messages)
         │
         ▼
Signals stored with full voice attribution:
- speaker: @username
- voice_duration: seconds
- voice_language: auto-detected
- voice_file_id: Telegram file reference
         │
         ▼
/voicelog shows all voice-sourced decisions
/ask cites voice notes: "Based on what @Raj said in a voice note on March 14th..."
```

**`/voicelog` examples:**
```
/voicelog                    → All voice note signals
/voicelog decisions          → Only decisions from voice
/voicelog @raj               → All voice notes from Raj
/voicelog api                → Voice notes mentioning "api"
```

---

### Document & Link Ingestion

**Document Types Supported:** PDF, DOCX, TXT, MD, CSV, JSON, LOG (up to 10MB)

**Processing Pipeline:**
```
File shared in Telegram group
         │
         ▼
Bot downloads to temp directory
         │
         ▼
Text extraction (PyPDF2 / python-docx / plain text)
         │
         ▼
Intelligent chunking (1,500 char chunks, 200 char overlap,
paragraph boundaries respected)
         │
         ▼
Each chunk stored as `document_knowledge` signal in ChromaDB
with metadata: filename, page number, shared_by, timestamp
         │
         ▼
Queryable alongside chat signals via /ask
```

**URL Processing:**
When a URL is detected in any message, ThirdEye:
1. Checks against skip-list (images, downloads, social embeds, Telegram links)
2. Fetches the page content with BeautifulSoup parsing
3. Summarizes with an LLM (2-4 sentence summary)
4. Stores as `link_knowledge` signal with source attribution
5. All happens in the background — no delay to the main message pipeline

---

## Knowledge Base Dashboard

A Next.js dashboard (located in `dashboard/`) that provides a full visual intelligence console.

### Pages

| Page | Description |
|---|---|
| **Knowledge Base** | Interactive network graph showing groups, signal clusters, and real-time data flow. Split-panel with Knowledge Browser for topic/date exploration |
| **Timeline** | Cross-group signal timeline with full filter support (severity, lens, type, date range) |
| **Patterns** | Pattern detection results with severity badges and recommendations |
| **Cross-Group** | Cross-group intelligence insights — the blind spot detector |
| **Jira** | Jira ticket manager — view, create, and track tickets raised by ThirdEye |
| **Meetings** | Google Meet session list with per-meeting signal breakdown and transcript viewer |

### Knowledge Base Network Map

The central visualization shows your intelligence network as a live graph:
- **Core Node** (violet diamond) — the ThirdEye Neural Engine
- **Group Nodes** (colored circles) — each monitored Telegram group, color-coded by lens type
- **Signal Satellites** — top 3 signal types per group orbiting each group node
- **Data Flow Lines** — animated neon traveling packets showing live data flow from groups to the core engine
- **Cross-edges** — connections between adjacent groups when cross-group intelligence is detected

### Knowledge Browser

A structured topic/date reference panel built alongside the graph:
- **Group selector** — browse any monitored group
- **Date range filter** — scope the view to specific time periods
- **AI-clustered topic chips** — topics derived from keyword frequency across all signals
- **Day-by-day timeline** — vertical scroll through days, each signal expandable to show raw quote, entities, and keywords
- **Footer stats** — total signals · topics count · days span

---

## Tech Stack

### Backend
| Component | Technology |
|---|---|
| Bot Framework | python-telegram-bot 21.9 |
| API Server | FastAPI 0.115 + Uvicorn |
| Vector Database | ChromaDB 1.0 (persistent, local) |
| Embeddings | Cohere embed-english-v3.0 (primary) + sentence-transformers all-MiniLM-L6-v2 (fallback) |
| LLM Routing | Custom multi-provider router (openai SDK) |
| Audio Transcription | Groq Whisper Large v3 |
| Web Search | Tavily (1,000 searches/month free) |
| HTTP Client | httpx (async) |
| HTML Parsing | BeautifulSoup4 |
| Document Parsing | PyPDF2 + python-docx |
| Data Validation | Pydantic v2 |

### Frontend
| Component | Technology |
|---|---|
| Framework | Next.js 15 (App Router) |
| UI | Tailwind CSS |
| Graph Visualization | Custom SVG with CSS animations |
| State | React hooks (no external state library) |
| API Client | Native fetch (proxied via Next.js rewrites) |

---

## Project Structure

```
binary_hack/
│
├── backend/                          # Python backend
│   ├── api/
│   │   └── routes.py                 # FastAPI endpoints
│   │
│   ├── agents/
│   │   ├── signal_extractor.py       # Core signal extraction (lens-aware)
│   │   ├── classifier.py             # Sentiment, urgency, keyword tagging
│   │   ├── context_detector.py       # Auto-detect group type from messages
│   │   ├── query_agent.py            # RAG pipeline context formatter
│   │   ├── pattern_detector.py       # Detect recurring patterns in signals
│   │   ├── cross_group_analyst.py    # Cross-group blind spot detection
│   │   ├── jira_agent.py             # Jira ticket generation + bulk raise
│   │   ├── document_ingestor.py      # PDF/DOCX/TXT extraction + chunking
│   │   ├── voice_handler.py          # Voice download + pipeline orchestration
│   │   ├── voice_transcriber.py      # Groq Whisper transcription
│   │   ├── meet_ingestor.py          # Google Meet transcript chunk processing
│   │   ├── meet_cross_ref.py         # Meeting ↔ chat signal cross-referencing
│   │   ├── web_search.py             # Tavily web search integration
│   │   ├── link_fetcher.py           # URL extraction, fetch, summarize
│   │   └── json_utils.py             # Robust JSON extraction from LLM outputs
│   │
│   ├── bot/
│   │   ├── bot.py                    # Telegram bot entry point + all handlers
│   │   └── commands.py               # /voicelog command
│   │
│   ├── db/
│   │   ├── chroma.py                 # ChromaDB CRUD operations
│   │   ├── embeddings.py             # Cohere + fallback embedding client
│   │   └── models.py                 # Pydantic models: Signal, Pattern, etc.
│   │
│   ├── integrations/
│   │   └── jira_client.py            # Atlassian REST API v3 client
│   │
│   ├── pipeline.py                   # Core orchestration: batch → signals → store → query
│   ├── providers.py                  # Multi-provider LLM router with fallback
│   └── config.py                     # All env vars via python-dotenv
│
├── dashboard/                        # Next.js frontend
│   ├── app/
│   │   ├── knowledge-base/           # Network graph + Knowledge Browser
│   │   │   ├── page.tsx
│   │   │   ├── NetworkMap.tsx        # SVG network visualization with animated edges
│   │   │   ├── KnowledgeBrowser.tsx  # Topic/date knowledge reference panel
│   │   │   ├── RightPanelTabs.tsx    # Tab switcher: Knowledge ↔ Query
│   │   │   ├── EntityPanel.tsx       # Deep query panel
│   │   │   ├── FloatingControls.tsx  # Graph controls HUD
│   │   │   ├── SystemTickerKnowledge.tsx
│   │   │   └── knowledge.css
│   │   │
│   │   ├── timeline/                 # Cross-group signal timeline
│   │   ├── jira/                     # Jira ticket management
│   │   ├── meetings/                 # Google Meet session viewer
│   │   ├── components/               # Shared UI components
│   │   └── lib/
│   │       └── api.ts                # Typed API client for all endpoints
│   │
│   └── next.config.ts                # Proxies /api → localhost:8000
│
├── chroma_db/                        # ChromaDB persistent storage (git-ignored)
├── docs/                             # Architecture docs and implementation guides
│   ├── sot.md                        # Single Source of Truth (full system design)
│   ├── implementation.md             # Milestone-by-milestone build guide
│   ├── additional.md                 # Milestones 11-13 (docs, web search, links)
│   ├── Voice_milestones.md           # Milestones 20-22 (voice intelligence)
│   ├── jira_milestones.md            # Milestones 17-19 (Jira integration)
│   ├── meet_extension.md             # Milestones 14-16 (Google Meet)
│   └── BATCHING_CONFIG.md            # Message batching configuration guide
│
├── .env                              # Environment variables (not in git)
├── requirements.txt                  # Python dependencies
└── README.md                         # This file
```

---

## Setup & Installation

### Prerequisites

- Python 3.11+
- Node.js 18+
- A Telegram account + bot token (from [@BotFather](https://t.me/BotFather))
- Free API keys (see below — all free, no credit card)

### Step 1 — Clone & install Python dependencies

```bash
git clone <repo-url>
cd binary_hack
pip install -r requirements.txt
```

### Step 2 — Install dashboard dependencies

```bash
cd dashboard
npm install
```

### Step 3 — Get API keys (all free)

Open each URL simultaneously in browser tabs:

| Service | URL | What you need |
|---|---|---|
| **Groq** | https://console.groq.com | API Key → used for Llama 3.3 70B + Whisper |
| **Cerebras** | https://cloud.cerebras.ai | API Key → fallback LLM provider |
| **SambaNova** | https://cloud.sambanova.ai | API Key → Llama 3.1 405B (complex reasoning) |
| **OpenRouter** | https://openrouter.ai | API Key → 27 free models, ultimate fallback |
| **Cohere** | https://dashboard.cohere.com | API Key → embeddings for vector search |
| **Telegram Bot** | [@BotFather](https://t.me/BotFather) | `/newbot` → bot token |
| **Tavily** | https://tavily.com | API Key → web search (1,000/month free) |
| **Google AI Studio** | https://aistudio.google.com | API Key → Gemini (last resort fallback) |

### Step 4 — Configure environment

Copy `.env.example` to `.env` and fill in your keys:

```bash
cp .env.example .env
# Edit .env with your keys
```

### Step 5 — Enable Telegram privacy mode (CRITICAL)

Before your bot can read group messages:
1. Open Telegram → Search **@BotFather**
2. Send `/setprivacy`
3. Select your bot
4. Choose **Disable**

### Step 6 — Seed demo data (optional)

To populate ChromaDB with realistic demo data across 3 simulated groups (Dev team, Product team, Client channel):

```bash
python scripts/seed_demo.py
```

This seeds ~30 signals across 3 groups and runs cross-group analysis to demonstrate the blind spot detection.

---

## Running the System

### Option A — Run everything together (recommended)

```bash
python run_all.py
```

This starts both the Telegram bot and the FastAPI server simultaneously.

### Option B — Run services separately

```bash
# Terminal 1: FastAPI backend
python run_api.py
# Runs at http://localhost:8000

# Terminal 2: Telegram bot
python run_bot.py

# Terminal 3: Next.js dashboard
cd dashboard
npm run dev
# Runs at http://localhost:3000
```

### Verify the system is running

```bash
curl http://localhost:8000/health
# → {"status":"ok","service":"thirdeye"}

curl http://localhost:8000/api/groups
# → {"groups": [...]}
```

---

## Environment Variables

```bash
# ── Telegram ──────────────────────────────────────────────────────
TELEGRAM_BOT_TOKEN=          # From @BotFather

# ── LLM Providers (all free, no credit card) ──────────────────────
GROQ_API_KEY=                # console.groq.com — Llama 3.3 70B + Whisper
GROQ_API_KEY_2=              # Optional: second key for rate limit rotation
GROQ_API_KEY_3=              # Optional: third key for rate limit rotation
CEREBRAS_API_KEY=            # cloud.cerebras.ai — fast inference
SAMBANOVA_API_KEY=           # cloud.sambanova.ai — 405B reasoning
OPENROUTER_API_KEY=          # openrouter.ai — 27 free models
GEMINI_API_KEY=              # aistudio.google.com — last resort fallback

# ── Embeddings ────────────────────────────────────────────────────
COHERE_API_KEY=              # dashboard.cohere.com — vector embeddings

# ── Storage ───────────────────────────────────────────────────────
CHROMA_DB_PATH=./chroma_db   # Where ChromaDB persists to disk

# ── Message Batching ──────────────────────────────────────────────
BATCH_SIZE=5                 # Messages to buffer before processing
BATCH_TIMEOUT_SECONDS=60     # Flush buffer after this many idle seconds

# ── Web Search ────────────────────────────────────────────────────
TAVILY_API_KEY=              # tavily.com — web search fallback (1,000/month free)

# ── Google Meet Extension ─────────────────────────────────────────
MEET_INGEST_SECRET=          # Shared secret for extension authentication
MEET_DEFAULT_GROUP_ID=meet_sessions
ENABLE_MEET_INGESTION=true
MEET_CROSS_REF_GROUPS=       # Comma-separated group IDs to cross-reference

# ── Jira Integration ──────────────────────────────────────────────
JIRA_BASE_URL=               # https://your-org.atlassian.net
JIRA_EMAIL=                  # Your Atlassian email
JIRA_API_TOKEN=              # Atlassian API token (not your password)
JIRA_DEFAULT_PROJECT=ENG     # Default project key
JIRA_DEFAULT_ISSUE_TYPE=Task
ENABLE_JIRA=true
JIRA_AUTO_RAISE=false        # Auto-raise critical signals as Jira tickets
JIRA_AUTO_RAISE_SEVERITY=high

# ── Voice Message Intelligence ────────────────────────────────────
ENABLE_VOICE_TRANSCRIPTION=true
VOICE_MAX_DURATION_SECONDS=300   # Skip voice notes longer than 5 minutes
VOICE_MIN_DURATION_SECONDS=2     # Skip sub-2-second recordings
VOICE_LANGUAGE=                  # Leave empty for Whisper auto-detect
VOICE_STORE_TRANSCRIPT=true      # Store raw transcript in ChromaDB
```

---

## API Reference

All endpoints are served by FastAPI at `http://localhost:8000`. The Next.js dashboard proxies `/api/*` to this backend.

### Groups & Signals

```
GET  /api/groups
     → List all monitored groups with signal counts and lens

GET  /api/groups/{group_id}/signals
     ?signal_type=&severity=&lens=&date_from=&date_to=
     → Get signals with filters

POST /api/groups/{group_id}/query
     Body: {"question": "..."}
     → Natural language RAG query over the group's knowledge base

GET  /api/groups/{group_id}/patterns
     → Run pattern detection and return results

GET  /api/cross-group/insights
     → Run cross-group analysis across all monitored groups

GET  /api/signals/timeline
     ?group_id=&severity=&lens=&signal_type=&date_from=&date_to=&limit=200
     → Cross-group signal timeline, newest first

GET  /health
     → {"status": "ok", "service": "thirdeye"}
```

### Knowledge Browser

```
GET  /api/knowledge/browse/{group_id}
     ?date_from=&date_to=&topic=
     → Topics (AI-clustered from keywords) + day-by-day timeline
     → Response: {group_id, group_name, total_signals, date_range, topics[], timeline[]}
```

### Google Meet

```
POST /api/meet/start          → Called by Chrome extension when meeting begins
POST /api/meet/ingest         → Called every 30s with transcript chunk
GET  /api/meet/meetings       → List all recorded meetings
GET  /api/meet/meetings/{id}  → Meeting detail with signal breakdown
GET  /api/meet/meetings/{id}/signals   → All signals for a meeting
GET  /api/meet/meetings/{id}/transcript → Raw transcript chunks
```

### Jira

```
GET  /api/jira/tickets        ?group_id=&date_from=&date_to=&live=
POST /api/jira/raise          Body: {signal_id, group_id, project_key, force}
POST /api/jira/create         Body: {summary, description, project_key, priority, ...}
GET  /api/jira/tickets/{key}/status   → Live status from Jira
GET  /api/jira/users/search   ?q=name  → Assignee search
GET  /api/jira/config         → Check Jira configuration
```

---

## Data Models

### Signal

The core unit of extracted intelligence:

```python
class Signal(BaseModel):
    id: str                   # UUID
    group_id: str             # Telegram group ID
    lens: str                 # dev | product | client | community | meet
    type: str                 # architecture_decision | tech_debt | feature_request | ...
    summary: str              # One-sentence description
    entities: list[str]       # [@people, technologies, concepts]
    severity: str             # low | medium | high | critical
    status: str               # proposed | decided | implemented | unresolved
    sentiment: str            # positive | neutral | negative | urgent
    urgency: str              # none | low | medium | high | critical
    raw_quote: str            # Verbatim message text that triggered this signal
    keywords: list[str]       # 3-5 searchable keywords
    timestamp: str            # ISO 8601
    # Voice attribution (voice-sourced signals only)
    source: str               # voice | text | document | link | meet
    speaker: str              # @username
    voice_duration: int       # seconds
    # Jira tracking
    jira_key: str             # e.g. ENG-42
    jira_url: str
```

### Pattern

```python
class Pattern(BaseModel):
    id: str
    group_id: str
    type: str                 # frequency_spike | knowledge_silo | recurring_issue | sentiment_trend | stale_item
    description: str
    severity: str             # info | warning | critical
    evidence_signal_ids: list[str]
    recommendation: str
    detected_at: str
    is_active: bool
```

### CrossGroupInsight

```python
class CrossGroupInsight(BaseModel):
    id: str
    type: str                 # blocked_handoff | conflicting_decision | information_silo | promise_reality_gap | duplicated_effort
    description: str
    group_a: dict             # {name, group_id, evidence}
    group_b: dict
    severity: str             # warning | critical
    recommendation: str
    detected_at: str
    is_resolved: bool
```

---

## Seeding Demo Data

Run the demo seeder to populate 3 realistic groups with interconnected signals that demonstrate cross-group intelligence:

```bash
python scripts/seed_demo.py
```

This seeds:
- **Acme Dev Team** — 13 messages → signals including recurring timeout bug, knowledge silo (only Raj knows payment), blocked dashboard waiting for design specs
- **Acme Product Team** — 10 messages → priority conflict (mobile vs API), promise to client about Friday dashboard demo, conversion metric drop
- **Acme ↔ ClientCo** — 9 messages → client escalation risk, scope creep (PDF export + dark mode), unacknowledged deadline pressure

**The cross-group analysis will surface:**
- Dev team blocked on design specs → Product team never mentions this
- Product team promised Friday demo → Dev team says "blocked, week 2 of waiting"
- Client escalation risk not visible to dev or product teams

After seeding, verify:
```bash
curl http://localhost:8000/api/groups
curl http://localhost:8000/api/cross-group/insights
curl http://localhost:8000/api/knowledge/browse/acme_dev
```

---

## Competitive Positioning

```
                    REACTIVE ◄──────────────────────► PROACTIVE
                         │                               │
               Slack AI  │                               │  ★ ThirdEye
              (summaries │                               │  (persistent intelligence,
               on demand)│                               │   pattern detection,
                         │                               │   cross-group analysis,
                  Prism  │                               │   proactive alerts,
            (product     │                               │   voice + docs + Meet)
             signals)    │                               │
                         │                               │
         SINGLE GROUP ◄──┼───────────────────────────────┼──► CROSS-GROUP
                         │                               │
              Runbear    │                               │
              (Q&A only) │                               │
```

**ThirdEye occupies the only empty quadrant: Proactive + Cross-Group.**

No other product on the market today passively builds intelligence across multiple team chats simultaneously and detects blind spots between them. That is a new category.

---

*Built at $0 in API costs. Every provider used has a free tier. No credit card required for any dependency.*