init

2026-04-19 12:41:48 +00:00 · 2026-04-05 00:43:23 +05:30
commit 8be37d3e92
425 changed files with 101853 additions and 0 deletions
--- a/thirdeye/README.md
+++ b/thirdeye/README.md
@@ -0,0 +1,996 @@
+# ThirdEye 👁️
+
+> **"Your team already has the answers. They're just trapped in chat."**
+
+ThirdEye is a **multi-agent conversation intelligence engine** that passively monitors Telegram group chats, extracts structured knowledge in real-time, builds a living knowledge graph, detects patterns humans miss, and catches cross-team blind spots before they become crises.
+
+It is not a chatbot. Not a summarizer. Not a search tool. It is a **conversation intelligence engine** — the first of its kind to work across multiple teams simultaneously and detect what falls through the cracks *between* them.
+
+---
+
+## Table of Contents
+
+- [The Problem](#the-problem)
+- [What ThirdEye Does](#what-thirdeye-does)
+- [Key Features](#key-features)
+- [Architecture Overview](#architecture-overview)
+- [The Adaptive Lens System](#the-adaptive-lens-system)
+- [Cross-Group Intelligence](#cross-group-intelligence)
+- [Agent Pipeline](#agent-pipeline)
+- [LLM Provider Stack](#llm-provider-stack)
+- [Integrations](#integrations)
+  - [Telegram Bot](#telegram-bot)
+  - [Google Meet Extension](#google-meet-chrome-extension)
+  - [Jira Integration](#jira-integration)
+  - [Voice Message Intelligence](#voice-message-intelligence)
+  - [Document & Link Ingestion](#document--link-ingestion)
+- [Knowledge Base Dashboard](#knowledge-base-dashboard)
+- [Tech Stack](#tech-stack)
+- [Project Structure](#project-structure)
+- [Setup & Installation](#setup--installation)
+- [Running the System](#running-the-system)
+- [Environment Variables](#environment-variables)
+- [API Reference](#api-reference)
+- [Data Models](#data-models)
+- [Seeding Demo Data](#seeding-demo-data)
+
+---
+
+## The Problem
+
+Every day, your team makes hundreds of decisions, discovers bugs, promises clients things, and solves problems — all in group chat. And every day, that intelligence vanishes as messages scroll past.
+
+**The Knowledge Graveyard Problem:**
+- Teams generate enormous value in group chats: decisions, recommendations, warnings, technical knowledge, client commitments
+- 100% of this knowledge is lost within days as messages scroll past  
+- When someone asks *"why did we choose Redis?"* or *"what did we promise the client?"* — nobody remembers, and nobody can find it
+- When Team A is blocked by Team B, but Team B doesn't know — nobody connects the dots until it's a crisis
+
+**Who feels this pain:**
+- Engineering teams — decisions lost, tech debt invisible, knowledge silos forming
+- Product teams — feature requests scattered, roadmap drift unnoticed, priority conflicts buried
+- Client-facing teams — promises untracked, scope creep uncaught, sentiment shifts missed
+- Any team of 3+ people communicating in chat
+
+---
+
+## What ThirdEye Does
+
+```
+Telegram Groups ──► Signal Extraction ──► ChromaDB Knowledge Graph
+                          │                        │
+                    8 AI Agents               Pattern Detection
+                          │                        │
+                    Cross-Group ◄────── Blind Spot Detection
+                    Intelligence                   │
+                          │                    Dashboard
+                    Proactive Alerts          + Telegram /ask
+```
+
+### Core Capabilities
+
+| Capability | Description |
+|---|---|
+| **Passive Intelligence Extraction** | Silently monitors group messages and extracts structured signals — no commands, no tagging, no forms |
+| **Adaptive Context Detection** | Auto-detects whether a group is a dev team, product team, client channel, or community, then activates the right extraction ontology |
+| **Living Knowledge Graph** | Builds a persistent, queryable knowledge base in ChromaDB that grows smarter over time |
+| **Natural Language Querying** | Any group member can ask questions and get community-sourced answers with source attribution |
+| **Pattern Detection** | Identifies trends: "Tech debt mentions up 300% this sprint." "Only @raj discusses payment — bus factor = 1." |
+| **Proactive Alerts** | Sends Telegram alerts for concerning patterns: overdue promises, sentiment shifts, knowledge silos, recurring bugs |
+| **Cross-Group Intelligence ⭐** | When deployed in multiple groups simultaneously, detects blind spots *between* teams: blocked handoffs, conflicting decisions, information silos, promise-reality gaps |
+
+---
+
+## Key Features
+
+### Signal Types Extracted
+
+**DevLens (Engineering Teams)**
+- `architecture_decision` — Technology choices with rationale, alternatives considered, decision owner
+- `tech_debt` — Shortcuts, hardcoded values, "will fix later" patterns
+- `knowledge_silo_evidence` — Only one person discusses a critical domain (bus factor = 1)
+- `recurring_bug` — Same issue mentioned repeatedly across conversations
+- `stack_decision` — Technology/framework choices (proposed or decided)
+- `deployment_risk` — Risky deployment practices caught in chat
+- `workaround` — Temporary fixes being applied repeatedly
+
+**ProductLens (Product Teams)**
+- `feature_request` — Features users or team members are asking for, with frequency
+- `user_pain_point` — User difficulties, complaints, confusion with evidence
+- `roadmap_drift` — Discussion of topics not on the current plan
+- `priority_conflict` — Team members disagreeing on what's most important
+- `metric_mention` — Specific numbers, conversion rates, performance data
+- `user_quote` — Direct quotes from customers with sentiment
+- `competitor_intel` — Mentions of competitor actions or features
+
+**ClientLens (Agency / Client Channels)**
+- `promise` — Commitments made with deadlines (explicit or implicit)
+- `scope_creep` — Additional requests introduced casually without formal change
+- `sentiment_signal` — Tone changes: growing frustration, formality shifts, positive praise
+- `unanswered_request` — Client questions that haven't received responses (with time elapsed)
+- `escalation_risk` — Mentions of involving management, expressing deadline concerns
+- `client_decision` — Decisions made by the client
+
+**MeetLens (Google Meet)**
+- `meet_decision` — Decisions made during meeting
+- `meet_action_item` — Actions committed to with owner
+- `meet_blocker` — Blockers raised in meeting
+- `meet_risk` — Risks identified during discussion
+- `meet_summary` — AI-generated meeting summary
+
+---
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│                          INPUT SOURCES                              │
+├──────────────┬──────────────┬────────────────┬───────────────────── ┤
+│  Telegram    │  Voice Notes │  Documents     │   Google Meet        │
+│  Messages    │  (OGG/MP4)   │  (PDF/DOCX/TXT)│   (Chrome Extension) │
+└──────┬───────┴──────┬───────┴───────┬────────┴──────┬──────────────┘
+       │              │               │               │
+       ▼              ▼               ▼               ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                     PROCESSING PIPELINE                             │
+│                                                                     │
+│  1. Context Detector ──► Lens Assignment (dev/product/client/meet)  │
+│  2. Signal Extractor ──► Structured signal extraction (lens-aware)  │
+│  3. Classifier Agent ──► Sentiment, urgency, keyword tagging        │
+│  4. Knowledge Architect ► Conflict resolution, entity linking       │
+│                                                                     │
+└─────────────────────────────────────────────────┬───────────────────┘
+                                                  │
+                                                  ▼
+┌─────────────────────────────────────────────────────────────────────┐
+│                     CHROMADB KNOWLEDGE GRAPH                        │
+│              (Per-group vector collections, Cohere embeddings)      │
+└───────────────────────────────────┬─────────────────────────────────┘
+                                    │
+              ┌─────────────────────┼──────────────────────┐
+              ▼                     ▼                      ▼
+┌─────────────────────┐  ┌──────────────────────┐  ┌──────────────────┐
+│  Pattern Detector   │  │  Cross-Group Analyst  │  │   Query Agent    │
+│  (within a group)   │  │  (between all groups) │  │  (RAG pipeline)  │
+└──────────┬──────────┘  └──────────┬───────────┘  └────────┬─────────┘
+           │                        │                        │
+           ▼                        ▼                        ▼
+┌──────────────────────────────────────────────────────────────────────┐
+│                         OUTPUT LAYER                                 │
+├───────────────┬───────────────────┬──────────────┬───────────────── ┤
+│  Telegram     │  Next.js          │  Jira        │  Proactive       │
+│  Bot Replies  │  Dashboard        │  Tickets     │  Alerts          │
+│  (/ask, etc.) │  (Knowledge Graph,│  (Auto-raise │  (Patterns,      │
+│               │   Timeline, Maps) │  + Manual)   │   Cross-group)   │
+└───────────────┴───────────────────┴──────────────┴──────────────────┘
+```
+
+---
+
+## The Adaptive Lens System
+
+When ThirdEye joins a group, the **Context Detector Agent** reads the first 50–100 messages and auto-classifies into one of four modes. Users can override at any time with `/lens set dev` or combine with `/lens set dev+product`.
+
+```
+                 ┌─────────────────────────────────────────┐
+                 │          Incoming Group Messages         │
+                 └────────────────────┬────────────────────┘
+                                      │
+                                      ▼
+                      ┌───────────────────────────┐
+                      │   Context Detector Agent   │
+                      │   (SambaNova 405B)         │
+                      │   Reads first 50-100 msgs  │
+                      └───────────────────────────┘
+                            │     │     │     │
+              ┌─────────────┘     │     │     └─────────────────┐
+              ▼                   ▼     ▼                       ▼
+       ┌──────────┐        ┌──────────┐ ┌──────────┐    ┌──────────────┐
+       │ DevLens  │        │ Product  │ │  Client  │    │  Community   │
+       │ (dev)    │        │ Lens     │ │  Lens    │    │  Lens        │
+       │          │        │(product) │ │ (client) │    │ (community)  │
+       │ Tech debt│        │ Features │ │ Promises │    │ Recommend-   │
+       │ Arch dec │        │ Pain pts │ │ Sentiment│    │ ations,      │
+       │ Bugs     │        │ Roadmap  │ │ Scope    │    │ Events,      │
+       │ Silos    │        │ Metrics  │ │ Escalat. │    │ Local info   │
+       └──────────┘        └──────────┘ └──────────┘    └──────────────┘
+```
+
+---
+
+## Cross-Group Intelligence
+
+This is the **killer feature**. When ThirdEye monitors 2+ groups, the Cross-Group Analyst Agent activates and finds things no human would catch:
+
+### What It Detects
+
+**Blocked Handoffs**
+```
+Dev Team:     "Still waiting for design specs" (mentioned 3× this week)
+Design Team:  No mention of this deliverable anywhere
+→ ALERT: "Likely handoff gap between dev and design teams."
+```
+
+**Conflicting Decisions**
+```
+Product Team: Decided "mobile-first approach" on Mar 15
+Dev Team:     Discussing "API-first priority" on Mar 17
+→ ALERT: "Potential misalignment between product and dev priorities."
+```
+
+**Information Silos**
+```
+Client Channel: "New regulatory requirement mentioned" on Mar 12
+Dev + Product:  No mention of this requirement in either group
+→ ALERT: "Critical client information not flowing to internal teams."
+```
+
+**Promise-Reality Gaps**
+```
+Product Team: "We'll have the dashboard ready by Friday for the client"
+Dev Team:     "Dashboard completely blocked. No design specs. Week 2 of waiting."
+→ ALERT: "Promise to client may not be deliverable. Timeline risk."
+```
+
+**Duplicated Effort**
+```
+Dev Team:     Building an internal analytics dashboard
+Design Team:  Simultaneously prototyping a similar dashboard
+→ ALERT: "Possible duplicated effort across teams."
+```
+
+---
+
+## Agent Pipeline
+
+ThirdEye uses **8 specialized AI agents**, each mapped to the optimal LLM for its task:
+
+```
+Message arrives from ANY Telegram group
+         │
+         ▼
+┌─────────────────────────────┐
+│  1. Context Detector Agent  │  Auto-classifies group type (one-time + periodic)
+│     (SambaNova 405B)        │  Activates appropriate lens ontology
+└─────────────┬───────────────┘
+              │
+              ▼
+┌─────────────────────────────┐
+│  2. Signal Extractor Agent  │  Processes batched messages (every 5-10 msgs)
+│     (Groq Llama 3.3 70B)   │  Extracts domain-specific signals per active lens
+└─────────────┬───────────────┘
+              │
+         ┌────┴────┐
+         ▼         ▼
+┌──────────────┐  ┌──────────────────────┐
+│  3. Classif. │  │  4. Entity           │
+│     Agent    │  │     Resolver Agent   │
+│  (Groq 8B)   │  │  (Groq 70B)          │
+│  Tags type,  │  │  Resolves "this bug",│
+│  sentiment,  │  │  "same issue" refs   │
+│  urgency     │  │  across messages     │
+└──────┬───────┘  └────────┬─────────────┘
+       └────────┬───────────┘
+                ▼
+┌─────────────────────────────┐
+│  5. Knowledge Architect     │  Builds + maintains ChromaDB knowledge graph
+│     Agent                   │  Resolves conflicts, tracks temporal changes
+│  (SambaNova 405B /          │  Links entities across conversations
+│   OpenRouter Nemotron 120B) │
+└─────────────┬───────────────┘
+              │
+         ┌────┴────┐
+         ▼         ▼
+┌──────────────┐  ┌──────────────────┐
+│  6. Pattern  │  │  7. Cross-Group   │  ⭐ THE KILLER AGENT
+│     Detector │  │     Analyst Agent │  Compares intelligence across
+│     Agent    │  │  (SambaNova 405B) │  ALL deployed groups
+│  (OpenRouter │  │                   │  Detects blind spots, conflicts,
+│   Nemotron)  │  │                   │  information silos
+└──────┬───────┘  └────────┬──────────┘
+       └────────┬───────────┘
+                ▼
+┌─────────────────────────────┐
+│  8. Query Agent             │  Handles /ask queries via agentic RAG
+│  (Groq 70B — fast response) │  over ChromaDB + web search fallback
+└─────────────────────────────┘
+```
+
+---
+
+## LLM Provider Stack
+
+ThirdEye uses **5 free LLM providers** with automatic fallback routing — zero API costs:
+
+```
+Groq (fastest) → Cerebras (fast) → SambaNova (405B) → OpenRouter (27 models) → Google AI Studio
+```
+
+| Provider | Key Models | Daily Capacity | Cost |
+|---|---|---|---|
+| **Groq** | Llama 3.3 70B, Llama 3.1 8B, **Whisper v3** | ~6,000 RPD | Free |
+| **Cerebras** | Llama 3.3 70B, Llama 3.1 8B | ~2,000 RPD | Free |
+| **SambaNova** | **Llama 3.1 405B**, Llama 3.3 70B | ~1,000 RPD | Free |
+| **OpenRouter** | Nemotron 120B, GPT-OSS 120B, 25+ models | ~5,400 RPD | Free |
+| **Cohere** | embed-english-v3.0 (embeddings), Rerank 3.5 | 1,000/month | Free |
+| **Google AI Studio** | Gemini 2.5 Flash/Pro | ~50 RPD | Free |
+
+**Combined daily capacity: ~10,000–15,000 requests for $0**
+
+### Agent → Model Mapping
+
+| Agent | Task Type | Primary | Fallback |
+|---|---|---|---|
+| Context Detector | Complex reasoning | SambaNova / Llama 3.1 405B | OpenRouter / Nemotron 120B |
+| Signal Extractor | Structured extraction | Groq / Llama 3.3 70B | Cerebras / Llama 3.3 70B |
+| Classifier | Fast classification | Groq / Llama 3.1 8B | Cerebras / Llama 3.1 8B |
+| Pattern Detector | Trend analysis | OpenRouter / Nemotron 120B | SambaNova / 405B |
+| Cross-Group Analyst | Cross-referencing | SambaNova / Llama 3.1 405B | OpenRouter / GPT-OSS 120B |
+| Query Agent | Fast Q&A | Groq / Llama 3.3 70B | Cerebras / Llama 3.3 70B |
+| Voice Transcriber | Audio-to-text | Groq / Whisper Large v3 | — |
+
+---
+
+## Integrations
+
+### Telegram Bot
+
+The bot is the primary input interface. It operates **passively** — no need to tag it or use commands for intelligence extraction to happen.
+
+#### Commands
+
+```
+/start           — Welcome message and command reference
+/ask [question]  — Query the knowledge base with natural language
+/search [query]  — Explicit web search via Tavily
+/digest          — Get an intelligence summary of the group
+/lens [mode]     — Set or check detection mode (dev/product/client/community)
+/flush           — Force-process the current message buffer immediately
+/debug           — Show group config and signal count
+
+/jira            — Show Jira integration status
+/jirastatus      — Get latest status of a ticket (e.g. /jirastatus ENG-42)
+/jirasearch      — Search Jira tickets by keyword
+/jiraraised      — List all tickets raised by ThirdEye for this group
+/jirawatch       — Watch a signal and auto-raise if it recurs
+
+/voicelog        — List all voice note decisions extracted from this group
+/meetsum         — Summarize the most recent Google Meet meeting
+/meetask [q]     — Ask a question about meeting content
+/meetmatch       — Find connections between meeting content and Telegram discussions
+```
+
+#### Passive Capabilities (no command needed)
+- **Text messages** → batched every 5–10 messages → signal extraction pipeline
+- **Documents** (PDF, DOCX, TXT, MD, CSV, JSON, LOG up to 10MB) → auto-ingested into knowledge graph
+- **Voice notes** (OGG) and **video notes** (MP4) → transcribed by Groq Whisper → signal extracted
+- **URLs in messages** → fetched → summarized → stored as `link_knowledge` signals
+
+---
+
+### Google Meet Chrome Extension
+
+A Chrome extension that captures live Google Meet transcripts and feeds them into ThirdEye in real-time.
+
+**How it works:**
+1. Extension uses the browser's **Web Speech API** (free, no API key) for real-time transcription
+2. Every 30 seconds, a transcript chunk is sent to the ThirdEye FastAPI backend
+3. The backend's `MeetIngestor` agent extracts meeting signals (decisions, action items, blockers, risks)
+4. All signals are stored in ChromaDB alongside your Telegram/document/voice signals
+5. After the meeting, `/meetsum` and `/meetask` in Telegram give instant access to meeting intelligence
+
+**Signal types extracted from meetings:**
+- `meet_decision` — Formal decisions made during the call
+- `meet_action_item` — "X will do Y by Z" commitments
+- `meet_blocker` — Blockers raised during discussion
+- `meet_risk` — Risks identified in conversation
+- `meet_summary` — Full AI-generated meeting summary
+
+**Cross-referencing:** `/meetmatch` connects meeting content to your Telegram history — surfacing prior discussions, related signals, and any contradictions between what was decided in the meeting vs. what's in the group chat.
+
+---
+
+### Jira Integration
+
+ThirdEye connects to Atlassian Jira Cloud to convert extracted signals into actionable tickets.
+
+#### How Signals Become Tickets
+
+```
+Signal in ChromaDB ──► Jira Agent ──► LLM generates well-formed ticket
+                                               │
+                              ┌────────────────┼────────────────────┐
+                              ▼                ▼                    ▼
+                         Issue Type       Priority           Description
+                         (Bug/Task/       (based on         (summary +
+                          Story)          severity)          raw quote +
+                                                             entity context)
+                                               │
+                                               ▼
+                                      Jira REST API v3
+                                               │
+                                               ▼
+                              ┌────────────────┴────────────────┐
+                              ▼                                  ▼
+                     Ticket created                   `jira_raised` tracking
+                     in Atlassian                     signal stored in
+                     Cloud                            ChromaDB (prevents
+                                                      duplicates)
+```
+
+#### Commands
+```bash
+/ask raise jira on [topic]               # Semantic search → bulk raise
+/ask assign jira ticket for [X] to [Name] # Raise + assign in one command
+/jirastatus ENG-42                        # Live ticket status from Jira
+/jiraraised                               # List all tickets raised by ThirdEye
+```
+
+#### Auto-Raise Mode
+When `JIRA_AUTO_RAISE=true`, ThirdEye automatically raises Jira tickets when:
+- A signal with severity ≥ `JIRA_AUTO_RAISE_SEVERITY` is extracted
+- A recurring bug pattern is detected (3+ occurrences)
+- A critical blocker is identified in a meeting
+
+---
+
+### Voice Message Intelligence
+
+**The differentiator:** Most chat intelligence products are text-only. ThirdEye is the first to process voice messages.
+
+**How it works:**
+```
+Voice/Video Note Received
+         │
+         ▼
+Download from Telegram (OGG/MP4)
+         │
+         ▼
+Groq Whisper Large v3 Transcription
+(7,200 sec/hour free tier — ~240 voice notes/hour)
+         │
+         ▼
+Transcript → Signal Extraction Pipeline
+(same as text messages)
+         │
+         ▼
+Signals stored with full voice attribution:
+- speaker: @username
+- voice_duration: seconds
+- voice_language: auto-detected
+- voice_file_id: Telegram file reference
+         │
+         ▼
+/voicelog shows all voice-sourced decisions
+/ask cites voice notes: "Based on what @Raj said in a voice note on March 14th..."
+```
+
+**`/voicelog` examples:**
+```
+/voicelog                    → All voice note signals
+/voicelog decisions          → Only decisions from voice
+/voicelog @raj               → All voice notes from Raj
+/voicelog api                → Voice notes mentioning "api"
+```
+
+---
+
+### Document & Link Ingestion
+
+**Document Types Supported:** PDF, DOCX, TXT, MD, CSV, JSON, LOG (up to 10MB)
+
+**Processing Pipeline:**
+```
+File shared in Telegram group
+         │
+         ▼
+Bot downloads to temp directory
+         │
+         ▼
+Text extraction (PyPDF2 / python-docx / plain text)
+         │
+         ▼
+Intelligent chunking (1,500 char chunks, 200 char overlap,
+paragraph boundaries respected)
+         │
+         ▼
+Each chunk stored as `document_knowledge` signal in ChromaDB
+with metadata: filename, page number, shared_by, timestamp
+         │
+         ▼
+Queryable alongside chat signals via /ask
+```
+
+**URL Processing:**
+When a URL is detected in any message, ThirdEye:
+1. Checks against skip-list (images, downloads, social embeds, Telegram links)
+2. Fetches the page content with BeautifulSoup parsing
+3. Summarizes with an LLM (2-4 sentence summary)
+4. Stores as `link_knowledge` signal with source attribution
+5. All happens in the background — no delay to the main message pipeline
+
+---
+
+## Knowledge Base Dashboard
+
+A Next.js dashboard (located in `dashboard/`) that provides a full visual intelligence console.
+
+### Pages
+
+| Page | Description |
+|---|---|
+| **Knowledge Base** | Interactive network graph showing groups, signal clusters, and real-time data flow. Split-panel with Knowledge Browser for topic/date exploration |
+| **Timeline** | Cross-group signal timeline with full filter support (severity, lens, type, date range) |
+| **Patterns** | Pattern detection results with severity badges and recommendations |
+| **Cross-Group** | Cross-group intelligence insights — the blind spot detector |
+| **Jira** | Jira ticket manager — view, create, and track tickets raised by ThirdEye |
+| **Meetings** | Google Meet session list with per-meeting signal breakdown and transcript viewer |
+
+### Knowledge Base Network Map
+
+The central visualization shows your intelligence network as a live graph:
+- **Core Node** (violet diamond) — the ThirdEye Neural Engine
+- **Group Nodes** (colored circles) — each monitored Telegram group, color-coded by lens type
+- **Signal Satellites** — top 3 signal types per group orbiting each group node
+- **Data Flow Lines** — animated neon traveling packets showing live data flow from groups to the core engine
+- **Cross-edges** — connections between adjacent groups when cross-group intelligence is detected
+
+### Knowledge Browser
+
+A structured topic/date reference panel built alongside the graph:
+- **Group selector** — browse any monitored group
+- **Date range filter** — scope the view to specific time periods
+- **AI-clustered topic chips** — topics derived from keyword frequency across all signals
+- **Day-by-day timeline** — vertical scroll through days, each signal expandable to show raw quote, entities, and keywords
+- **Footer stats** — total signals · topics count · days span
+
+---
+
+## Tech Stack
+
+### Backend
+| Component | Technology |
+|---|---|
+| Bot Framework | python-telegram-bot 21.9 |
+| API Server | FastAPI 0.115 + Uvicorn |
+| Vector Database | ChromaDB 1.0 (persistent, local) |
+| Embeddings | Cohere embed-english-v3.0 (primary) + sentence-transformers all-MiniLM-L6-v2 (fallback) |
+| LLM Routing | Custom multi-provider router (openai SDK) |
+| Audio Transcription | Groq Whisper Large v3 |
+| Web Search | Tavily (1,000 searches/month free) |
+| HTTP Client | httpx (async) |
+| HTML Parsing | BeautifulSoup4 |
+| Document Parsing | PyPDF2 + python-docx |
+| Data Validation | Pydantic v2 |
+
+### Frontend
+| Component | Technology |
+|---|---|
+| Framework | Next.js 15 (App Router) |
+| UI | Tailwind CSS |
+| Graph Visualization | Custom SVG with CSS animations |
+| State | React hooks (no external state library) |
+| API Client | Native fetch (proxied via Next.js rewrites) |
+
+---
+
+## Project Structure
+
+```
+binary_hack/
+│
+├── backend/                          # Python backend
+│   ├── api/
+│   │   └── routes.py                 # FastAPI endpoints
+│   │
+│   ├── agents/
+│   │   ├── signal_extractor.py       # Core signal extraction (lens-aware)
+│   │   ├── classifier.py             # Sentiment, urgency, keyword tagging
+│   │   ├── context_detector.py       # Auto-detect group type from messages
+│   │   ├── query_agent.py            # RAG pipeline context formatter
+│   │   ├── pattern_detector.py       # Detect recurring patterns in signals
+│   │   ├── cross_group_analyst.py    # Cross-group blind spot detection
+│   │   ├── jira_agent.py             # Jira ticket generation + bulk raise
+│   │   ├── document_ingestor.py      # PDF/DOCX/TXT extraction + chunking
+│   │   ├── voice_handler.py          # Voice download + pipeline orchestration
+│   │   ├── voice_transcriber.py      # Groq Whisper transcription
+│   │   ├── meet_ingestor.py          # Google Meet transcript chunk processing
+│   │   ├── meet_cross_ref.py         # Meeting ↔ chat signal cross-referencing
+│   │   ├── web_search.py             # Tavily web search integration
+│   │   ├── link_fetcher.py           # URL extraction, fetch, summarize
+│   │   └── json_utils.py             # Robust JSON extraction from LLM outputs
+│   │
+│   ├── bot/
+│   │   ├── bot.py                    # Telegram bot entry point + all handlers
+│   │   └── commands.py               # /voicelog command
+│   │
+│   ├── db/
+│   │   ├── chroma.py                 # ChromaDB CRUD operations
+│   │   ├── embeddings.py             # Cohere + fallback embedding client
+│   │   └── models.py                 # Pydantic models: Signal, Pattern, etc.
+│   │
+│   ├── integrations/
+│   │   └── jira_client.py            # Atlassian REST API v3 client
+│   │
+│   ├── pipeline.py                   # Core orchestration: batch → signals → store → query
+│   ├── providers.py                  # Multi-provider LLM router with fallback
+│   └── config.py                     # All env vars via python-dotenv
+│
+├── dashboard/                        # Next.js frontend
+│   ├── app/
+│   │   ├── knowledge-base/           # Network graph + Knowledge Browser
+│   │   │   ├── page.tsx
+│   │   │   ├── NetworkMap.tsx        # SVG network visualization with animated edges
+│   │   │   ├── KnowledgeBrowser.tsx  # Topic/date knowledge reference panel
+│   │   │   ├── RightPanelTabs.tsx    # Tab switcher: Knowledge ↔ Query
+│   │   │   ├── EntityPanel.tsx       # Deep query panel
+│   │   │   ├── FloatingControls.tsx  # Graph controls HUD
+│   │   │   ├── SystemTickerKnowledge.tsx
+│   │   │   └── knowledge.css
+│   │   │
+│   │   ├── timeline/                 # Cross-group signal timeline
+│   │   ├── jira/                     # Jira ticket management
+│   │   ├── meetings/                 # Google Meet session viewer
+│   │   ├── components/               # Shared UI components
+│   │   └── lib/
+│   │       └── api.ts                # Typed API client for all endpoints
+│   │
+│   └── next.config.ts                # Proxies /api → localhost:8000
+│
+├── chroma_db/                        # ChromaDB persistent storage (git-ignored)
+├── docs/                             # Architecture docs and implementation guides
+│   ├── sot.md                        # Single Source of Truth (full system design)
+│   ├── implementation.md             # Milestone-by-milestone build guide
+│   ├── additional.md                 # Milestones 11-13 (docs, web search, links)
+│   ├── Voice_milestones.md           # Milestones 20-22 (voice intelligence)
+│   ├── jira_milestones.md            # Milestones 17-19 (Jira integration)
+│   ├── meet_extension.md             # Milestones 14-16 (Google Meet)
+│   └── BATCHING_CONFIG.md            # Message batching configuration guide
+│
+├── .env                              # Environment variables (not in git)
+├── requirements.txt                  # Python dependencies
+└── README.md                         # This file
+```
+
+---
+
+## Setup & Installation
+
+### Prerequisites
+
+- Python 3.11+
+- Node.js 18+
+- A Telegram account + bot token (from [@BotFather](https://t.me/BotFather))
+- Free API keys (see below — all free, no credit card)
+
+### Step 1 — Clone & install Python dependencies
+
+```bash
+git clone <repo-url>
+cd binary_hack
+pip install -r requirements.txt
+```
+
+### Step 2 — Install dashboard dependencies
+
+```bash
+cd dashboard
+npm install
+```
+
+### Step 3 — Get API keys (all free)
+
+Open each URL simultaneously in browser tabs:
+
+| Service | URL | What you need |
+|---|---|---|
+| **Groq** | https://console.groq.com | API Key → used for Llama 3.3 70B + Whisper |
+| **Cerebras** | https://cloud.cerebras.ai | API Key → fallback LLM provider |
+| **SambaNova** | https://cloud.sambanova.ai | API Key → Llama 3.1 405B (complex reasoning) |
+| **OpenRouter** | https://openrouter.ai | API Key → 27 free models, ultimate fallback |
+| **Cohere** | https://dashboard.cohere.com | API Key → embeddings for vector search |
+| **Telegram Bot** | [@BotFather](https://t.me/BotFather) | `/newbot` → bot token |
+| **Tavily** | https://tavily.com | API Key → web search (1,000/month free) |
+| **Google AI Studio** | https://aistudio.google.com | API Key → Gemini (last resort fallback) |
+
+### Step 4 — Configure environment
+
+Copy `.env.example` to `.env` and fill in your keys:
+
+```bash
+cp .env.example .env
+# Edit .env with your keys
+```
+
+### Step 5 — Enable Telegram privacy mode (CRITICAL)
+
+Before your bot can read group messages:
+1. Open Telegram → Search **@BotFather**
+2. Send `/setprivacy`
+3. Select your bot
+4. Choose **Disable**
+
+### Step 6 — Seed demo data (optional)
+
+To populate ChromaDB with realistic demo data across 3 simulated groups (Dev team, Product team, Client channel):
+
+```bash
+python scripts/seed_demo.py
+```
+
+This seeds ~30 signals across 3 groups and runs cross-group analysis to demonstrate the blind spot detection.
+
+---
+
+## Running the System
+
+### Option A — Run everything together (recommended)
+
+```bash
+python run_all.py
+```
+
+This starts both the Telegram bot and the FastAPI server simultaneously.
+
+### Option B — Run services separately
+
+```bash
+# Terminal 1: FastAPI backend
+python run_api.py
+# Runs at http://localhost:8000
+
+# Terminal 2: Telegram bot
+python run_bot.py
+
+# Terminal 3: Next.js dashboard
+cd dashboard
+npm run dev
+# Runs at http://localhost:3000
+```
+
+### Verify the system is running
+
+```bash
+curl http://localhost:8000/health
+# → {"status":"ok","service":"thirdeye"}
+
+curl http://localhost:8000/api/groups
+# → {"groups": [...]}
+```
+
+---
+
+## Environment Variables
+
+```bash
+# ── Telegram ──────────────────────────────────────────────────────
+TELEGRAM_BOT_TOKEN=          # From @BotFather
+
+# ── LLM Providers (all free, no credit card) ──────────────────────
+GROQ_API_KEY=                # console.groq.com — Llama 3.3 70B + Whisper
+GROQ_API_KEY_2=              # Optional: second key for rate limit rotation
+GROQ_API_KEY_3=              # Optional: third key for rate limit rotation
+CEREBRAS_API_KEY=            # cloud.cerebras.ai — fast inference
+SAMBANOVA_API_KEY=           # cloud.sambanova.ai — 405B reasoning
+OPENROUTER_API_KEY=          # openrouter.ai — 27 free models
+GEMINI_API_KEY=              # aistudio.google.com — last resort fallback
+
+# ── Embeddings ────────────────────────────────────────────────────
+COHERE_API_KEY=              # dashboard.cohere.com — vector embeddings
+
+# ── Storage ───────────────────────────────────────────────────────
+CHROMA_DB_PATH=./chroma_db   # Where ChromaDB persists to disk
+
+# ── Message Batching ──────────────────────────────────────────────
+BATCH_SIZE=5                 # Messages to buffer before processing
+BATCH_TIMEOUT_SECONDS=60     # Flush buffer after this many idle seconds
+
+# ── Web Search ────────────────────────────────────────────────────
+TAVILY_API_KEY=              # tavily.com — web search fallback (1,000/month free)
+
+# ── Google Meet Extension ─────────────────────────────────────────
+MEET_INGEST_SECRET=          # Shared secret for extension authentication
+MEET_DEFAULT_GROUP_ID=meet_sessions
+ENABLE_MEET_INGESTION=true
+MEET_CROSS_REF_GROUPS=       # Comma-separated group IDs to cross-reference
+
+# ── Jira Integration ──────────────────────────────────────────────
+JIRA_BASE_URL=               # https://your-org.atlassian.net
+JIRA_EMAIL=                  # Your Atlassian email
+JIRA_API_TOKEN=              # Atlassian API token (not your password)
+JIRA_DEFAULT_PROJECT=ENG     # Default project key
+JIRA_DEFAULT_ISSUE_TYPE=Task
+ENABLE_JIRA=true
+JIRA_AUTO_RAISE=false        # Auto-raise critical signals as Jira tickets
+JIRA_AUTO_RAISE_SEVERITY=high
+
+# ── Voice Message Intelligence ────────────────────────────────────
+ENABLE_VOICE_TRANSCRIPTION=true
+VOICE_MAX_DURATION_SECONDS=300   # Skip voice notes longer than 5 minutes
+VOICE_MIN_DURATION_SECONDS=2     # Skip sub-2-second recordings
+VOICE_LANGUAGE=                  # Leave empty for Whisper auto-detect
+VOICE_STORE_TRANSCRIPT=true      # Store raw transcript in ChromaDB
+```
+
+---
+
+## API Reference
+
+All endpoints are served by FastAPI at `http://localhost:8000`. The Next.js dashboard proxies `/api/*` to this backend.
+
+### Groups & Signals
+
+```
+GET  /api/groups
+     → List all monitored groups with signal counts and lens
+
+GET  /api/groups/{group_id}/signals
+     ?signal_type=&severity=&lens=&date_from=&date_to=
+     → Get signals with filters
+
+POST /api/groups/{group_id}/query
+     Body: {"question": "..."}
+     → Natural language RAG query over the group's knowledge base
+
+GET  /api/groups/{group_id}/patterns
+     → Run pattern detection and return results
+
+GET  /api/cross-group/insights
+     → Run cross-group analysis across all monitored groups
+
+GET  /api/signals/timeline
+     ?group_id=&severity=&lens=&signal_type=&date_from=&date_to=&limit=200
+     → Cross-group signal timeline, newest first
+
+GET  /health
+     → {"status": "ok", "service": "thirdeye"}
+```
+
+### Knowledge Browser
+
+```
+GET  /api/knowledge/browse/{group_id}
+     ?date_from=&date_to=&topic=
+     → Topics (AI-clustered from keywords) + day-by-day timeline
+     → Response: {group_id, group_name, total_signals, date_range, topics[], timeline[]}
+```
+
+### Google Meet
+
+```
+POST /api/meet/start          → Called by Chrome extension when meeting begins
+POST /api/meet/ingest         → Called every 30s with transcript chunk
+GET  /api/meet/meetings       → List all recorded meetings
+GET  /api/meet/meetings/{id}  → Meeting detail with signal breakdown
+GET  /api/meet/meetings/{id}/signals   → All signals for a meeting
+GET  /api/meet/meetings/{id}/transcript → Raw transcript chunks
+```
+
+### Jira
+
+```
+GET  /api/jira/tickets        ?group_id=&date_from=&date_to=&live=
+POST /api/jira/raise          Body: {signal_id, group_id, project_key, force}
+POST /api/jira/create         Body: {summary, description, project_key, priority, ...}
+GET  /api/jira/tickets/{key}/status   → Live status from Jira
+GET  /api/jira/users/search   ?q=name  → Assignee search
+GET  /api/jira/config         → Check Jira configuration
+```
+
+---
+
+## Data Models
+
+### Signal
+
+The core unit of extracted intelligence:
+
+```python
+class Signal(BaseModel):
+    id: str                   # UUID
+    group_id: str             # Telegram group ID
+    lens: str                 # dev | product | client | community | meet
+    type: str                 # architecture_decision | tech_debt | feature_request | ...
+    summary: str              # One-sentence description
+    entities: list[str]       # [@people, technologies, concepts]
+    severity: str             # low | medium | high | critical
+    status: str               # proposed | decided | implemented | unresolved
+    sentiment: str            # positive | neutral | negative | urgent
+    urgency: str              # none | low | medium | high | critical
+    raw_quote: str            # Verbatim message text that triggered this signal
+    keywords: list[str]       # 3-5 searchable keywords
+    timestamp: str            # ISO 8601
+    # Voice attribution (voice-sourced signals only)
+    source: str               # voice | text | document | link | meet
+    speaker: str              # @username
+    voice_duration: int       # seconds
+    # Jira tracking
+    jira_key: str             # e.g. ENG-42
+    jira_url: str
+```
+
+### Pattern
+
+```python
+class Pattern(BaseModel):
+    id: str
+    group_id: str
+    type: str                 # frequency_spike | knowledge_silo | recurring_issue | sentiment_trend | stale_item
+    description: str
+    severity: str             # info | warning | critical
+    evidence_signal_ids: list[str]
+    recommendation: str
+    detected_at: str
+    is_active: bool
+```
+
+### CrossGroupInsight
+
+```python
+class CrossGroupInsight(BaseModel):
+    id: str
+    type: str                 # blocked_handoff | conflicting_decision | information_silo | promise_reality_gap | duplicated_effort
+    description: str
+    group_a: dict             # {name, group_id, evidence}
+    group_b: dict
+    severity: str             # warning | critical
+    recommendation: str
+    detected_at: str
+    is_resolved: bool
+```
+
+---
+
+## Seeding Demo Data
+
+Run the demo seeder to populate 3 realistic groups with interconnected signals that demonstrate cross-group intelligence:
+
+```bash
+python scripts/seed_demo.py
+```
+
+This seeds:
+- **Acme Dev Team** — 13 messages → signals including recurring timeout bug, knowledge silo (only Raj knows payment), blocked dashboard waiting for design specs
+- **Acme Product Team** — 10 messages → priority conflict (mobile vs API), promise to client about Friday dashboard demo, conversion metric drop
+- **Acme ↔ ClientCo** — 9 messages → client escalation risk, scope creep (PDF export + dark mode), unacknowledged deadline pressure
+
+**The cross-group analysis will surface:**
+- Dev team blocked on design specs → Product team never mentions this
+- Product team promised Friday demo → Dev team says "blocked, week 2 of waiting"
+- Client escalation risk not visible to dev or product teams
+
+After seeding, verify:
+```bash
+curl http://localhost:8000/api/groups
+curl http://localhost:8000/api/cross-group/insights
+curl http://localhost:8000/api/knowledge/browse/acme_dev
+```
+
+---
+
+## Competitive Positioning
+
+```
+                    REACTIVE ◄──────────────────────► PROACTIVE
+                         │                               │
+               Slack AI  │                               │  ★ ThirdEye
+              (summaries │                               │  (persistent intelligence,
+               on demand)│                               │   pattern detection,
+                         │                               │   cross-group analysis,
+                  Prism  │                               │   proactive alerts,
+            (product     │                               │   voice + docs + Meet)
+             signals)    │                               │
+                         │                               │
+         SINGLE GROUP ◄──┼───────────────────────────────┼──► CROSS-GROUP
+                         │                               │
+              Runbear    │                               │
+              (Q&A only) │                               │
+```
+
+**ThirdEye occupies the only empty quadrant: Proactive + Cross-Group.**
+
+No other product on the market today passively builds intelligence across multiple team chats simultaneously and detects blind spots between them. That is a new category.
+
+---
+
+*Built at $0 in API costs. Every provider used has a free tier. No credit card required for any dependency.*