Files
B.Tech-Project-III/thirdeye
2026-04-05 00:43:23 +05:30
..
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30
2026-04-05 00:43:23 +05:30

ThirdEye 👁️

"Your team already has the answers. They're just trapped in chat."

ThirdEye is a multi-agent conversation intelligence engine that passively monitors Telegram group chats, extracts structured knowledge in real-time, builds a living knowledge graph, detects patterns humans miss, and catches cross-team blind spots before they become crises.

It is not a chatbot. Not a summarizer. Not a search tool. It is a conversation intelligence engine — the first of its kind to work across multiple teams simultaneously and detect what falls through the cracks between them.


Table of Contents


The Problem

Every day, your team makes hundreds of decisions, discovers bugs, promises clients things, and solves problems — all in group chat. And every day, that intelligence vanishes as messages scroll past.

The Knowledge Graveyard Problem:

  • Teams generate enormous value in group chats: decisions, recommendations, warnings, technical knowledge, client commitments
  • 100% of this knowledge is lost within days as messages scroll past
  • When someone asks "why did we choose Redis?" or "what did we promise the client?" — nobody remembers, and nobody can find it
  • When Team A is blocked by Team B, but Team B doesn't know — nobody connects the dots until it's a crisis

Who feels this pain:

  • Engineering teams — decisions lost, tech debt invisible, knowledge silos forming
  • Product teams — feature requests scattered, roadmap drift unnoticed, priority conflicts buried
  • Client-facing teams — promises untracked, scope creep uncaught, sentiment shifts missed
  • Any team of 3+ people communicating in chat

What ThirdEye Does

Telegram Groups ──► Signal Extraction ──► ChromaDB Knowledge Graph
                          │                        │
                    8 AI Agents               Pattern Detection
                          │                        │
                    Cross-Group ◄────── Blind Spot Detection
                    Intelligence                   │
                          │                    Dashboard
                    Proactive Alerts          + Telegram /ask

Core Capabilities

Capability Description
Passive Intelligence Extraction Silently monitors group messages and extracts structured signals — no commands, no tagging, no forms
Adaptive Context Detection Auto-detects whether a group is a dev team, product team, client channel, or community, then activates the right extraction ontology
Living Knowledge Graph Builds a persistent, queryable knowledge base in ChromaDB that grows smarter over time
Natural Language Querying Any group member can ask questions and get community-sourced answers with source attribution
Pattern Detection Identifies trends: "Tech debt mentions up 300% this sprint." "Only @raj discusses payment — bus factor = 1."
Proactive Alerts Sends Telegram alerts for concerning patterns: overdue promises, sentiment shifts, knowledge silos, recurring bugs
Cross-Group Intelligence When deployed in multiple groups simultaneously, detects blind spots between teams: blocked handoffs, conflicting decisions, information silos, promise-reality gaps

Key Features

Signal Types Extracted

DevLens (Engineering Teams)

  • architecture_decision — Technology choices with rationale, alternatives considered, decision owner
  • tech_debt — Shortcuts, hardcoded values, "will fix later" patterns
  • knowledge_silo_evidence — Only one person discusses a critical domain (bus factor = 1)
  • recurring_bug — Same issue mentioned repeatedly across conversations
  • stack_decision — Technology/framework choices (proposed or decided)
  • deployment_risk — Risky deployment practices caught in chat
  • workaround — Temporary fixes being applied repeatedly

ProductLens (Product Teams)

  • feature_request — Features users or team members are asking for, with frequency
  • user_pain_point — User difficulties, complaints, confusion with evidence
  • roadmap_drift — Discussion of topics not on the current plan
  • priority_conflict — Team members disagreeing on what's most important
  • metric_mention — Specific numbers, conversion rates, performance data
  • user_quote — Direct quotes from customers with sentiment
  • competitor_intel — Mentions of competitor actions or features

ClientLens (Agency / Client Channels)

  • promise — Commitments made with deadlines (explicit or implicit)
  • scope_creep — Additional requests introduced casually without formal change
  • sentiment_signal — Tone changes: growing frustration, formality shifts, positive praise
  • unanswered_request — Client questions that haven't received responses (with time elapsed)
  • escalation_risk — Mentions of involving management, expressing deadline concerns
  • client_decision — Decisions made by the client

MeetLens (Google Meet)

  • meet_decision — Decisions made during meeting
  • meet_action_item — Actions committed to with owner
  • meet_blocker — Blockers raised in meeting
  • meet_risk — Risks identified during discussion
  • meet_summary — AI-generated meeting summary

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                          INPUT SOURCES                              │
├──────────────┬──────────────┬────────────────┬───────────────────── ┤
│  Telegram    │  Voice Notes │  Documents     │   Google Meet        │
│  Messages    │  (OGG/MP4)   │  (PDF/DOCX/TXT)│   (Chrome Extension) │
└──────┬───────┴──────┬───────┴───────┬────────┴──────┬──────────────┘
       │              │               │               │
       ▼              ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     PROCESSING PIPELINE                             │
│                                                                     │
│  1. Context Detector ──► Lens Assignment (dev/product/client/meet)  │
│  2. Signal Extractor ──► Structured signal extraction (lens-aware)  │
│  3. Classifier Agent ──► Sentiment, urgency, keyword tagging        │
│  4. Knowledge Architect ► Conflict resolution, entity linking       │
│                                                                     │
└─────────────────────────────────────────────────┬───────────────────┘
                                                  │
                                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     CHROMADB KNOWLEDGE GRAPH                        │
│              (Per-group vector collections, Cohere embeddings)      │
└───────────────────────────────────┬─────────────────────────────────┘
                                    │
              ┌─────────────────────┼──────────────────────┐
              ▼                     ▼                      ▼
┌─────────────────────┐  ┌──────────────────────┐  ┌──────────────────┐
│  Pattern Detector   │  │  Cross-Group Analyst  │  │   Query Agent    │
│  (within a group)   │  │  (between all groups) │  │  (RAG pipeline)  │
└──────────┬──────────┘  └──────────┬───────────┘  └────────┬─────────┘
           │                        │                        │
           ▼                        ▼                        ▼
┌──────────────────────────────────────────────────────────────────────┐
│                         OUTPUT LAYER                                 │
├───────────────┬───────────────────┬──────────────┬───────────────── ┤
│  Telegram     │  Next.js          │  Jira        │  Proactive       │
│  Bot Replies  │  Dashboard        │  Tickets     │  Alerts          │
│  (/ask, etc.) │  (Knowledge Graph,│  (Auto-raise │  (Patterns,      │
│               │   Timeline, Maps) │  + Manual)   │   Cross-group)   │
└───────────────┴───────────────────┴──────────────┴──────────────────┘

The Adaptive Lens System

When ThirdEye joins a group, the Context Detector Agent reads the first 50100 messages and auto-classifies into one of four modes. Users can override at any time with /lens set dev or combine with /lens set dev+product.

                 ┌─────────────────────────────────────────┐
                 │          Incoming Group Messages         │
                 └────────────────────┬────────────────────┘
                                      │
                                      ▼
                      ┌───────────────────────────┐
                      │   Context Detector Agent   │
                      │   (SambaNova 405B)         │
                      │   Reads first 50-100 msgs  │
                      └───────────────────────────┘
                            │     │     │     │
              ┌─────────────┘     │     │     └─────────────────┐
              ▼                   ▼     ▼                       ▼
       ┌──────────┐        ┌──────────┐ ┌──────────┐    ┌──────────────┐
       │ DevLens  │        │ Product  │ │  Client  │    │  Community   │
       │ (dev)    │        │ Lens     │ │  Lens    │    │  Lens        │
       │          │        │(product) │ │ (client) │    │ (community)  │
       │ Tech debt│        │ Features │ │ Promises │    │ Recommend-   │
       │ Arch dec │        │ Pain pts │ │ Sentiment│    │ ations,      │
       │ Bugs     │        │ Roadmap  │ │ Scope    │    │ Events,      │
       │ Silos    │        │ Metrics  │ │ Escalat. │    │ Local info   │
       └──────────┘        └──────────┘ └──────────┘    └──────────────┘

Cross-Group Intelligence

This is the killer feature. When ThirdEye monitors 2+ groups, the Cross-Group Analyst Agent activates and finds things no human would catch:

What It Detects

Blocked Handoffs

Dev Team:     "Still waiting for design specs" (mentioned 3× this week)
Design Team:  No mention of this deliverable anywhere
→ ALERT: "Likely handoff gap between dev and design teams."

Conflicting Decisions

Product Team: Decided "mobile-first approach" on Mar 15
Dev Team:     Discussing "API-first priority" on Mar 17
→ ALERT: "Potential misalignment between product and dev priorities."

Information Silos

Client Channel: "New regulatory requirement mentioned" on Mar 12
Dev + Product:  No mention of this requirement in either group
→ ALERT: "Critical client information not flowing to internal teams."

Promise-Reality Gaps

Product Team: "We'll have the dashboard ready by Friday for the client"
Dev Team:     "Dashboard completely blocked. No design specs. Week 2 of waiting."
→ ALERT: "Promise to client may not be deliverable. Timeline risk."

Duplicated Effort

Dev Team:     Building an internal analytics dashboard
Design Team:  Simultaneously prototyping a similar dashboard
→ ALERT: "Possible duplicated effort across teams."

Agent Pipeline

ThirdEye uses 8 specialized AI agents, each mapped to the optimal LLM for its task:

Message arrives from ANY Telegram group
         │
         ▼
┌─────────────────────────────┐
│  1. Context Detector Agent  │  Auto-classifies group type (one-time + periodic)
│     (SambaNova 405B)        │  Activates appropriate lens ontology
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  2. Signal Extractor Agent  │  Processes batched messages (every 5-10 msgs)
│     (Groq Llama 3.3 70B)   │  Extracts domain-specific signals per active lens
└─────────────┬───────────────┘
              │
         ┌────┴────┐
         ▼         ▼
┌──────────────┐  ┌──────────────────────┐
│  3. Classif. │  │  4. Entity           │
│     Agent    │  │     Resolver Agent   │
│  (Groq 8B)   │  │  (Groq 70B)          │
│  Tags type,  │  │  Resolves "this bug",│
│  sentiment,  │  │  "same issue" refs   │
│  urgency     │  │  across messages     │
└──────┬───────┘  └────────┬─────────────┘
       └────────┬───────────┘
                ▼
┌─────────────────────────────┐
│  5. Knowledge Architect     │  Builds + maintains ChromaDB knowledge graph
│     Agent                   │  Resolves conflicts, tracks temporal changes
│  (SambaNova 405B /          │  Links entities across conversations
│   OpenRouter Nemotron 120B) │
└─────────────┬───────────────┘
              │
         ┌────┴────┐
         ▼         ▼
┌──────────────┐  ┌──────────────────┐
│  6. Pattern  │  │  7. Cross-Group   │  ⭐ THE KILLER AGENT
│     Detector │  │     Analyst Agent │  Compares intelligence across
│     Agent    │  │  (SambaNova 405B) │  ALL deployed groups
│  (OpenRouter │  │                   │  Detects blind spots, conflicts,
│   Nemotron)  │  │                   │  information silos
└──────┬───────┘  └────────┬──────────┘
       └────────┬───────────┘
                ▼
┌─────────────────────────────┐
│  8. Query Agent             │  Handles /ask queries via agentic RAG
│  (Groq 70B — fast response) │  over ChromaDB + web search fallback
└─────────────────────────────┘

LLM Provider Stack

ThirdEye uses 5 free LLM providers with automatic fallback routing — zero API costs:

Groq (fastest) → Cerebras (fast) → SambaNova (405B) → OpenRouter (27 models) → Google AI Studio
Provider Key Models Daily Capacity Cost
Groq Llama 3.3 70B, Llama 3.1 8B, Whisper v3 ~6,000 RPD Free
Cerebras Llama 3.3 70B, Llama 3.1 8B ~2,000 RPD Free
SambaNova Llama 3.1 405B, Llama 3.3 70B ~1,000 RPD Free
OpenRouter Nemotron 120B, GPT-OSS 120B, 25+ models ~5,400 RPD Free
Cohere embed-english-v3.0 (embeddings), Rerank 3.5 1,000/month Free
Google AI Studio Gemini 2.5 Flash/Pro ~50 RPD Free

Combined daily capacity: ~10,00015,000 requests for $0

Agent → Model Mapping

Agent Task Type Primary Fallback
Context Detector Complex reasoning SambaNova / Llama 3.1 405B OpenRouter / Nemotron 120B
Signal Extractor Structured extraction Groq / Llama 3.3 70B Cerebras / Llama 3.3 70B
Classifier Fast classification Groq / Llama 3.1 8B Cerebras / Llama 3.1 8B
Pattern Detector Trend analysis OpenRouter / Nemotron 120B SambaNova / 405B
Cross-Group Analyst Cross-referencing SambaNova / Llama 3.1 405B OpenRouter / GPT-OSS 120B
Query Agent Fast Q&A Groq / Llama 3.3 70B Cerebras / Llama 3.3 70B
Voice Transcriber Audio-to-text Groq / Whisper Large v3

Integrations

Telegram Bot

The bot is the primary input interface. It operates passively — no need to tag it or use commands for intelligence extraction to happen.

Commands

/start           — Welcome message and command reference
/ask [question]  — Query the knowledge base with natural language
/search [query]  — Explicit web search via Tavily
/digest          — Get an intelligence summary of the group
/lens [mode]     — Set or check detection mode (dev/product/client/community)
/flush           — Force-process the current message buffer immediately
/debug           — Show group config and signal count

/jira            — Show Jira integration status
/jirastatus      — Get latest status of a ticket (e.g. /jirastatus ENG-42)
/jirasearch      — Search Jira tickets by keyword
/jiraraised      — List all tickets raised by ThirdEye for this group
/jirawatch       — Watch a signal and auto-raise if it recurs

/voicelog        — List all voice note decisions extracted from this group
/meetsum         — Summarize the most recent Google Meet meeting
/meetask [q]     — Ask a question about meeting content
/meetmatch       — Find connections between meeting content and Telegram discussions

Passive Capabilities (no command needed)

  • Text messages → batched every 510 messages → signal extraction pipeline
  • Documents (PDF, DOCX, TXT, MD, CSV, JSON, LOG up to 10MB) → auto-ingested into knowledge graph
  • Voice notes (OGG) and video notes (MP4) → transcribed by Groq Whisper → signal extracted
  • URLs in messages → fetched → summarized → stored as link_knowledge signals

Google Meet Chrome Extension

A Chrome extension that captures live Google Meet transcripts and feeds them into ThirdEye in real-time.

How it works:

  1. Extension uses the browser's Web Speech API (free, no API key) for real-time transcription
  2. Every 30 seconds, a transcript chunk is sent to the ThirdEye FastAPI backend
  3. The backend's MeetIngestor agent extracts meeting signals (decisions, action items, blockers, risks)
  4. All signals are stored in ChromaDB alongside your Telegram/document/voice signals
  5. After the meeting, /meetsum and /meetask in Telegram give instant access to meeting intelligence

Signal types extracted from meetings:

  • meet_decision — Formal decisions made during the call
  • meet_action_item — "X will do Y by Z" commitments
  • meet_blocker — Blockers raised during discussion
  • meet_risk — Risks identified in conversation
  • meet_summary — Full AI-generated meeting summary

Cross-referencing: /meetmatch connects meeting content to your Telegram history — surfacing prior discussions, related signals, and any contradictions between what was decided in the meeting vs. what's in the group chat.


Jira Integration

ThirdEye connects to Atlassian Jira Cloud to convert extracted signals into actionable tickets.

How Signals Become Tickets

Signal in ChromaDB ──► Jira Agent ──► LLM generates well-formed ticket
                                               │
                              ┌────────────────┼────────────────────┐
                              ▼                ▼                    ▼
                         Issue Type       Priority           Description
                         (Bug/Task/       (based on         (summary +
                          Story)          severity)          raw quote +
                                                             entity context)
                                               │
                                               ▼
                                      Jira REST API v3
                                               │
                                               ▼
                              ┌────────────────┴────────────────┐
                              ▼                                  ▼
                     Ticket created                   `jira_raised` tracking
                     in Atlassian                     signal stored in
                     Cloud                            ChromaDB (prevents
                                                      duplicates)

Commands

/ask raise jira on [topic]               # Semantic search → bulk raise
/ask assign jira ticket for [X] to [Name] # Raise + assign in one command
/jirastatus ENG-42                        # Live ticket status from Jira
/jiraraised                               # List all tickets raised by ThirdEye

Auto-Raise Mode

When JIRA_AUTO_RAISE=true, ThirdEye automatically raises Jira tickets when:

  • A signal with severity ≥ JIRA_AUTO_RAISE_SEVERITY is extracted
  • A recurring bug pattern is detected (3+ occurrences)
  • A critical blocker is identified in a meeting

Voice Message Intelligence

The differentiator: Most chat intelligence products are text-only. ThirdEye is the first to process voice messages.

How it works:

Voice/Video Note Received
         │
         ▼
Download from Telegram (OGG/MP4)
         │
         ▼
Groq Whisper Large v3 Transcription
(7,200 sec/hour free tier — ~240 voice notes/hour)
         │
         ▼
Transcript → Signal Extraction Pipeline
(same as text messages)
         │
         ▼
Signals stored with full voice attribution:
- speaker: @username
- voice_duration: seconds
- voice_language: auto-detected
- voice_file_id: Telegram file reference
         │
         ▼
/voicelog shows all voice-sourced decisions
/ask cites voice notes: "Based on what @Raj said in a voice note on March 14th..."

/voicelog examples:

/voicelog                    → All voice note signals
/voicelog decisions          → Only decisions from voice
/voicelog @raj               → All voice notes from Raj
/voicelog api                → Voice notes mentioning "api"

Document Types Supported: PDF, DOCX, TXT, MD, CSV, JSON, LOG (up to 10MB)

Processing Pipeline:

File shared in Telegram group
         │
         ▼
Bot downloads to temp directory
         │
         ▼
Text extraction (PyPDF2 / python-docx / plain text)
         │
         ▼
Intelligent chunking (1,500 char chunks, 200 char overlap,
paragraph boundaries respected)
         │
         ▼
Each chunk stored as `document_knowledge` signal in ChromaDB
with metadata: filename, page number, shared_by, timestamp
         │
         ▼
Queryable alongside chat signals via /ask

URL Processing: When a URL is detected in any message, ThirdEye:

  1. Checks against skip-list (images, downloads, social embeds, Telegram links)
  2. Fetches the page content with BeautifulSoup parsing
  3. Summarizes with an LLM (2-4 sentence summary)
  4. Stores as link_knowledge signal with source attribution
  5. All happens in the background — no delay to the main message pipeline

Knowledge Base Dashboard

A Next.js dashboard (located in dashboard/) that provides a full visual intelligence console.

Pages

Page Description
Knowledge Base Interactive network graph showing groups, signal clusters, and real-time data flow. Split-panel with Knowledge Browser for topic/date exploration
Timeline Cross-group signal timeline with full filter support (severity, lens, type, date range)
Patterns Pattern detection results with severity badges and recommendations
Cross-Group Cross-group intelligence insights — the blind spot detector
Jira Jira ticket manager — view, create, and track tickets raised by ThirdEye
Meetings Google Meet session list with per-meeting signal breakdown and transcript viewer

Knowledge Base Network Map

The central visualization shows your intelligence network as a live graph:

  • Core Node (violet diamond) — the ThirdEye Neural Engine
  • Group Nodes (colored circles) — each monitored Telegram group, color-coded by lens type
  • Signal Satellites — top 3 signal types per group orbiting each group node
  • Data Flow Lines — animated neon traveling packets showing live data flow from groups to the core engine
  • Cross-edges — connections between adjacent groups when cross-group intelligence is detected

Knowledge Browser

A structured topic/date reference panel built alongside the graph:

  • Group selector — browse any monitored group
  • Date range filter — scope the view to specific time periods
  • AI-clustered topic chips — topics derived from keyword frequency across all signals
  • Day-by-day timeline — vertical scroll through days, each signal expandable to show raw quote, entities, and keywords
  • Footer stats — total signals · topics count · days span

Tech Stack

Backend

Component Technology
Bot Framework python-telegram-bot 21.9
API Server FastAPI 0.115 + Uvicorn
Vector Database ChromaDB 1.0 (persistent, local)
Embeddings Cohere embed-english-v3.0 (primary) + sentence-transformers all-MiniLM-L6-v2 (fallback)
LLM Routing Custom multi-provider router (openai SDK)
Audio Transcription Groq Whisper Large v3
Web Search Tavily (1,000 searches/month free)
HTTP Client httpx (async)
HTML Parsing BeautifulSoup4
Document Parsing PyPDF2 + python-docx
Data Validation Pydantic v2

Frontend

Component Technology
Framework Next.js 15 (App Router)
UI Tailwind CSS
Graph Visualization Custom SVG with CSS animations
State React hooks (no external state library)
API Client Native fetch (proxied via Next.js rewrites)

Project Structure

binary_hack/
│
├── backend/                          # Python backend
│   ├── api/
│   │   └── routes.py                 # FastAPI endpoints
│   │
│   ├── agents/
│   │   ├── signal_extractor.py       # Core signal extraction (lens-aware)
│   │   ├── classifier.py             # Sentiment, urgency, keyword tagging
│   │   ├── context_detector.py       # Auto-detect group type from messages
│   │   ├── query_agent.py            # RAG pipeline context formatter
│   │   ├── pattern_detector.py       # Detect recurring patterns in signals
│   │   ├── cross_group_analyst.py    # Cross-group blind spot detection
│   │   ├── jira_agent.py             # Jira ticket generation + bulk raise
│   │   ├── document_ingestor.py      # PDF/DOCX/TXT extraction + chunking
│   │   ├── voice_handler.py          # Voice download + pipeline orchestration
│   │   ├── voice_transcriber.py      # Groq Whisper transcription
│   │   ├── meet_ingestor.py          # Google Meet transcript chunk processing
│   │   ├── meet_cross_ref.py         # Meeting ↔ chat signal cross-referencing
│   │   ├── web_search.py             # Tavily web search integration
│   │   ├── link_fetcher.py           # URL extraction, fetch, summarize
│   │   └── json_utils.py             # Robust JSON extraction from LLM outputs
│   │
│   ├── bot/
│   │   ├── bot.py                    # Telegram bot entry point + all handlers
│   │   └── commands.py               # /voicelog command
│   │
│   ├── db/
│   │   ├── chroma.py                 # ChromaDB CRUD operations
│   │   ├── embeddings.py             # Cohere + fallback embedding client
│   │   └── models.py                 # Pydantic models: Signal, Pattern, etc.
│   │
│   ├── integrations/
│   │   └── jira_client.py            # Atlassian REST API v3 client
│   │
│   ├── pipeline.py                   # Core orchestration: batch → signals → store → query
│   ├── providers.py                  # Multi-provider LLM router with fallback
│   └── config.py                     # All env vars via python-dotenv
│
├── dashboard/                        # Next.js frontend
│   ├── app/
│   │   ├── knowledge-base/           # Network graph + Knowledge Browser
│   │   │   ├── page.tsx
│   │   │   ├── NetworkMap.tsx        # SVG network visualization with animated edges
│   │   │   ├── KnowledgeBrowser.tsx  # Topic/date knowledge reference panel
│   │   │   ├── RightPanelTabs.tsx    # Tab switcher: Knowledge ↔ Query
│   │   │   ├── EntityPanel.tsx       # Deep query panel
│   │   │   ├── FloatingControls.tsx  # Graph controls HUD
│   │   │   ├── SystemTickerKnowledge.tsx
│   │   │   └── knowledge.css
│   │   │
│   │   ├── timeline/                 # Cross-group signal timeline
│   │   ├── jira/                     # Jira ticket management
│   │   ├── meetings/                 # Google Meet session viewer
│   │   ├── components/               # Shared UI components
│   │   └── lib/
│   │       └── api.ts                # Typed API client for all endpoints
│   │
│   └── next.config.ts                # Proxies /api → localhost:8000
│
├── chroma_db/                        # ChromaDB persistent storage (git-ignored)
├── docs/                             # Architecture docs and implementation guides
│   ├── sot.md                        # Single Source of Truth (full system design)
│   ├── implementation.md             # Milestone-by-milestone build guide
│   ├── additional.md                 # Milestones 11-13 (docs, web search, links)
│   ├── Voice_milestones.md           # Milestones 20-22 (voice intelligence)
│   ├── jira_milestones.md            # Milestones 17-19 (Jira integration)
│   ├── meet_extension.md             # Milestones 14-16 (Google Meet)
│   └── BATCHING_CONFIG.md            # Message batching configuration guide
│
├── .env                              # Environment variables (not in git)
├── requirements.txt                  # Python dependencies
└── README.md                         # This file

Setup & Installation

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • A Telegram account + bot token (from @BotFather)
  • Free API keys (see below — all free, no credit card)

Step 1 — Clone & install Python dependencies

git clone <repo-url>
cd binary_hack
pip install -r requirements.txt

Step 2 — Install dashboard dependencies

cd dashboard
npm install

Step 3 — Get API keys (all free)

Open each URL simultaneously in browser tabs:

Service URL What you need
Groq https://console.groq.com API Key → used for Llama 3.3 70B + Whisper
Cerebras https://cloud.cerebras.ai API Key → fallback LLM provider
SambaNova https://cloud.sambanova.ai API Key → Llama 3.1 405B (complex reasoning)
OpenRouter https://openrouter.ai API Key → 27 free models, ultimate fallback
Cohere https://dashboard.cohere.com API Key → embeddings for vector search
Telegram Bot @BotFather /newbot → bot token
Tavily https://tavily.com API Key → web search (1,000/month free)
Google AI Studio https://aistudio.google.com API Key → Gemini (last resort fallback)

Step 4 — Configure environment

Copy .env.example to .env and fill in your keys:

cp .env.example .env
# Edit .env with your keys

Step 5 — Enable Telegram privacy mode (CRITICAL)

Before your bot can read group messages:

  1. Open Telegram → Search @BotFather
  2. Send /setprivacy
  3. Select your bot
  4. Choose Disable

Step 6 — Seed demo data (optional)

To populate ChromaDB with realistic demo data across 3 simulated groups (Dev team, Product team, Client channel):

python scripts/seed_demo.py

This seeds ~30 signals across 3 groups and runs cross-group analysis to demonstrate the blind spot detection.


Running the System

python run_all.py

This starts both the Telegram bot and the FastAPI server simultaneously.

Option B — Run services separately

# Terminal 1: FastAPI backend
python run_api.py
# Runs at http://localhost:8000

# Terminal 2: Telegram bot
python run_bot.py

# Terminal 3: Next.js dashboard
cd dashboard
npm run dev
# Runs at http://localhost:3000

Verify the system is running

curl http://localhost:8000/health
# → {"status":"ok","service":"thirdeye"}

curl http://localhost:8000/api/groups
# → {"groups": [...]}

Environment Variables

# ── Telegram ──────────────────────────────────────────────────────
TELEGRAM_BOT_TOKEN=          # From @BotFather

# ── LLM Providers (all free, no credit card) ──────────────────────
GROQ_API_KEY=                # console.groq.com — Llama 3.3 70B + Whisper
GROQ_API_KEY_2=              # Optional: second key for rate limit rotation
GROQ_API_KEY_3=              # Optional: third key for rate limit rotation
CEREBRAS_API_KEY=            # cloud.cerebras.ai — fast inference
SAMBANOVA_API_KEY=           # cloud.sambanova.ai — 405B reasoning
OPENROUTER_API_KEY=          # openrouter.ai — 27 free models
GEMINI_API_KEY=              # aistudio.google.com — last resort fallback

# ── Embeddings ────────────────────────────────────────────────────
COHERE_API_KEY=              # dashboard.cohere.com — vector embeddings

# ── Storage ───────────────────────────────────────────────────────
CHROMA_DB_PATH=./chroma_db   # Where ChromaDB persists to disk

# ── Message Batching ──────────────────────────────────────────────
BATCH_SIZE=5                 # Messages to buffer before processing
BATCH_TIMEOUT_SECONDS=60     # Flush buffer after this many idle seconds

# ── Web Search ────────────────────────────────────────────────────
TAVILY_API_KEY=              # tavily.com — web search fallback (1,000/month free)

# ── Google Meet Extension ─────────────────────────────────────────
MEET_INGEST_SECRET=          # Shared secret for extension authentication
MEET_DEFAULT_GROUP_ID=meet_sessions
ENABLE_MEET_INGESTION=true
MEET_CROSS_REF_GROUPS=       # Comma-separated group IDs to cross-reference

# ── Jira Integration ──────────────────────────────────────────────
JIRA_BASE_URL=               # https://your-org.atlassian.net
JIRA_EMAIL=                  # Your Atlassian email
JIRA_API_TOKEN=              # Atlassian API token (not your password)
JIRA_DEFAULT_PROJECT=ENG     # Default project key
JIRA_DEFAULT_ISSUE_TYPE=Task
ENABLE_JIRA=true
JIRA_AUTO_RAISE=false        # Auto-raise critical signals as Jira tickets
JIRA_AUTO_RAISE_SEVERITY=high

# ── Voice Message Intelligence ────────────────────────────────────
ENABLE_VOICE_TRANSCRIPTION=true
VOICE_MAX_DURATION_SECONDS=300   # Skip voice notes longer than 5 minutes
VOICE_MIN_DURATION_SECONDS=2     # Skip sub-2-second recordings
VOICE_LANGUAGE=                  # Leave empty for Whisper auto-detect
VOICE_STORE_TRANSCRIPT=true      # Store raw transcript in ChromaDB

API Reference

All endpoints are served by FastAPI at http://localhost:8000. The Next.js dashboard proxies /api/* to this backend.

Groups & Signals

GET  /api/groups
     → List all monitored groups with signal counts and lens

GET  /api/groups/{group_id}/signals
     ?signal_type=&severity=&lens=&date_from=&date_to=
     → Get signals with filters

POST /api/groups/{group_id}/query
     Body: {"question": "..."}
     → Natural language RAG query over the group's knowledge base

GET  /api/groups/{group_id}/patterns
     → Run pattern detection and return results

GET  /api/cross-group/insights
     → Run cross-group analysis across all monitored groups

GET  /api/signals/timeline
     ?group_id=&severity=&lens=&signal_type=&date_from=&date_to=&limit=200
     → Cross-group signal timeline, newest first

GET  /health
     → {"status": "ok", "service": "thirdeye"}

Knowledge Browser

GET  /api/knowledge/browse/{group_id}
     ?date_from=&date_to=&topic=
     → Topics (AI-clustered from keywords) + day-by-day timeline
     → Response: {group_id, group_name, total_signals, date_range, topics[], timeline[]}

Google Meet

POST /api/meet/start          → Called by Chrome extension when meeting begins
POST /api/meet/ingest         → Called every 30s with transcript chunk
GET  /api/meet/meetings       → List all recorded meetings
GET  /api/meet/meetings/{id}  → Meeting detail with signal breakdown
GET  /api/meet/meetings/{id}/signals   → All signals for a meeting
GET  /api/meet/meetings/{id}/transcript → Raw transcript chunks

Jira

GET  /api/jira/tickets        ?group_id=&date_from=&date_to=&live=
POST /api/jira/raise          Body: {signal_id, group_id, project_key, force}
POST /api/jira/create         Body: {summary, description, project_key, priority, ...}
GET  /api/jira/tickets/{key}/status   → Live status from Jira
GET  /api/jira/users/search   ?q=name  → Assignee search
GET  /api/jira/config         → Check Jira configuration

Data Models

Signal

The core unit of extracted intelligence:

class Signal(BaseModel):
    id: str                   # UUID
    group_id: str             # Telegram group ID
    lens: str                 # dev | product | client | community | meet
    type: str                 # architecture_decision | tech_debt | feature_request | ...
    summary: str              # One-sentence description
    entities: list[str]       # [@people, technologies, concepts]
    severity: str             # low | medium | high | critical
    status: str               # proposed | decided | implemented | unresolved
    sentiment: str            # positive | neutral | negative | urgent
    urgency: str              # none | low | medium | high | critical
    raw_quote: str            # Verbatim message text that triggered this signal
    keywords: list[str]       # 3-5 searchable keywords
    timestamp: str            # ISO 8601
    # Voice attribution (voice-sourced signals only)
    source: str               # voice | text | document | link | meet
    speaker: str              # @username
    voice_duration: int       # seconds
    # Jira tracking
    jira_key: str             # e.g. ENG-42
    jira_url: str

Pattern

class Pattern(BaseModel):
    id: str
    group_id: str
    type: str                 # frequency_spike | knowledge_silo | recurring_issue | sentiment_trend | stale_item
    description: str
    severity: str             # info | warning | critical
    evidence_signal_ids: list[str]
    recommendation: str
    detected_at: str
    is_active: bool

CrossGroupInsight

class CrossGroupInsight(BaseModel):
    id: str
    type: str                 # blocked_handoff | conflicting_decision | information_silo | promise_reality_gap | duplicated_effort
    description: str
    group_a: dict             # {name, group_id, evidence}
    group_b: dict
    severity: str             # warning | critical
    recommendation: str
    detected_at: str
    is_resolved: bool

Seeding Demo Data

Run the demo seeder to populate 3 realistic groups with interconnected signals that demonstrate cross-group intelligence:

python scripts/seed_demo.py

This seeds:

  • Acme Dev Team — 13 messages → signals including recurring timeout bug, knowledge silo (only Raj knows payment), blocked dashboard waiting for design specs
  • Acme Product Team — 10 messages → priority conflict (mobile vs API), promise to client about Friday dashboard demo, conversion metric drop
  • Acme ↔ ClientCo — 9 messages → client escalation risk, scope creep (PDF export + dark mode), unacknowledged deadline pressure

The cross-group analysis will surface:

  • Dev team blocked on design specs → Product team never mentions this
  • Product team promised Friday demo → Dev team says "blocked, week 2 of waiting"
  • Client escalation risk not visible to dev or product teams

After seeding, verify:

curl http://localhost:8000/api/groups
curl http://localhost:8000/api/cross-group/insights
curl http://localhost:8000/api/knowledge/browse/acme_dev

Competitive Positioning

                    REACTIVE ◄──────────────────────► PROACTIVE
                         │                               │
               Slack AI  │                               │  ★ ThirdEye
              (summaries │                               │  (persistent intelligence,
               on demand)│                               │   pattern detection,
                         │                               │   cross-group analysis,
                  Prism  │                               │   proactive alerts,
            (product     │                               │   voice + docs + Meet)
             signals)    │                               │
                         │                               │
         SINGLE GROUP ◄──┼───────────────────────────────┼──► CROSS-GROUP
                         │                               │
              Runbear    │                               │
              (Q&A only) │                               │

ThirdEye occupies the only empty quadrant: Proactive + Cross-Group.

No other product on the market today passively builds intelligence across multiple team chats simultaneously and detects blind spots between them. That is a new category.


Built at $0 in API costs. Every provider used has a free tier. No credit card required for any dependency.