arkorty/B.Tech-Project-III

Fork 0

mirror of https://github.com/arkorty/B.Tech-Project-III.git synced 2026-04-19 12:41:48 +00:00

Files

History

Arkaprabha Chakraborty 8be37d3e92 init

2026-04-05 00:43:23 +05:30

backend

init

2026-04-05 00:43:23 +05:30

dashboard

init

2026-04-05 00:43:23 +05:30

docs

init

2026-04-05 00:43:23 +05:30

meet_extension

init

2026-04-05 00:43:23 +05:30

scripts

init

2026-04-05 00:43:23 +05:30

.gitignore

init

2026-04-05 00:43:23 +05:30

package-lock.json

init

2026-04-05 00:43:23 +05:30

package.json

init

2026-04-05 00:43:23 +05:30

README.md

init

2026-04-05 00:43:23 +05:30

requirements.txt

init

2026-04-05 00:43:23 +05:30

run_all.py

init

2026-04-05 00:43:23 +05:30

run_api.py

init

2026-04-05 00:43:23 +05:30

run_bot.py

init

2026-04-05 00:43:23 +05:30

stitch_agents.html

init

2026-04-05 00:43:23 +05:30

stitch_dashboard.html

init

2026-04-05 00:43:23 +05:30

stitch_intelligence.html

init

2026-04-05 00:43:23 +05:30

stitch_knowledge.html

init

2026-04-05 00:43:23 +05:30

stitch_logs.html

init

2026-04-05 00:43:23 +05:30

test_ollama.py

init

2026-04-05 00:43:23 +05:30

README.md

ThirdEye 👁️

"Your team already has the answers. They're just trapped in chat."

ThirdEye is a multi-agent conversation intelligence engine that passively monitors Telegram group chats, extracts structured knowledge in real-time, builds a living knowledge graph, detects patterns humans miss, and catches cross-team blind spots before they become crises.

It is not a chatbot. Not a summarizer. Not a search tool. It is a conversation intelligence engine — the first of its kind to work across multiple teams simultaneously and detect what falls through the cracks between them.

The Problem
What ThirdEye Does
Key Features
Architecture Overview
The Adaptive Lens System
Cross-Group Intelligence
Agent Pipeline
LLM Provider Stack
Integrations
Knowledge Base Dashboard
Tech Stack
Project Structure
Setup & Installation
Running the System
Environment Variables
API Reference
Data Models
Seeding Demo Data

The Problem

Every day, your team makes hundreds of decisions, discovers bugs, promises clients things, and solves problems — all in group chat. And every day, that intelligence vanishes as messages scroll past.

The Knowledge Graveyard Problem:

Teams generate enormous value in group chats: decisions, recommendations, warnings, technical knowledge, client commitments
100% of this knowledge is lost within days as messages scroll past
When someone asks "why did we choose Redis?" or "what did we promise the client?" — nobody remembers, and nobody can find it
When Team A is blocked by Team B, but Team B doesn't know — nobody connects the dots until it's a crisis

Who feels this pain:

Engineering teams — decisions lost, tech debt invisible, knowledge silos forming
Product teams — feature requests scattered, roadmap drift unnoticed, priority conflicts buried
Client-facing teams — promises untracked, scope creep uncaught, sentiment shifts missed
Any team of 3+ people communicating in chat

What ThirdEye Does

Telegram Groups ──► Signal Extraction ──► ChromaDB Knowledge Graph
                          │                        │
                    8 AI Agents               Pattern Detection
                          │                        │
                    Cross-Group ◄────── Blind Spot Detection
                    Intelligence                   │
                          │                    Dashboard
                    Proactive Alerts          + Telegram /ask

Core Capabilities

Capability	Description
Passive Intelligence Extraction	Silently monitors group messages and extracts structured signals — no commands, no tagging, no forms
Adaptive Context Detection	Auto-detects whether a group is a dev team, product team, client channel, or community, then activates the right extraction ontology
Living Knowledge Graph	Builds a persistent, queryable knowledge base in ChromaDB that grows smarter over time
Natural Language Querying	Any group member can ask questions and get community-sourced answers with source attribution
Pattern Detection	Identifies trends: "Tech debt mentions up 300% this sprint." "Only @raj discusses payment — bus factor = 1."
Proactive Alerts	Sends Telegram alerts for concerning patterns: overdue promises, sentiment shifts, knowledge silos, recurring bugs
Cross-Group Intelligence ⭐	When deployed in multiple groups simultaneously, detects blind spots between teams: blocked handoffs, conflicting decisions, information silos, promise-reality gaps

Key Features

Signal Types Extracted

DevLens (Engineering Teams)

architecture_decision — Technology choices with rationale, alternatives considered, decision owner
tech_debt — Shortcuts, hardcoded values, "will fix later" patterns
knowledge_silo_evidence — Only one person discusses a critical domain (bus factor = 1)
recurring_bug — Same issue mentioned repeatedly across conversations
stack_decision — Technology/framework choices (proposed or decided)
deployment_risk — Risky deployment practices caught in chat
workaround — Temporary fixes being applied repeatedly

ProductLens (Product Teams)

feature_request — Features users or team members are asking for, with frequency
user_pain_point — User difficulties, complaints, confusion with evidence
roadmap_drift — Discussion of topics not on the current plan
priority_conflict — Team members disagreeing on what's most important
metric_mention — Specific numbers, conversion rates, performance data
user_quote — Direct quotes from customers with sentiment
competitor_intel — Mentions of competitor actions or features

ClientLens (Agency / Client Channels)

promise — Commitments made with deadlines (explicit or implicit)
scope_creep — Additional requests introduced casually without formal change
sentiment_signal — Tone changes: growing frustration, formality shifts, positive praise
unanswered_request — Client questions that haven't received responses (with time elapsed)
escalation_risk — Mentions of involving management, expressing deadline concerns
client_decision — Decisions made by the client

MeetLens (Google Meet)

meet_decision — Decisions made during meeting
meet_action_item — Actions committed to with owner
meet_blocker — Blockers raised in meeting
meet_risk — Risks identified during discussion
meet_summary — AI-generated meeting summary

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                          INPUT SOURCES                              │
├──────────────┬──────────────┬────────────────┬───────────────────── ┤
│  Telegram    │  Voice Notes │  Documents     │   Google Meet        │
│  Messages    │  (OGG/MP4)   │  (PDF/DOCX/TXT)│   (Chrome Extension) │
└──────┬───────┴──────┬───────┴───────┬────────┴──────┬──────────────┘
       │              │               │               │
       ▼              ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     PROCESSING PIPELINE                             │
│                                                                     │
│  1. Context Detector ──► Lens Assignment (dev/product/client/meet)  │
│  2. Signal Extractor ──► Structured signal extraction (lens-aware)  │
│  3. Classifier Agent ──► Sentiment, urgency, keyword tagging        │
│  4. Knowledge Architect ► Conflict resolution, entity linking       │
│                                                                     │
└─────────────────────────────────────────────────┬───────────────────┘
                                                  │
                                                  ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     CHROMADB KNOWLEDGE GRAPH                        │
│              (Per-group vector collections, Cohere embeddings)      │
└───────────────────────────────────┬─────────────────────────────────┘
                                    │
              ┌─────────────────────┼──────────────────────┐
              ▼                     ▼                      ▼
┌─────────────────────┐  ┌──────────────────────┐  ┌──────────────────┐
│  Pattern Detector   │  │  Cross-Group Analyst  │  │   Query Agent    │
│  (within a group)   │  │  (between all groups) │  │  (RAG pipeline)  │
└──────────┬──────────┘  └──────────┬───────────┘  └────────┬─────────┘
           │                        │                        │
           ▼                        ▼                        ▼
┌──────────────────────────────────────────────────────────────────────┐
│                         OUTPUT LAYER                                 │
├───────────────┬───────────────────┬──────────────┬───────────────── ┤
│  Telegram     │  Next.js          │  Jira        │  Proactive       │
│  Bot Replies  │  Dashboard        │  Tickets     │  Alerts          │
│  (/ask, etc.) │  (Knowledge Graph,│  (Auto-raise │  (Patterns,      │
│               │   Timeline, Maps) │  + Manual)   │   Cross-group)   │
└───────────────┴───────────────────┴──────────────┴──────────────────┘

The Adaptive Lens System

When ThirdEye joins a group, the Context Detector Agent reads the first 50–100 messages and auto-classifies into one of four modes. Users can override at any time with /lens set dev or combine with /lens set dev+product.

                 ┌─────────────────────────────────────────┐
                 │          Incoming Group Messages         │
                 └────────────────────┬────────────────────┘
                                      │
                                      ▼
                      ┌───────────────────────────┐
                      │   Context Detector Agent   │
                      │   (SambaNova 405B)         │
                      │   Reads first 50-100 msgs  │
                      └───────────────────────────┘
                            │     │     │     │
              ┌─────────────┘     │     │     └─────────────────┐
              ▼                   ▼     ▼                       ▼
       ┌──────────┐        ┌──────────┐ ┌──────────┐    ┌──────────────┐
       │ DevLens  │        │ Product  │ │  Client  │    │  Community   │
       │ (dev)    │        │ Lens     │ │  Lens    │    │  Lens        │
       │          │        │(product) │ │ (client) │    │ (community)  │
       │ Tech debt│        │ Features │ │ Promises │    │ Recommend-   │
       │ Arch dec │        │ Pain pts │ │ Sentiment│    │ ations,      │
       │ Bugs     │        │ Roadmap  │ │ Scope    │    │ Events,      │
       │ Silos    │        │ Metrics  │ │ Escalat. │    │ Local info   │
       └──────────┘        └──────────┘ └──────────┘    └──────────────┘

Cross-Group Intelligence

This is the killer feature. When ThirdEye monitors 2+ groups, the Cross-Group Analyst Agent activates and finds things no human would catch:

What It Detects

Blocked Handoffs

Dev Team:     "Still waiting for design specs" (mentioned 3× this week)
Design Team:  No mention of this deliverable anywhere
→ ALERT: "Likely handoff gap between dev and design teams."

Conflicting Decisions

Product Team: Decided "mobile-first approach" on Mar 15
Dev Team:     Discussing "API-first priority" on Mar 17
→ ALERT: "Potential misalignment between product and dev priorities."

Information Silos

Client Channel: "New regulatory requirement mentioned" on Mar 12
Dev + Product:  No mention of this requirement in either group
→ ALERT: "Critical client information not flowing to internal teams."

Promise-Reality Gaps

Product Team: "We'll have the dashboard ready by Friday for the client"
Dev Team:     "Dashboard completely blocked. No design specs. Week 2 of waiting."
→ ALERT: "Promise to client may not be deliverable. Timeline risk."

Duplicated Effort

Dev Team:     Building an internal analytics dashboard
Design Team:  Simultaneously prototyping a similar dashboard
→ ALERT: "Possible duplicated effort across teams."

Agent Pipeline

ThirdEye uses 8 specialized AI agents, each mapped to the optimal LLM for its task:

Message arrives from ANY Telegram group
         │
         ▼
┌─────────────────────────────┐
│  1. Context Detector Agent  │  Auto-classifies group type (one-time + periodic)
│     (SambaNova 405B)        │  Activates appropriate lens ontology
└─────────────┬───────────────┘
              │
              ▼
┌─────────────────────────────┐
│  2. Signal Extractor Agent  │  Processes batched messages (every 5-10 msgs)
│     (Groq Llama 3.3 70B)   │  Extracts domain-specific signals per active lens
└─────────────┬───────────────┘
              │
         ┌────┴────┐
         ▼         ▼
┌──────────────┐  ┌──────────────────────┐
│  3. Classif. │  │  4. Entity           │
│     Agent    │  │     Resolver Agent   │
│  (Groq 8B)   │  │  (Groq 70B)          │
│  Tags type,  │  │  Resolves "this bug",│
│  sentiment,  │  │  "same issue" refs   │
│  urgency     │  │  across messages     │
└──────┬───────┘  └────────┬─────────────┘
       └────────┬───────────┘
                ▼
┌─────────────────────────────┐
│  5. Knowledge Architect     │  Builds + maintains ChromaDB knowledge graph
│     Agent                   │  Resolves conflicts, tracks temporal changes
│  (SambaNova 405B /          │  Links entities across conversations
│   OpenRouter Nemotron 120B) │
└─────────────┬───────────────┘
              │
         ┌────┴────┐
         ▼         ▼
┌──────────────┐  ┌──────────────────┐
│  6. Pattern  │  │  7. Cross-Group   │  ⭐ THE KILLER AGENT
│     Detector │  │     Analyst Agent │  Compares intelligence across
│     Agent    │  │  (SambaNova 405B) │  ALL deployed groups
│  (OpenRouter │  │                   │  Detects blind spots, conflicts,
│   Nemotron)  │  │                   │  information silos
└──────┬───────┘  └────────┬──────────┘
       └────────┬───────────┘
                ▼
┌─────────────────────────────┐
│  8. Query Agent             │  Handles /ask queries via agentic RAG
│  (Groq 70B — fast response) │  over ChromaDB + web search fallback
└─────────────────────────────┘

LLM Provider Stack

ThirdEye uses 5 free LLM providers with automatic fallback routing — zero API costs:

Groq (fastest) → Cerebras (fast) → SambaNova (405B) → OpenRouter (27 models) → Google AI Studio

Provider	Key Models	Daily Capacity	Cost
Groq	Llama 3.3 70B, Llama 3.1 8B, Whisper v3	~6,000 RPD	Free
Cerebras	Llama 3.3 70B, Llama 3.1 8B	~2,000 RPD	Free
SambaNova	Llama 3.1 405B, Llama 3.3 70B	~1,000 RPD	Free
OpenRouter	Nemotron 120B, GPT-OSS 120B, 25+ models	~5,400 RPD	Free
Cohere	embed-english-v3.0 (embeddings), Rerank 3.5	1,000/month	Free
Google AI Studio	Gemini 2.5 Flash/Pro	~50 RPD	Free

Combined daily capacity: ~10,000–15,000 requests for $0

Agent → Model Mapping

Agent	Task Type	Primary	Fallback
Context Detector	Complex reasoning	SambaNova / Llama 3.1 405B	OpenRouter / Nemotron 120B
Signal Extractor	Structured extraction	Groq / Llama 3.3 70B	Cerebras / Llama 3.3 70B
Classifier	Fast classification	Groq / Llama 3.1 8B	Cerebras / Llama 3.1 8B
Pattern Detector	Trend analysis	OpenRouter / Nemotron 120B	SambaNova / 405B
Cross-Group Analyst	Cross-referencing	SambaNova / Llama 3.1 405B	OpenRouter / GPT-OSS 120B
Query Agent	Fast Q&A	Groq / Llama 3.3 70B	Cerebras / Llama 3.3 70B
Voice Transcriber	Audio-to-text	Groq / Whisper Large v3	—

Integrations

Telegram Bot

The bot is the primary input interface. It operates passively — no need to tag it or use commands for intelligence extraction to happen.

Commands

/start           — Welcome message and command reference
/ask [question]  — Query the knowledge base with natural language
/search [query]  — Explicit web search via Tavily
/digest          — Get an intelligence summary of the group
/lens [mode]     — Set or check detection mode (dev/product/client/community)
/flush           — Force-process the current message buffer immediately
/debug           — Show group config and signal count

/jira            — Show Jira integration status
/jirastatus      — Get latest status of a ticket (e.g. /jirastatus ENG-42)
/jirasearch      — Search Jira tickets by keyword
/jiraraised      — List all tickets raised by ThirdEye for this group
/jirawatch       — Watch a signal and auto-raise if it recurs

/voicelog        — List all voice note decisions extracted from this group
/meetsum         — Summarize the most recent Google Meet meeting
/meetask [q]     — Ask a question about meeting content
/meetmatch       — Find connections between meeting content and Telegram discussions

Passive Capabilities (no command needed)

Text messages → batched every 5–10 messages → signal extraction pipeline
Documents (PDF, DOCX, TXT, MD, CSV, JSON, LOG up to 10MB) → auto-ingested into knowledge graph
Voice notes (OGG) and video notes (MP4) → transcribed by Groq Whisper → signal extracted
URLs in messages → fetched → summarized → stored as link_knowledge signals

Google Meet Chrome Extension

A Chrome extension that captures live Google Meet transcripts and feeds them into ThirdEye in real-time.

How it works:

Extension uses the browser's Web Speech API (free, no API key) for real-time transcription
Every 30 seconds, a transcript chunk is sent to the ThirdEye FastAPI backend
The backend's MeetIngestor agent extracts meeting signals (decisions, action items, blockers, risks)
All signals are stored in ChromaDB alongside your Telegram/document/voice signals
After the meeting, /meetsum and /meetask in Telegram give instant access to meeting intelligence

Signal types extracted from meetings:

meet_decision — Formal decisions made during the call
meet_action_item — "X will do Y by Z" commitments
meet_blocker — Blockers raised during discussion
meet_risk — Risks identified in conversation
meet_summary — Full AI-generated meeting summary

Cross-referencing: /meetmatch connects meeting content to your Telegram history — surfacing prior discussions, related signals, and any contradictions between what was decided in the meeting vs. what's in the group chat.

Jira Integration

ThirdEye connects to Atlassian Jira Cloud to convert extracted signals into actionable tickets.

How Signals Become Tickets

Signal in ChromaDB ──► Jira Agent ──► LLM generates well-formed ticket
                                               │
                              ┌────────────────┼────────────────────┐
                              ▼                ▼                    ▼
                         Issue Type       Priority           Description
                         (Bug/Task/       (based on         (summary +
                          Story)          severity)          raw quote +
                                                             entity context)
                                               │
                                               ▼
                                      Jira REST API v3
                                               │
                                               ▼
                              ┌────────────────┴────────────────┐
                              ▼                                  ▼
                     Ticket created                   `jira_raised` tracking
                     in Atlassian                     signal stored in
                     Cloud                            ChromaDB (prevents
                                                      duplicates)

Commands

/ask raise jira on [topic]               # Semantic search → bulk raise
/ask assign jira ticket for [X] to [Name] # Raise + assign in one command
/jirastatus ENG-42                        # Live ticket status from Jira
/jiraraised                               # List all tickets raised by ThirdEye

Auto-Raise Mode

When JIRA_AUTO_RAISE=true, ThirdEye automatically raises Jira tickets when:

A signal with severity ≥ JIRA_AUTO_RAISE_SEVERITY is extracted
A recurring bug pattern is detected (3+ occurrences)
A critical blocker is identified in a meeting

Voice Message Intelligence

The differentiator: Most chat intelligence products are text-only. ThirdEye is the first to process voice messages.

How it works:

Voice/Video Note Received
         │
         ▼
Download from Telegram (OGG/MP4)
         │
         ▼
Groq Whisper Large v3 Transcription
(7,200 sec/hour free tier — ~240 voice notes/hour)
         │
         ▼
Transcript → Signal Extraction Pipeline
(same as text messages)
         │
         ▼
Signals stored with full voice attribution:
- speaker: @username
- voice_duration: seconds
- voice_language: auto-detected
- voice_file_id: Telegram file reference
         │
         ▼
/voicelog shows all voice-sourced decisions
/ask cites voice notes: "Based on what @Raj said in a voice note on March 14th..."

/voicelog examples:

/voicelog                    → All voice note signals
/voicelog decisions          → Only decisions from voice
/voicelog @raj               → All voice notes from Raj
/voicelog api                → Voice notes mentioning "api"

Document & Link Ingestion

Document Types Supported: PDF, DOCX, TXT, MD, CSV, JSON, LOG (up to 10MB)

Processing Pipeline:

File shared in Telegram group
         │
         ▼
Bot downloads to temp directory
         │
         ▼
Text extraction (PyPDF2 / python-docx / plain text)
         │
         ▼
Intelligent chunking (1,500 char chunks, 200 char overlap,
paragraph boundaries respected)
         │
         ▼
Each chunk stored as `document_knowledge` signal in ChromaDB
with metadata: filename, page number, shared_by, timestamp
         │
         ▼
Queryable alongside chat signals via /ask

URL Processing: When a URL is detected in any message, ThirdEye:

Checks against skip-list (images, downloads, social embeds, Telegram links)
Fetches the page content with BeautifulSoup parsing
Summarizes with an LLM (2-4 sentence summary)
Stores as link_knowledge signal with source attribution
All happens in the background — no delay to the main message pipeline

Knowledge Base Dashboard

A Next.js dashboard (located in dashboard/) that provides a full visual intelligence console.

Pages

Page	Description
Knowledge Base	Interactive network graph showing groups, signal clusters, and real-time data flow. Split-panel with Knowledge Browser for topic/date exploration
Timeline	Cross-group signal timeline with full filter support (severity, lens, type, date range)
Patterns	Pattern detection results with severity badges and recommendations
Cross-Group	Cross-group intelligence insights — the blind spot detector
Jira	Jira ticket manager — view, create, and track tickets raised by ThirdEye
Meetings	Google Meet session list with per-meeting signal breakdown and transcript viewer

Knowledge Base Network Map

The central visualization shows your intelligence network as a live graph:

Core Node (violet diamond) — the ThirdEye Neural Engine
Group Nodes (colored circles) — each monitored Telegram group, color-coded by lens type
Signal Satellites — top 3 signal types per group orbiting each group node
Data Flow Lines — animated neon traveling packets showing live data flow from groups to the core engine
Cross-edges — connections between adjacent groups when cross-group intelligence is detected

Knowledge Browser

A structured topic/date reference panel built alongside the graph:

Group selector — browse any monitored group
Date range filter — scope the view to specific time periods
AI-clustered topic chips — topics derived from keyword frequency across all signals
Day-by-day timeline — vertical scroll through days, each signal expandable to show raw quote, entities, and keywords
Footer stats — total signals · topics count · days span

Tech Stack

Backend

Component	Technology
Bot Framework	python-telegram-bot 21.9
API Server	FastAPI 0.115 + Uvicorn
Vector Database	ChromaDB 1.0 (persistent, local)
Embeddings	Cohere embed-english-v3.0 (primary) + sentence-transformers all-MiniLM-L6-v2 (fallback)
LLM Routing	Custom multi-provider router (openai SDK)
Audio Transcription	Groq Whisper Large v3
Web Search	Tavily (1,000 searches/month free)
HTTP Client	httpx (async)
HTML Parsing	BeautifulSoup4
Document Parsing	PyPDF2 + python-docx
Data Validation	Pydantic v2

Frontend

Component	Technology
Framework	Next.js 15 (App Router)
UI	Tailwind CSS
Graph Visualization	Custom SVG with CSS animations
State	React hooks (no external state library)
API Client	Native fetch (proxied via Next.js rewrites)

Project Structure

binary_hack/
│
├── backend/                          # Python backend
│   ├── api/
│   │   └── routes.py                 # FastAPI endpoints
│   │
│   ├── agents/
│   │   ├── signal_extractor.py       # Core signal extraction (lens-aware)
│   │   ├── classifier.py             # Sentiment, urgency, keyword tagging
│   │   ├── context_detector.py       # Auto-detect group type from messages
│   │   ├── query_agent.py            # RAG pipeline context formatter
│   │   ├── pattern_detector.py       # Detect recurring patterns in signals
│   │   ├── cross_group_analyst.py    # Cross-group blind spot detection
│   │   ├── jira_agent.py             # Jira ticket generation + bulk raise
│   │   ├── document_ingestor.py      # PDF/DOCX/TXT extraction + chunking
│   │   ├── voice_handler.py          # Voice download + pipeline orchestration
│   │   ├── voice_transcriber.py      # Groq Whisper transcription
│   │   ├── meet_ingestor.py          # Google Meet transcript chunk processing
│   │   ├── meet_cross_ref.py         # Meeting ↔ chat signal cross-referencing
│   │   ├── web_search.py             # Tavily web search integration
│   │   ├── link_fetcher.py           # URL extraction, fetch, summarize
│   │   └── json_utils.py             # Robust JSON extraction from LLM outputs
│   │
│   ├── bot/
│   │   ├── bot.py                    # Telegram bot entry point + all handlers
│   │   └── commands.py               # /voicelog command
│   │
│   ├── db/
│   │   ├── chroma.py                 # ChromaDB CRUD operations
│   │   ├── embeddings.py             # Cohere + fallback embedding client
│   │   └── models.py                 # Pydantic models: Signal, Pattern, etc.
│   │
│   ├── integrations/
│   │   └── jira_client.py            # Atlassian REST API v3 client
│   │
│   ├── pipeline.py                   # Core orchestration: batch → signals → store → query
│   ├── providers.py                  # Multi-provider LLM router with fallback
│   └── config.py                     # All env vars via python-dotenv
│
├── dashboard/                        # Next.js frontend
│   ├── app/
│   │   ├── knowledge-base/           # Network graph + Knowledge Browser
│   │   │   ├── page.tsx
│   │   │   ├── NetworkMap.tsx        # SVG network visualization with animated edges
│   │   │   ├── KnowledgeBrowser.tsx  # Topic/date knowledge reference panel
│   │   │   ├── RightPanelTabs.tsx    # Tab switcher: Knowledge ↔ Query
│   │   │   ├── EntityPanel.tsx       # Deep query panel
│   │   │   ├── FloatingControls.tsx  # Graph controls HUD
│   │   │   ├── SystemTickerKnowledge.tsx
│   │   │   └── knowledge.css
│   │   │
│   │   ├── timeline/                 # Cross-group signal timeline
│   │   ├── jira/                     # Jira ticket management
│   │   ├── meetings/                 # Google Meet session viewer
│   │   ├── components/               # Shared UI components
│   │   └── lib/
│   │       └── api.ts                # Typed API client for all endpoints
│   │
│   └── next.config.ts                # Proxies /api → localhost:8000
│
├── chroma_db/                        # ChromaDB persistent storage (git-ignored)
├── docs/                             # Architecture docs and implementation guides
│   ├── sot.md                        # Single Source of Truth (full system design)
│   ├── implementation.md             # Milestone-by-milestone build guide
│   ├── additional.md                 # Milestones 11-13 (docs, web search, links)
│   ├── Voice_milestones.md           # Milestones 20-22 (voice intelligence)
│   ├── jira_milestones.md            # Milestones 17-19 (Jira integration)
│   ├── meet_extension.md             # Milestones 14-16 (Google Meet)
│   └── BATCHING_CONFIG.md            # Message batching configuration guide
│
├── .env                              # Environment variables (not in git)
├── requirements.txt                  # Python dependencies
└── README.md                         # This file

Setup & Installation

Prerequisites

Python 3.11+
Node.js 18+
A Telegram account + bot token (from @BotFather)
Free API keys (see below — all free, no credit card)

Step 1 — Clone & install Python dependencies

git clone <repo-url>
cd binary_hack
pip install -r requirements.txt

Step 2 — Install dashboard dependencies

cd dashboard
npm install

Step 3 — Get API keys (all free)

Open each URL simultaneously in browser tabs:

Service	URL	What you need
Groq	https://console.groq.com	API Key → used for Llama 3.3 70B + Whisper
Cerebras	https://cloud.cerebras.ai	API Key → fallback LLM provider
SambaNova	https://cloud.sambanova.ai	API Key → Llama 3.1 405B (complex reasoning)
OpenRouter	https://openrouter.ai	API Key → 27 free models, ultimate fallback
Cohere	https://dashboard.cohere.com	API Key → embeddings for vector search
Telegram Bot	@BotFather	`/newbot` → bot token
Tavily	https://tavily.com	API Key → web search (1,000/month free)
Google AI Studio	https://aistudio.google.com	API Key → Gemini (last resort fallback)

Step 4 — Configure environment

Copy .env.example to .env and fill in your keys:

cp .env.example .env
# Edit .env with your keys

Step 5 — Enable Telegram privacy mode (CRITICAL)

Before your bot can read group messages:

Open Telegram → Search @BotFather
Send /setprivacy
Select your bot
Choose Disable

Step 6 — Seed demo data (optional)

To populate ChromaDB with realistic demo data across 3 simulated groups (Dev team, Product team, Client channel):

python scripts/seed_demo.py

This seeds ~30 signals across 3 groups and runs cross-group analysis to demonstrate the blind spot detection.

Running the System

Option A — Run everything together (recommended)

python run_all.py

This starts both the Telegram bot and the FastAPI server simultaneously.

Option B — Run services separately

# Terminal 1: FastAPI backend
python run_api.py
# Runs at http://localhost:8000

# Terminal 2: Telegram bot
python run_bot.py

# Terminal 3: Next.js dashboard
cd dashboard
npm run dev
# Runs at http://localhost:3000

Verify the system is running

curl http://localhost:8000/health
# → {"status":"ok","service":"thirdeye"}

curl http://localhost:8000/api/groups
# → {"groups": [...]}

Environment Variables

# ── Telegram ──────────────────────────────────────────────────────
TELEGRAM_BOT_TOKEN=          # From @BotFather

# ── LLM Providers (all free, no credit card) ──────────────────────
GROQ_API_KEY=                # console.groq.com — Llama 3.3 70B + Whisper
GROQ_API_KEY_2=              # Optional: second key for rate limit rotation
GROQ_API_KEY_3=              # Optional: third key for rate limit rotation
CEREBRAS_API_KEY=            # cloud.cerebras.ai — fast inference
SAMBANOVA_API_KEY=           # cloud.sambanova.ai — 405B reasoning
OPENROUTER_API_KEY=          # openrouter.ai — 27 free models
GEMINI_API_KEY=              # aistudio.google.com — last resort fallback

# ── Embeddings ────────────────────────────────────────────────────
COHERE_API_KEY=              # dashboard.cohere.com — vector embeddings

# ── Storage ───────────────────────────────────────────────────────
CHROMA_DB_PATH=./chroma_db   # Where ChromaDB persists to disk

# ── Message Batching ──────────────────────────────────────────────
BATCH_SIZE=5                 # Messages to buffer before processing
BATCH_TIMEOUT_SECONDS=60     # Flush buffer after this many idle seconds

# ── Web Search ────────────────────────────────────────────────────
TAVILY_API_KEY=              # tavily.com — web search fallback (1,000/month free)

# ── Google Meet Extension ─────────────────────────────────────────
MEET_INGEST_SECRET=          # Shared secret for extension authentication
MEET_DEFAULT_GROUP_ID=meet_sessions
ENABLE_MEET_INGESTION=true
MEET_CROSS_REF_GROUPS=       # Comma-separated group IDs to cross-reference

# ── Jira Integration ──────────────────────────────────────────────
JIRA_BASE_URL=               # https://your-org.atlassian.net
JIRA_EMAIL=                  # Your Atlassian email
JIRA_API_TOKEN=              # Atlassian API token (not your password)
JIRA_DEFAULT_PROJECT=ENG     # Default project key
JIRA_DEFAULT_ISSUE_TYPE=Task
ENABLE_JIRA=true
JIRA_AUTO_RAISE=false        # Auto-raise critical signals as Jira tickets
JIRA_AUTO_RAISE_SEVERITY=high

# ── Voice Message Intelligence ────────────────────────────────────
ENABLE_VOICE_TRANSCRIPTION=true
VOICE_MAX_DURATION_SECONDS=300   # Skip voice notes longer than 5 minutes
VOICE_MIN_DURATION_SECONDS=2     # Skip sub-2-second recordings
VOICE_LANGUAGE=                  # Leave empty for Whisper auto-detect
VOICE_STORE_TRANSCRIPT=true      # Store raw transcript in ChromaDB

API Reference

All endpoints are served by FastAPI at http://localhost:8000. The Next.js dashboard proxies /api/* to this backend.

Groups & Signals

GET  /api/groups
     → List all monitored groups with signal counts and lens

GET  /api/groups/{group_id}/signals
     ?signal_type=&severity=&lens=&date_from=&date_to=
     → Get signals with filters

POST /api/groups/{group_id}/query
     Body: {"question": "..."}
     → Natural language RAG query over the group's knowledge base

GET  /api/groups/{group_id}/patterns
     → Run pattern detection and return results

GET  /api/cross-group/insights
     → Run cross-group analysis across all monitored groups

GET  /api/signals/timeline
     ?group_id=&severity=&lens=&signal_type=&date_from=&date_to=&limit=200
     → Cross-group signal timeline, newest first

GET  /health
     → {"status": "ok", "service": "thirdeye"}

Knowledge Browser

GET  /api/knowledge/browse/{group_id}
     ?date_from=&date_to=&topic=
     → Topics (AI-clustered from keywords) + day-by-day timeline
     → Response: {group_id, group_name, total_signals, date_range, topics[], timeline[]}

Google Meet

POST /api/meet/start          → Called by Chrome extension when meeting begins
POST /api/meet/ingest         → Called every 30s with transcript chunk
GET  /api/meet/meetings       → List all recorded meetings
GET  /api/meet/meetings/{id}  → Meeting detail with signal breakdown
GET  /api/meet/meetings/{id}/signals   → All signals for a meeting
GET  /api/meet/meetings/{id}/transcript → Raw transcript chunks

Jira

GET  /api/jira/tickets        ?group_id=&date_from=&date_to=&live=
POST /api/jira/raise          Body: {signal_id, group_id, project_key, force}
POST /api/jira/create         Body: {summary, description, project_key, priority, ...}
GET  /api/jira/tickets/{key}/status   → Live status from Jira
GET  /api/jira/users/search   ?q=name  → Assignee search
GET  /api/jira/config         → Check Jira configuration

Data Models

Signal

The core unit of extracted intelligence:

class Signal(BaseModel):
    id: str                   # UUID
    group_id: str             # Telegram group ID
    lens: str                 # dev | product | client | community | meet
    type: str                 # architecture_decision | tech_debt | feature_request | ...
    summary: str              # One-sentence description
    entities: list[str]       # [@people, technologies, concepts]
    severity: str             # low | medium | high | critical
    status: str               # proposed | decided | implemented | unresolved
    sentiment: str            # positive | neutral | negative | urgent
    urgency: str              # none | low | medium | high | critical
    raw_quote: str            # Verbatim message text that triggered this signal
    keywords: list[str]       # 3-5 searchable keywords
    timestamp: str            # ISO 8601
    # Voice attribution (voice-sourced signals only)
    source: str               # voice | text | document | link | meet
    speaker: str              # @username
    voice_duration: int       # seconds
    # Jira tracking
    jira_key: str             # e.g. ENG-42
    jira_url: str

Pattern

class Pattern(BaseModel):
    id: str
    group_id: str
    type: str                 # frequency_spike | knowledge_silo | recurring_issue | sentiment_trend | stale_item
    description: str
    severity: str             # info | warning | critical
    evidence_signal_ids: list[str]
    recommendation: str
    detected_at: str
    is_active: bool

CrossGroupInsight

class CrossGroupInsight(BaseModel):
    id: str
    type: str                 # blocked_handoff | conflicting_decision | information_silo | promise_reality_gap | duplicated_effort
    description: str
    group_a: dict             # {name, group_id, evidence}
    group_b: dict
    severity: str             # warning | critical
    recommendation: str
    detected_at: str
    is_resolved: bool

Seeding Demo Data

Run the demo seeder to populate 3 realistic groups with interconnected signals that demonstrate cross-group intelligence:

python scripts/seed_demo.py

This seeds:

Acme Dev Team — 13 messages → signals including recurring timeout bug, knowledge silo (only Raj knows payment), blocked dashboard waiting for design specs
Acme Product Team — 10 messages → priority conflict (mobile vs API), promise to client about Friday dashboard demo, conversion metric drop
Acme ↔ ClientCo — 9 messages → client escalation risk, scope creep (PDF export + dark mode), unacknowledged deadline pressure

The cross-group analysis will surface:

Dev team blocked on design specs → Product team never mentions this
Product team promised Friday demo → Dev team says "blocked, week 2 of waiting"
Client escalation risk not visible to dev or product teams

After seeding, verify:

curl http://localhost:8000/api/groups
curl http://localhost:8000/api/cross-group/insights
curl http://localhost:8000/api/knowledge/browse/acme_dev

Competitive Positioning

                    REACTIVE ◄──────────────────────► PROACTIVE
                         │                               │
               Slack AI  │                               │  ★ ThirdEye
              (summaries │                               │  (persistent intelligence,
               on demand)│                               │   pattern detection,
                         │                               │   cross-group analysis,
                  Prism  │                               │   proactive alerts,
            (product     │                               │   voice + docs + Meet)
             signals)    │                               │
                         │                               │
         SINGLE GROUP ◄──┼───────────────────────────────┼──► CROSS-GROUP
                         │                               │
              Runbear    │                               │
              (Q&A only) │                               │

ThirdEye occupies the only empty quadrant: Proactive + Cross-Group.

No other product on the market today passively builds intelligence across multiple team chats simultaneously and detects blind spots between them. That is a new category.

Built at $0 in API costs. Every provider used has a free tier. No credit card required for any dependency.

README.md Unescape Escape

ThirdEye 👁️

Table of Contents

The Problem

What ThirdEye Does

Core Capabilities

Key Features

Signal Types Extracted

Architecture Overview

The Adaptive Lens System

Cross-Group Intelligence

What It Detects

Agent Pipeline

LLM Provider Stack

Agent → Model Mapping

Integrations

Telegram Bot

Commands

Passive Capabilities (no command needed)

Google Meet Chrome Extension

Jira Integration

How Signals Become Tickets

Commands

Auto-Raise Mode

Voice Message Intelligence

Document & Link Ingestion

Knowledge Base Dashboard

Pages

Knowledge Base Network Map

Knowledge Browser

Tech Stack

Backend

Frontend

Project Structure

Setup & Installation

Prerequisites

Step 1 — Clone & install Python dependencies

Step 2 — Install dashboard dependencies

Step 3 — Get API keys (all free)

Step 4 — Configure environment

Step 5 — Enable Telegram privacy mode (CRITICAL)

Step 6 — Seed demo data (optional)

Running the System

Option A — Run everything together (recommended)

Option B — Run services separately

Verify the system is running

Environment Variables

API Reference

Groups & Signals

Knowledge Browser

Google Meet

Jira

Data Models

Signal

Pattern

CrossGroupInsight

Seeding Demo Data

Competitive Positioning

README.md