👁️ AuraVision: Advanced Assistive Vision System Documentation

AuraVision (formerly SmartAV) is a state-of-the-art, real-time assistive technology system designed to empower visually impaired and blind users. It integrates advanced Computer Vision (CV), Large Language Models (LLM), and Retrieval-Augmented Generation (RAG) to provide contextual, spoken environmental awareness.

This document serves as the developer and systems engineering reference, covering the core architecture, pipelines, file structures, API catalog, database models, security rules, and DevOps instructions.

🏗️ 1. Technical Architecture & Data Pipelines

AuraVision is split into three main layers: Perception (CV Thread), Reasoning/Storage (AI Loop), and Interaction (Web/TTS).

graph TD
    %% perception
    A[Camera Feed - camera.py] -->|cv2.VideoCapture| B[Orchestrator Thread - engine.py]
    B -->|multiprocessing.Queue| C[Inference Worker Process - engine.py]
    
    %% models
    C -->|YOLOE-26N-seg| D[Segmenter / Object Locator]
    C -->|MTCNN / InceptionResnetV1| E[Face Identifier]
    
    %% feedback loop
    D -->|Detections Queue| B
    E -->|Face Coordinates| B
    
    %% logging / sync
    B -->|Detections / Guidance| F[Data Logger - data_logger.py]
    F -->|JSONL Write| G[(detections.jsonl)]
    F -->|Background Sync| H[(Google Firestore)]
    
    %% AI / RAG
    B -->|Scene JSON + Frame| I[Scene Reasoner - reasoner.py]
    I -->|Multimodal Input| J[Alice LLM Service - llm_service.py]
    J -->|Query| K[RAG Service - rag_service.py]
    K -->|Google GenAI Embeddings| L[(ChromaDB Vector Store)]
    J -->|Gemini-3-Flash| M[Guidance Text Response]
    
    %% Interaction
    M -->|Typewriter Sync| N[Web Frontend SPA]
    M -->|Audio Feedback - audio.py| O[Zero-Latency TTS / gTTS]
    H -->|Sync Profile / Devices / Faces| N

1.1 The Inference Worker Process (GIL Avoidance)

To guarantee real-time performance and prevent Python's Global Interpreter Lock (GIL) from choking on high-CPU tensor operations (YOLO & PyTorch face embeddings), inference is isolated in a separate OS process:

Lifecycle: Managed via multiprocessing.Process. Spawned on server startup, killed on shutdown.
IPC Channel: Uses three thread-safe multiprocessing Queues:
- _input_queue (maxsize=1): Accepts the latest raw camera frame. Drops incoming frames if the inference queue is full to avoid latency lag.
- _output_queue (maxsize=1): Delivers parsed detection arrays back to the orchestrator.
- _command_queue: Used to signal live vocab reloads (dynamic search) or known face encoding refreshes to the worker process on-the-fly.
Worker Loop (inference_worker):
- Continually polls the _command_queue for instructions.
- Captures frames from _input_queue.
- Runs YOLO segmentations and FaceNet embeddings comparison.
- Writes a heartbeat timestamps file to /tmp/worker_heartbeat.log every 5 seconds for diagnostic monitoring.

1.2 The Orchestrator Thread (`detection_loop`)

The orchestrator runs as a daemon thread in the main FastAPI application:

Pulls frames from the background camera thread.
Feeds the inference worker.
Coordinates dynamic visual search expiry timers.
Saves crop snapshots of detected objects to src/static/img/snapshots/ (PDPA compliant: skips generic person labels to protect privacy).
Triggers async LLM reasoning via SceneReasoner if a scene shift is detected.
Estimates walkable path clearance dynamically.

1.3 Scene Change Similarity Optimization

To minimize token consumption and voice congestion, the orchestrator evaluates whether the frame has changed before invoking the LLM:

Resizes both the previous LLM frame and the current frame to a tiny 64x64 grayscale resolution.
Computes the average pixel difference using cv2.absdiff().mean().
If the difference is below 15.0, the camera is deemed static, and the LLM API call is skipped.

📁 2. Workspace Directory Structure

smart-assistive-system/
├── config.py                 # System configuration, constants, and thresholds
├── main.py                   # Desktop interface entrypoint (OpenCV visualization mode)
├── run_web.py                # Web Dashboard server entrypoint
├── pyproject.toml            # Package configuration and dependencies
├── reset_data.py             # Diagnostic script to purge local logs and databases
├── firestore.rules           # Security rules for Cloud Firestore
├── storage.rules             # Security rules for Firebase Storage
├── cache/                    # Local storage cache (e.g. downloaded faces metadata)
├── chroma_db/                # Persistent vector database folders
├── tests/                    # Unit testing suite (pytest)
└── src/
    ├── __init__.py
    ├── web_server.py         # Main FastAPI initialization & event lifecycle hooks
    ├── auth.py               # Session auth, clock skew handling, user registration
    ├── camera.py             # OpenCV camera thread capture logic
    ├── detector.py           # YOLO segmenter, dynamic class binder, and face matcher
    ├── face_recognizer.py    # InceptionResnetV1 embedding generation via facenet-pytorch
    ├── reasoner.py           # Cooldown controllers, approach logic, and scene change check
    ├── llm_service.py        # Gemini client interface, context build, and RAG connector
    ├── rag_service.py        # LlamaIndex semantic query router and temporal bounds parser
    ├── vector_store.py       # ChromaDB wrapper with multi-user isolation filters
    ├── data_logger.py        # Thread-safe JSONL writing and Firestore async logs syncer
    ├── label_utils.py        # Label normalization, synonym mappings, and room categorizer
    ├── security.py           # Rate limiting configuration and security helpers
    ├── settings_db.py        # Settings getter/setter connected to Firestore preferences
    ├── templates/            # HTML/Jinja page views for the single page application
    └── static/               # Frontend asset directories
        ├── css/              # Tailwind overrides and theme stylesheet configs
        └── js/               # Frontend Javascript SPA logic
            ├── app.js        # Global script hook
            ├── core/         # Core frameworks (app-core.js typewriter, settings, etc.)
            └── modules/      # Page-specific views controllers (dashboard, timeline, etc.)

⚙️ 3. Core Components Reference

3.1 `src/camera.py` (Background Video Capture)

Design Pattern: Runs a background capture loop thread (update) grabbing raw frames from camera CAMERA_ID at 5ms intervals.
Synchronization: Implements a threading.Condition variable. The server video feed route calls wait_for_frame(timeout=1.0) which blocks until a new frame is grabbed, reducing CPU usage compared to loop-polling.

3.2 `src/detector.py` (Object & Segment Detection)

Model: Loads open-vocabulary model yoloe-26N-seg (fallback to yolo26n.pt if unavailable).
Segment Metric Extraction (_extract_mask_metrics):
- Resizes the binary mask to the original frame dimensions.
- Calculates mask_area_ratio (percentage of total pixels covered by the mask).
- Calculates path_coverage by slicing the mask relative to PATH_BAND_LEFT and PATH_BAND_RIGHT to determine how much of the center path is blocked.
- Extracts the largest external contour and uses cv2.approxPolyDP to simplify the polygon to coordinates for web dashboard visualization.
Face Match Linker:
- If a person is detected by YOLO, the frame is processed by FaceRecognizer.
- Evaluates intersection coordinates and IoU (Intersection over Area) between the Face bounding box and YOLO person bounding box.
- If IoU > 0.5 (or center coordinate is inside the box and IoU > 0.3), the label person is overridden with the familiar identity name.

3.3 `src/face_recognizer.py` (Familiar Face Identifier)

Framework: Uses facenet_pytorch. Instantiates MTCNN for localized face boundary detection and InceptionResnetV1 (pretrained on vggface2) for feature mapping.
Euclidean Embedding Distance:
- Downloads user's registered faces from Firebase Storage and extracts known embeddings.
- Compares face embeddings in real-time camera frames against known face vectors using the L2 norm (np.linalg.norm).
- If the L2 Euclidean distance is below 0.85, it confirms a match.

3.4 `src/audio.py` (Speech Synthesizer Daemon)

Design: Spawns a background thread queue worker (worker) processing speech text sequentially.
OS English Engine (Zero-Latency):
- macOS: Uses the shell say command via subprocess.Popen.
- Windows: Runs a PowerShell synthesizer script: (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak(...).
Foreign TTS (Burmese, Myanmar, Thai, Japanese, Chinese):
- Uses gTTS (Google Text-To-Speech) via web requests. Saves the audio locally as a temporary MP3 file, initializes the pygame.mixer to play it, and unloads/deletes the file post-playback.
Interruption: Calling interrupt() immediately terminates the running subprocess, stops pygame music playback, and clears the queue backlog.

3.5 `src/llm_service.py` (Alice Reasoning Engine)

Model: Uses Gemini 3 Flash (gemini-3-flash-preview via Google GenAI SDK).
Multimodal Feed: Accepts the current camera frame (converted to PIL RGB image), names of detected objects, and the list of user registered faces.
Context Loading: Fetches the user settings from Firestore. If configured, it appends active online haptic/wearable devices, time information (Morning/Evening/Night guidelines), and user object preferences into the LLM system prompt.
Burmese/Foreign script guidelines: If TARGET_LANGUAGE is non-English, the system instructs Gemini to output the response using the destination script characters exclusively. This prevents the TTS engine from spelling out English abbreviations letter-by-letter.
Semantic Function Calling: Bundles Gemini functions for starting dynamic object search (search_for_object) and ending search (stop_search).

3.6 `src/rag_service.py` (LlamaIndex Context Retrieval)

Core: Powered by LlamaIndex VectorStoreIndex with Google's text-embedding-004 embedding model.
Logical Collections:
- system_docs: Stores system features and support guides for user support questions.
- detection_memory: Stores spatial/object logs.
- guidance_memory: Stores historical guidance texts.
Search Intent Router:
- Identifies query intent: support (queries system_docs), hazard_memory (queries detection_memory filtered by is_dangerous=True), object_location (queries object detections, adds fallback query for exact keyword match), temporal_memory (queries both detections and guidance), or general_memory.
Scoring & Reranking:
- Applies parsed temporal bounds (date filters).
- Computes a recency boost: max(0, 3.0 - (age_hours / 24.0)), decaying over 72 hours.
- Computes a keyword match boost: adds +2.0 score if the target object label matches the token overlap.
- Elevates score by +4.0 if matches exact is_dangerous flag (for hazards) or target label (for object searches).

3.7 `src/vector_store.py` (ChromaDB Wrapper)

Multi-User logical isolation: When calling query(), the wrapper dynamically builds a ChromaDB where query. It wraps conditions using the $and logical operator, forcing a strict {"user_id": user_id} filter to prevent cross-tenant data leaks.

3.8 `src/label_utils.py` (Normalization & Room Categorization)

Synonyms: Standardizes inputs (e.g. "automobile" -> "car", "sofa" -> "couch") via config.LABEL_SYNONYMS.
Room Inference Rules: Maps labels to rooms:
- Refrigerator, microwave, cups -> Kitchen
- Couch, TV, remote -> Living Room
- Bed, personal faces -> Bedroom
- Toilet -> Bathroom
- Cars, curbs, bollards -> Outside/Door
Identity Presence: Maps all familiar/registered faces and the generic person label under Bedroom to signify personal presence.

🔒 4. Security, Compliance, & Privacy

Local Edge Inference: AuraVision runs YOLOE segmentation, MTCNN face detection, and FaceNet embedding calculations entirely local-device side. No video frames are streamed to cloud servers.
PDPA / GDPR Compliance: Camera snapshots (crops of bounding boxes) are saved locally under src/static/img/snapshots/ to render in the user timeline. To ensure compliance, crop saving is bypassed if the label is a generic person. Crops are only saved for recognized familiar names (which users register explicitly in the Identity section) or non-human objects.
Database Security (Firestore / Storage Rules): See firestore.rules and storage.rules. Access is strictly restricted to authenticated users:
```
match /users/{userId}/{document=**} {
    allow read, write: if request.auth != null && request.auth.uid == userId;
}
```
Clock Skew Retry: Firebase auth tokens evaluated on startup can throw "Token used too early" if the device system time lags slightly behind Google servers. src/auth.py catches this error, waits 3 seconds, and retries token validation.
XSS Protection: Frontend typewriter responses pass through DOMPurify.sanitize() prior to rendering markdown text inside document elements.

📊 5. Database Schema & Data Models

5.1 Cloud Storage Schema (Google Firestore)

`/users/{user_id}`

{
  "email": "user@example.com",
  "name": "TayZa",
  "avatar_url": "https://lh3.googleusercontent.com/...",
  "settings": {
    "show_overlays": true
  }
}

`/users/{user_id}/settings/preferences`

{
  "voice": {
    "type": "female",
    "rate": 1.2,
    "volume": 80,
    "language": "english"
  },
  "ai_params": {
    "hazards": true,
    "people": true,
    "daily_objects": false,
    "ai_mode": "advanced",
    "sensitivity": 0.5
  },
  "general": {
    "theme": "dark",
    "performance": "balanced"
  },
  "navigation": {
    "voice_enabled": true,
    "announce_distance": true
  },
  "last_sync": "2026-05-30T01:15:30.123456"
}

`/users/{user_id}/logs/{log_id}`

{
  "timestamp": "2026-05-30T01:15:30",
  "type": "detection",
  "label": "knife",
  "metadata": {
    "box": [100.5, 200.2, 150.3, 300.9],
    "confidence": 0.89,
    "distance": "near",
    "position": "center",
    "is_dangerous": true,
    "path_coverage": 0.35,
    "mask_area_ratio": 0.18,
    "mask_contour": [100, 200], [150, 200], [150, 300], [100, 300](/garden/100-200-150-200-150-300-100-300)
  }
}

`/users/{user_id}/faces/{face_id}`

{
  "name": "Mom",
  "relationship": "Mother",
  "phone_number": "+123456789",
  "is_emergency": true,
  "notes": "Spends time in Kitchen",
  "group": "Family",
  "file_path": "https://storage.googleapis.com/...",
  "storage_path": "faces/user_id/mom_a1b2c3d4.jpg",
  "created_at": "2026-05-30T01:15:30.123456Z"
}

`/users/{user_id}/devices/{device_id}`

{
  "name": "Smart cane companion",
  "status": "online",
  "battery": 92
}

`/users/{user_id}/saved_destinations/{dest_id}`

{
  "name": "Central Hospital",
  "place_id": "ChIJ...",
  "address": "123 Health Ave",
  "lat": 16.8206,
  "lng": 96.1317,
  "category": "hospital",
  "created_at": "2026-05-30T01:15:30Z",
  "updated_at": "2026-05-30T01:15:30Z"
}

`/users/{user_id}/activities/{activity_id}`

{
  "started_at": "2026-05-30T00:00:00Z",
  "ended_at": "2026-05-30T00:45:00Z",
  "duration_sec": 2700,
  "distance_m": 1250.0,
  "avg_speed_mps": 0.46,
  "max_speed_mps": 1.2,
  "paused_sec": 120,
  "raw_point_count": 520,
  "encoded_point_count": 92,
  "polyline": "_p~iFzseuU...",
  "start_lat": 16.8206,
  "start_lng": 96.1317,
  "end_lat": 16.8250,
  "end_lng": 96.1350,
  "preview_status": "ready",
  "preview_storage_path": "activity_previews/user_id/activity_id.png",
  "preview_updated_at": "2026-05-30T00:46:00Z",
  "created_at": "2026-05-30T00:45:00Z",
  "updated_at": "2026-05-30T00:45:00Z"
}

5.2 Local Storage Formats

File: `detections.jsonl`

A high-speed JSON line-by-line append fallback database for local operations:

{"timestamp": "2026-05-30T01:15:30", "type": "detection", "label": "cup", "user_id": "user_id_123", "metadata": {"box": [50.0, 60.0, 90.0, 110.0], "confidence": 0.72, "distance": "far", "position": "left", "is_dangerous": false, "path_coverage": 0.0}}

ChromaDB Collections (Local Vector Memory)

system_docs: Support document chunks indexed with metadata {"memory_type": "system_doc", "user_id": "system"}.

detection_memory: Detections indexed with metadata:

{
  "memory_type": "detection",
  "label": "chair",
  "normalized_label": "chair",
  "room": "Living Room",
  "is_dangerous": false,
  "user_id": "user_id_123"
}

guidance_memory: Historical responses from Alice: {"memory_type": "guidance", "user_id": "user_id_123"}.
vision_events: Legacy system vector database for backward compatibility.

📡 6. Complete API Catalog

| Method | Endpoint | Tags | Description | |---|---|---|---| | GET | /login | auth | Renders the login HTML view. Redirects authenticated sessions to dashboard. | | GET | /signup | auth | Renders the sign-up HTML view. | | POST | /auth/verify | auth | Accepts idToken payload, decodes Firebase JWT, provisions Firestore user document, and sets user session. | | GET | /auth/logout | auth | Removes user_id from session and redirects to root login. | | GET | / | views | Roots view. Redirects to /login or serves SPA page. | | GET | /timeline | views | Renders SPA page routed to the timeline sub-section. | | GET | /analytic | views | Renders SPA page routed to analytics sub-section. | | GET | /settings | views | Renders SPA page routed to settings sub-section. | | GET | /identity | views | Renders SPA page routed to identity management sub-section. | | GET | /video_feed | feeds | Serves MJPEG video stream coordinates mapped with overlay boxes, segment masks, and path clearance indicator bar. | | GET | /api/status | status | Returns system active flags, current detections array, FPS statistics, latest LLM dialogue, and cached logs list. | | WS | /ws/status | status | WebSocket connection providing state pushes at 10 FPS (100ms ticks). | | GET | /api/timeline/events | timeline | Retrieves merged logs list from local JSONL and Cloud Firestore. | | GET | /api/timeline/locations | timeline | Returns percentages of time spent in each room (Kitchen, Living Room, etc.) over past $N$ hours. | | GET | /api/timeline/heatmap | timeline | Groups locations counts into hourly slots for ApexCharts heatmap representation. | | GET | /api/timeline/insights | timeline | Generates a conversational summary of historical activities utilizing the LLM. | | GET | /api/dashboard/stats | dashboard | Generates complete statistical datasets (trends, category groups, radar coordinates, proximity counts) for the analytics dashboard charts. | | GET | /api/dashboard/live_stats | dashboard | Retrieves live stats collected over the immediate past 60 seconds. | | GET | /api/settings | settings | Gets current user settings. Syncs language codes configuration variables. | | POST | /api/settings | settings | Updates preferences document in Firestore and changes global configuration flags. | | POST | /api/settings/sync | settings | Performs a manual backup sync pushing past 50 local JSONL lines to Firestore. | | POST | /api/settings/overlays | settings | Updates variable flag controlling overlays rendering on MJPEG camera feed. | | GET | /api/navigation/destinations | navigation | Lists user's saved locations. | | POST | /api/navigation/destinations | navigation | Saves a location coordinates entry (maximum 20). | | PUT | /api/navigation/destinations/{id}| navigation | Updates saved location label/category in Firestore. | | DELETE| /api/navigation/destinations/{id}| navigation | Removes saved location. | | POST | /api/navigation/activities | navigation | Saves a recorded navigation session. Requests static map thumbnail generation and stores preview image path. | | GET | /api/navigation/activities | navigation | Lists navigation activities logs. | | GET | /api/navigation/activities/{id}/preview | navigation | Retrieves the Static Map preview PNG from Firebase Storage. | | DELETE| /api/navigation/activities/{id} | navigation | Removes navigation activity and deletes preview PNG from Storage. | | GET | /api/user/me | user | Returns profile fields of current session. | | GET | /api/faces | faces | Fetches registered known faces details. | | POST | /api/faces | faces | Uploads face file, adds document to Firestore faces collections, and triggers reloading encodings. | | DELETE| /api/faces/{id} | faces | Deletes face database document and Storage file, and reloads encodings. | | POST | /api/faces/capture | faces | Captures frame from current running camera, uploads it to storage, and registers face encoding. | | GET | /api/faces/{id}/speak | faces | Speaks the matched face name and relationship via the TTS system. | | POST | /api/system/state | system | Starts/stops the camera feed thread and detection loop. | | POST | /api/faces/mode | system | Toggles camera state for registration page context. | | POST | /api/ask | system | Answers conversational user questions using RAG (detections context). | | POST | /api/support/ask | system | Answers system help and feature questions using system docs. | | POST | /api/audio/state | system | Mutes/unmutes TTS synthesis operations. | | POST | /api/search/start | system | Launches a dynamic visual search target class. | | POST | /api/search/stop | system | Ends current search query and resets vocabs list. | | GET | /api/search/status | system | Returns active status and expiration seconds remaining. | | GET | /api/devices | system | Lists registered haptic/vibration devices. | | POST | /api/devices | system | Pairs a new smart device. | | DELETE| /api/devices/{id} | system | Deletes a paired smart device. | | GET | /api/devices/pairing-token| system | Generates a quick QR pairing token. | | POST | /api/devices/quick-pair | system | Pairs device via QR pairing token. | | POST | /api/settings/delete-data | system | Standardized endpoint to wipe logs and memory indices across cloud and local storage. |

🌐 7. Frontend Architecture (SPA)

The web dashboard is structured as a Single-Page Application (SPA) utilizing vanilla JavaScript and custom stylesheets:

7.1 View Controller Routing (`router.js`)

Dynamically toggles class .hidden / .flex on container structures matching: #view-dashboard, #view-timeline, #view-settings, #view-analytic, #view-identities.
Controls custom navigations states on sidebars anchors.
Triggers initialization hooks on tab load: initTimeline(), initAnalyticDashboard(), loadIdentities().
Listens to historical window browser popstates to maintain browser back button actions.

7.2 Typewriter Speech Synchronizer (`AuraVisionApp.utils.runTypewriter`)

Objective: Reveal chatbot messages in sync with browser SpeechSynthesis audio playback.
Mechanism:
1. Utilizes standard browser speechSynthesis API.
2. Subscribes to the onboundary callback, listening to event type word.
3. Uses event.charIndex to compute the character position.
4. Truncates text nodes and selectively updates text content up to the matching word index.
5. Cascades elements display styles to visible as parent levels are parsed.
6. Automatically scrolls active containers to keep typewriter visible.
7. Falls back to a time-based character reveal loop if the system SpeechSynthesis engine does not support boundary event ticks.

7.3 Frontend Modules Structure

activity.js: Manages recording navigation walks, monitoring GPS locations, computing current speeds, drawing path overlays on maps, and saving activity sessions.
analytics.js: Fetches historical data counts and renders dashboard widgets using ApexCharts (safety trends, locations heatmaps, proximity charts).
dashboard.js: Configures the main status indicators (FPS, CPU/Process loops statuses, active detections lists, interactive voice chat widgets).
identity.js: Handles file uploads, camera snapshot triggers, naming profiles, and speaking names.
navigation.js: Renders active navigation routes, fetches Google directions, and coordinates haptic guidance step updates.
settings.js: Manages theme changes, AI parameters sliders (focus zones, sensitivity), and database synchronization triggers.
timeline.js: Implements chronological log views, filters events list by severity/location, and handles insight summaries.

🛠️ 8. DevOps & Verification Procedures

8.1 Setup and Dependency Installation

# Create local virtualenv
python3 -m venv .venv
source .venv/bin/activate

# Upgrade pip
python -m pip install --upgrade pip

# Install in editable mode (forces LlamaIndex + Ultralytics resolution)
python -m pip install -e .

If the uv tool is installed in the shell path:

uv sync

8.2 Execution Command reference

Web Server (FastAPI Dashboard)

.venv/bin/python run_web.py

Desktop Mode (Local OpenCV Interface)

.venv/bin/python main.py

Direct Production Server Command

.venv/bin/python -m uvicorn src.web_server:app --host 0.0.0.0 --port 8080 --reload

8.3 System Verification & Pytest Suite

AuraVision includes a comprehensive unit testing suite using pytest.

# Run full suite
.venv/bin/python -m pytest -q

[!WARNING] Running the full suite may report a failure in tests/test_assistant_policy.py due to a legacy unresolved file dependency (src.assistant_policy). This is unrelated to modern RAG operations.

Targeted RAG & Core Verification Tests

Run targeted tests to verify detection pipeline and RAG memory logic:

.venv/bin/python -m pytest tests/test_vector_store.py tests/test_rag_service.py tests/test_llm_service.py tests/test_data_logger.py tests/test_reasoner.py -q

Expected outcome: 21 passed.

8.4 Maintenance Scripts

Data Purging: To perform a clean wipe of local logs, cache data, and ChromaDB directories, execute:
```
.venv/bin/python reset_data.py
```

Aura-Vision

👁️ AuraVision: Advanced Assistive Vision System Documentation

🏗️ 1. Technical Architecture & Data Pipelines

1.1 The Inference Worker Process (GIL Avoidance)

1.2 The Orchestrator Thread (detection_loop)

1.3 Scene Change Similarity Optimization

📁 2. Workspace Directory Structure

⚙️ 3. Core Components Reference

3.1 src/camera.py (Background Video Capture)

3.2 src/detector.py (Object & Segment Detection)

3.3 src/face_recognizer.py (Familiar Face Identifier)

3.4 src/audio.py (Speech Synthesizer Daemon)

3.5 src/llm_service.py (Alice Reasoning Engine)

3.6 src/rag_service.py (LlamaIndex Context Retrieval)

3.7 src/vector_store.py (ChromaDB Wrapper)

3.8 src/label_utils.py (Normalization & Room Categorization)