šļø AuraVision: Advanced Assistive Vision System Documentation
AuraVision (formerly SmartAV) is a state-of-the-art, real-time assistive technology system designed to empower visually impaired and blind users. It integrates advanced Computer Vision (CV), Large Language Models (LLM), and Retrieval-Augmented Generation (RAG) to provide contextual, spoken environmental awareness.
This document serves as the developer and systems engineering reference, covering the core architecture, pipelines, file structures, API catalog, database models, security rules, and DevOps instructions.
šļø 1. Technical Architecture & Data Pipelines
AuraVision is split into three main layers: Perception (CV Thread), Reasoning/Storage (AI Loop), and Interaction (Web/TTS).
graph TD
%% perception
A[Camera Feed - camera.py] -->|cv2.VideoCapture| B[Orchestrator Thread - engine.py]
B -->|multiprocessing.Queue| C[Inference Worker Process - engine.py]
%% models
C -->|YOLOE-26N-seg| D[Segmenter / Object Locator]
C -->|MTCNN / InceptionResnetV1| E[Face Identifier]
%% feedback loop
D -->|Detections Queue| B
E -->|Face Coordinates| B
%% logging / sync
B -->|Detections / Guidance| F[Data Logger - data_logger.py]
F -->|JSONL Write| G[(detections.jsonl)]
F -->|Background Sync| H[(Google Firestore)]
%% AI / RAG
B -->|Scene JSON + Frame| I[Scene Reasoner - reasoner.py]
I -->|Multimodal Input| J[Alice LLM Service - llm_service.py]
J -->|Query| K[RAG Service - rag_service.py]
K -->|Google GenAI Embeddings| L[(ChromaDB Vector Store)]
J -->|Gemini-3-Flash| M[Guidance Text Response]
%% Interaction
M -->|Typewriter Sync| N[Web Frontend SPA]
M -->|Audio Feedback - audio.py| O[Zero-Latency TTS / gTTS]
H -->|Sync Profile / Devices / Faces| N
1.1 The Inference Worker Process (GIL Avoidance)
To guarantee real-time performance and prevent Python's Global Interpreter Lock (GIL) from choking on high-CPU tensor operations (YOLO & PyTorch face embeddings), inference is isolated in a separate OS process:
- Lifecycle: Managed via
multiprocessing.Process. Spawned on server startup, killed on shutdown. - IPC Channel: Uses three thread-safe multiprocessing Queues:
_input_queue(maxsize=1): Accepts the latest raw camera frame. Drops incoming frames if the inference queue is full to avoid latency lag._output_queue(maxsize=1): Delivers parsed detection arrays back to the orchestrator._command_queue: Used to signal live vocab reloads (dynamic search) or known face encoding refreshes to the worker process on-the-fly.
- Worker Loop (
inference_worker):- Continually polls the
_command_queuefor instructions. - Captures frames from
_input_queue. - Runs YOLO segmentations and FaceNet embeddings comparison.
- Writes a heartbeat timestamps file to
/tmp/worker_heartbeat.logevery 5 seconds for diagnostic monitoring.
- Continually polls the
1.2 The Orchestrator Thread (detection_loop)
The orchestrator runs as a daemon thread in the main FastAPI application:
- Pulls frames from the background camera thread.
- Feeds the inference worker.
- Coordinates dynamic visual search expiry timers.
- Saves crop snapshots of detected objects to
src/static/img/snapshots/(PDPA compliant: skips genericpersonlabels to protect privacy). - Triggers async LLM reasoning via
SceneReasonerif a scene shift is detected. - Estimates walkable path clearance dynamically.
1.3 Scene Change Similarity Optimization
To minimize token consumption and voice congestion, the orchestrator evaluates whether the frame has changed before invoking the LLM:
- Resizes both the previous LLM frame and the current frame to a tiny
64x64grayscale resolution. - Computes the average pixel difference using
cv2.absdiff().mean(). - If the difference is below
15.0, the camera is deemed static, and the LLM API call is skipped.
š 2. Workspace Directory Structure
smart-assistive-system/
āāā config.py # System configuration, constants, and thresholds
āāā main.py # Desktop interface entrypoint (OpenCV visualization mode)
āāā run_web.py # Web Dashboard server entrypoint
āāā pyproject.toml # Package configuration and dependencies
āāā reset_data.py # Diagnostic script to purge local logs and databases
āāā firestore.rules # Security rules for Cloud Firestore
āāā storage.rules # Security rules for Firebase Storage
āāā cache/ # Local storage cache (e.g. downloaded faces metadata)
āāā chroma_db/ # Persistent vector database folders
āāā tests/ # Unit testing suite (pytest)
āāā src/
āāā __init__.py
āāā web_server.py # Main FastAPI initialization & event lifecycle hooks
āāā auth.py # Session auth, clock skew handling, user registration
āāā camera.py # OpenCV camera thread capture logic
āāā detector.py # YOLO segmenter, dynamic class binder, and face matcher
āāā face_recognizer.py # InceptionResnetV1 embedding generation via facenet-pytorch
āāā reasoner.py # Cooldown controllers, approach logic, and scene change check
āāā llm_service.py # Gemini client interface, context build, and RAG connector
āāā rag_service.py # LlamaIndex semantic query router and temporal bounds parser
āāā vector_store.py # ChromaDB wrapper with multi-user isolation filters
āāā data_logger.py # Thread-safe JSONL writing and Firestore async logs syncer
āāā label_utils.py # Label normalization, synonym mappings, and room categorizer
āāā security.py # Rate limiting configuration and security helpers
āāā settings_db.py # Settings getter/setter connected to Firestore preferences
āāā templates/ # HTML/Jinja page views for the single page application
āāā static/ # Frontend asset directories
āāā css/ # Tailwind overrides and theme stylesheet configs
āāā js/ # Frontend Javascript SPA logic
āāā app.js # Global script hook
āāā core/ # Core frameworks (app-core.js typewriter, settings, etc.)
āāā modules/ # Page-specific views controllers (dashboard, timeline, etc.)
āļø 3. Core Components Reference
3.1 src/camera.py (Background Video Capture)
- Design Pattern: Runs a background capture loop thread (
update) grabbing raw frames from cameraCAMERA_IDat 5ms intervals. - Synchronization: Implements a
threading.Conditionvariable. The server video feed route callswait_for_frame(timeout=1.0)which blocks until a new frame is grabbed, reducing CPU usage compared to loop-polling.
3.2 src/detector.py (Object & Segment Detection)
- Model: Loads open-vocabulary model
yoloe-26N-seg(fallback toyolo26n.ptif unavailable). - Segment Metric Extraction (
_extract_mask_metrics):- Resizes the binary mask to the original frame dimensions.
- Calculates
mask_area_ratio(percentage of total pixels covered by the mask). - Calculates
path_coverageby slicing the mask relative toPATH_BAND_LEFTandPATH_BAND_RIGHTto determine how much of the center path is blocked. - Extracts the largest external contour and uses
cv2.approxPolyDPto simplify the polygon to coordinates for web dashboard visualization.
- Face Match Linker:
- If a
personis detected by YOLO, the frame is processed by FaceRecognizer. - Evaluates intersection coordinates and IoU (Intersection over Area) between the Face bounding box and YOLO person bounding box.
- If IoU > 0.5 (or center coordinate is inside the box and IoU > 0.3), the label
personis overridden with the familiar identity name.
- If a
3.3 src/face_recognizer.py (Familiar Face Identifier)
- Framework: Uses
facenet_pytorch. InstantiatesMTCNNfor localized face boundary detection andInceptionResnetV1(pretrained onvggface2) for feature mapping. - Euclidean Embedding Distance:
- Downloads user's registered faces from Firebase Storage and extracts known embeddings.
- Compares face embeddings in real-time camera frames against known face vectors using the L2 norm (
np.linalg.norm). - If the L2 Euclidean distance is below
0.85, it confirms a match.
3.4 src/audio.py (Speech Synthesizer Daemon)
- Design: Spawns a background thread queue worker (
worker) processing speech text sequentially. - OS English Engine (Zero-Latency):
- macOS: Uses the shell
saycommand viasubprocess.Popen. - Windows: Runs a PowerShell synthesizer script:
(New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak(...).
- macOS: Uses the shell
- Foreign TTS (Burmese, Myanmar, Thai, Japanese, Chinese):
- Uses
gTTS(Google Text-To-Speech) via web requests. Saves the audio locally as a temporary MP3 file, initializes thepygame.mixerto play it, and unloads/deletes the file post-playback.
- Uses
- Interruption: Calling
interrupt()immediately terminates the running subprocess, stops pygame music playback, and clears the queue backlog.
3.5 src/llm_service.py (Alice Reasoning Engine)
- Model: Uses Gemini 3 Flash (
gemini-3-flash-previewvia Google GenAI SDK). - Multimodal Feed: Accepts the current camera frame (converted to PIL RGB image), names of detected objects, and the list of user registered faces.
- Context Loading: Fetches the user settings from Firestore. If configured, it appends active online haptic/wearable devices, time information (Morning/Evening/Night guidelines), and user object preferences into the LLM system prompt.
- Burmese/Foreign script guidelines: If
TARGET_LANGUAGEis non-English, the system instructs Gemini to output the response using the destination script characters exclusively. This prevents the TTS engine from spelling out English abbreviations letter-by-letter. - Semantic Function Calling: Bundles Gemini functions for starting dynamic object search (
search_for_object) and ending search (stop_search).
3.6 src/rag_service.py (LlamaIndex Context Retrieval)
- Core: Powered by LlamaIndex
VectorStoreIndexwith Google'stext-embedding-004embedding model. - Logical Collections:
system_docs: Stores system features and support guides for user support questions.detection_memory: Stores spatial/object logs.guidance_memory: Stores historical guidance texts.
- Search Intent Router:
- Identifies query intent:
support(queriessystem_docs),hazard_memory(queriesdetection_memoryfiltered byis_dangerous=True),object_location(queries object detections, adds fallback query for exact keyword match),temporal_memory(queries both detections and guidance), orgeneral_memory.
- Identifies query intent:
- Scoring & Reranking:
- Applies parsed temporal bounds (date filters).
- Computes a recency boost:
max(0, 3.0 - (age_hours / 24.0)), decaying over 72 hours. - Computes a keyword match boost: adds
+2.0score if the target object label matches the token overlap. - Elevates score by
+4.0if matches exactis_dangerousflag (for hazards) or target label (for object searches).
3.7 src/vector_store.py (ChromaDB Wrapper)
- Multi-User logical isolation: When calling
query(), the wrapper dynamically builds a ChromaDBwherequery. It wraps conditions using the$andlogical operator, forcing a strict{"user_id": user_id}filter to prevent cross-tenant data leaks.
3.8 src/label_utils.py (Normalization & Room Categorization)
- Synonyms: Standardizes inputs (e.g. "automobile" -> "car", "sofa" -> "couch") via
config.LABEL_SYNONYMS. - Room Inference Rules: Maps labels to rooms:
- Refrigerator, microwave, cups -> Kitchen
- Couch, TV, remote -> Living Room
- Bed, personal faces -> Bedroom
- Toilet -> Bathroom
- Cars, curbs, bollards -> Outside/Door
- Identity Presence: Maps all familiar/registered faces and the generic
personlabel under Bedroom to signify personal presence.
š 4. Security, Compliance, & Privacy
- Local Edge Inference: AuraVision runs YOLOE segmentation, MTCNN face detection, and FaceNet embedding calculations entirely local-device side. No video frames are streamed to cloud servers.
- PDPA / GDPR Compliance: Camera snapshots (crops of bounding boxes) are saved locally under
src/static/img/snapshots/to render in the user timeline. To ensure compliance, crop saving is bypassed if the label is a genericperson. Crops are only saved for recognized familiar names (which users register explicitly in the Identity section) or non-human objects. - Database Security (Firestore / Storage Rules): See firestore.rules and storage.rules. Access is strictly restricted to authenticated users:
match /users/{userId}/{document=**} { allow read, write: if request.auth != null && request.auth.uid == userId; } - Clock Skew Retry: Firebase auth tokens evaluated on startup can throw "Token used too early" if the device system time lags slightly behind Google servers.
src/auth.pycatches this error, waits 3 seconds, and retries token validation. - XSS Protection: Frontend typewriter responses pass through
DOMPurify.sanitize()prior to rendering markdown text inside document elements.
š 5. Database Schema & Data Models
5.1 Cloud Storage Schema (Google Firestore)
/users/{user_id}
{
"email": "user@example.com",
"name": "TayZa",
"avatar_url": "https://lh3.googleusercontent.com/...",
"settings": {
"show_overlays": true
}
}
/users/{user_id}/settings/preferences
{
"voice": {
"type": "female",
"rate": 1.2,
"volume": 80,
"language": "english"
},
"ai_params": {
"hazards": true,
"people": true,
"daily_objects": false,
"ai_mode": "advanced",
"sensitivity": 0.5
},
"general": {
"theme": "dark",
"performance": "balanced"
},
"navigation": {
"voice_enabled": true,
"announce_distance": true
},
"last_sync": "2026-05-30T01:15:30.123456"
}
/users/{user_id}/logs/{log_id}
{
"timestamp": "2026-05-30T01:15:30",
"type": "detection",
"label": "knife",
"metadata": {
"box": [100.5, 200.2, 150.3, 300.9],
"confidence": 0.89,
"distance": "near",
"position": "center",
"is_dangerous": true,
"path_coverage": 0.35,
"mask_area_ratio": 0.18,
"mask_contour": [100, 200], [150, 200], [150, 300], [100, 300](/garden/100-200-150-200-150-300-100-300)
}
}
/users/{user_id}/faces/{face_id}
{
"name": "Mom",
"relationship": "Mother",
"phone_number": "+123456789",
"is_emergency": true,
"notes": "Spends time in Kitchen",
"group": "Family",
"file_path": "https://storage.googleapis.com/...",
"storage_path": "faces/user_id/mom_a1b2c3d4.jpg",
"created_at": "2026-05-30T01:15:30.123456Z"
}
/users/{user_id}/devices/{device_id}
{
"name": "Smart cane companion",
"status": "online",
"battery": 92
}
/users/{user_id}/saved_destinations/{dest_id}
{
"name": "Central Hospital",
"place_id": "ChIJ...",
"address": "123 Health Ave",
"lat": 16.8206,
"lng": 96.1317,
"category": "hospital",
"created_at": "2026-05-30T01:15:30Z",
"updated_at": "2026-05-30T01:15:30Z"
}
/users/{user_id}/activities/{activity_id}
{
"started_at": "2026-05-30T00:00:00Z",
"ended_at": "2026-05-30T00:45:00Z",
"duration_sec": 2700,
"distance_m": 1250.0,
"avg_speed_mps": 0.46,
"max_speed_mps": 1.2,
"paused_sec": 120,
"raw_point_count": 520,
"encoded_point_count": 92,
"polyline": "_p~iFzseuU...",
"start_lat": 16.8206,
"start_lng": 96.1317,
"end_lat": 16.8250,
"end_lng": 96.1350,
"preview_status": "ready",
"preview_storage_path": "activity_previews/user_id/activity_id.png",
"preview_updated_at": "2026-05-30T00:46:00Z",
"created_at": "2026-05-30T00:45:00Z",
"updated_at": "2026-05-30T00:45:00Z"
}
5.2 Local Storage Formats
File: detections.jsonl
A high-speed JSON line-by-line append fallback database for local operations:
{"timestamp": "2026-05-30T01:15:30", "type": "detection", "label": "cup", "user_id": "user_id_123", "metadata": {"box": [50.0, 60.0, 90.0, 110.0], "confidence": 0.72, "distance": "far", "position": "left", "is_dangerous": false, "path_coverage": 0.0}}
ChromaDB Collections (Local Vector Memory)
system_docs: Support document chunks indexed with metadata{"memory_type": "system_doc", "user_id": "system"}.detection_memory: Detections indexed with metadata:{ "memory_type": "detection", "label": "chair", "normalized_label": "chair", "room": "Living Room", "is_dangerous": false, "user_id": "user_id_123" }guidance_memory: Historical responses from Alice:{"memory_type": "guidance", "user_id": "user_id_123"}.vision_events: Legacy system vector database for backward compatibility.
š” 6. Complete API Catalog
| Method | Endpoint | Tags | Description |
|---|---|---|---|
| GET | /login | auth | Renders the login HTML view. Redirects authenticated sessions to dashboard. |
| GET | /signup | auth | Renders the sign-up HTML view. |
| POST | /auth/verify | auth | Accepts idToken payload, decodes Firebase JWT, provisions Firestore user document, and sets user session. |
| GET | /auth/logout | auth | Removes user_id from session and redirects to root login. |
| GET | / | views | Roots view. Redirects to /login or serves SPA page. |
| GET | /timeline | views | Renders SPA page routed to the timeline sub-section. |
| GET | /analytic | views | Renders SPA page routed to analytics sub-section. |
| GET | /settings | views | Renders SPA page routed to settings sub-section. |
| GET | /identity | views | Renders SPA page routed to identity management sub-section. |
| GET | /video_feed | feeds | Serves MJPEG video stream coordinates mapped with overlay boxes, segment masks, and path clearance indicator bar. |
| GET | /api/status | status | Returns system active flags, current detections array, FPS statistics, latest LLM dialogue, and cached logs list. |
| WS | /ws/status | status | WebSocket connection providing state pushes at 10 FPS (100ms ticks). |
| GET | /api/timeline/events | timeline | Retrieves merged logs list from local JSONL and Cloud Firestore. |
| GET | /api/timeline/locations | timeline | Returns percentages of time spent in each room (Kitchen, Living Room, etc.) over past $N$ hours. |
| GET | /api/timeline/heatmap | timeline | Groups locations counts into hourly slots for ApexCharts heatmap representation. |
| GET | /api/timeline/insights | timeline | Generates a conversational summary of historical activities utilizing the LLM. |
| GET | /api/dashboard/stats | dashboard | Generates complete statistical datasets (trends, category groups, radar coordinates, proximity counts) for the analytics dashboard charts. |
| GET | /api/dashboard/live_stats | dashboard | Retrieves live stats collected over the immediate past 60 seconds. |
| GET | /api/settings | settings | Gets current user settings. Syncs language codes configuration variables. |
| POST | /api/settings | settings | Updates preferences document in Firestore and changes global configuration flags. |
| POST | /api/settings/sync | settings | Performs a manual backup sync pushing past 50 local JSONL lines to Firestore. |
| POST | /api/settings/overlays | settings | Updates variable flag controlling overlays rendering on MJPEG camera feed. |
| GET | /api/navigation/destinations | navigation | Lists user's saved locations. |
| POST | /api/navigation/destinations | navigation | Saves a location coordinates entry (maximum 20). |
| PUT | /api/navigation/destinations/{id}| navigation | Updates saved location label/category in Firestore. |
| DELETE| /api/navigation/destinations/{id}| navigation | Removes saved location. |
| POST | /api/navigation/activities | navigation | Saves a recorded navigation session. Requests static map thumbnail generation and stores preview image path. |
| GET | /api/navigation/activities | navigation | Lists navigation activities logs. |
| GET | /api/navigation/activities/{id}/preview | navigation | Retrieves the Static Map preview PNG from Firebase Storage. |
| DELETE| /api/navigation/activities/{id} | navigation | Removes navigation activity and deletes preview PNG from Storage. |
| GET | /api/user/me | user | Returns profile fields of current session. |
| GET | /api/faces | faces | Fetches registered known faces details. |
| POST | /api/faces | faces | Uploads face file, adds document to Firestore faces collections, and triggers reloading encodings. |
| DELETE| /api/faces/{id} | faces | Deletes face database document and Storage file, and reloads encodings. |
| POST | /api/faces/capture | faces | Captures frame from current running camera, uploads it to storage, and registers face encoding. |
| GET | /api/faces/{id}/speak | faces | Speaks the matched face name and relationship via the TTS system. |
| POST | /api/system/state | system | Starts/stops the camera feed thread and detection loop. |
| POST | /api/faces/mode | system | Toggles camera state for registration page context. |
| POST | /api/ask | system | Answers conversational user questions using RAG (detections context). |
| POST | /api/support/ask | system | Answers system help and feature questions using system docs. |
| POST | /api/audio/state | system | Mutes/unmutes TTS synthesis operations. |
| POST | /api/search/start | system | Launches a dynamic visual search target class. |
| POST | /api/search/stop | system | Ends current search query and resets vocabs list. |
| GET | /api/search/status | system | Returns active status and expiration seconds remaining. |
| GET | /api/devices | system | Lists registered haptic/vibration devices. |
| POST | /api/devices | system | Pairs a new smart device. |
| DELETE| /api/devices/{id} | system | Deletes a paired smart device. |
| GET | /api/devices/pairing-token| system | Generates a quick QR pairing token. |
| POST | /api/devices/quick-pair | system | Pairs device via QR pairing token. |
| POST | /api/settings/delete-data | system | Standardized endpoint to wipe logs and memory indices across cloud and local storage. |
š 7. Frontend Architecture (SPA)
The web dashboard is structured as a Single-Page Application (SPA) utilizing vanilla JavaScript and custom stylesheets:
7.1 View Controller Routing (router.js)
- Dynamically toggles class
.hidden/.flexon container structures matching:#view-dashboard,#view-timeline,#view-settings,#view-analytic,#view-identities. - Controls custom navigations states on sidebars anchors.
- Triggers initialization hooks on tab load:
initTimeline(),initAnalyticDashboard(),loadIdentities(). - Listens to historical window browser popstates to maintain browser back button actions.
7.2 Typewriter Speech Synchronizer (AuraVisionApp.utils.runTypewriter)
- Objective: Reveal chatbot messages in sync with browser SpeechSynthesis audio playback.
- Mechanism:
- Utilizes standard browser
speechSynthesisAPI. - Subscribes to the
onboundarycallback, listening to event typeword. - Uses
event.charIndexto compute the character position. - Truncates text nodes and selectively updates text content up to the matching word index.
- Cascades elements display styles to
visibleas parent levels are parsed. - Automatically scrolls active containers to keep typewriter visible.
- Falls back to a time-based character reveal loop if the system SpeechSynthesis engine does not support boundary event ticks.
- Utilizes standard browser
7.3 Frontend Modules Structure
activity.js: Manages recording navigation walks, monitoring GPS locations, computing current speeds, drawing path overlays on maps, and saving activity sessions.analytics.js: Fetches historical data counts and renders dashboard widgets using ApexCharts (safety trends, locations heatmaps, proximity charts).dashboard.js: Configures the main status indicators (FPS, CPU/Process loops statuses, active detections lists, interactive voice chat widgets).identity.js: Handles file uploads, camera snapshot triggers, naming profiles, and speaking names.navigation.js: Renders active navigation routes, fetches Google directions, and coordinates haptic guidance step updates.settings.js: Manages theme changes, AI parameters sliders (focus zones, sensitivity), and database synchronization triggers.timeline.js: Implements chronological log views, filters events list by severity/location, and handles insight summaries.
š ļø 8. DevOps & Verification Procedures
8.1 Setup and Dependency Installation
# Create local virtualenv
python3 -m venv .venv
source .venv/bin/activate
# Upgrade pip
python -m pip install --upgrade pip
# Install in editable mode (forces LlamaIndex + Ultralytics resolution)
python -m pip install -e .
If the uv tool is installed in the shell path:
uv sync
8.2 Execution Command reference
Web Server (FastAPI Dashboard)
.venv/bin/python run_web.py
Desktop Mode (Local OpenCV Interface)
.venv/bin/python main.py
Direct Production Server Command
.venv/bin/python -m uvicorn src.web_server:app --host 0.0.0.0 --port 8080 --reload
8.3 System Verification & Pytest Suite
AuraVision includes a comprehensive unit testing suite using pytest.
# Run full suite
.venv/bin/python -m pytest -q
[!WARNING] Running the full suite may report a failure in
tests/test_assistant_policy.pydue to a legacy unresolved file dependency (src.assistant_policy). This is unrelated to modern RAG operations.
Targeted RAG & Core Verification Tests
Run targeted tests to verify detection pipeline and RAG memory logic:
.venv/bin/python -m pytest tests/test_vector_store.py tests/test_rag_service.py tests/test_llm_service.py tests/test_data_logger.py tests/test_reasoner.py -q
Expected outcome: 21 passed.
8.4 Maintenance Scripts
- Data Purging: To perform a clean wipe of local logs, cache data, and ChromaDB directories, execute:
.venv/bin/python reset_data.py