feat: Add metrics instrumentation for queue, health, and database operations
Implement Prometheus metrics to populate empty Grafana dashboard panels. ## Phase 1: Queue Size Metrics ✅ **File**: `processor.py` - Track vector sync queue depth in real-time - Update metric after receiving and processing each document - Update metric during timeout (empty queue) - Enables: "Processing Queue Depth" panel ## Phase 2: Health Check Metrics ✅ **File**: `app.py` - Add Nextcloud connectivity check with timing - Add Qdrant health check with timing - Record dependency health status (up/down) - Record health check duration - Enables: 4 health status panels + health check duration panel ## Phase 3: Database Operation Metrics (Partial) ⏳ **File**: `storage.py` - Instrument `store_refresh_token()` method - Track SQLite INSERT operation timing and success/error status - Enables: Partial data for database operation latency panel ## Metrics Now Exposed ### Queue Metrics: - `mcp_vector_sync_queue_size` - Real-time queue depth ### Health Metrics: - `mcp_dependency_health{dependency="nextcloud"}` - UP/DOWN status - `mcp_dependency_health{dependency="qdrant"}` - UP/DOWN status - `mcp_dependency_check_duration_seconds{dependency}` - Health check latency ### Database Metrics: - `mcp_db_operations_total{db="sqlite",operation="insert"}` - Operation count - `mcp_db_operation_duration_seconds{db="sqlite",operation="insert"}` - Operation latency ## Dashboard Impact **Panels Now Populated** (7/34 panels): - ✅ Processing Queue Depth - ✅ Nextcloud Health - ✅ Qdrant Health - ✅ Health Check Duration - ✅ Database Operation Latency (partial) - ✅ Vector sync panels (already working from PR #292) **Panels Still Empty** (remaining work): - ⏳ OAuth panels (4): Token validations, exchanges, cache hit rate, refresh ops - ⏳ MCP tool panels (3): Call volume, error rates, execution duration - ⏳ Database panel: Needs more SQLite operations instrumented (~29 remaining) ## Testing Verified metric definitions exist and will be recorded on next deployment. ## Next Steps Phase 4: OAuth token metrics (unified_verifier.py, context_helper.py, storage.py) Phase 5: MCP tool metrics (all server/*.py files with @mcp.tool()) Phase 3 completion: Remaining 29 database operations in storage.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -18,6 +18,7 @@ from nextcloud_mcp_server.embedding import get_embedding_service
|
||||
from nextcloud_mcp_server.observability.metrics import (
|
||||
record_qdrant_operation,
|
||||
record_vector_sync_processing,
|
||||
update_vector_sync_queue_size,
|
||||
)
|
||||
from nextcloud_mcp_server.observability.tracing import trace_operation
|
||||
from nextcloud_mcp_server.vector.document_chunker import DocumentChunker
|
||||
@@ -61,11 +62,21 @@ async def processor_task(
|
||||
with anyio.fail_after(1.0):
|
||||
doc_task = await receive_stream.receive()
|
||||
|
||||
# Update queue size metric after receiving
|
||||
stream_stats = receive_stream.statistics()
|
||||
update_vector_sync_queue_size(stream_stats.current_buffer_used)
|
||||
|
||||
# Process document
|
||||
await process_document(doc_task, nc_client)
|
||||
|
||||
# Update queue size metric after processing
|
||||
stream_stats = receive_stream.statistics()
|
||||
update_vector_sync_queue_size(stream_stats.current_buffer_used)
|
||||
|
||||
except TimeoutError:
|
||||
# No documents available, continue
|
||||
# No documents available, update metric to show empty queue
|
||||
stream_stats = receive_stream.statistics()
|
||||
update_vector_sync_queue_size(stream_stats.current_buffer_used)
|
||||
continue
|
||||
|
||||
except anyio.EndOfStream:
|
||||
|
||||
Reference in New Issue
Block a user