Skip to main content
The Aris Brain provides comprehensive monitoring capabilities for tracking system health, equipment status, and historical performance.

Health Checks

Detailed Health Status

Get comprehensive system health:
curl http://aris.local/health
{
  "status": "ok",
  "version": "1.0.0",
  "uptime_seconds": 86400,
  "timestamp": "2024-01-15T10:30:00Z",
  "checks": {
    "mqtt": {
      "status": "ok",
      "connected": true
    },
    "cloud": {
      "status": "ok",
      "state": "connected",
      "connected": true
    },
    "memory": {
      "heap_used_mb": 45,
      "heap_total_mb": 64,
      "rss_mb": 120
    },
    "state_file": {
      "status": "ok",
      "last_modified": "2024-01-15T10:29:00Z"
    }
  }
}

Status Meanings

StatusMeaning
okEverything is working normally
degradedCloud is disconnected but local control works
unhealthyMQTT is disconnected, system may not function

Readiness Probe

For container orchestration (Kubernetes, Docker):
curl http://aris.local/ready
Returns 200 if ready, 503 if not.

Event Log

The Brain maintains an audit log of all significant events.

Querying Events

# Recent events (default: last 100)
curl "http://aris.local/api/events" \
  -H "Authorization: Bearer YOUR_TOKEN"

# Filter by type
curl "http://aris.local/api/events?type=fault_raised" \
  -H "Authorization: Bearer YOUR_TOKEN"

# Filter by severity
curl "http://aris.local/api/events?severity=critical" \
  -H "Authorization: Bearer YOUR_TOKEN"

# Filter by time range
curl "http://aris.local/api/events?from=2024-01-14T00:00:00Z&to=2024-01-15T00:00:00Z" \
  -H "Authorization: Bearer YOUR_TOKEN"

# Pagination
curl "http://aris.local/api/events?limit=50&offset=100" \
  -H "Authorization: Bearer YOUR_TOKEN"

Event Types

TypeDescription
fault_raisedA fault was detected
fault_clearedA fault was resolved
mode_changeSystem or zone mode changed
setpoint_changeTemperature setpoint was adjusted
device_onlineDevice came online
device_offlineDevice went offline
command_sentCommand was sent to a device
system_startupBrain started
system_shutdownBrain stopped

Event Severities

SeverityDescription
infoNormal operation
warningPotential issue, system still functioning
criticalSerious issue requiring attention

Exporting Events

Download events as CSV:
curl "http://aris.local/api/events/export?from=2024-01-01T00:00:00Z" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -o events.csv

Equipment Monitoring

Heat Pump Status

curl http://aris.local/api/heatpumps \
  -H "Authorization: Bearer YOUR_TOKEN"
Key metrics to watch:
  • compressorSpeedPercent - Current compressor load
  • copInstant - Real-time efficiency (higher is better)
  • defrostActive - Defrost cycle in progress
  • activeFaults - Any active fault codes

FCU Status

curl http://aris.local/api/fcus/status/all \
  -H "Authorization: Bearer YOUR_TOKEN"
{
  "status": [
    {
      "fcuId": "fcu_01",
      "friendlyName": "Living Room",
      "online": true,
      "lastSeen": "2024-01-15T10:30:00Z"
    },
    {
      "fcuId": "fcu_02",
      "friendlyName": "Bedroom",
      "online": false,
      "lastSeen": "2024-01-15T09:15:00Z"
    }
  ]
}

HCU Status

curl http://aris.local/api/hcu \
  -H "Authorization: Bearer YOUR_TOKEN"
Key metrics:
  • mode - Current operating mode (heat/cool/dhw_charge/idle)
  • supplyTempC / returnTempC - Water temperatures
  • thermalPowerKw - Heat output
  • copInstant - System efficiency

DHW Tank

curl http://aris.local/api/dhw \
  -H "Authorization: Bearer YOUR_TOKEN"
Key metrics:
  • stateOfChargePercent - How much hot water is available
  • effectiveCapacityLiters - Usable hot water volume
  • heatingActive - Whether tank is being heated
  • isSanitizing - Legionella sanitization in progress

Time-Series Metrics

The Brain stores detailed metrics in VictoriaMetrics for historical analysis.

Check Metrics Health

curl http://aris.local/api/metrics/health \
  -H "Authorization: Bearer YOUR_TOKEN"

Query Metrics

# Current value
curl "http://aris.local/api/metrics/query?query=aris_zone_temp_c" \
  -H "Authorization: Bearer YOUR_TOKEN"

# Historical range
curl "http://aris.local/api/metrics/query_range?query=aris_zone_temp_c&start=2024-01-14T00:00:00Z&end=2024-01-15T00:00:00Z&step=5m" \
  -H "Authorization: Bearer YOUR_TOKEN"

Available Metrics

MetricLabelsDescription
aris_zone_temp_czone_idZone temperature
aris_zone_humidity_percentzone_idZone humidity
aris_zone_heat_setpoint_czone_idHeating setpoint
aris_zone_cool_setpoint_czone_idCooling setpoint
aris_fcu_fan_percentfcu_idFCU fan speed
aris_hcu_supply_temp_chcu_idHCU supply temperature
aris_hcu_cophcu_idSystem efficiency (COP—higher is better)
aris_hp_power_kwhp_idHeat pump power
aris_dhw_soc_percentdhw_idDHW state of charge

Monitoring Script Example

import requests
import time

BRAIN_URL = "http://aris.local"
TOKEN = "your-api-token"

def check_health():
    """Check system health and alert on issues"""
    response = requests.get(f"{BRAIN_URL}/health")
    health = response.json()

    if health["status"] == "unhealthy":
        print(f"CRITICAL: System unhealthy - MQTT disconnected")
        return False

    if health["status"] == "degraded":
        print(f"WARNING: System degraded - Cloud disconnected")

    return True

def check_faults():
    """Check for active faults"""
    headers = {"Authorization": f"Bearer {TOKEN}"}

    # Check heat pump faults
    response = requests.get(f"{BRAIN_URL}/api/heatpumps", headers=headers)
    if response.status_code == 200:
        for hp in response.json().get("heatPumps", []):
            if hp.get("activeFaults"):
                print(f"FAULT: Heat pump {hp['hpId']}: {hp['activeFaults']}")

    # Check FCU faults
    response = requests.get(f"{BRAIN_URL}/api/fcus", headers=headers)
    if response.status_code == 200:
        for fcu in response.json().get("fcus", []):
            if fcu.get("activeFaults"):
                print(f"FAULT: FCU {fcu['fcuId']}: {fcu['activeFaults']}")

def check_offline_devices():
    """Check for offline FCUs"""
    headers = {"Authorization": f"Bearer {TOKEN}"}
    response = requests.get(f"{BRAIN_URL}/api/fcus/status/all", headers=headers)

    if response.status_code == 200:
        for fcu in response.json().get("status", []):
            if not fcu.get("online"):
                print(f"WARNING: FCU offline: {fcu['friendlyName']}")

if __name__ == "__main__":
    while True:
        print(f"\n--- Health Check {time.strftime('%Y-%m-%d %H:%M:%S')} ---")
        check_health()
        check_faults()
        check_offline_devices()
        time.sleep(60)  # Check every minute

Alerting Integration

For production alerting, integrate with:
  • Prometheus/Alertmanager - Use the /prometheus/* endpoints
  • Custom webhooks - Poll events API and send to Slack/Discord/PagerDuty