Metrics Plugin Configuration
Overview
The Metrics plugin collects, formats, and publishes system or application metrics from the Alemca Agent to the platform.
It reconciles two opposing requirements: exhaustiveness (do not lose anything) and network frugality (do not saturate anything).
Function | Description | Opinion |
---|---|---|
Collection | Periodic execution of commands, file reads, or embedded values according to the DataList. | A collect_time of 5 minutes covers 95 % of situations; going below that is rarely justified. |
Caching | On-disk buffer (metrics.cache ) with a size cap before sending. |
Unless the device has a tiny flash disk, keep 10 MB—lost data costs much more than storage. |
Publishing | Batched send to the data_metrics_ex exchange over AMQP/TLS. | Do not “optimize” by switching to raw TCP; you would just rebuild something worse. |
Jitter | Pseudo-random offset of send_time to avoid synchronized bursts. |
Use 25 % of send_time : enough to de-phase traffic, small enough to keep control. |
Auto-profile | Dynamic selection of a Mode (Linux, OpenWRT, Windows, Legato…) that injects default DataList. | Remove them only if you measure a real space gain—not “just because”. |
Typical Use Cases
- Basic monitoring: free RAM, CPU load, disk space, and uptime.
- Cellular telemetry: RSSI / RSRP / RSRQ from an LTE modem.
- Static inventory: agent version + SIM ICCID sent once (
once: true
), then purged from the cache.
Configuration Details
/etc/alemca/config.yaml
metrics:
enabled: true # Fully enable/disable the plugin
cache_path: "/tmp/alemca" # Cache directory
max_cache_size: 10 # Maximum cache size in MB
collect_time: "5m" # Collection interval
send_time: "1h" # Sending interval
jitter: "5m" # Random offset (~25 % of send_time)
mode: 0 # Operating mode (Linux, OpenWRT, etc.) (auto-detected)
data_list: # Declarative list of metrics
- name: "ram" # Free memory
file_path: "/proc/meminfo"
regex: "MemFree:\\s+(\\d+)"
regex_group: 1
unit: "kB"
type: "int"
- name: "uptime" # Time since last boot
command: "cat /proc/uptime"
regex: "(\\d+)"
unit: "s"
type: "int"
- name: "version_agent" # Sent only once
value: "{{AGENT_VERSION}}"
type: "string"
once: true
Section Breakdown
Main
- enabled – Turns the Metrics plugin on or off.
- cache_path – Directory where
metrics.cache
&metrics.tmp
are stored before upload. - max_cache_size – Total cache size allowed before LRU purge.
- collect_time – Interval between two collections.
- send_time – Interval between two uploads to AMQP.
- jitter – Pseudo-random offset added to
send_time
. - queue_name – AMQP queue where data are sent.
- headers – Key/value pairs injected into AMQP message headers.
- tags – JSON keys added to every metric packet.
mode
The mode field is internal; it derives from the detected OS:
Code | Platform | Notes |
---|---|---|
0 |
Generic Linux | Standard fallback. |
1 |
OpenWRT | Adds modem metrics via gsmctl . |
2 |
Windows | Uses PowerShell and wmic . |
3 |
Legato | Extracts data via cm commands. |
999 |
Unknown | No pre-filled DataList. |
data_list
Each entry describes how to obtain one metric:
Key | Meaning |
---|---|
name | Unique metric name. |
file_path | Path to a file to read entirely. |
command | Shell command to execute. |
at_command / at_device | Send an AT command on at_device , wait for a reply. |
value | Literal value (no I/O). |
regex / regex_group | Regular-expression extraction; regex_group = capture group to return. |
unit | Unit (kB, %, s, dBm…). |
type | int , float , string , etc. |
math | Arithmetic expression to apply (e.g. / 10 ). |
once | true = one-shot collection, then the entry is removed. |
Default Fields
OS / Mode | version_agent | version_device | ram | load | disk_space | data_interfaces_in / out | uptime | modem_temperature | modem_cell_id | modem_rssi / rsrp / rsrq / sinr | modem_iccid |
---|---|---|---|---|---|---|---|---|---|---|---|
0 – Generic Linux | Yes | No | Yes | Yes | Yes | Yes | Yes | No | No | No | No |
1 – OpenWRT | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Optional | Optional | Optional | Optional |
2 – Windows | Yes | No | Yes | No | Yes | Yes | Yes | No | No | No | No |
3 – Legato | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | No | Yes |
Optional: The “modem_” metrics on OpenWRT are collected only if* the gsmctl
binary is present on the device.
Information on Reported Metrics
Metric Definitions
- Cell ID – Numeric identifier of a cell in a mobile network.
- RSRP (Reference Signal Received Power) – Strength of the LTE reference signal received. Higher (less negative) is better.
- RSRQ (Reference Signal Received Quality) – Quality of the LTE reference signal, accounting for noise and interference. Higher is better.
- RSSI (Received Signal Strength Indicator) – Total received signal power, including noise.
- SINR (Signal-to-Interference plus Noise Ratio) – Ratio of useful signal to interference and noise. Higher is better.
Level | RSRP (dBm) | RSRQ (dB) | RSSI (dBm) | SINR (dB) |
---|---|---|---|---|
Excellent | ≥ -80 | ≥ -9 | ≥ -65 | ≥ 20 |
Good | -80 to -90 | -9 to -12 | -65 to -75 | 13 to 20 |
Average | -90 to -100 | -12 to -15 | -75 to -85 | 0 to 13 |
Poor | -100 to -110 | -15 to -18 | -85 to -95 | -5 to 0 |
Critical | ≤ -110 | ≤ -18 | ≤ -95 | ≤ -5 |
- Temperature – Device or system temperature in °C. High temperatures can impair hardware and cause failures.
- Data Interfaces Default Route In – Bytes entering the interface used for the default route.
- Data Interfaces Default Route Out – Bytes leaving the interface used for the default route.
- Load – System load (CPU, RAM, etc.); higher values mean the system is busier.
- Ping – Reachability flag: 1 (true) = reachable, 0 (false) = unreachable.
- RAM – Memory currently in use (bytes).
- Uptime – Continuous run time (seconds).
- Disk Space – Percentage of storage used.
- Version Agent – Version of the Alemca agent; indicates if updates are needed.
- Version Device – Device firmware version; older firmware may need updating.
- Alert Timeout – Connectivity flag:
- true – Device did not respond within the timeout and is offline.
- false – Device is reachable.
- Alert Uptime – Uptime flag:
- true – Device is offline or below the minimum uptime threshold.
- false – Device uptime is within expectations.
- Alert Signals (RSRP, RSRQ, RSSI, SINR) – Signal-quality alert level:
4
– Excellent3
– Good2
– Average1
– Poor0
– Critical-1
– Metric not found (signal unavailable)- Alert Version Agent – Agent version flag:
- true – Agent version is below the threshold or unavailable.
- false – Agent version is acceptable.