Initial
This commit is contained in:
parent
3fc0100dcf
commit
9fe5418c62
100
README.md
100
README.md
@ -1,2 +1,100 @@
|
|||||||
# prtg-smart-health-check
|
# PRTG SMART Monitoring Script (`prtg-smart-health-check.v1.sh`)
|
||||||
|
|
||||||
|
Dieses Bash-Skript überwacht die S.M.A.R.T.-Werte einer physischen Festplatte oder SSD (SATA, SAS oder NVMe) und liefert die Daten im PRTG-kompatiblen XML-Format. Es wird über einen **SSH Script (erweitert)** Sensor eingebunden und kann automatisch erkennen, ob es sich um ein klassisches oder NVMe-Gerät handelt.
|
||||||
|
|
||||||
|
## 🔧 Voraussetzungen
|
||||||
|
|
||||||
|
- Ein Linux-System (z. B. Debian/Ubuntu/Proxmox)
|
||||||
|
- Das Paket `smartmontools` muss installiert sein:
|
||||||
|
```bash
|
||||||
|
sudo apt install smartmontools
|
||||||
|
```
|
||||||
|
- Das Skript muss mit `bash` ausgeführt werden
|
||||||
|
- PRTG benötigt SSH-Zugriff mit einem Benutzer, der das Skript ausführen darf
|
||||||
|
|
||||||
|
## 📁 Installation
|
||||||
|
|
||||||
|
1. Erstelle den Zielpfad für das Skript (empfohlenes Verzeichnis):
|
||||||
|
```bash
|
||||||
|
sudo mkdir -p /var/prtg/scriptsxml/
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Kopiere das Skript in dieses Verzeichnis und mache es ausführbar:
|
||||||
|
```bash
|
||||||
|
sudo cp prtg-smart-health-check.v1.sh /var/prtg/scriptsxml/
|
||||||
|
sudo chmod +x /var/prtg/scriptsxml/prtg-smart-health-check.v1.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Erlaube dem PRTG-SSH-Benutzer die Nutzung von `smartctl` ohne Passwort:
|
||||||
|
```bash
|
||||||
|
echo "prtguser ALL=(ALL) NOPASSWD: /usr/sbin/smartctl" | sudo tee /etc/sudoers.d/prtg-smart
|
||||||
|
```
|
||||||
|
> **Hinweis**: Ersetze `prtguser` durch den tatsächlichen Benutzer, den PRTG per SSH verwendet.
|
||||||
|
|
||||||
|
## 🧪 Testlauf
|
||||||
|
|
||||||
|
Vorab kann das Skript manuell getestet werden:
|
||||||
|
```bash
|
||||||
|
sudo /var/prtg/scriptsxml/prtg-smart-health-check.v1.sh /dev/nvme0n1
|
||||||
|
```
|
||||||
|
|
||||||
|
## ⚙️ PRTG-Einbindung
|
||||||
|
|
||||||
|
1. Öffne dein PRTG Webinterface
|
||||||
|
2. Wähle das gewünschte Gerät (der Linux-Host)
|
||||||
|
3. Klicke auf **„Sensor hinzufügen“ → „SSH Script (Erweitert)“**
|
||||||
|
4. Wähle im Dropdown das Skript `prtg-smart-health-check.v1.sh`
|
||||||
|
5. Als **Parameter** gibst du das physikalische Device an, z. B.:
|
||||||
|
```
|
||||||
|
/dev/sda
|
||||||
|
```
|
||||||
|
oder
|
||||||
|
```
|
||||||
|
/dev/nvme0n1
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Lasse das Feld **„Mutex Name“** leer
|
||||||
|
7. Speichern – fertig ✅
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> ❗ **Wichtig**:
|
||||||
|
> Der Benutzer, mit dem PRTG sich per SSH verbindet, muss das angegebene Device lesen dürfen (z. B. `/dev/nvme0n1`). Üblicherweise ist das nur per `sudo` mit `smartctl` möglich – genau deshalb ist die `sudoers`-Datei erforderlich.
|
||||||
|
|
||||||
|
## 📊 Was wird überwacht?
|
||||||
|
|
||||||
|
**SATA/SAS:**
|
||||||
|
- Temperatur
|
||||||
|
- Betriebsstunden
|
||||||
|
- Reallocated Sectors
|
||||||
|
- Pending Sectors
|
||||||
|
- Uncorrectable Sectors
|
||||||
|
- CRC-Fehler und mehr
|
||||||
|
|
||||||
|
**NVMe:**
|
||||||
|
- Temperatur
|
||||||
|
- Wear-Level (% Verbrauch)
|
||||||
|
- Datenmenge geschrieben/gelesen (TB)
|
||||||
|
- Unsafe Shutdowns
|
||||||
|
- Fehlerlogs
|
||||||
|
- Power-Cycles
|
||||||
|
- Verlauf des Verschleißes direkt im Sensortext
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## 📁 Caching für Wear-Level-Verlauf
|
||||||
|
|
||||||
|
Das Skript speichert den letzten Wear-Level-Wert unter:
|
||||||
|
```
|
||||||
|
/var/prtg/scriptsxml/.smart_wear_dev_nvme0n1.cache
|
||||||
|
```
|
||||||
|
Daraus ergibt sich im Sensortext z. B.:
|
||||||
|
```
|
||||||
|
Wear: 9 % – +1 % seit 2025-03-27
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
🧠 Ideal für Admins, die Verschleiß frühzeitig erkennen und NVMe-Smartdaten sauber im Monitoring sehen wollen.
|
||||||
|
|
||||||
|
Autor: Patrick Asmus – [www.techniverse.net](https://www.techniverse.net)
|
||||||
|
BIN
assets/screenshot_1.png
Normal file
BIN
assets/screenshot_1.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 19 KiB |
BIN
assets/screenshot_2.png
Normal file
BIN
assets/screenshot_2.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 163 KiB |
135
prtg-smart-health-check.v1.sh
Normal file
135
prtg-smart-health-check.v1.sh
Normal file
@ -0,0 +1,135 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Beschreibung: Überwacht die S.M.A.R.T Werte eines physikalischen Devices und gibt diese als XML in PRTG aus
|
||||||
|
# Parameter: /dev/sda
|
||||||
|
# Autor: Patrick Asmus
|
||||||
|
# Web: https://www.techniverse.net
|
||||||
|
# Version: 1.0
|
||||||
|
# Datum: 28.03.2025
|
||||||
|
# Modifikation: Initial
|
||||||
|
#####################################################
|
||||||
|
|
||||||
|
DEVICE="$1"
|
||||||
|
|
||||||
|
if [ -z "$DEVICE" ]; then
|
||||||
|
echo "<prtg><error>1</error><text>Kein Gerät angegeben</text></prtg>"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v smartctl &>/dev/null; then
|
||||||
|
echo "<prtg><error>1</error><text>smartctl nicht installiert</text></prtg>"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ ! -b "$DEVICE" ]; then
|
||||||
|
echo "<prtg><error>1</error><text>Gerät $DEVICE nicht gefunden</text></prtg>"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
SMART_OUTPUT=$(sudo smartctl -x "$DEVICE")
|
||||||
|
|
||||||
|
# Herstellerinformationen extrahieren
|
||||||
|
MODEL=$(echo "$SMART_OUTPUT" | awk -F: '/Model Number|Device Model/ {print $2}' | xargs)
|
||||||
|
SERIAL=$(echo "$SMART_OUTPUT" | awk -F: '/Serial Number/ {print $2}' | xargs)
|
||||||
|
VENDOR=$(echo "$SMART_OUTPUT" | awk -F: '/Vendor/ {print $2}' | xargs)
|
||||||
|
[ -z "$VENDOR" ] && VENDOR=$(echo "$MODEL" | awk '{print $1}')
|
||||||
|
MODEL=${MODEL:-Unbekanntes Modell}
|
||||||
|
SERIAL=${SERIAL:-Keine Seriennummer}
|
||||||
|
VENDOR=${VENDOR:-Unbekannter Hersteller}
|
||||||
|
|
||||||
|
# Gerätetyp erkennen
|
||||||
|
if echo "$SMART_OUTPUT" | grep -q "NVMe Version"; then
|
||||||
|
TYPE="nvme"
|
||||||
|
else
|
||||||
|
TYPE="sata"
|
||||||
|
fi
|
||||||
|
|
||||||
|
XML="<prtg>"
|
||||||
|
|
||||||
|
if [ "$TYPE" = "sata" ]; then
|
||||||
|
get_value() {
|
||||||
|
echo "$SMART_OUTPUT" | awk -v id="$1" '$1 == id {print $10}'
|
||||||
|
}
|
||||||
|
|
||||||
|
TEMP=$(get_value 194)
|
||||||
|
HOURS=$(get_value 9)
|
||||||
|
REALLOC=$(get_value 5)
|
||||||
|
REALLOC_EVENT=$(get_value 196)
|
||||||
|
PENDING=$(get_value 197)
|
||||||
|
UNCORRECTABLE=$(get_value 198)
|
||||||
|
CRC_ERROR=$(get_value 199)
|
||||||
|
REPORTED_UNCORRECT=$(get_value 187)
|
||||||
|
|
||||||
|
for var in TEMP HOURS REALLOC REALLOC_EVENT PENDING UNCORRECTABLE CRC_ERROR REPORTED_UNCORRECT; do
|
||||||
|
eval "[ -z \$$var ] && $var=0"
|
||||||
|
done
|
||||||
|
|
||||||
|
XML+="
|
||||||
|
<result><channel>Temperature (°C)</channel><value>$TEMP</value><unit>Temperature</unit><limitmode>1</limitmode><limitmaxwarning>45</limitmaxwarning><limitmaxerror>55</limitmaxerror></result>
|
||||||
|
<result><channel>Power-On Hours</channel><value>$HOURS</value><unit>Hours</unit></result>
|
||||||
|
<result><channel>Reallocated Sectors</channel><value>$REALLOC</value><unit>Count</unit><limitmode>1</limitmode><limitmaxerror>10</limitmaxerror></result>
|
||||||
|
<result><channel>Reallocated Events</channel><value>$REALLOC_EVENT</value><unit>Count</unit></result>
|
||||||
|
<result><channel>Pending Sectors</channel><value>$PENDING</value><unit>Count</unit><limitmode>1</limitmode><limitmaxerror>1</limitmaxerror></result>
|
||||||
|
<result><channel>Offline Uncorrectable</channel><value>$UNCORRECTABLE</value><unit>Count</unit><limitmode>1</limitmode><limitmaxerror>1</limitmaxerror></result>
|
||||||
|
<result><channel>Reported Uncorrect</channel><value>$REPORTED_UNCORRECT</value><unit>Count</unit></result>
|
||||||
|
<result><channel>CRC Error Count</channel><value>$CRC_ERROR</value><unit>Count</unit></result>
|
||||||
|
<text>$VENDOR | $MODEL | SN: $SERIAL | $DEVICE</text>"
|
||||||
|
|
||||||
|
elif [ "$TYPE" = "nvme" ]; then
|
||||||
|
get_nvme_value() {
|
||||||
|
echo "$SMART_OUTPUT" | grep -E "^$1:" | head -n1 | awk -F: '{gsub(/^[ \t]+|[ \t]+$/, "", $2); print $2}'
|
||||||
|
}
|
||||||
|
|
||||||
|
TEMP=$(get_nvme_value "Temperature" | awk '{print $1}')
|
||||||
|
HOURS=$(get_nvme_value "Power On Hours" | sed 's/\.//g')
|
||||||
|
PERCENT_USED=$(get_nvme_value "Percentage Used" | sed 's/%//')
|
||||||
|
MEDIA_ERRORS=$(get_nvme_value "Media and Data Integrity Errors")
|
||||||
|
ERROR_LOGS=$(get_nvme_value "Error Information Log Entries")
|
||||||
|
UNSAFE_SHUTDOWNS=$(get_nvme_value "Unsafe Shutdowns")
|
||||||
|
WARN_TEMP_TIME=$(get_nvme_value "Warning Comp. Temperature Time")
|
||||||
|
CRIT_TEMP_TIME=$(get_nvme_value "Critical Comp. Temperature Time")
|
||||||
|
POWER_CYCLES=$(get_nvme_value "Power Cycles")
|
||||||
|
WRITTEN_TB=$(get_nvme_value "Data Units Written" | grep -o '\[[0-9.]* TB\]' | tr -d '[]TB ')
|
||||||
|
READ_TB=$(get_nvme_value "Data Units Read" | grep -o '\[[0-9.]* TB\]' | tr -d '[]TB ')
|
||||||
|
|
||||||
|
# === Wear Level Verlauf verfolgen ===
|
||||||
|
CACHE_FILE="/var/prtg/scriptsxml/.smart_wear_${DEVICE//\//_}.cache"
|
||||||
|
NOW_DATE=$(date '+%Y-%m-%d')
|
||||||
|
if [ -f "$CACHE_FILE" ]; then
|
||||||
|
LAST_LEVEL=$(awk -F: '{print $1}' "$CACHE_FILE")
|
||||||
|
LAST_DATE=$(awk -F: '{print $2}' "$CACHE_FILE")
|
||||||
|
DIFF=$((PERCENT_USED - LAST_LEVEL))
|
||||||
|
|
||||||
|
if [ "$DIFF" -ne 0 ]; then
|
||||||
|
TREND=" – +$DIFF % seit $LAST_DATE"
|
||||||
|
echo "$PERCENT_USED:$NOW_DATE" > "$CACHE_FILE"
|
||||||
|
else
|
||||||
|
TREND=" – unverändert seit $LAST_DATE"
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "$PERCENT_USED:$NOW_DATE" > "$CACHE_FILE"
|
||||||
|
TREND=" – erster Messpunkt"
|
||||||
|
fi
|
||||||
|
|
||||||
|
for var in TEMP HOURS PERCENT_USED MEDIA_ERRORS ERROR_LOGS UNSAFE_SHUTDOWNS WARN_TEMP_TIME CRIT_TEMP_TIME POWER_CYCLES WRITTEN_TB READ_TB; do
|
||||||
|
eval "[ -z \$$var ] && $var=0"
|
||||||
|
done
|
||||||
|
|
||||||
|
XML+="
|
||||||
|
<result><channel>Temperature (°C)</channel><value>$TEMP</value><unit>Temperature</unit><limitmode>1</limitmode><limitmaxwarning>65</limitmaxwarning><limitmaxerror>80</limitmaxerror></result>
|
||||||
|
<result><channel>Power-On Hours</channel><value>$HOURS</value><unit>Hours</unit></result>
|
||||||
|
<result><channel>Wear Level (Percentage Used)</channel><value>$PERCENT_USED</value><unit>Percent</unit><limitmode>1</limitmode><limitmaxwarning>70</limitmaxwarning><limitmaxerror>90</limitmaxerror></result>
|
||||||
|
<result><channel>Power Cycles</channel><value>$POWER_CYCLES</value><unit>Count</unit></result>
|
||||||
|
<result><channel>Media/Data Errors</channel><value>$MEDIA_ERRORS</value><unit>Count</unit><limitmode>1</limitmode><limitmaxerror>1</limitmaxerror></result>
|
||||||
|
<result><channel>SMART Error Logs</channel><value>$ERROR_LOGS</value><unit>Count</unit></result>
|
||||||
|
<result><channel>Unsafe Shutdowns</channel><value>$UNSAFE_SHUTDOWNS</value><unit>Count</unit></result>
|
||||||
|
<result><channel>Warning Temp Time</channel><value>$WARN_TEMP_TIME</value><unit>TimeSeconds</unit></result>
|
||||||
|
<result><channel>Critical Temp Time</channel><value>$CRIT_TEMP_TIME</value><unit>TimeSeconds</unit><limitmode>1</limitmode><limitmaxerror>10</limitmaxerror></result>
|
||||||
|
<result><channel>Data Written (TB)</channel><value>$WRITTEN_TB</value><unit>Custom</unit></result>
|
||||||
|
<result><channel>Data Read (TB)</channel><value>$READ_TB</value><unit>Custom</unit></result>
|
||||||
|
<text>$VENDOR | $MODEL | SN: $SERIAL | $DEVICE | Wear: $PERCENT_USED %$TREND</text>"
|
||||||
|
else
|
||||||
|
XML+="<error>1</error><text>Unbekannter Gerätetyp</text>"
|
||||||
|
fi
|
||||||
|
|
||||||
|
XML+="</prtg>"
|
||||||
|
echo "$XML"
|
Loading…
x
Reference in New Issue
Block a user