Diferencia entre revisiones de «S.M.A.R.T SSD en ESXi»
(No se muestran 6 ediciones intermedias del mismo usuario) | |||
Línea 5: | Línea 5: | ||
El valor que tendremos que tener en cuenta es '''TBW'' (Total Bytes Written) este valor describe cuantos Bytes se pueden escribir en todo el dispositivo, hasta que espire la garantía, el valor lo podremos leer en S.M.A.R.T en el campo <code class="alert-info>Total_LBAs_Written</code> | El valor que tendremos que tener en cuenta es '''TBW'' (Total Bytes Written) este valor describe cuantos Bytes se pueden escribir en todo el dispositivo, hasta que espire la garantía, el valor lo podremos leer en S.M.A.R.T en el campo <code class="alert-info>Total_LBAs_Written</code> | ||
+ | |||
+ | ==Instalar smartctl== | ||
+ | [https://www.smartmontools.org/wiki/Download smartmontools] | ||
+ | |||
+ | [https://drive.google.com/file/d/1tkRCVHZhlCMtXOKVCZDcJcQ5WSrCaHIT/view?usp=sharing Descarga] | ||
+ | |||
+ | copiar el archivo '''vib''' en <code class="alert-success>/tmp/</code> | ||
+ | |||
+ | Aceptar instalación de '''CommunitySupported''' | ||
+ | esxcli software acceptance set --level=CommunitySupported | ||
+ | Instalar | ||
+ | esxcli software vib install -v /tmp/smartctl-6.6-4321.x86_64.vib | ||
+ | '''/opt/smartmontools/smartctl -d [Device Type] --all /dev/disks/[DISK]''' | ||
+ | ==Listar discos== | ||
+ | [[Archivo:Dev-disk.png]] | ||
+ | esxcli storage core device list | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | naa.50014ee25619eff0 | ||
+ | Display Name: Local ATA Disk (naa.50014ee25619eff0) | ||
+ | Has Settable Display Name: true | ||
+ | Size: 476940 | ||
+ | Device Type: Direct-Access | ||
+ | Multipath Plugin: NMP | ||
+ | Devfs Path: /vmfs/devices/disks/naa.50014ee25619eff0 | ||
+ | Vendor: ATA | ||
+ | Model: WDC WD5000AAJS-0 | ||
+ | Revision: 12.0 | ||
+ | SCSI Level: 5 | ||
+ | Is Pseudo: false | ||
+ | Status: on | ||
+ | Is RDM Capable: false | ||
+ | Is Local: true | ||
+ | Is Removable: false | ||
+ | Is SSD: false | ||
+ | Is VVOL PE: false | ||
+ | Is Offline: false | ||
+ | Is Perennially Reserved: false | ||
+ | Queue Full Sample Size: 0 | ||
+ | Queue Full Threshold: 0 | ||
+ | Thin Provisioning Status: unknown | ||
+ | Attached Filters: | ||
+ | VAAI Status: unsupported | ||
+ | Other UIDs: vml.0100000000202020202057442d574341533833323637313936574443205744 | ||
+ | Is Shared Clusterwide: false | ||
+ | Is Local SAS Device: false | ||
+ | Is SAS: false | ||
+ | Is USB: false | ||
+ | Is Boot USB Device: false | ||
+ | Is Boot Device: false | ||
+ | Device Max Queue Depth: 1 | ||
+ | No of outstanding IOs with competing worlds: 1 | ||
+ | Drive Type: unknown | ||
+ | RAID Level: unknown | ||
+ | Number of Physical Drives: unknown | ||
+ | Protection Enabled: false | ||
+ | PI Activated: false | ||
+ | PI Type: 0 | ||
+ | PI Protection Mask: NO PROTECTION | ||
+ | Supported Guard Types: NO GUARD SUPPORT | ||
+ | DIX Enabled: false | ||
+ | DIX Guard Type: NO GUARD SUPPORT | ||
+ | Emulated DIX/DIF Enabled: false | ||
+ | </syntaxhighlight> | ||
+ | opt/smartmontools/smartctl -d sat --all /dev/disks/naa.50014ee25619eff0 | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | SMART Attributes Data Structure revision number: 16 | ||
+ | Vendor Specific SMART Attributes with Thresholds: | ||
+ | ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE | ||
+ | 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 42 | ||
+ | 3 Spin_Up_Time 0x0003 182 179 021 Pre-fail Always - 5858 | ||
+ | 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3343 | ||
+ | 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 | ||
+ | 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0 | ||
+ | 9 Power_On_Hours 0x0032 037 037 000 Old_age Always - 46459 | ||
+ | 10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0 | ||
+ | 11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0 | ||
+ | 12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2451 | ||
+ | 192 Power-Off_Retract_Count 0x0032 199 199 000 Old_age Always - 1489 | ||
+ | 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3344 | ||
+ | 194 Temperature_Celsius 0x0022 122 093 000 Old_age Always - 28 | ||
+ | 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 | ||
+ | 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 | ||
+ | 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0 | ||
+ | 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 | ||
+ | 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0 | ||
+ | </syntaxhighlight> | ||
+ | ==esxcli storage== | ||
+ | esxcli storage core device smart get -d naa.50014ee25619eff0 | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Health Status OK N/A N/A | ||
+ | Media Wearout Indicator N/A N/A N/A | ||
+ | Write Error Count 0 51 N/A | ||
+ | Read Error Count 42 51 N/A | ||
+ | Power-on Hours 37 0 37 | ||
+ | Power Cycle Count 98 0 98 | ||
+ | Reallocated Sector Count 0 140 N/A | ||
+ | Raw Read Error Rate 42 51 N/A | ||
+ | Drive Temperature 28 0 N/A | ||
+ | Driver Rated Max Temperature N/A N/A N/A | ||
+ | Write Sectors TOT Count N/A N/A N/A | ||
+ | Read Sectors TOT Count N/A N/A N/A | ||
+ | Initial Bad Block Count N/A N/A N/A | ||
+ | </syntaxhighlight> | ||
+ | esxcli storage core device stats get -d naa.50014ee25619eff0 | ||
+ | <syntaxhighlight lang="bash"> | ||
+ | Device: naa.50014ee25619eff0 | ||
+ | Successful Commands: 9265 | ||
+ | Blocks Read: 119780 | ||
+ | Blocks Written: 531 | ||
+ | Read Operations: 1720 | ||
+ | Write Operations: 531 | ||
+ | Reserve Operations: 73 | ||
+ | Reservation Conflicts: 0 | ||
+ | Failed Commands: 1 | ||
+ | Failed Blocks Read: 0 | ||
+ | Failed Blocks Written: 0 | ||
+ | Failed Read Operations: 0 | ||
+ | Failed Write Operations: 0 | ||
+ | Failed Reserve Operations: 0 | ||
+ | </syntaxhighlight> |
Revisión actual del 11:11 28 oct 2020
Prácticamente he ido sustituyendo los disco duros mecánicos por SSD, en todos los host que administro, el problema que me he encontrado es poder detectar algún posible fallo, debido a que los SSD tienen una vida util dependiendo de las celdas que se pueden sobrescribir 1000 en los SSD TLC o 100.000 en los SLC.
Por eso este hecho me parece interesante poder controlar estos datos de escritura de las celdas, desafortunadamente no hay opción de controlar estos valores desde VMware Host Client, pero podemos usara la herramienta smartmontools que nos permite poder acceder a los datos de control de S.M.A.R.T
El valor que tendremos que tener en cuenta es 'TBW (Total Bytes Written) este valor describe cuantos Bytes se pueden escribir en todo el dispositivo, hasta que espire la garantía, el valor lo podremos leer en S.M.A.R.T en el campo Total_LBAs_Written
Instalar smartctl
copiar el archivo vib en /tmp/
Aceptar instalación de CommunitySupported
esxcli software acceptance set --level=CommunitySupported
Instalar
esxcli software vib install -v /tmp/smartctl-6.6-4321.x86_64.vib
/opt/smartmontools/smartctl -d [Device Type] --all /dev/disks/[DISK]
Listar discos
esxcli storage core device list
naa.50014ee25619eff0 Display Name: Local ATA Disk (naa.50014ee25619eff0) Has Settable Display Name: true Size: 476940 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/naa.50014ee25619eff0 Vendor: ATA Model: WDC WD5000AAJS-0 Revision: 12.0 SCSI Level: 5 Is Pseudo: false Status: on Is RDM Capable: false Is Local: true Is Removable: false Is SSD: false Is VVOL PE: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: unknown Attached Filters: VAAI Status: unsupported Other UIDs: vml.0100000000202020202057442d574341533833323637313936574443205744 Is Shared Clusterwide: false Is Local SAS Device: false Is SAS: false Is USB: false Is Boot USB Device: false Is Boot Device: false Device Max Queue Depth: 1 No of outstanding IOs with competing worlds: 1 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: false
opt/smartmontools/smartctl -d sat --all /dev/disks/naa.50014ee25619eff0
SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 200 200 051 Pre-fail Always - 42 3 Spin_Up_Time 0x0003 182 179 021 Pre-fail Always - 5858 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3343 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0 9 Power_On_Hours 0x0032 037 037 000 Old_age Always - 46459 10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0 11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0 12 Power_Cycle_Count 0x0032 098 098 000 Old_age Always - 2451 192 Power-Off_Retract_Count 0x0032 199 199 000 Old_age Always - 1489 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3344 194 Temperature_Celsius 0x0022 122 093 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0
esxcli storage
esxcli storage core device smart get -d naa.50014ee25619eff0
Health Status OK N/A N/A Media Wearout Indicator N/A N/A N/A Write Error Count 0 51 N/A Read Error Count 42 51 N/A Power-on Hours 37 0 37 Power Cycle Count 98 0 98 Reallocated Sector Count 0 140 N/A Raw Read Error Rate 42 51 N/A Drive Temperature 28 0 N/A Driver Rated Max Temperature N/A N/A N/A Write Sectors TOT Count N/A N/A N/A Read Sectors TOT Count N/A N/A N/A Initial Bad Block Count N/A N/A N/A
esxcli storage core device stats get -d naa.50014ee25619eff0
Device: naa.50014ee25619eff0 Successful Commands: 9265 Blocks Read: 119780 Blocks Written: 531 Read Operations: 1720 Write Operations: 531 Reserve Operations: 73 Reservation Conflicts: 0 Failed Commands: 1 Failed Blocks Read: 0 Failed Blocks Written: 0 Failed Read Operations: 0 Failed Write Operations: 0 Failed Reserve Operations: 0