Часть сайтов были не доступны около 1 часа

Увы, сегодня было первое падение за 1.5 года:



5 сайтов были не доступны около 1 часа.

Суть в том что на сервере испортился блок питания и сервер ушел в offline:

This operation was closed at 2018-11-18 17:53:32 CET (UTC +01:00)

Here are the details of this operation: 
Power Suply replacement
Date 2018-11-18 17:21:24 CET (UTC +01:00), baptiste L made Power Suply replacement:
 Diagnosis:
HS power

Actions:
Replacing the power supply.
Replacing the  cpu cooling.
Server restart.

result:
Boot OK. Server on login screen. Ping OK, services started.


В логах было:

[ 446.030044] CPU1: Package temperature/speed normal
[ 446.030045] CPU7: Package temperature/speed normal
[ 446.030045] CPU5: Package temperature/speed normal
[ 446.030046] CPU3: Package temperature/speed normal
[ 446.030047] CPU2: Package temperature/speed normal
[ 446.030067] CPU0: Package temperature/speed normal
[ 446.030068] CPU4: Package temperature/speed normal
[ 446.030542] CPU6: Package temperature/speed normal
[ 600.327422] mce: [Hardware Error]: Machine check events logged
[ 746.173504] CPU2: Core temperature above threshold, cpu clock throttled (total events = 1767)
[ 746.173506] CPU1: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.173507] CPU6: Core temperature above threshold, cpu clock throttled (total events = 1767)
[ 746.173508] CPU5: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.173510] CPU6: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.173530] CPU0: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.173531] CPU4: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.173533] CPU3: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.173534] CPU7: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.174152] CPU2: Package temperature above threshold, cpu clock throttled (total events = 3871)
[ 746.174466] CPU2: Core temperature/speed normal
[ 746.174467] CPU6: Core temperature/speed normal
[ 746.174468] CPU1: Package temperature/speed normal
[ 746.174468] CPU5: Package temperature/speed normal
[ 746.174469] CPU6: Package temperature/speed normal
[ 746.174495] CPU4: Package temperature/speed normal
[ 746.174496] CPU0: Package temperature/speed normal
[ 746.174498] CPU3: Package temperature/speed normal
[ 746.174499] CPU7: Package temperature/speed normal
[ 746.174984] CPU2: Package temperature/speed normal
[ 750.396648] mce: [Hardware Error]: Machine check events logged


Примерный даунтайм 30-40 минут.

0 комментариев

Оставить комментарий




Только зарегистрированные и авторизованные пользователи могут оставлять комментарии.