How long can your system do in “uptime”?
03-09-10 status update: After almost 27 days of solid uptime, the server unexpectedly crashed. I’ve been reviewing the logs, but unable to find any information leading me to the cause of the crash. So far, the only thing I’ve found that seems suspicious is the avahi-daemon – there is an error right about the time when the server stopped responding.
The network shares (mounted via AFP with Netatalk) suddenly disappeared on my Mac. This lead me to believe that I may had lost network connection from the Mac – but that wasn’t the case. Pinging and attempts to ssh to the server were unsuccessful, as they both timed out. All vital signs on the server seemed to be okay – the power was on, fans were spinning, hard drives were spinning…etc., but there was no response from the keyboard – just a black screen on my monitor, as if the screensaver was on.
Unplugging and plugging the keyboard didn’t help, there was still no way to get anything to come out on the screen – trying another keyboard still didn’t do the trick. The NIC LEDs were on, but solid, which was a bit uncommon. There was no visible hard drive activity… Not really sure what happened.
After 15 – 20 minutes of troubleshooting before powering the server off, I realized that the system may have just locked up or crashed, and it seems that this is what happened.
A few minutes later after a hard shut-down, everything seemed normal. I took advantage of the time to install the latest kernel, which had been sitting in my “pending” updates for quite a few days now.
So, the race against time starts again.
Current server uptime? 1:12