How do you investigate a system crash when there are no records/logs?

by xperator   Last Updated September 12, 2019 01:02 AM

TL;DR

  1. How does one actually investigates a system crash when the logs don't show anything?
  2. Secondly, how do I prepare for future crashes? Is it possible to have more aggressive/accurate logging? In case the system panics or freezes in a way that it didn't even had time to log.

Few weeks ago I got 3 VPS machines (KVM) from a provider, and 2 of them crashed after a week ( at random/different times ). They all had 512MB ram ( with 512mb swap space ).

One one of them actually was shutdown and had a "offline" label in the provider's admin panel, and the other was kinda frozen, the panel showed "Online" but I couldn't ssh or access to it though web console.

None of them were running anything cpu/memory intensive tasks. One was just a openvpn server (with 2-3users) and the other just nginx+php serving a static site. Both of them had like 200-300 available memory at all times and the cpu was below 10% usage.

I had Netdata monitoring installed. So I had a history of almost everything. I looked up every single chart and graph right before the crashes. There was no spike or sudden increase in CPU/Memory/Disk/Network/Process/Firewall usage.

I looked up every single log file under /var/logs/. I read them line by line (before crash happened). I also used journalctl. There was no error, no warnings, no out of memory, no process killing, just normal events.

Both the servers that crashed had a syslog that looked like this:

enter image description here As you can see the ufw is just blocking random spammers right before the crash and then there is no log. Also the boot you see at 20:41:02 is the hard/forced reboot we did after the crash happened, just to get the system back online.

When I asked the provide they said everything looks ok on their side and the reason my servers crashed was because 512MB RAM was too low and I had to upgrade.

Also, there are 2 things that I randomly read on the internet that I thought I ask here if they're an actual thing.

  • "Micro RAM spikes, for example rotating ram tables to disk, etc"
  • a parameter called journal_data_writeback that if it's enabled, the system might miss writing logs to the disk during a crash.
Tags : server crash log


Related Questions


How do I fix Ubuntu crashes?

Updated November 12, 2017 03:02 AM

Crash RIP !INEXACT!

Updated July 06, 2018 00:02 AM

Kdump crash does not work properly

Updated December 16, 2018 09:02 AM

Ubuntuserver 16.04 crashes on large data loads

Updated March 27, 2018 13:02 PM