Security Lessons from Nature – Status monitoring
I weigh between 150 and 155 pounds. What’s interesting is that, under ideal conditions, it is exactly between 150 and 155. I weigh myself regularly, and I have noticed that if my weight ever drops below 150, I get sick within a day. The same applies if it holds steady over 155 for more than a couple of days. Similarly, I have an average temperature range, and any significant variance typically bodes ill(ness).
The human body (really, all mammals) has many such metrics. In addition to weight and temp, there is an average heart rate, normal EKG, bone density and typical levels of vitamins, minerals and hormones. These can be measured in many ways, but they generally fall into two categories. Some things can be measured at a surface level (weight and temp), others require special equipment, a tolerance of invasive procedures and significant amounts of time. Of course, the more time you devote to it, the better the data you get, so these scans are generally only done when a problem is suspected.
The same applies to IT systems. There are certain metrics that are easily determined and if they vary, it can indicate a problem. Just like weight and temperature, some can be easily gathered, gathering others can impact the system, and some require the system to be down before they can be gathered.
Just like we generally don’t send people in for a full body scan on a regular basis, we aren’t in the habit of shutting down servers for a day each week and performing precautionary forensic analysis upon them. Instead, we prefer to check surface-level data: Disk, CPU and RAM usage, network connection statistics. If one of these indicate a problem, then and only then do we begin to dig more deeply and run scans that might impact system performance.
The key, just like my regular monitoring of my weight and temp, is to regularly monitor system performance metrics. Otherwise, you only catch problems after they’ve already impacted the system. Just as it’s easiest to deal with a cold before it really sets in, it’s easier to identify an attack at the beginning of the process.