Information Technology


Uptime

A Few Thoughts on Uptime

A Few Thoughts on Uptime References

One Page Summary

Tools for Managing Outages

Normal Accidents



Configure HA Servers in Data Centers

Long Road to High-Availability

Testing Highly-Available Hardware

Testing the Transport Side of Highly-Available Hosts


Uptime

These links contain white papers I've produced discussing my experience with designing, installing, testing, and maintaining highly-available systems.


A Few Thoughts on Uptime

A Few Thoughts on Uptime: I pull together experience, insights into human brain functioning, and the normal accidents model.

A Few Thoughts on Uptime References: The books I read which informed A Few Thoughts on Uptime.

One Page Summary A Few Thoughts on Uptime: An effort to squish the entire paper onto one page.

Tools for Managing Outages: Tips and techinques for handling planned downtime.


General

Configure HA Servers in Data Centers: An internal guide I wrote to help sys admins configure their hosts to take advantage of the highly-available power and Ethernet we provide. Focus on NetApp.

Long Road to High-Availability: The psychological, operational, and process-oriented potholes I have encountered as we have gradually increased uptime.

Testing Highly-Available Hardware: Tips I've developed for validating that highly-available systems can survive the loss of a component.

Testing the Transport Side of Highly-Available Hosts: The ;login magazine article I wrote on this topic.


Prepared by:
Stuart Kendrick

Last modified: 27-November-2011