A Linux uptime survey

by Winfried Trümper

Thanks to everybody who took part in the uptime survey of Linux machine. Currently I'm developing a WWW-form for easier and standarized data aquisition. Please submit any future report about uptimes only through this form; it will be announced seperately in comp.os.linux.announce.

Only submissions for machines were taken into account which are not rebooted for fun (e.g. to play games under proprietary OSes on the same machine) and which are not used to test new development kernels. People with the attitude "if something hangs, I press alt-ctrl-del" were taken into account, though. Here are the results as of September, 3rd, 1997:


	Number of machines analysed: 62  (submitted by 50 people)
	Machines with high uptime:   32  (reaching uptimes >= 100 days)
              of them running 1.2:   14  (= all 1.2 machines)
Above numbers are quite impressive: over 50% of all Linux machines can run for more than 100 days without an error that brings the system down. All machines are used in production environments and heavily stressed in various aspects (networking, number-crunching, graphics). An amount of 62 analysed machines is not statistical base, though.

All machines running 1.2 reach a high uptime by default; IMHO this shows the constant quality of Linux. It could be misleading for people new to Linux: please note that stock Linux 1.2 has a number of security bugs and shouldn't be used in new installations. Support for Linux 1.2 will vanish in the foreseeable future. Patches for 1.2.13 are still available from, though.

	Highest uptime:    1 year (followed by 341,311,295,286,270 ... days)
	Average uptime:   45 days (between clustered reboots)
	Average time
        of observation:  218 days
Yes, we've waited for the machine to complete this amazing uptime of 1 year. :-) I'm sure there are machines with an even higher uptime out there but at least for this survey it is the first place. The according machine is running 1..2.13 and acting as a POP server for a very large company (not in the computer business).
	Overall reboots:          625 (all machines counted together)
	Reboots on the same day:  324
	Days with reboots:        301 (= clustered reboots)
Jonathan Larmour wrote about the frequency of reboots:
	It may seem like a lot, but if you actually look closely at it, the
	majority of reboots are in a cluster. This is normally when I play
	around with upgrades out of hours, to reduce noticed
	downtime. These sometimes don't work quite right, so I need to
	reboot again.
Therefore the average uptime was computed as the sum of all uptime days (13547) divided by the days with reboots (301). I believe this algorithm is fair because only a minority of participants had high availability (minimize outage time) on their mind. Instead, they rebooted Linux frequently after hardware upgrades or update of distribution to check the modified setup. Additional comments from Larry Doolittle:
	A plot of reboot frequency vs. time out of the box might be
	interesting in a larger context, though.  It does take a finite
	amount of time fiddling with new equipment to get things "right".
And in fact, there is a period of frequent reboots 3 days range after installation or hardware upgrades in many cases. The reasons for downtimes were:
	Overall reboots:		625	100%
	Unkown reason:			422	67%
	For known reason:	100%	203       

	Upgrades		 51%	104
		Hardware	   19%	   38
		Software	   11%	   23
		Kernel		   21%	   43
	Failures		 38%	 78
		Power		   20%	   41
		Hardware	    2%	    4
		Software	   11%	   22
		Kernel		    5%	   11
	Moving machine		  5%	 11
	Other			  5%	 10
A small graphic summarizing the numbers.


Fig 1: Reasons for downtimes of Linux-based system

Conclusion: never hook your computer to the same circuit as your coffee machine (downtimes around 9am), your desktop lamp, your refrigerator, etc. A UPS (uninteruptable power supply) may help for cases of short power outages (e..g. in rural areas). Secure the main power switch against your baby and your knee (oops!), though.

Software-Failures which required (?) a machine boot were mainly caused by the fileservers nfsd and netatalk. From my experience, it is sufficient to unload/load the kernel appletalk-module to make netatalk working again. "nfsd" can be killed with a killall -KILL nfsd and re-started without booting the whole system (hint).

Back to the Linux related links.


© Winfried Trümper 1997. All rights reserved. You may distribute copies of the documents freely, as long as you give proper credit.