How many errors are OK?

Opinion by Nick Barron

Intel's latest processor has triggered a debate about the reliability of hardware. Where do you stand?

Intel's latest processor has triggered a debate about the reliability of hardware. Where do you stand?

In the security business we're only too used to worrying about software problems. What we don't tend to see so much discussion of is hardware vulnerabilities. Recently however, Intel's new Core 2 Duo processor has been the subject of some high-profile criticism.

Leading the charge is Theo de Raadt. As a major player in the OpenSSH and OpenBSD projects, de Raadt's credentials are impressive, so it's difficult to brush his comments under the carpet. In a recent posting to the OpenBSD mailing list, he slated Intel's new processor range, expressing concern over a range of memory management and related bugs in the Intel errata list.

This has provoked a furious debate over whether there's anything to worry about. Indeed, errors in processors are not a new thing; a quick review of Intel's website shows a hefty set of errata for each of the Pentium range. Given the extreme complexity of modern processors, I'm surprised there aren't more.

Linux guru Linus Torvalds had a much less extreme reaction, describing the key problems with memory handling as insignificant. He suggested that Intel's main problem was poor documentation of the new memory management system.

Unlike normal software bugs, a CPU exploit could affect all operating systems. Given the almost ubiquitous nature of Intel's CPUs, this would be a serious matter, although the exploit would have to be customised for each OS to provide a sophisticated hack. A platform-independent denial-of-service may be feasible, but a universal buffer overflow exploit won't be.

Modern CPUs run "microcode" and can be patched by BIOS updates or modules in the OS. One of de Raadt's concerns, which I think is valid, is that manufacturers such as Intel tend to ignore the open-source community when issuing the necessary technical details for such updates. Indeed, ironically in this case, the open source "vulnerability window" may be noticeably worse than that of commercial operating systems, a reversal of the usual situation.

At the moment the jury is still out. As yet there are no exploits or proof-of-concept code to demonstrate the vulnerabilities proposed by de Raadt.

On a more pragmatic note, the impression that hardware is error-free is a naive one. There have been efforts to produce formally proven and reliable CPUs in the past, but they have failed to come up with that elusive perfect option. Consider, for example, the development and subsequent demise of the Viper processor, hailed as the solution to everyone's safety and security needs back in the 1980s.

Although Claude Shannon elegantly proved in the 1940s that a reliable communications channel could be produced from an unreliable one by sacrificing bandwidth, it has not been possible to transfer this approach to the computing world. The key problem is that in communications systems information is not lost, but in computing it is. For example, from the sum of two numbers you cannot generally reverse the process to reveal the original numbers. This means Shannon's work cannot be simply transferred to computers. There is some work on reversible computing underway in the quantum computing world, but it will be a while before it reaches the shopfloor, if it ever does.

So while the error rate can be reduced, it cannot be removed completely. The question then is how much error is acceptable, and the problem becomes a more familiar one of risk management. General purpose processors are always going to involve trade-offs on performance and commercial benefits versus testing and reliability. Although there are less complex and (debatably) "more secure" options around, off-the-shelf processors will get far more testing simply due to the size of the market. Even with the error rates in Intel's previous devices, they are considered reliable enough for many safety-critical applications.

Perhaps the saving grace for hardware reliability is the unreliability of software and its users. Whatever the risk of a hardware error breaching your security, I'm willing to bet the chances of errant software or users breaking it first are several orders of magnitude higher.


Find this article useful?

Get more great articles like this in your inbox every lunchtime

Upcoming Events