Re: [PATCH 0/2] Generic hardware error reporting support

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Linus Torvalds
Date: Friday, November 19, 2010 - 7:15 pm

On Fri, Nov 19, 2010 at 6:04 PM, huang ying
<huang.ying.caritas@gmail.com> wrote:

Bah. Many machine checks _were_ software errors. They were things like
the BIOS not clearing some old pending state etc.

The confusion came not from printk, but simply from ambiguous errors.
When is a machine check hardware-related? It's not at all always
obvious.

Sometimes machine checks are from uninitialized hardware state, where
_software_ hasn't initialized it. Is it a hardware bug? No.


Sure. That doesn't change the fact that finding the data is your
/var/log/messages and your regular logging tools is still a lot more
useful than having some random tool that is specialized and that most
IT people won't know about. And that won't be good at doing network
reporting etc etc.

The thing is, hardware errors aren't that special. Sure, hardware
people always think so. But to anybody else, a hardware error is "just
another source of issues".

Anybody who thinks that hardware errors are special and needs a
special interface is missing that point totally.

And I really do understand why people inside Intel would miss that
point. To YOU guys the hardware errors you report are magical and
special. But that's always true. To _everybody_, the errors _they_
report is special. Like snowflakes, we're all unique. And we're all
the same.


And by "we", who do you mean exactly? The fact is, "we" covers a lot
of ground, and I don't think your statement is in the least true.

Yes, IT people want to know. When they start seeing hardware errors,
they'll start replacing the machine as soon as they can. Whether that
replacement is then "in five minutes" or "four months from now" is up
to their management, their replacement policy, and based on how
critical that machine is.

IT HAS NOTHING WHAT-SO-EVER TO DO WITH HOW OFTEN THE ERRORS HAPPEN.

And yes, Intel can do guidelines, but when you say there should be
some "enforced policy" by some tool, you're simply just wrong.

                  Linus
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/2] Generic hardware error reporting support, Huang Ying, (Fri Nov 19, 1:10 am)
[PATCH 2/2] Hardware error record persistent support, Huang Ying, (Fri Nov 19, 1:10 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 4:22 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 5:02 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 5:55 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 6:18 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 6:37 am)
Re: [PATCH 2/2] Hardware error record persistent support, Linus Torvalds, (Fri Nov 19, 8:52 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Fri Nov 19, 8:56 am)
Re: [PATCH 2/2] Hardware error record persistent support, Andrew Morton, (Fri Nov 19, 1:01 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Fri Nov 19, 7:15 pm)
Re: [PATCH 1/2] Generic hardware error reporting mechanism, Borislav Petkov, (Sat Nov 20, 2:00 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Sat Nov 20, 4:57 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Elias Gabriel Amaral ..., (Sat Nov 20, 5:50 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Sat Nov 20, 5:50 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Mauro Carvalho Chehab, (Tue Nov 30, 8:09 am)