Re: [PATCH 1/2] Generic hardware error reporting mechanism

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: huang ying
Date: Friday, November 19, 2010 - 7:52 pm

On Fri, Nov 19, 2010 at 9:56 PM,  <boris@alien8.de> wrote:

You mean "struct trace_entry"? They are quite different on design. The
record format in patch can incorporate multiple sections into one
record, which is meaningful for hardware error reporting. And we do
not need the fancy
"/sys/kernel/debug/tracing/events/<xxx>/<xxx>/format", user space
error daemon only consumes all error record it recognized and blindly
log all other records.


There is no APEI specific code in this patch.


Because every device may report hardware errors, but not every device
will do it. So just a pointer is added to "struct device" and
corresponding data structure is only created when needed.


In general all hardware errors should be reported and logged.


Some summary hardware error information can be put into printk. Error
daemon is needed because we need not only log the the error but the
predictive recovery. If you really have no daemon, cat can be used to
log the error. I don't fully understand your words, you want to
enforce policies without error daemon?


We can use another device file to inject error, for example
/dev/error/error_inject. Just write the needed information to this
file. The format can be same as the error record defined as above,
because it is highly extensible.


These are policies and will be done in user space error daemon. For
some emergency error recovery actions, we will do it in kernel.


The point is lockless not the memory allocator. The lockless memory
allocator is not hardware error reporting specific, it can be used by
other part of the kernel too.


Uncritical errors can be reported in NMI handler too. So we need
lockless data structure for them.


No. not special for hw errors. It can be used by other part of kernel.


You think drivers/herror is not a good name? We can rename it to
"drivers/ras" if that is the consensus.

Best Regards,
Huang Ying
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 0/2] Generic hardware error reporting support, Huang Ying, (Fri Nov 19, 1:10 am)
[PATCH 2/2] Hardware error record persistent support, Huang Ying, (Fri Nov 19, 1:10 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 4:22 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 5:02 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 5:55 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 6:18 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Peter Zijlstra, (Fri Nov 19, 6:37 am)
Re: [PATCH 2/2] Hardware error record persistent support, Linus Torvalds, (Fri Nov 19, 8:52 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Fri Nov 19, 8:56 am)
Re: [PATCH 2/2] Hardware error record persistent support, Andrew Morton, (Fri Nov 19, 1:01 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Fri Nov 19, 7:15 pm)
Re: [PATCH 1/2] Generic hardware error reporting mechanism, huang ying, (Fri Nov 19, 7:52 pm)
Re: [PATCH 1/2] Generic hardware error reporting mechanism, Borislav Petkov, (Sat Nov 20, 2:00 am)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Sat Nov 20, 4:57 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Elias Gabriel Amaral ..., (Sat Nov 20, 5:50 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Linus Torvalds, (Sat Nov 20, 5:50 pm)
Re: [PATCH 0/2] Generic hardware error reporting support, Mauro Carvalho Chehab, (Tue Nov 30, 8:09 am)