Linux: Reliability, Availability. and Serviceability

Submitted by Jeremy
on August 3, 2007 - 11:49am

A recent patch posted to the lkml aimed to make it possible to use both kdb and kdump at the same time, and instead led to an interesting discussion about RAS (Reliability, Availability, and Serviceability) tools. Vivek Goyal compared the two main philosophies, "so basically there are two kind of users. One who believes that despite the kernel [having] crashed something meaningful can be done," versus, "exec on panic, which thinks that once [the] kernel is crashed nothing meaningful can be done". When the discussion focused on kdb, Keith Owens noted:

"The problem above applies to all the RAS tools, not just kdb. My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, crash, kdump etc.) should be using a common interface that safely puts the entire system in a stopped state and saves the state of each cpu. Then each tool can do what it likes, instead of every RAS tool doing its own thing and they all conflict with each other, which is why this thread started."

Andrew Morton summarized the current state of affairs, "lots of different groups, little commonality in their desired funtionality, little interest in sharing infrastructure or concepts." In response to an earlier patch Keith posted to a lesser-trafficked mailing list, Andrew suggested it be resubmitted in a working form for a full review, "much of the onus is upon the various RAS tool developers to demonstrate why it is unsuitable for their use and, hopefully, to explain how it can be fixed for them."


From:	Takenori Nagano [email blocked]
To: 	kexec, [email blocked]
Subject: [patch] add kdump_after_notifier
Date:	Thu, 19 Jul 2007 21:15:12 +0900

Hi,

In latest kernel, we can't use panic_notifier_list if kdump is enabled.
panic_notifier_list is very useful function for debug, failover, etc...

So this patch adds a control file /proc/sys/kernel/dump_after_notifier
and resolves a problem users can not use both kdump and panic_notifier_list
at the same time.

kdump_after_notifier = 0
 -> panic()
    -> crash_kexec(NULL)

kdump_after_notifier = 1
 -> panic()
    -> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
    -> crash_kexec(NULL)


Signed-off-by: Takenori Nagano [email blocked]
Signed-off-by: Kazuto Miyoshi [email blocked]


From: Bernhard Walle [email blocked] To: Takenori Nagano [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 26 Jul 2007 16:07:02 +0200 * Takenori Nagano [email blocked] [2007-07-19 14:15]: > > In latest kernel, we can't use panic_notifier_list if kdump is enabled. > panic_notifier_list is very useful function for debug, failover, etc... > > So this patch adds a control file /proc/sys/kernel/dump_after_notifier > and resolves a problem users can not use both kdump and panic_notifier_list > at the same time. > > kdump_after_notifier = 0 > -> panic() > -> crash_kexec(NULL) > > kdump_after_notifier = 1 > -> panic() > -> atomic_notifier_call_chain(&panic_notifier_list, 0, buf); > -> crash_kexec(NULL) What's problematic about this patch? I also would like to see that feature. Thanks, Bernhard
From: Vivek Goyal [email blocked] Cc: Bernhard Walle [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 26 Jul 2007 21:02:40 +0530 On Thu, Jul 26, 2007 at 04:07:02PM +0200, Bernhard Walle wrote: > * Takenori Nagano [email blocked] [2007-07-19 14:15]: > > > > In latest kernel, we can't use panic_notifier_list if kdump is enabled. > > panic_notifier_list is very useful function for debug, failover, etc... > > > > So this patch adds a control file /proc/sys/kernel/dump_after_notifier > > and resolves a problem users can not use both kdump and panic_notifier_list > > at the same time. > > > > kdump_after_notifier = 0 > > -> panic() > > -> crash_kexec(NULL) > > > > kdump_after_notifier = 1 > > -> panic() > > -> atomic_notifier_call_chain(&panic_notifier_list, 0, buf); > > -> crash_kexec(NULL) > > What's problematic about this patch? I also would like to see that > feature. I would like to see the code which will get executed after panic and before crash_kexec(). This potentially makes crash dump feature unreliable in the sense one can now register on panic_notifier_list and try to do whole lot of things and might get stuck there. After the system has crashed, one is not supposed to do a whole lot. Thanks Vivek
From: Bernhard Walle [email blocked] To: Vivek Goyal [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 26 Jul 2007 17:34:40 +0200 * Vivek Goyal [email blocked] [2007-07-26 17:32]: > On Thu, Jul 26, 2007 at 04:07:02PM +0200, Bernhard Walle wrote: > > * Takenori Nagano [email blocked] [2007-07-19 14:15]: > > > > > > In latest kernel, we can't use panic_notifier_list if kdump is enabled. > > > panic_notifier_list is very useful function for debug, failover, etc... > > > > > > So this patch adds a control file /proc/sys/kernel/dump_after_notifier > > > and resolves a problem users can not use both kdump and panic_notifier_list > > > at the same time. > > > > > > kdump_after_notifier = 0 > > > -> panic() > > > -> crash_kexec(NULL) > > > > > > kdump_after_notifier = 1 > > > -> panic() > > > -> atomic_notifier_call_chain(&panic_notifier_list, 0, buf); > > > -> crash_kexec(NULL) > > > > What's problematic about this patch? I also would like to see that > > feature. > > I would like to see the code which will get executed after panic and > before crash_kexec(). This potentially makes crash dump feature unreliable > in the sense one can now register on panic_notifier_list and try to > do whole lot of things and might get stuck there. After the system > has crashed, one is not supposed to do a whole lot. Of course, but that's why the patch doesn't change this by default but gives the user the choice. Thanks, Bernhard
From: Vivek Goyal [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 26 Jul 2007 21:14:15 +0530 On Thu, Jul 26, 2007 at 05:34:40PM +0200, Bernhard Walle wrote: > * Vivek Goyal [email blocked] [2007-07-26 17:32]: > > On Thu, Jul 26, 2007 at 04:07:02PM +0200, Bernhard Walle wrote: > > > * Takenori Nagano [email blocked] [2007-07-19 14:15]: > > > > > > > > In latest kernel, we can't use panic_notifier_list if kdump is enabled. > > > > panic_notifier_list is very useful function for debug, failover, etc... > > > > > > > > So this patch adds a control file /proc/sys/kernel/dump_after_notifier > > > > and resolves a problem users can not use both kdump and panic_notifier_list > > > > at the same time. > > > > > > > > kdump_after_notifier = 0 > > > > -> panic() > > > > -> crash_kexec(NULL) > > > > > > > > kdump_after_notifier = 1 > > > > -> panic() > > > > -> atomic_notifier_call_chain(&panic_notifier_list, 0, buf); > > > > -> crash_kexec(NULL) > > > > > > What's problematic about this patch? I also would like to see that > > > feature. > > > > I would like to see the code which will get executed after panic and > > before crash_kexec(). This potentially makes crash dump feature unreliable > > in the sense one can now register on panic_notifier_list and try to > > do whole lot of things and might get stuck there. After the system > > has crashed, one is not supposed to do a whole lot. > > Of course, but that's why the patch doesn't change this by default but > gives the user the choice. > I am skeptical that how many users will really know that whether to set this option as 1 or 0. Telling them setting it zero is more reliable as compared to 1 is kind of vague. What value will distro set it to by default? Can we be more specific in terms of functionality and code that exactly what we are trying to do after panic? Thanks Vivek
From: Bernhard Walle [email blocked] To: Vivek Goyal [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 26 Jul 2007 17:47:18 +0200 * Vivek Goyal [email blocked] [2007-07-26 17:44]: > > > > Of course, but that's why the patch doesn't change this by default but > > gives the user the choice. > > > > What value will distro set it to by default? 0. > Can we be more specific in terms of functionality and code that exactly > what we are trying to do after panic? Well, KDB, but now everybody answers with “not mainline -- doesn't count”. Thanks, Bernhard
From: Vivek Goyal [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 26 Jul 2007 21:24:45 +0530 On Thu, Jul 26, 2007 at 05:47:18PM +0200, Bernhard Walle wrote: > * Vivek Goyal [email blocked] [2007-07-26 17:44]: > > > > > > Of course, but that's why the patch doesn't change this by default but > > > gives the user the choice. > > > > > > > What value will distro set it to by default? > > 0. > > > Can we be more specific in terms of functionality and code that exactly > > what we are trying to do after panic? > > Well, KDB, but now everybody answers with “not mainline -- doesn't > count”. > That's true. Its not mainline. We had similar discussion in the past also. I think we should allow only audited code to be run after panic(). Leaving it open to modules or unaudited code makes this solution something like LKCD where whole lot of code used to run after the crash, hence was unreliable. If KDB goes mainline, then I think it is not a bad idea to call debugger first (if it is enabled) and then one can trigger crash dump from inside the debugger. Thanks Vivek
From: Takenori Nagano [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Fri, 27 Jul 2007 08:28:48 +0900 Hi Vivek, Vivek Goyal wrote: > On Thu, Jul 26, 2007 at 05:47:18PM +0200, Bernhard Walle wrote: >> * Vivek Goyal [email blocked] [2007-07-26 17:44]: >>>> Of course, but that's why the patch doesn't change this by default but >>>> gives the user the choice. >>>> >>> What value will distro set it to by default? >> 0. >> >>> Can we be more specific in terms of functionality and code that exactly >>> what we are trying to do after panic? >> Well, KDB, but now everybody answers with “not mainline -- doesn't >> count”. >> > > That's true. Its not mainline. We had similar discussion in the past > also. I think we should allow only audited code to be run after panic(). > Leaving it open to modules or unaudited code makes this solution > something like LKCD where whole lot of code used to run after the crash, > hence was unreliable. It is *not* KDB specific problem. Please grep in mainline kernel. You can find some function using panic_notifier_list. (IPMI, softdog, heartbeat, etc...) My patch gives a chance to use kdump for panic_notifier user. It is good for kdump too, because kdump user goes to increase. :-) Bernhard's idea (kdump uses panic_notifier) is very good for me. But it isn't good for kdump user, because they want to take a dump ASAP when panicked. Vivek, please think about this problem again. If there is a developer who has the opinion on this problem, please give us your opinion. Thanks.
From: Vivek Goyal [email blocked] To: Takenori Nagano [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Mon, 30 Jul 2007 14:46:24 +0530 On Fri, Jul 27, 2007 at 08:28:48AM +0900, Takenori Nagano wrote: > Hi Vivek, > > Vivek Goyal wrote: > > On Thu, Jul 26, 2007 at 05:47:18PM +0200, Bernhard Walle wrote: > >> * Vivek Goyal [email blocked] [2007-07-26 17:44]: > >>>> Of course, but that's why the patch doesn't change this by default but > >>>> gives the user the choice. > >>>> > >>> What value will distro set it to by default? > >> 0. > >> > >>> Can we be more specific in terms of functionality and code that exactly > >>> what we are trying to do after panic? > >> Well, KDB, but now everybody answers with “not mainline -- doesn't > >> count”. > >> > > > > That's true. Its not mainline. We had similar discussion in the past > > also. I think we should allow only audited code to be run after panic(). > > Leaving it open to modules or unaudited code makes this solution > > something like LKCD where whole lot of code used to run after the crash, > > hence was unreliable. > > It is *not* KDB specific problem. Please grep in mainline kernel. You can find > some function using panic_notifier_list. (IPMI, softdog, heartbeat, etc...) > My patch gives a chance to use kdump for panic_notifier user. It is good for > kdump too, because kdump user goes to increase. :-) I grepped for couple of items. Heartbeat functionality of for stopping responding to service processsor in case of panic so that service processor knows that system has crashed and it needs to reboot the machine. But if somebody has configured and enabled kdump then we don't want service processor to stop responding to heartbeat otherwise service processor will reboot the machine and we will not be able to capture the dump. In case of detecting softlockup, panic notifier is used so that in case of panic we don't want to flag other threads are not being scheduled and it is a softlockup. In case of kdump this condition is not valid. Immediately after kdump we will boot to next OS and previous kernel's context is wiped off. Can you please be specific what exactly is the problem you are facing? In what situation is this call creating the problem? > > Bernhard's idea (kdump uses panic_notifier) is very good for me. But it isn't > good for kdump user, because they want to take a dump ASAP when panicked. > This one is better than registering kdump as one of the users of a panic_notifier() list. I think if there are any crash specific actions, they should be taken care in next kernel while it is booting. If something is really very time critical, and has to be done immediately after panic (I am not sure how can one ensure that given the fact any number of users can register on panic_notifier_list and you are not sure about your order in the list and when one will get the control), then probably that piece of code should be in kernel and called before crash_kexec(). What is that specific piece of action which you can't do in second kernel? Eric, do you have any thoughts on this. I think these guys are referring to failover problem where immediately after panic() they want to send message to other node. Thanks Vivek
From: [email blocked] (Eric W. Biederman) Subject: Re: [patch] add kdump_after_notifier Date: Mon, 30 Jul 2007 07:42:59 -0600 Vivek Goyal [email blocked] writes: >> Bernhard's idea (kdump uses panic_notifier) is very good for me. But it isn't >> good for kdump user, because they want to take a dump ASAP when panicked. >> > > This one is better than registering kdump as one of the users of a > panic_notifier() list. > > I think if there are any crash specific actions, they should be taken care > in next kernel while it is booting. > > If something is really very time critical, and has to be done immediately > after panic (I am not sure how can one ensure that given the fact any number > of users can register on panic_notifier_list and you are not sure about your > order in the list and when one will get the control), then probably that > piece of code should be in kernel and called before crash_kexec(). > > What is that specific piece of action which you can't do in second kernel? > > Eric, do you have any thoughts on this. I think these guys are referring > to failover problem where immediately after panic() they want to send > message to other node. My thoughts are roughly the same as they were last time this was suggested. I think adding a notifier to the kexec on panic path is a bad idea. This functionality sounds wrong, because it makes it hard to ensure reliability of the kexec on panic code path. We are still doing to much on it as it stands. The working assumption on that code path needs to be the kernel is broken. Anything else is just asking for trouble. Currently we do have a hook in place for code to be called. It is called the purgatory section of /sbin/kexec. And it's user space so you can do whatever you want there. Or you can wait until the second kernel gets more fully booted. If we really need to do something in the kernel we can patch the kernel to make a function call from crash_kexec. We don't need any notifiers to do this. A further problem with notifiers is they mess up the state we would like to debug. Which again makes them a problem. So at least until a specific case is made for a specific piece of code to get in I am totally opposed to the idea. Eric
From: Takenori Nagano [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Tue, 31 Jul 2007 14:55:44 +0900 Eric W. Biederman wrote: > > My thoughts are roughly the same as they were last time this was suggested. > I think adding a notifier to the kexec on panic path is a bad idea. > This functionality sounds wrong, because it makes it hard to ensure > reliability of the kexec on panic code path. We are still doing to > much on it as it stands. The working assumption on that code path > needs to be the kernel is broken. Anything else is just asking for > trouble. > > Currently we do have a hook in place for code to be called. It is called > the purgatory section of /sbin/kexec. And it's user space so you can > do whatever you want there. Or you can wait until the second kernel > gets more fully booted. > > If we really need to do something in the kernel we can patch the kernel > to make a function call from crash_kexec. We don't need any notifiers > to do this. > > A further problem with notifiers is they mess up the state we would > like to debug. Which again makes them a problem. > > > So at least until a specific case is made for a specific piece of code > to get in I am totally opposed to the idea. Hi all, IMHO, most users don't use kdump, kdump users are only kernel developers and enterprise users. I think enterprise users want the notifier function, because they use some driver and software (hardware monitering driver, clustering software, heartbeat driver, etc...) to raise their system availability. Some popular distributers added the dump function to their own kernel. We can use panic_notifier on LKCD (http://lkcd.sourceforge.net/), and diskdump (http://sourceforge.net/projects/lkdump) provides own notifier function disk_dump_notifier. Now, kdump was merged mainline kernel. Then some distributers chose kdump. I think kdump is greater than other dump function, but kdump has no notifier function. This is a large problem for enterprise users. Solutions 1: my patch 2: Bernhard's idea 3: add kdump_notifier_list I think my patch is better than other solutions, because it has only very few impact. Vivek, Eric, how do you think? Thanks.
From: [email blocked] (Eric W. Biederman) To: Takenori Nagano [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Tue, 31 Jul 2007 00:53:54 -0600 Takenori Nagano [email blocked] writes: > > Hi all, > > IMHO, most users don't use kdump, kdump users are only kernel developers and > enterprise users. Not at all. So far the only kdump related bug report I have seen has been from fedora Core. > think enterprise users want the notifier function, because > they use some driver and software (hardware monitering driver, clustering > software, heartbeat driver, etc...) to raise their system availability. Which users want this? Specifics are needed here not hand waving. In particular why can't the use the existing hooks that are already in place. > Some popular distributers added the dump function to their own kernel. We can > use panic_notifier on LKCD (http://lkcd.sourceforge.net/), and diskdump > (http://sourceforge.net/projects/lkdump) provides own notifier function > disk_dump_notifier. > > Now, kdump was merged mainline kernel. Then some distributers chose kdump. > I think kdump is greater than other dump function, but kdump has no notifier > function. This is a large problem for enterprise users. Why? If this is a large problem we should have people that are willing to have patches with users of this notifier. > Solutions > 1: my patch > 2: Bernhard's idea > 3: add kdump_notifier_list I think you are solving a non-problem. And the more I get hand waving the more I think this. > I think my patch is better than other solutions, because it has only very few > impact. Vivek, Eric, how do you think? No. The problem with your patch is that it doesn't have a code impact. We need to see who is using this and why. Because you are trying to hide what is going on your code has a tremendous maintenance and review burden. I think any hook has a tremendous maintenance and review burden. Especially since the people who want this absolutely refuse to publish their code. If it is some proprietary solution that needs this and can not withstand a code review it is absolutely the wrong thing to have on this path. The answer is no, and it isn't even worth talking about until the code for some real users shows up. Adding a notifier violates a fundamental assumption of the code path. The assumption is that the entire kernel is broken, and you want me to follow a broken pointer to broken code? We already have so much code on that code path it is almost impossible to test and review thoroughly and you want to add more crap? My apologies about my tone but I'm very annoyed at the direction of all of this conversation, please don't try and avoid showing the users. Please let's make it upfront. Eric
From: Takenori Nagano [email blocked] To: "Eric W. Biederman" [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Wed, 01 Aug 2007 18:26:13 +0900 Eric W. Biederman wrote: > Takenori Nagano [email blocked] writes: >> Hi all, >> >> IMHO, most users don't use kdump, kdump users are only kernel developers and >> enterprise users. > > Not at all. So far the only kdump related bug report I have seen has > been from fedora Core. Sorry, I thought general users push reset button when the machine is panicked. :-( > No. The problem with your patch is that it doesn't have a code > impact. We need to see who is using this and why. My motivation is very simple. I want to use both kdb and kdump, but I think it is too weak to satisfy kexec guys. Then I brought up the example enterprise software. But it isn't a lie. I know some drivers which use panic_notifier. IMHO, they use only major distribution, and they has the workaround or they don't notice this problem yet. I think they will be in trouble if all distributions choose only kdump. BTW, I use kdb and lkcd now, but I want to use kdb and kdump. I sent a patch to kdb community but it was rejected. kdb maintainer Keith Owens said, > Both KDB and crash_kexec should be using the panic_notifier_chain, with > KDB having a higher priority than crash_exec. The whole point of > notifier chains is to handle cases like this, so we should not be > adding more code to the panic routine. > > The real problem here is the way that the crash_exec code is hard coded > into various places instead of using notifier chains. The same issue > exists in arch/ia64/kernel/mca.c because of bad coding practices from > kexec. Then I gave up to merge my patch to kdb, and I tried to send another patch to kexec community. I can understand his opinion, but it is very difficult to modify that kdump is called from panic_notifier. Because it has a reason why kdump don't use panic_notifier. So, I made this patch. Please do something about this problem. Thanks,
From: [email blocked] (Eric W. Biederman) To: Takenori Nagano [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Wed, 01 Aug 2007 04:00:48 -0600 Takenori Nagano [email blocked] writes: >> No. The problem with your patch is that it doesn't have a code >> impact. We need to see who is using this and why. > > My motivation is very simple. I want to use both kdb and kdump, but I think it > is too weak to satisfy kexec guys. Then I brought up the example enterprise > software. But it isn't a lie. I know some drivers which use panic_notifier. > IMHO, they use only major distribution, and they has the workaround or they > don't notice this problem yet. I think they will be in trouble if all > distributions choose only kdump. Possibly. > BTW, I use kdb and lkcd now, but I want to use kdb and kdump. I sent a patch to > kdb community but it was rejected. kdb maintainer Keith Owens said, >> Both KDB and crash_kexec should be using the panic_notifier_chain, with >> KDB having a higher priority than crash_exec. The whole point of >> notifier chains is to handle cases like this, so we should not be >> adding more code to the panic routine. >> >> The real problem here is the way that the crash_exec code is hard coded >> into various places instead of using notifier chains. The same issue >> exists in arch/ia64/kernel/mca.c because of bad coding practices from >> kexec. I respectfully disagree with his opinion, as using notifier chains assumes more of the kernel works. Although following it's argument to it's logical conclusion we should call crash_kexec as the very first thing inside of panic. Given how much state something like bust_spinlocks messes up that might not be a bad idea. It does make adding an alternative debug mechanism in there difficult. Does anyone know if this also affects kgdb? > Then I gave up to merge my patch to kdb, and I tried to send another patch to > kexec community. I can understand his opinion, but it is very difficult to > modify that kdump is called from panic_notifier. Because it has a reason why > kdump don't use panic_notifier. So, I made this patch. > > Please do something about this problem. Hmm. Tricky. These appear to be two code bases with a completely different philosophy on what errors are being avoided. The kexec on panic assumption is that the kernel is broken and we better not touch it something horrible has gone wrong. And this is the reason why kexec on panic is replacing lkcd. Because the strong assumption results in more errors getting captured with less likely hood of messing up your system. The kdb assumption appears to be that the kernel is mostly ok, and that there are just some specific thing that is wrong. The easiest way I can think to resolve this is for kdb to simply set a break point at the entry point of panic() when it initializes. Then it wouldn't even need to be on the panic_list. That approach would probably even give better debug information because you would not have the effects of bust_spinlocks to undo. Is there some reason why kdb doesn't want to hook panic with a some kind of break point? Eric
From: Vivek Goyal [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 2 Aug 2007 16:58:52 +0530 On Wed, Aug 01, 2007 at 04:00:48AM -0600, Eric W. Biederman wrote: > Hmm. Tricky. These appear to be two code bases with a completely different > philosophy on what errors are being avoided. > > The kexec on panic assumption is that the kernel is broken and we better not > touch it something horrible has gone wrong. And this is the reason why > kexec on panic is replacing lkcd. Because the strong assumption results > in more errors getting captured with less likely hood of messing up your > system. > > The kdb assumption appears to be that the kernel is mostly ok, and that there > are just some specific thing that is wrong. > Thinking more about it. So basically there are two kind of users. One who believe that despite the kernel has crashed something meaningful can be done. In fact kernel also thinks so. That's why we have created panic_notifier_list and even exported it to modules and now we have some users. These users most of the time do non-disruptive activities and can co-exist. OTOH, we have kexec on panic, which thinks that once kernel is crashed nothing meaningful can be done and it is disruptive and can't co-exist with other users. Some thoughts on possible solutions for this problem. - Stop exporting panic_notifier_list list to modules. Audit the in kernel users of panic_notifier_list. Let crash_kexec() run once all other users of panic_notifier_list have been executed. This has fall side of breaking down external modules using panic_notifier_list and at the same time there is no gurantee that audited code will not run into the issues. - Continue with existing policy. If kdump is configured, panic_notifier_list notifications will not be invoked. Any post panic action should be executed in second kernel. There might be 1-2 odd cases like in kernel debugger which still needs to be invoked in first kernel. These users should explicitly put hooks in panic() routine and refrain from using panic_notifier list. One thing to keep in mind, doing things in second kernel might not be easy as we have lost all the config data of the first kernel. For example, if one wants to send a kernel crash event over network to a system management software, he might have to pack in lot of software in second kernel's initrd. - Let the user decide if he wants to run panic_notifier_list after the crash or not with the help of a /proc option as suggested by the Takenori's patch. Fall side is, on what basis an enterprise user will take a decision whether he wants to run the notifiers or not. My gut feeling is that distro will end up setting this parameter as 1 by default, which would mean first run panic notifiers and then run crash_kexec(). - Make crash_kexec() a user of panic_notifier_list and let it run after all the callback handlers have run. This will invariably reduce the reliability of kdump. Personally I believe that second solution should bring us best of both the worlds. Making sure post panic actions can be done more reliably at the same time making sure reliability of kdump is not compromised. Keith, do you see a value in second solution and would there be any reason why kdb hook can not be explicitly placed in panic(). There will not be many users like kdb. Rest of the users should end up performing post panic actions in second kernel. Solutoin 3, can prove to be a stop gap solution but I think this will make situation confusing for customers at the same time everybody will try to take short route of performing post panic operations in first kernel. Thanks Vivek
From: Keith Owens [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Fri, 03 Aug 2007 14:05:47 +1000 Vivek Goyal (on Thu, 2 Aug 2007 16:58:52 +0530) wrote: > >Thinking more about it. So basically there are two kind of users. One who >believe that despite the kernel has crashed something meaningful can >be done. In fact kernel also thinks so. That's why we have created >panic_notifier_list and even exported it to modules and now we have some >users. These users most of the time do non-disruptive activities and >can co-exist. > >OTOH, we have kexec on panic, which thinks that once kernel is crashed >nothing meaningful can be done and it is disruptive and can't co-exist >with other users. > >Some thoughts on possible solutions for this problem. > >- Stop exporting panic_notifier_list list to modules. Audit the in kernel > users of panic_notifier_list. Let crash_kexec() run once all other users > of panic_notifier_list have been executed. This has fall side of breaking > down external modules using panic_notifier_list and at the same time > there is no gurantee that audited code will not run into the issues. > >- Continue with existing policy. If kdump is configured, panic_notifier_list > notifications will not be invoked. Any post panic action should be executed > in second kernel. There might be 1-2 odd cases like in kernel debugger > which still needs to be invoked in first kernel. These users should > explicitly put hooks in panic() routine and refrain from using > panic_notifier list. > > One thing to keep in mind, doing things in second kernel might not be easy > as we have lost all the config data of the first kernel. For example, > if one wants to send a kernel crash event over network to a system > management software, he might have to pack in lot of software in > second kernel's initrd. > >- Let the user decide if he wants to run panic_notifier_list after the > crash or not with the help of a /proc option as suggested by the > Takenori's patch. Fall side is, on what basis an enterprise user will > take a decision whether he wants to run the notifiers or not. My gut > feeling is that distro will end up setting this parameter as 1 by default, > which would mean first run panic notifiers and then run crash_kexec(). > >- Make crash_kexec() a user of panic_notifier_list and let it run after all > the callback handlers have run. This will invariably reduce the reliability > of kdump. > >Personally I believe that second solution should bring us best of both >the worlds. Making sure post panic actions can be done more reliably at >the same time making sure reliability of kdump is not compromised. > >Keith, do you see a value in second solution and would there be any >reason why kdb hook can not be explicitly placed in panic(). There will >not be many users like kdb. Rest of the users should end up performing >post panic actions in second kernel. > >Solutoin 3, can prove to be a stop gap solution but I think this will >make situation confusing for customers at the same time everybody will >try to take short route of performing post panic operations in first kernel. > >Thanks >Vivek Do not concentrate on kdb alone. The problem above applies to all the RAS tools, not just kdb. My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, crash, kdump etc.) should be using a common interface that safely puts the entire system in a stopped state and saves the state of each cpu. Then each tool can do what it likes, instead of every RAS tool doing its own thing and they all conflict with each other, which is why this thread started. It is not the kernel's job to decide which RAS tool runs first, second etc., it is the user's decision to set that policy. Different sites will want different orders, some will say "go straight to kdump", other sites will want to invoke a debugger first. Sites must be able to define that policy, but we hard code the policy into the kernel. I proposed and wrote most of this common interface against 2.6.19-rc5. See http://marc.info/?l=linux-arch&w=2&r=1&s=crash_stop&q=b, look for crash_stop. The crash_stop interface stops all the cpus, saves the system state in a common format then runs an ordered list of RAS tools. The order that the RAS tools are run depends on the priority value that each tool passes to register_die_notifier. Currently each RAS tool hard codes its priority but it is trivial to change the tools to make that priority a parameter, passing the policy decision back to the user, not the kernel. Despite having written the code and put it up for comments, the only feedback I got was from Vivek saying "So I think crash dump will be a little special case". kdump is a special case whose priority is hard wired into the kernel, so of course people are going to argue about the coexistence of kdump with the other RAS tools. Unless the kdump developers agree to some flexibility, this thread will not be resolved to anybody's satisfaction. Use a common interface with no special cases and let the user decide which tools to run and in which order. The main objection raised against crash_stop is that it will not work if the kernel stack has overflowed. That problem is also solvable, I raised an RFC inside SGI that would detect stack overflow and still let the cpu continue. Again, no interest. I will copy that proposal to the list as a separate thread. I have pretty well given up on RAS code in the Linux kernel. Everybody has different ideas, there is no overall plan and little interest from Linus in getting RAS tools into the kernel. We are just thrashing.
From: Andrew Morton [email blocked] To: Keith Owens [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Thu, 2 Aug 2007 23:25:02 -0700 On Fri, 03 Aug 2007 14:05:47 +1000 Keith Owens [email blocked] wrote: > I have pretty well given up on RAS code in the Linux kernel. Everybody > has different ideas, there is no overall plan and little interest from > Linus in getting RAS tools into the kernel. We are just thrashing. Lots of different groups, little commonality in their desired funtionality, little interest in sharing infrastructure or concepts. Sometimes people need a bit of motivational help. In this case that motivation would come from the understanding that all the RAS tools would be *required* to use such infrastructure if it was merged. Going off and open-coding your own stuff would henceforth not be acceptable. If it turns out that it really was unsuitable for a particular group's RAS feature, and we merged it anyway, well, that mismatch is that group's fault. It was a sizeable mistake to send those patches to a few obscure mailing lists - this is the first I've heard of it, for example. So. Please, send it all again, copy the correct lists and people, make sure that at least one client of the infrastructure is wired up and working (ideally, all such in-kernel clients should be wired up) and let's take a look at it. Much of the onus is upon the various RAS tool developers to demonstrate why it is unsuitable for their use and, hopefully, to explain how it can be fixed for them.
From: Keith Owens [email blocked] To: Andrew Morton [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Fri, 03 Aug 2007 16:34:04 +1000 Andrew Morton (on Thu, 2 Aug 2007 23:25:02 -0700) wrote: >On Fri, 03 Aug 2007 14:05:47 +1000 Keith Owens [email blocked] wrote: Switching to [email blocked], I just resigned from SGI. >> I have pretty well given up on RAS code in the Linux kernel. Everybody >> has different ideas, there is no overall plan and little interest from >> Linus in getting RAS tools into the kernel. We are just thrashing. > >Lots of different groups, little commonality in their desired funtionality, >little interest in sharing infrastructure or concepts. Sometimes people >need a bit of motivational help. > >In this case that motivation would come from the understanding that all the >RAS tools would be *required* to use such infrastructure if it was merged. >Going off and open-coding your own stuff would henceforth not be acceptable. >If it turns out that it really was unsuitable for a particular group's RAS >feature, and we merged it anyway, well, that mismatch is that group's >fault. > >It was a sizeable mistake to send those patches to a few obscure mailing >lists - this is the first I've heard of it, for example. linux-arch is obscure?? Where else do you send patches that affect multiple architectures? >So. Please, send it all again, copy the correct lists and people, make sure >that at least one client of the infrastructure is wired up and working (ideally, >all such in-kernel clients should be wired up) and let's take a look at it. Already tried that. The only RAS tool that is currently in the kernel is kexec/kdump and they insist on doing things their own way. That makes it impossible to put a common RAS structure in place, because kexec will not use it. Sorry to keep beating on this drum, but kexec insist that their code must have priority and that they do not trust the rest of the kernel. Until that changes, there is no point is discussing how to make kexec coexist with other RAS tools. If kexec change their mind then we can look at using a common RAS interface, otherwise it is a waste of time and I have better things to do with my life.
From: Andrew Morton [email blocked] To: Keith Owens [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Fri, 3 Aug 2007 00:37:43 -0700 On Fri, 03 Aug 2007 16:34:04 +1000 Keith Owens [email blocked] wrote: > Andrew Morton (on Thu, 2 Aug 2007 23:25:02 -0700) wrote: > >On Fri, 03 Aug 2007 14:05:47 +1000 Keith Owens [email blocked] wrote: > > Switching to [email blocked], I just resigned from SGI. > > >> I have pretty well given up on RAS code in the Linux kernel. Everybody > >> has different ideas, there is no overall plan and little interest from > >> Linus in getting RAS tools into the kernel. We are just thrashing. > > > >Lots of different groups, little commonality in their desired funtionality, > >little interest in sharing infrastructure or concepts. Sometimes people > >need a bit of motivational help. > > > >In this case that motivation would come from the understanding that all the > >RAS tools would be *required* to use such infrastructure if it was merged. > >Going off and open-coding your own stuff would henceforth not be acceptable. > >If it turns out that it really was unsuitable for a particular group's RAS > >feature, and we merged it anyway, well, that mismatch is that group's > >fault. > > > >It was a sizeable mistake to send those patches to a few obscure mailing > >lists - this is the first I've heard of it, for example. > > linux-arch is obscure?? Exceedingly. It's a way of contacting arch maintainers, that's all. It isn't really a place to discuss new infrastructural concepts which affect multiple features. > Where else do you send patches that affect > multiple architectures? This should have gone to linux-kernel. > >So. Please, send it all again, copy the correct lists and people, make sure > >that at least one client of the infrastructure is wired up and working (ideally, > >all such in-kernel clients should be wired up) and let's take a look at it. > > Already tried that. The only RAS tool that is currently in the kernel is > kexec/kdump and they insist on doing things their own way. That makes > it impossible to put a common RAS structure in place, because kexec > will not use it. eh, write the patch for them, let's look at how much impact it is likely to have. > Sorry to keep beating on this drum, but kexec insist that their code > must have priority and that they do not trust the rest of the kernel. > Until that changes, there is no point is discussing how to make kexec > coexist with other RAS tools. If kexec change their mind then we can > look at using a common RAS interface, otherwise it is a waste of time > and I have better things to do with my life. I saw one email from Vivek expressing on-general-principle concerns. It was hardly thorough or irreconcilable-looking. Let's drag this thing into the daylight and poke at it a bit.
From: Eric W. Biederman [email blocked] Subject: Re: [patch] add kdump_after_notifier Date: Fri, 03 Aug 2007 01:10:44 -0600 Andrew Morton [email blocked] writes: > > Much of the onus is upon the various RAS tool developers to demonstrate why it > is unsuitable for their use and, hopefully, to explain how it can be fixed for > them. My current take on the situation. There are 4 different cases we care about. - Trivial in kernel message failure reports. (Oops, backtraces and the like) - Crash dumps. - Debuggers. - kernel Probes. The in kernel failure messages seem to be doing a good job and are reasonably simple to maintain. For crash dumping we have sufficient infrastructure in the kernel now in the kexec on panic work, and it is simpler and more reliable then the previous attempts. Although those kernel code paths could be made simpler yet and probably should be. Only when it comes to debuggers does it seem we don't have something we can generally settle on and agree on. All I know is that any set of code that wants to be common infrastructure that makes the assumption that the kernel is mostly not broken is not interesting for use when things are fully automated. Because it fails to work in real world failure cases. Those things only work in the artificial testing environments of developers. Right now I have seen so little to seriously address these real world concerns in suggests or patches for some kind of infrastructure that I'm tired of discussing it. I admit I haven't seen or heard of those patches either but even their description sounds non-interesting. Eric

Related Links:

How can system work if kernel is crashed?

Fred Flinta (not verified)
on
August 3, 2007 - 2:15pm

I am a noob.

But I don't see how someone can expect the machine todo something meaningful of the kernel has crashed. If the kernel has crashed, then the system is dead.

However, I do that that if a subsystem crashes, then it can get restarted and that the system can heal itself.

In Linux there is a

malefic (not verified)
on
August 3, 2007 - 2:58pm

In Linux there is a mechanism called kexec. In a few words: an emergency kernel image is loaded into a reserved memory area on boot, then, when a crash occurs, it's possible to jump into that kernel, which then can perform whatever operation it needs on the crashed kernel.

KDB is also functional

Anonymous (not verified)
on
August 3, 2007 - 4:40pm

I have little experience with kexec. I do have a lot of experience with kdb on
large systems. KDB can often get information out of a kernel in the rare cases that
LKCD fails. The tools that work with LKCD dumps are very nice especially since they
are extensible. The problems that give LKCD heartburn tend to be failed hardware.
It has been great for kernel misfires of all kinds.
I look forward to working with kexec in the future. There will be a little friction
and I will be less productive for a while as I will be working with unfamiliar tools.
Saving the state of a broken system on a broken system is tough. I would think that
you would *have* to lose state by booting a new kernel to drop the dump. I expect to
find that the kexec/kdump people are doing something clever to avoid that.

One caveat is that I work on Large SSI systems, so the odds that there is an nonmunched
cpu to deal with the drop to kdb. The experience of kdb/LKCD may be different for people with
1-4 cpus.

Linux after Appel

Linux Test Guy (not verified)
on
August 4, 2007 - 1:57pm

Your post is making me nervous I must admit. I plan to switch to linux from Apple and I am still testing wheter it is the right thing for me...
Regards Sandro!

Well I've had similar

bezborav@drupal.org
on
March 15, 2008 - 7:32am

Well I've had similar thoughts when I was still in the thinking process what to do.
I'm glad that I went for linux in the end as it worked out just fine, still does. Took me a while to adopt it properly tho. But I guess it's anyones call to decide for themselves. cheers

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.