A recent patch posted to the lkml aimed to make it possible to use both kdb and kdump at the same time, and instead led to an interesting discussion about RAS (Reliability, Availability, and Serviceability) tools. Vivek Goyal compared the two main philosophies, "so basically there are two kind of users. One who believes that despite the kernel [having] crashed something meaningful can be done," versus, "exec on panic, which thinks that once [the] kernel is crashed nothing meaningful can be done". When the discussion focused on kdb, Keith Owens noted:
"The problem above applies to all the RAS tools, not just kdb. My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, crash, kdump etc.) should be using a common interface that safely puts the entire system in a stopped state and saves the state of each cpu. Then each tool can do what it likes, instead of every RAS tool doing its own thing and they all conflict with each other, which is why this thread started."
Andrew Morton summarized the current state of affairs, "lots of different groups, little commonality in their desired funtionality, little interest in sharing infrastructure or concepts." In response to an earlier patch Keith posted to a lesser-trafficked mailing list, Andrew suggested it be resubmitted in a working form for a full review, "much of the onus is upon the various RAS tool developers to demonstrate why it is unsuitable for their use and, hopefully, to explain how it can be fixed for them."
Offering a potential alternative to the existing suspend and restore implementations in the Linux Kernel, Ying Huang posted a patch utilizing kexec, "kexec based hibernation has some potential advantages over uswsusp and suspend2. " He listed two such potential advantages, "the hibernation image size can exceed half of memory size easily," and, "the hibernation image can be written to and read from almost anywhere, such as a USB disk [or] NFS." He described the feature implemented by his patch as "jumping from a kexeced kernel to the original kernel", allowing someone to first boot from one kernel, then to kexec another crashdump kernel in reserved memory and run from it for a while, and finally to "jump back" to the original kernel.
Andrew Morton replied to the idea very positively, "this sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace?" He went on to ask, "how close do you think all this is to being a viable thing?" Ying replied, "the kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones," going on to list a series of steps that still have to be implemented before kexec based hibernation would be a viable option.