"There is a tension here between generality of support infrastructure, maintainability of the infrastructure, simplicity of the infrastructure and reliability of the infrastructure," began Eric Biederman, discussing the need for a common RAS infrastructure for dealing with kernel crashes and what would be involved in getting such tools merged into the mainline kernel. He continued, "the historical linux perspective is that anything that compromises the maintainability or the reliability of the kernel without the tools is unacceptable. There is also a historical perspective that using the single stepping mode of a debugger to diagnose problems frequently leads to symptoms being fixed and not the actual problems being fixed."
Eric compared the kexec on panic code and the kdb code, "on the kexec on panic path the philosophy is that the kernel is broken and as little as possible should be relied upon." He contrasted this to kdb, "from what I can tell the philosophy of the kdb code is that the kernel is mostly ok except for one or two little bugs so it is reasonable to rely on lots of kernel infrastructure." He then suggested that it was because of this difference and reduced maintenance overhead that kexec on panic was merged into the mainline kernel, "I will note that in some sense it is a harder approach to implement as it emphasizes the challenge of drivers that work starting from a random hardware state, and because it draws a clear line between the broken kernel and the recover kernel. But those things are exactly what encourage things to work well." As for what is the next step forward in RAS development, Eric noted, "if someone who is suggesting an implementation can absorb and understand the requirements of the different groups and come up with solutions that meet the requirements of the different projects I think progress can be made. That as far as I know takes talent."
Ying Huang continues to work on his
kexec-based hibernation patches. Currently only supporting the i386 architecture, Ying notes, "the setup of hibernation/restore is fairly complex. I will continue working on simplifying." Following up to the latest round of kexec-based hibernation patches posted to the Linux Kernel mailing list it was asked how performance would compare to other hibernation solutions. Ying suggested that with not-yet implemented optimizations it should offer comparable performance:
"In general, for kexec based hibernation, what increases hibernation/wakeup time: One extra Linux boot is needed to hibernate and wakeup. What decreases hibernation/wakeup time: Most hibernation/wakeup work is done in full functional user space program, so it is possible to do some optimization, such as parallel compression.
"So, I think the kexec based hibernation may be slower than original implementation in general. In this prototype implementation, the hibernation/wakeup time is much longer than original hibernation/wakeup implementation. But it has much to be optimized and I think it can approach the speed of the original implementation after optimization."
A recent patch posted to the lkml aimed to make it possible to use both kdb and kdump at the same time, and instead led to an interesting discussion about RAS (Reliability, Availability, and Serviceability) tools. Vivek Goyal compared the two main philosophies, "so basically there are two kind of users. One who believes that despite the kernel [having] crashed something meaningful can be done," versus, "exec on panic, which thinks that once [the] kernel is crashed nothing meaningful can be done". When the discussion focused on kdb, Keith Owens noted:
"The problem above applies to all the RAS tools, not just kdb. My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, crash, kdump etc.) should be using a common interface that safely puts the entire system in a stopped state and saves the state of each cpu. Then each tool can do what it likes, instead of every RAS tool doing its own thing and they all conflict with each other, which is why this thread started."
Andrew Morton summarized the current state of affairs, "lots of different groups, little commonality in their desired funtionality, little interest in sharing infrastructure or concepts." In response to an earlier patch Keith posted to a lesser-trafficked mailing list, Andrew suggested it be resubmitted in a working form for a full review, "much of the onus is upon the various RAS tool developers to demonstrate why it is unsuitable for their use and, hopefully, to explain how it can be fixed for them."
Ying Huang posted a new version of his hibernation patches that utilize kexec noting two changes, "1) the kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed; and 2) the same code path is used for both kexec a new kernel and jump back to original kernel." Andrew Morton noted that he was still interested however didn't intend to merge the patches right away, "I like the idea but I think I'll let people chat about it a bit more before looking at merging the patches, OK?" TuxOnIce maintainer Nigel Cunningham expressed some strong reservations:
"Please wait until you see a complete implementation that actually works. I'm sitting here quietly, following (and now breaking) the 'If you can't say anything positive, don't say anything at all' line because I think that the more into the implementation details people get, the uglier this is going to show itself to be. I'm perfectly willing to be proven wrong, but haven't seen anything so far that's even begun to convince me otherwise."
Offering a potential alternative to the existing suspend and restore implementations in the Linux Kernel, Ying Huang posted a patch utilizing kexec, "kexec based hibernation has some potential advantages over uswsusp and suspend2. " He listed two such potential advantages, "the hibernation image size can exceed half of memory size easily," and, "the hibernation image can be written to and read from almost anywhere, such as a USB disk [or] NFS." He described the feature implemented by his patch as "jumping from a kexeced kernel to the original kernel", allowing someone to first boot from one kernel, then to kexec another crashdump kernel in reserved memory and run from it for a while, and finally to "jump back" to the original kernel.
Andrew Morton replied to the idea very positively, "this sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace?" He went on to ask, "how close do you think all this is to being a viable thing?" Ying replied, "the kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones," going on to list a series of steps that still have to be implemented before kexec based hibernation would be a viable option.
With the release of 2.6.9-mm1, Andrew Morton [interview] offered a quick status update on a number of patches in his -mm tree [forum] that are 2.6-mainline hopefuls. For example, regarding the much debated reiser4 filesystem [story], Andrew said that he is still "not sure, really. The namespace extensions were disabled, although all the code for that is still present. Linus's filesystem criterion used to be 'once lots of people are using it, preferably when vendors are shipping it'. That's a bit of a chicken and egg thing though. Needs more discussion". And as for Ingo Molnar [interview]'s preemption and low-latency fixups [forum] Andrew offered, "I haven't really thought about it and haven't looked at the patches yet. Hopefully 2.6.10 material."
Other projects specifically mentioned include the sysfs backing store, the ext3 reservations code, the ext3 resize code, kexec and crashdump [story], perfctr, cachefs, cpusets, and the md updates. Read on for Andrew's comments and the complete -mm1 changelog.