Jeff Garzik posted a series of nine patchs to the lkml titled to "remove [the] 'irq' argument from all irq handlers", explaining, "the overwhelming majority of drivers do not ever bother with the 'irq' argument that is passed to each driver's irq handler. Of the minority of drivers that do use the arg, the majority of those have the irq number stored in their private-info structure somewhere." He noted that he had no intention to push the patches upstream anytime soon.
Feedback was entirely positive, with Thomas Gleixner suggesting, "Full ACK. We should do this right at the edge of -rc1. And let's do this right now in .24 and not drag it out for no good reason." Ingo Molnar concurred, "full ACK on the concept from me too. Please go ahead! :)" Eric Biederman noted that there was still work to be done, "the practical question is how do we make this change without breaking the drivers that use their irq argument." Jeff agreed, explaining why the code won't be pushed upstream during -rc1, "I am finding a ton of bugs in each get_irqfunc_irq() driver, so I would rather patiently sift through them, and push fixes and cleanups upstream. Once that effort is done, everything should be in the 'trivial' pile and not have the logic that you are worried about (and thus there would be no need to add an additional branch to the irq handling path)."
"There is a tension here between generality of support infrastructure, maintainability of the infrastructure, simplicity of the infrastructure and reliability of the infrastructure," began Eric Biederman, discussing the need for a common RAS infrastructure for dealing with kernel crashes and what would be involved in getting such tools merged into the mainline kernel. He continued, "the historical linux perspective is that anything that compromises the maintainability or the reliability of the kernel without the tools is unacceptable. There is also a historical perspective that using the single stepping mode of a debugger to diagnose problems frequently leads to symptoms being fixed and not the actual problems being fixed."
Eric compared the kexec on panic code and the kdb code, "on the kexec on panic path the philosophy is that the kernel is broken and as little as possible should be relied upon." He contrasted this to kdb, "from what I can tell the philosophy of the kdb code is that the kernel is mostly ok except for one or two little bugs so it is reasonable to rely on lots of kernel infrastructure." He then suggested that it was because of this difference and reduced maintenance overhead that kexec on panic was merged into the mainline kernel, "I will note that in some sense it is a harder approach to implement as it emphasizes the challenge of drivers that work starting from a random hardware state, and because it draws a clear line between the broken kernel and the recover kernel. But those things are exactly what encourage things to work well." As for what is the next step forward in RAS development, Eric noted, "if someone who is suggesting an implementation can absorb and understand the requirements of the different groups and come up with solutions that meet the requirements of the different projects I think progress can be made. That as far as I know takes talent."