Keith Owens

Linux: Reliability, Availability. and Serviceability

Submitted by Jeremy
on August 3, 2007 - 11:49am
Linux news

A recent patch posted to the lkml aimed to make it possible to use both kdb and kdump at the same time, and instead led to an interesting discussion about RAS (Reliability, Availability, and Serviceability) tools. Vivek Goyal compared the two main philosophies, "so basically there are two kind of users. One who believes that despite the kernel [having] crashed something meaningful can be done," versus, "exec on panic, which thinks that once [the] kernel is crashed nothing meaningful can be done". When the discussion focused on kdb, Keith Owens noted:

"The problem above applies to all the RAS tools, not just kdb. My stance is that _all_ the RAS tools (kdb, kgdb, nlkd, netdump, lkcd, crash, kdump etc.) should be using a common interface that safely puts the entire system in a stopped state and saves the state of each cpu. Then each tool can do what it likes, instead of every RAS tool doing its own thing and they all conflict with each other, which is why this thread started."

Andrew Morton summarized the current state of affairs, "lots of different groups, little commonality in their desired funtionality, little interest in sharing infrastructure or concepts." In response to an earlier patch Keith posted to a lesser-trafficked mailing list, Andrew suggested it be resubmitted in a working form for a full review, "much of the onus is upon the various RAS tool developers to demonstrate why it is unsuitable for their use and, hopefully, to explain how it can be fixed for them."

Linux: dumpfs, Common RAS Output API

Submitted by Jeremy
on July 23, 2004 - 5:41am
Linux news

Keith Owens [interview] announced the availablitiy of dumpfs, "a common API for all the RAS code that wants to save data during a kernel failure and to extract that RAS data on the next boot." RAS stands for "Reliability, Availability & Serviceability". The proposed API, currently at version 0.01, is proof of concept, and a work in progress. Keith explains:

"dumpfs-v0.01 handles mounting the dumpfs partitions, including reliable sharing with swap partitions and clearing the dumpfs partitions. I am working on the code that reads and writes dumpfs data from kernel space, it is incomplete and has not been tested yet. After dumpfs_kernel is working, dumpfs_user is trivial. The code is proof of concept, some sections of the API (including polled I/O and data compression) are not supported yet, and some of the code is ugly."

Read on for a short FAQ about the proposed common dump API, as well as the current complete documentation detailing what it is and how it works.

Feature: Debugging With The New Linux Module Loader

Submitted by Jeremy
on November 24, 2002 - 4:02pm
Linux feature article

Rusty Russell's new module loader was recently merged into Linus' 2.5 kernel tree [story]. This new implementation aims to cleanup and reduce the amount of code in the kernel and user space required to load a kernel module. Additionally, it now removes the requirement that kernel and user space code for modutils have to be in sync.

Linux: kdb vs. kgdb

Submitted by Jeremy
on March 30, 2002 - 11:47am
Linux news

Jeremy Jackson asked "which kernel debugger is 'best'?" on the Linux Kernel Mailing List.