Recently, a lot of work has gone into the 2.5 development kernel to facilitate better debugging. Starting with the 2.5.39 kernel, an infrastructure is in place for tracking down a wide range of atomicity/sleep bugs.
For example, a task in the kernel cannot sleep if it is atomic (by definition). By atomic we mean a number of things: the task holds a spinlock, holds the BKL, or has explicitly disabled preemption. Further, interrupt handlers are not schedulable so they too are atomic. In other words, not willing to be scheduled.
Obviously it is illegal to sleep while atomic. If you hold a lock and reschedule, it is possible for the newly scheduled task to attempt to acquire the same lock and - boom - deadlock. Note even uniprocessor systems are potential victims here: if the lock is protecting a critical region, the rescheduled task can enter the critical region and mangle the unprotected data. Finally, since interrupt handlers do not have a process context, it is imminent doom if we try to reschedule as one. In short, a wide array of locking and sleeping bugs can be caught if we have the infrastructure to (a) tell if we are atomic and (b) perform that check in key places.
Well, we have both (a) and (b) these days. The former came about via two 2.5 changes. First, kernel preemption, which introduced a general atomicity counter, preempt_count, and infrastructure for tracking atomicity via the count. The second change, the global IRQ rewrite, removed the global IRQ lock and folded the IRQ counter and bottom half counter into the preempt_count. These resulted in the preempt_count becoming an accurate per-task check for "just how atomic (or not) are we?".
The later change came about recently, primarily with the new debugging checks in do_exit() and schedule() and more recently the might_sleep() method for warning about potential unsafe sleeping. Ultimately, we can place debugging checks in places where we know the kernel should or should not be atomic. If our assumptions are wrong, we can print some debugging info and track down the offender.
One more infinitely useful change occurred: the merging of kkallsyms. This patch allows in-kernel symbol resolving. This eliminates the need for the user-space ksymoops to be run on oopses and allows kernel stack traces to contain fully resolved symbols (i.e. "request_irq()" instead of "c0108500").
To fully benefit from these checks, you need to enable three configure options:
CONFIG_PREEMPT=y
CONFIG_DEBUG_KERNEL=y
CONFIG_KALLSYMS=y
The first enables the preemptive kernel which makes full use of the atomicity counter. Without kernel preemption enabled, only interrupts and bottom halves are counted in the counter -- you lose the most valuable statistic, the lock count. The second configure option turns on the might_sleep() check. Without it, you miss a lot of the most useful checking. Finally, the last option nullifies the need to run ksymoops on your stack traces before reporting them. They are printed pretty and perfect - ready to post.
While you are at it, you may want to enable some of the other debugging options. Two of the most useful are:
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_SPINLOCK=y
The first enables slab cache debugging which, among other things, provides memory poisoning on free. Due to some recent changes, it does provide a bit of an impact on performance, however. The second option enables spinlock debugging (SMP kernel-only). This checks for improper lock usage like unlock-before-lock, double lock, double unlock, and use of uninitialized lock.
So compile it up and boot into it. Play around. 2.5 is fast, eh?
Anyhow, you will no doubt see some debugging messages and stack traces during boot. The Linux Kernel Mailing List (lkml) has most likely already seen these. I suggest you search the archives before posting. If you see errors during regular use, those are of the greatest importance.
There are three main debugging messages that pertain to atomicity issues; they are:
"scheduling while atomic" - schedule() was called while the current task was atomic. This is always a bug and almost always a problem. For example, the current task could hold a lock which is a deadlock and potential race issue. Or, we may be scheduling an interrupt handler - in which case the system will die.
"Sleeping function called from illegal context at file.c:line" - this is the might_sleep() check. A function that sleeps, like kmalloc(), down(), or __alloc_pages() was called while atomic. Bad news.
"note: task[pid] exited with preempt_count n" - a task exited while atomic. Starting with 2.5.39, this should never occur and is a bug. Earlier 2.5 revisions reported this if the BKL was held. Since the BKL is released on schedule(), it is often OK to exit while holding it.
Each is followed by a stack trace. Report the message, the stack trace (decoded via ksymoops if not using kkallsyms), and any other pertinent data (i.e. "I can reproduce it by doing _____") to the lkml or the maintainer of the code that is clearly doing the Wrong Thing.
Happy hacking.
(c)2002 Robert Love
Good job
I just wanted to say how much I appreciate your effort
in writing this informative article. The fact that no one
made any comments doesn't mean it wasn't read and
valued!
Duncan.
Re: Good job
I was a bit saddened by the lack of discussion on the article but I am very glad to hear you enjoyed it. What really matters is that it encouraged at least someone to try out 2.5 and help us develop a better kernel.
Thank you for your words.
- Robert Love
I tried it
The problem was, I had no issues ;-)