Ralf Baechle posted the Linux/MIPS architecture merge plans for the upcoming 2.6.24 kernel. The diffstat for all changes showed, "435 files changed, 14274 insertions(+), 10196 deletions(-)", about which Ralf noted, "the number of patch lines and files is inflated by two large whitespace cleanup patches." He continued:
"The biggest actual changes are the support for tickless kernels on MIPS and the rewrite for many of the timer devices previously used as clocksources and clockevents. Various cleanups, including some moving of code and support for 32-bit Broadcom BCM47XX processors, the return of support for LASAT which isn't quite as unused as previously thought."
When a Linux user reported a repeatedly high load average on an idle server, tracking the problem to a specific patch labeled, "user of the jiffies rounding code", Andrew Morton replied, "this is unexpected. High load average is due to either a task chewing a lot of CPU time or a task stuck in uninterruptible sleep." Linus Torvalds disagreed, explaining:
"We saw high loadaverages with the timer bogosity with 'gettimeofday()' and 'select()' not agreeing, so they would do things like
'date = time(..); select(.. , timeout = );'and when 'date' wasn't taking the jiffies offset into account, and thus mixing these kinds of different time sources, the select ended up returning immediately because they effectively used different clocks, and suddenly we had some applications chewing up 30% CPU time, because they were in a loop that *tried* to sleep."
Linus offered what he described as an "idiotic patch" to cause the load average to not be calculated exactly once every 5 seconds to prevent it from being in sync with something else waking up every 5 seconds, noting, "the load average is not calculated every tick, because that's not just expensive, but we also want to have some time-based decay." Arjan van de Ven pointed out that this shouldn't help, "I mean, the load gets only updated in actual timer interrupts... and on a tickless system there's very few of those around..... and usually at places round_jiffies() already put a timer on." Linus agreed with this reasoning, suggesting, "maybe Anders' problem stems partly from the fact that he really is using the tweaks to make that tickless theory more true than it tends to be on most systems?" Arjan pointed out that a lot of work has been successful in making tickless kernels wake up less, "we fixed a TON of stuff over the last months.. standard desktops (F8 / next Ubuntu) will be around 10 wakeups/sec, in a lab environment you can get below 2 ;)"
"It took me quite a while to realize the real root cause of the VAIO - and probably many other machines - suspend/resume regressions, which were unearthed by the dyntick / clockevents patches," Thomas Gleixner explained regarding two patches for fixing suspend issues that Andrew Morton experienced with his VAIO laptop. He continued, "we disable a lot of ACPI/BIOS functionality during suspend, but we keep the lower idle C-states functionality active across suspend/resume. It seems that this causes trouble with certain BIOSes, but I assume that the problem is more wide spread and just not surfacing due to the various scenarios in which a machine goes into suspend/resume." Thomas concluded, "I really hope that this two patches finally set an end to the 'jinxed VAIO heisenbug series', which started when we removed the periodic tick with the clockevents/dyntick patches."
Linus Torvalds expressed some concerns, "the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the '*why* does this happen?' question." He agreed that at that point there was a problem with ACPI, but cautioned that this could be triggered by another bug, "in particular, I also suspect that this may not really fix the problem - maybe it just makes the window sufficiently small that it no longer triggers. Because we don't necessarily understand what the real background for the problem is, I'm not sure we can say that it is solved." Linus concluded, "but hey, I think I'll apply the patches as-is. I'd just feel even better if we actually understood *why* doing the CPU Cx states is not something we can do around the suspend code!"
"Intel's Open Source Technology Center is pleased to announce the LessWatts.org project, an open source project for saving power on Linux," began an email posted to the lkml by Arjan van de Ven. The announcement continued:
"LessWatts.org is a place to bring users, developers and distribution makers together around power reduction for linux machines, from mobile to desktop to server to datacenter. LessWatts.org is about a system-level approach to power savings, from the lowest level device drivers in the kernel to the most advanced desktop applications. LessWatts.org is about things you can do to reduce power usage. LessWatts.org is about longer battery life, a lower airconditioning bill, about reducing the impact of computers on the environment."
The announcement went on to note, "at this time of launching the LessWatts.org project, the technology development projects are those that Intel has started, is involved in or has just started working on, such as PowerTOP, Tickless Idle, Graphics and various link power management techniques. We'd like to invite all developers and projects that focus on power saving to join the LessWatts.org effort and community."
Included in Andrew Morton's potential 2.6.23 merge list [story] were a series of patches to make the x86-64 architecture tickless. Andi Kleen, the x86-64 maintainer replied, "I'm sceptical about the dynticks code. It just rips out the x86-64 timing code completely, which needs a lot more review and testing. Probably not .23." Linus Torvalds agreed, "we are *not* going to do another 'rip everything out, and replace it with new code' again. Over my dead body. We're going to do this thing gradually, or not at all." He went on to explain "the patch-set itself actually looks fine, as far as I'm concerned. But it does seem to have that 'enable everything in one go' problem. I'd much rather see one time source at a time being converted, and enabled then and there, so that when people report problems and do a bisection, if it was HPET that broke, you get the commit that changed HPET."
In response to the pains caused by the original dyntick merge in 2.6.21, Ingo Molnar acknowledged, "we had 12 -hrt/dynticks merge related regressions between 2.6.21-rc1 and -final, and 4 after final." He went on to point out, "it's all pretty quiet today on the dynticks regressions front. (there are no open regressions in either the upstream i386 code or in the devel patches we are aware of)." As to the source of the bugs, he explained, "the majority of the above bugs were in the infrastructure code. (the worst was the generic resume/suspend one fixed in 126.96.36.199) And sadly, a fair number of the infrastructure bugs we introduced during the frentic clockevents/dynticks rewrites/redesigns we did between .20 and .21. That was a royally stupid mistake for us to do - instead of patiently waiting for the bugs to be shaken out we destabilized the infrastructure. (it was a 'lets make this thing so nice that it's impossible to reject' instintic gut reaction.)" Linus replied, "one thing I'll happily talk about is that while 2.6.21 was painful, you and Thomas in particular were both very responsible about the thing, so no, I'm not at all complaining or worried about it in that sense! I just really _really_ wish we could have two fairly stable releases in a row. I think 2.6.22 has the potential to be a pretty good setup, and I'd really like to avoid having another 2.6.21 immediately afterwards."
"With all the tickless [story] and other goodies going into the kernel in the last few months, there is a lot of hope that this helps Linux reduce power consumption," Arjan van de Ven began on the lkml, "and the good news is that it does... once you fix some bugs and fix a bunch of userspace applications." He referred to a promising graph generated utilizing the recently introduced PowerTOP utility [story], measuring power consumption before and after applying a series of related bug fixes.
The tests began with a Lenovo T61 laptop running the stock 32-bit Fedora 7 kernel which includes the tickless kernel. This was compared against the stock 2.6.22-rc4 kernel with a series of improvements including a fix for the Ondemand CPUFREQ governor, the new CPUIDLE infrastructure, the Active Link Power Management patch, disabling the laptop's TV-out capability, and using a cli utility to properly reduce the laptop's backlight. Arjan summarizes, "with kernel fixes and features, the power consumption of this laptop went from 21.06 Watts to 18.25 Watts; with 2 additional userspace fixes the power consumption ended up at 15.5 Watts."
Linux creator Linus Torvalds announced the release of the 2.6.21 kernel, "if the goal for 2.6.20 was to be a stable release (and it was), the goal for 2.6.21 is to have just survived the big timer-related changes and some of the other surprises (just as an example: we were apparently unlucky enough to hit what looks like a previously unknown hardware errata in one of the ethernet drivers that got updated etc)." Regarding the the dynticks code which was merged in -rc1 [story] he said, "the big change during 2.6.21 is all the timer changes to support a tickless system (and even with ticks, more varied time sources). Thanks (when it no longer broke for lots of people ;) go to Thomas Gleixner and Ingo Molnar and a cadre of testers and coders." He went on to note, "of course, the timer stuff was just the most painful and core part (and thus the one that I remember most): there's a lot of changes all over. The appended changelog is just for the fixes since -rc7, so that doesn't look very impressive, the full changes since 2.6.20 are obviously a *lot* bigger (and you're better off reading the individual -rc changelogs)." Linus finished with a running joke about the many debates centered around current CPU scheduler efforts [story], quipping, "we now return you to your regular scheduler discussions".
Linus Torvalds announced the first release candidate for the upcoming 2.6.21 kernel, ending the two-week merge window [story], "there's a lot of changes, as is usual for an -rc1 thing, but at least so far it would seem that 2.6.20 has been a good base, and I don't think we have anything *really* scary here." Linus noted that the tickless kernel patch [story] was finally merged into the mainline kernel, "the most interesting core change may be the dyntick/nohz one, where timer ticks will only happen when needed. It's been brewing for a _loong_ time, but it's in the standard kernel now as an option." Thomas Gleixner explained a year ago how this could result in cooler CPUs and power savings, "the tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer interrupts: if there is no timer to be expired for say 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds."
As for the rest of the changes, Linus added, "there's a ton of architecture updates (arm, mips, powerpc, x86, you name it), ACPI updates, and lots of driver work. And just a lot of cleanups." Release candidate kernels can be downloaded from your nearest kernel.org mirror. You can browse through all the changes using the gitweb interface. Kernel Newbiews maintains a useful summary of all the changes going into the latest version of the Linux kernel.
Avi Kivity suggested that combining KVM, the Kernel-based Virtual Machine [story], with the dyntick patch [story] could improve overall KVM performance. He noted that it would likely improve performance of both the host by "avoiding expensive vmexits due to useless timer interrupts," as well as on the guest by "reducing the load on the host when the guest is idling (currently an idle guest consumes a few percent cpu)". Ingo Molnar [interview] pointed out that KVM with his -rt kernel already works with dynticks enabled on both the host and the guest, "using the dynticks code from the -rt kernel makes the overhead of an idle guest go down by a factor of 10-15". Ingo added that he hopes the dyntick patch will be ready to be merged into the upcoming mainline 2.6.21 kernel.
Rik van Riel [interview] noted that there were other ways to reduce the load of the guest when it's idling, "you do not need dynticks for this actually. Simple no-tick-on-idle like Xen has works well enough." Ingo explained, "s390 (and more recently Xen too) uses a next_timer_interrupt() based method to stop the guest tick - which works in terms of reducing guest load, but it doesnt stop the host-side interrupt. The highest quality approach is to have dynticks on both the host and the guest, and this also gives high-resolution timers and a modernized time/timer-events subsystem for both the host and the guest."
Thomas Gleixner and Ingo Molnar [interview] posted an update of their high-res timers kernel patches for the 2.6.17 kernel, "upon which we based a tickless kernel (dyntick) implementation and a 'dynamic HZ' feature as well". The patch currently works for x86, with ports to x86_64, PPC and ARM in the works. Thomas explains, "the high-res timers feature (CONFIG_HIGH_RES_TIMERS) enables POSIX timers and nanosleep() to be as accurate as the hardware allows (around 1usec on typical hardware). This feature is transparent - if enabled it just makes these timers much more accurate than the current HZ resolution." He goes on to discribe the tickless kernel:
"The tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer interrupts: if there is no timer to be expired for say 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds. This should bring cooler CPUs and power savings: on our (x86) testboxes we have measured the effective IRQ rate to go from HZ to 1-2 timer interrupts per second.
"This feature is implemented by driving 'low res timer wheel' processing via special per-CPU high-res timers, which timers are reprogrammed to the next-low-res-timer-expires interval. This tickless-kernel design is SMP-safe in a natural way and has been developed on SMP systems from the beginning."