Following the release of the 2.6.21 kernel [story] Andrew Morton [interview] posted a list of patches in his -mm kernel, summarizing for each his plans as to whether or not they wil be pushed upstream for inclusion in the upcoming 2.6.22 kernel. He noted, "the overall stability in recent -mm's was not sufficiently high and we ran out of time to find all the bugs. I shouldn't have merged all those patches last week - they contained an exceptional amount of garbage. This all means that more bugs than usual will probably leak into mainline, and we'll have to fix them there." He went on to add, "I've been ducking most non-bugfix patches recently. I have ~200 feature and cleanup patches queued for later consideration, so people who sent those will be hearing from me eventually."
Theodore Ts'o posted an update on the ext4 filesystem [story], "I've respun the ext4 development patchset, with Amit's updated fallocate patches. I've added Dave's patch to add ia64 support to the fallocate system call, but *not* the XFS fallocate support patches. (Probably better for them to live in an xfs tree, where they can more easily tested and updated.) Yes, we haven't reached complete closure on the fallocate system call calling convention, but it's enough for us to get more testing in -mm." Jeff Garzik noted that none of this development was happening in the kernel as originally planned, "why isn't this stuff going upstream rapidly? AFAICT nothing much at all has happened upstream besides a mass renaming? The whole point of having ext4 in the kernel is to do development upstream, in the public view, getting new stuff in ASAP (even if that means changing or pulling some stuff later)."
Theodore acknowledged, "in general, yes, ext4 development has been a little slow; part of the problem is that we have a lot of people, but a number of folks are new and their patches need review before they are ready for upstream acceptance, and a number of other folks who should be doing the review have been overloaded with multiple other projects and have been time-sharing." He went on to note, "but we also get flamed when the patches don't meet various criteria, up to and including breaking on ia64. We are in the process of setting up automated testing to help address that problem, but it's a taken a little while to get that going. I'm also trying to schedule more time so I can do the needed review of the patches so they meet basic upstream standards so we *can* push them. If other folks would like to help with the review process, that would be more than welcome. But yes, we will try to get more of the patches pushed sooner rather than later."
Linux creator Linus Torvalds announced the release of the 2.6.21 kernel, "if the goal for 2.6.20 was to be a stable release (and it was), the goal for 2.6.21 is to have just survived the big timer-related changes and some of the other surprises (just as an example: we were apparently unlucky enough to hit what looks like a previously unknown hardware errata in one of the ethernet drivers that got updated etc)." Regarding the the dynticks code which was merged in -rc1 [story] he said, "the big change during 2.6.21 is all the timer changes to support a tickless system (and even with ticks, more varied time sources). Thanks (when it no longer broke for lots of people ;) go to Thomas Gleixner and Ingo Molnar and a cadre of testers and coders." He went on to note, "of course, the timer stuff was just the most painful and core part (and thus the one that I remember most): there's a lot of changes all over. The appended changelog is just for the fixes since -rc7, so that doesn't look very impressive, the full changes since 2.6.20 are obviously a *lot* bigger (and you're better off reading the individual -rc changelogs)." Linus finished with a running joke about the many debates centered around current CPU scheduler efforts [story], quipping, "we now return you to your regular scheduler discussions".
The future of Reiser4 was raised on the lkml, with the filesystem's creator, Hans Reiser [interview], awaiting his May 7'th trial [story]. Concerns that the filesystem wasn't being maintained were laid to rest when Andrew Morton [interview] stated, "the namesys engineers continue to maintain reiser4 and I continue to receive patches for it." He further added, "the namesys guys are responsive and play well with others." As to why the filesystem hasn't yet been merged into the 2.6 kernel, Andrew explained, "to get it unstuck we'd need a general push, get people looking at and testing the code, get the vendors to have a serious think about it, etc. We could do that - it'd require that the namesys people (and I) start making threatening noises about merging it, I guess." He then made joking reference to the recent debate regarding the new CPU schedulers [story], "or we could move all the reiser4 code into kernel/sched.c - that seems to get people fired up."
Namesys developer and author of the Reiser4 encryption and compression plugins, Edward Shiskin, offered some updates. Replying to some comments about the need to remove plugins from the Reiser4 code he explained, "the popular opinion that plugins make more sense in the VFS is a great delusion, as plugins are entities related to reiser4 disk layouts." In an earlier thread it had been suggested that the plugins were misnamed and would be better called an internal abstraction layer [story]. Edward went on to note, "currently there are two namesys employees working [on Reiser4] mostly on enthusiasm." He linked to a wiki page listing known issues with the code needing to be fixed before it's likely to be merged into the 2.6 kernel, "the main issues here are xattrs and support for blocksize != pagesize. I think that adding xattrs will take ~1 month of full-time working. Not sure about blocksize support." When it was noted that other filesystems have already been merged without support for either of these features, Edward said that they'd lower their priority and finish up with the other remaining issues left on the old todo list and resume the merge discussion at that time.
Avi Kivity is the lead developer and maintainer of the Kernel-based Virtual Machine project, better known as kvm. The project was started in mid-2006, and has been part of the Linux kernel since the 2.6.20 release in February of 2007. kvm is a full virtualization system for x86-based Linux hosts, allowing users to run isolated x86 guest operating systems in virtual machines.
A recent discussion on the lkml examined the possibility of a Linux implementation of Sun's ZFS. It was pointed out that the file system is released under the GPL-incompatible CDDL, and that Sun has filed numerous patents to prevent ZFS from being reverse engineered. Max Yudin pointed out, "according to Jeff Bonwick's blog Sun issued 56 patents on ZFS, but I have no idea what they patented. Sorry, binary compatible ZFS reimplementation with GPL license might not be legal." David Litwin noted that he had been told by a ZFS developer to talk to Linux developers to see about getting non-GPL'd code included with the kernel. Theodore T'so replied, "that was totally useless answer from the ZFS developers. What he should have told you is to contact Sun management, since they are the only ones who can decide whether or not to release ZFS under a GPL license, and more importantly, to give a patent license for any patents they may have filed in the course of developing ZFS."
Alan Cox [interview] suggested, "the real test of whether Sun were serious about ZFS being anywhere but Solaris is what they do to license it - they've patented everything they can, and made the code available only under licenses incompatible with other OS products. Their intent is quite clear, and quite sad. Compare it to what the old Sun company did with NFS, which is now a standard used everywhere." Theodore T'so added, "given that Sun has reportedly filed a huge number of patents covering ZFS and has refused to make them available for anything other than Solaris --- and there are senior Sun programmers who have on record stated that one of the reasons why Sun picked the CDDL was precisely because it was incompatible with GPL and Sun fears Linux ---- I wouldn't bet on Sun being willing to making a patent license available to a hypothetical alternate implementation of the ZFS format for Linux." He went on to note, "of course, this is all open source. If someone wants to work on reimplementing ZFS from scratch, either in userspace or in the kernel, certainly the Linux community won't stop them. Given the patent issues Linus might not feel comfortable including it in the mainline sources without a promise from Sun that they won't sue the pants off of him and The Linux Foundation, but again, that's Sun's decision, and no one else can help you there."
Ingo Molnar [interview] released a new patchset titled the "Modular Scheduler Core and Completely Fair Scheduler". He explained, "this project is a complete rewrite of the Linux task scheduler. My goal is to address various feature requests and to fix deficiencies in the vanilla scheduler that were suggested/found in the past few years, both for desktop scheduling and for server scheduling workloads." The patchset introduces Scheduling Classes, "an extensible hierarchy of scheduler modules. These modules encapsulate scheduling policy details and are handled by the scheduler core without the core code assuming about them too much." It also includes sched_fair.c with an implementation of the CFS desktop scheduler, "a replacement for the vanilla scheduler's SCHED_OTHER interactivity code," about which Ingo noted, "I'd like to give credit to Con Kolivas [interview] for the general approach here: he has proven via RSDL/SD that 'fair scheduling' is possible and that it results in better desktop scheduling. Kudos Con!"
Regarding the actual implementation, Ingo explained, "CFS's design is quite radical: it does not use runqueues, it uses a time-ordered rbtree to build a 'timeline' of future task execution, and thus has no 'array switch' artifacts (by which both the vanilla scheduler and RSDL/SD are affected). CFS uses nanosecond granularity accounting and does not rely on any jiffies or other HZ detail. Thus the CFS scheduler has no notion of 'timeslices' and has no heuristics whatsoever. There is only one central tunable, /proc/sys/kernel/sched_granularity_ns, which can be used to tune the scheduler from 'desktop' (low latencies) to 'server' (good batching) workloads." He went on to note, "due to its design, the CFS scheduler is not prone to any of the 'attacks' that exist today against the heuristics of the stock scheduler".
Linus Torvalds announced the first release candidate for the upcoming 2.6.21 kernel, ending the two-week merge window [story], "there's a lot of changes, as is usual for an -rc1 thing, but at least so far it would seem that 2.6.20 has been a good base, and I don't think we have anything *really* scary here." Linus noted that the tickless kernel patch [story] was finally merged into the mainline kernel, "the most interesting core change may be the dyntick/nohz one, where timer ticks will only happen when needed. It's been brewing for a _loong_ time, but it's in the standard kernel now as an option." Thomas Gleixner explained a year ago how this could result in cooler CPUs and power savings, "the tickless kernel feature (CONFIG_NO_HZ) enables 'on-demand' timer interrupts: if there is no timer to be expired for say 1.5 seconds when the system goes idle, then the system will stay totally idle for 1.5 seconds."
As for the rest of the changes, Linus added, "there's a ton of architecture updates (arm, mips, powerpc, x86, you name it), ACPI updates, and lots of driver work. And just a lot of cleanups." Release candidate kernels can be downloaded from your nearest kernel.org mirror. You can browse through all the changes using the gitweb interface. Kernel Newbiews maintains a useful summary of all the changes going into the latest version of the Linux kernel.
Ingo Molnar [interview] posted a second version of his syslets subystem patch set, which offers asynchrous system call support [story]. He noted that the effort is a work in progress, and that there are still outstanding issues to be fixed, "the biggest conceptual change in v2 is the ability of cachemiss threads to be turned into user threads. This fixes signal handling, makes them ptrace-eable, etc," going on to list numerous fixes since the first release. He noted that prior to releasing a third version of the patch set he will add support for multiple completion rings, add logic to share the 'spare thread' between the rings to further reduce startup costs, and remove reliance on mlock().
Linus Torvalds commented, "I'm still not a huge fan of the user space interface, but at least the core code looks quite clean. No objections on that front." He referred to earlier comments in which he had reacted strongly to the syslets userland interface saying, "I dislike it intensely, because it's so _close_ to being usable. But the programming interface looks absolutely horrid for any 'casual' use, and while the loops etc look like fun, I think they are likely to be less than useful in practice. Yeah, you can do the 'setup and teardown' just once, but it ends up being 'once per user', and it ends up being a lot of stuff to do for somebody who wants to just do some simple async stuff." He later noted that he was in particular concerned with the "register" functionality, which Ingo then simplified.
Rik van Riel [interview] posted some thoughts on the page replacement requirements of the Linux VM, noting that the same kinds of bugs have been getting fixed and reintroduced over the past few years, "this has convinced me that it is time to take a look at the actual requirements of a page replacement mechanism, so we can try to fix things without reintroducing other bugs. Understanding what is going on should also help us deal better with really large memory systems." He added his thoughts from this email to the linux-mm wiki, which he plans to update as new requirements surface.
The initial requirements shortlist included seven items: "1) must select good pages for eviction; must not submit too much I/O at once. Submitting too much I/O at once can kill latency and even lead to deadlocks when bounce buffers (highmem) are involved. Note that submitting sequential I/O is a good thing; 2) must be able to efficiently evict the pages on which pageout I/O completed; 3) must be able to deal with multiple memory zones efficiently; 4) must always have some pages ready to evict. Scanning 32GB of "recently referenced" memory is not an option when memory gets tight; 5) must be able to process pages in batches, to reduce SMP lock contention; 6) a bad decision should have bounded consequences. The VM needs to be resilient against its own heuristics going bad; 7) low overhead of execution." He continued on with some more in depth discussion of the various requirements.
Following the release of the 2.6.20 kernel [story] Andrew Morton [interview] posted a list of patches in his -mm kernel, summarizing for each his plans as to whether or not they will be pushed upstream for inclusion in the upcoming 2.6.21 kernel. Andrew commented, "I'm getting fed up of holding onto hundreds of patches against subsystem trees, sending them over and over again and seeing nothing happen. I sent 242 patches out to subsystem maintainers on Monday and look at what's still here." In response to some confusion as to what happens to these patches, he went on explain, "once a subsystem has a subsystem tree (git or quilt) I basically never merge anything which belongs to that tree. It's always originator->mm->subsystemtree->Linus".
Linux creator Linus Torvalds announced the release of the 2.6.20 kernel, summarizing, "a lot of stuff. All over. And KVM." He further noted, "I tried rather hard to make 2.6.20 largely a 'stabilization release'. Unlike a lot of kernels lately, there aren't really any big fundamental changes to some core infrastructure area, and while we always have bugs, I really am hoping that we fixed many more than we introduced." His announcement started with a news parody, "in a widely anticipated move, Linux 'headcase' Torvalds today announced the immediate availability of the most advanced Linux kernel to date, version 2.6.20." Linus continued:
"As ICD head analyst Walter Dickweed put it: "Releasing a new kernel on Superbowl Sunday means that the important 'pasty white nerd' constituency finally has something to do while the rest of the country sits comatose in front of their 65" plasma screens."
"Walter was immediately attacked for his racist and insensitive remarks by Geeks without Borders representative Marilyn vos Savant, who pointed out that not all of their members are either pasty nor white. "Some of them even shower!" she added, claiming that the constant stereotyping hurts nerds' standing in society.
Geeks outside the US were just confused about the whole issue, and were heard wondering what the big hoopla was all about. Some of the more culturally aware of them were heard snickering about balls that weren't even round.
Jens Axboe has been involved with Linux since 1993. 30 years old, he lives in Copenhagen, Denmark, and works as a Linux Kernel developer for Oracle. His block layer rewrite launched the 2.5 kernel development branch, a layer he continues to maintain and improve. Interested in most anything dealing with IO, he has introduced several new IO schedulers to the kernel, including the default CFQ, or Complete Fair Queuing scheduler.
In this interview, Jens talks about how he got interested in Linux, how he became the maintainer of the block layer and other block devices, and what's involved in being a maintainer. He describes his work on IO schedulers, offering an indepth look at the design and current status of the CFQ scheduler, including a peek at what's in store for the future. He conveys his excitement about the new splice IO model, explaining how it came about and how it works. And he discusses the current 2.6 kernel development process, the impact of git, and why the GPL is important to him.
"The Linux kernel community is offering all companies free Linux driver development," Greg Kroah-Hartman posted in an open offer on the lkml, for all types of devices "from USB toys to PCI video devices to high-speed networking cards." He explains, "all that is needed is some kind of specification that describes how your device works, or the email address of an engineer that is willing to answer questions every once in a while. A few sample devices might be good to have so that debugging doesn't have to be done by email, but if necessary, that can be done." He added, "if your company is worried about NDA issues surrounding your device's specifications, we have arranged a program with OSDL/TLF's Tech Board to provide the legal framework where a company can interact with a member of the kernel community in order to properly assure that all needed NDA requirements are fulfilled." Greg suggests that companies participating can allow their developers to focus on drivers for other operating systems, "and you can add 'supported on Linux' to your product's marketing material." He further explains:
"You will receive a complete and working Linux driver that is added to the main Linux kernel source tree. The driver will be written by some of the members of the Linux kernel developer community (over 1500 strong and growing). This driver will then be automatically included in all Linux distributions, including the 'enterprise ones. It will be automatically kept up to date and working through all Linux kernel API changes. This driver will work with all of the different CPU types supported by Linux (for the CPUs that support the bus types that your device works on), the largest number of CPU types supported by any operating system ever before in the history of computing."