Hans Reiser [interview] described a recently posted patch as, "it revises the existing reiser4 code to do a good job for writes that are larger than 4k at a time by assiduously adhering to the principle that things that need to be done once per write should be done once per write, not once per 4k." He went on to explain, "this code empirically proves that the generic code design which passes 4k at a time to the underlying FS can be improved. Performance results show that the new code consumes 40% less CPU when doing 'dd bs=1MB .....'" Referring to
generic_file_write(), he further noted that currently when writing 64MB of data, "it may go to the kernel as a 64MB write, but VFS sends it to the FS as 64MB/4k separate 4k writes." It was acknowledged that this could also be accomplished in a non-generic way, howevever earlier feedback had suggested that such improvements should be made available to all.
Andrew Morton [interview] responded to the proposed changes saying, "there's nothing which leaps out and says 'wrong' in this. But there's nothing which leaps out and says 'right', either. It seems somewhat arbitrary, that's all." He pointed out that reiser4 was currently the only filesystem to benefit from the changes, "to be able to say 'yes, we want this' I think we'd need to understand which other filesystems would benefit from exploiting it, and with what results?" In the resulting discussion, it was determined that both FUSE [story] and XFS [story] would benefit from these changes prompting Hans to ask, "Is it enough?" Andrew agreed, "Spose so. Let's see what the diff looks like?"
Andrew Morton [interview] posted an overview of patches in -mm, discussing what is destined for inclusion in the upcoming 2.6.18 Linux kernel. He noted, "there is an unusually large amount of difficult material here." Patch sets that were discussed include a cleanup of kernel headers, klibc, various subsystem cleanups, the ACX1xx wireless driver, swsup cleanups, per-task statistic metrics, a clocksource management infrastructure, smpnice, swap prefetching [story], priority-inheriting futexes, a revamp of /proc/pid, ecryptfs, utsname virtualization [story], readahead, reiser4 improvements, a statistics infrastructure, and lock validation code.
Following up on a couple of features discussed earlier on KernelTrap, both swap-prefetching and utsname virtualization were briefly discussed. In regards to swap-prefetching Andrew noted, "I remain skeptical, but I have a lot of RAM. Multiple people have sung its praises. I guess I'll re-review and tentatively plan on sending them along or 2.6.18. Opinions are sought." As for utsname virtualization, "this doesn't seem very pointful as a standalone thing. That's a general problem with infrastructural work for a very large new feature. So probably I'll continue to babysit these patches, unless someone can identify a decent reason why mainline needs this work. I don't want to carry an ever-growing stream of OS-virtualisation groundwork patches for ever and ever so if we're going to do this thing... faster, please."
Andrew Morton [interview] offered a list of patches in his mm tree, summarizing for each his plans as to whether or not they will be pushed to Linus for inclusion in the upcoming 2.6.17 kernel. Comments on the patches range from the simple "will merge" to pushing them to others for review. One of the more entertaining comments followed a set of 33 patches where Andrew noted, "This is Oleg's romp through the core kernel. There's a ton of material here. I'll probably send it all to Linus and ask him to review it. (aka blame-shifting)." Later in the thread he explained, "it's just a whole lot of code in areas which are tricky and in which few people work and in which reviewing resources are slight."
One set of patches refused with the comment, "still don't have a compelling argument for this, IMO" was Con Kolivas [interview]' swap prefetching efforts [story]. The feature was discussed in a couple of follow up threads. In response to some concerns raised by Jens Axboe, Con explained the implementation a little further, "If the system is idle it doesn't cost anything to bring those pages in (laptop mode disables any prefetching if you're thinking about power consumption on laptops). And if the system wants the ram that has been filled with prefetched pages wrongly, the prefetched pages are at the tail end of the inactive LRU list with a copy on backing store so if they're not accessed they'll be the first thing dropped in preference to anything else, without any I/O."
Four months ago a debate on the lkml suggested that support for GCC 2.95 would be around for a long time [story], but a more recent thread suggests otherwise. 2.6 maintainer Andrew Morton put together a small patch to remove support for 2.95, and discussion continued to explore which versions of GCC 3.x should be supported. Andrew explained:
"2.95.x is basically buggered at present. There's one scsi driver which doesn't compile due to weird __VA_ARGS__ tricks and the rather useful scsi/sd.c is currently getting an ICE. None of the new SAS code compiles, due to extensive use of anonymous unions. The V4L guys are very good at exploiting the gcc-2.95.x macro expansion bug (_why_ does each driver need to implement its own debug macros?) and various people keep on sneaking in anonymous unions.
"It's time to give up on it and just drink more coffee or play more tetris or something, I'm afraid."
Following the piratical release of 2.6.14-rc2, a brief discussion looked at the advantages of using git to grab the latest version of the kernel code. A small break in service as the master.kernel.org server was situated in its new home [story] caused the 2.6.14-rc2 patch to not show up right away, and led to people pointing out the advantages of using git. When the ketchup script [story] was proposed as an alternative, it was illustrated how git can keep you up to date with the kernel down to a patch by patch level, or with a specific checkpoint. Linus further explained how git can be used to first track down that a bug was introduced between for example rc1-git3 and rc1-git4, and then to use "git-bisect" to further isolate the problem to a specific change.
As for -rc2, Linus noted, "not a whole lot o' excitement, ye scurvy dogs, but it has t' ALSA, LSM, audit and watchdog merges that be missed from -rc1, and a merge series with Andrew. But on t' whole pretty reasonable - you can see t' details in the shortlog (appended)." Evidently Monday the 19'th of September was International Talk Like A Pirate Day.
Andrew Morton [interview] provided an update on the current development status of the Linux kernel. As of his announcement, the latest development release is 2.6.13-git5, with 2.6.14 expected around October 7'th. At this time, Andrew is tracking 144 bugs though he notes, "I haven't culled these yet - some may be fixed." Indeed, a number of replies indicated that several of the listed bugs have been fixed.
As for what will likely be merged in the next couple of weeks and be part of the upcoming 2.6.14 release, Andrew listed several filesystems including relayfs [story], v9fs [story], and FUSE [story]. Regarding the latter he noted that he was, "fed up with arguing - any remaining problems can be fixed up in-tree if anyone can think of how to fix them." As for much anticipated Reiser4, Andrew summarized, "Stuck. Last time we discussed this I asked the reiser4 team to develop and negotiate a bullet-point list of things to be addressed. Once that's agreed to, implement it and then we can merge it. None of that has happened and as far as I know, all the review feedback which was provided was lost."
In the debate following Andrew Morton [interview] posting his plans for 2.6.13 [story], the existence of a plugin layer in Reiser4 was discussed. Jeff Garzik put it blunty, "the plugin stuff is crap. This is not a filesystem but a filesystem new layer. IMO considered in that light, it duplicates functionality elsewhere." Andrew Morton went on to explain, "I think the concern here is that this is implemented at the wrong level. In Linux, a filesystem is some dumb thing which implements address_space_operations, filesystem_operations, etc."
Hans Reiser noted, "please remember that this is per file, per item, per node, per attribute, per disk format, per bitmap, per super block, etc., abstracting, not per filesystem abstracting." He explained a couple advantages to plugins being that it makes it much easier for developers to change the disk format, and allows for easy code reuse. He added, "the use of plugins forced all the programmers to think about reusability at every layer of design. V3 of reiserfs is way too hard to work on and modify. If you ask one of the team to code something for V3 instead of V4, they quietly groan at the thought. It is just so much easier to do in V4."
Andrew Morton replied, "advanced features such as those which you describe are implemented on top of the filesystem, not within it. reiser4 turns it all upside down. Now, some of the features which you envision are not amenable to above-the-fs implementations. But some will be, and that's where we should implement those." The lengthy discussion continued, an interesting read for Reiser4 supporters and detractors alike.
Jesper Juhl submitted a small patch to bring the
kernel/module.c source file closer in line with the kernel's CodingStyle document. Specifically, he quoted from the CodingStyle document, "don't put multiple statements on a single line unless you have something to hide," which goes on to give an example of how such statements can cause confusion:
if (condition) do_this; do_something_everytime;
2.6 maintainer Andrew Morton [interview] quickly replied, "there are about 88 squillion of these in the kernel. I think it would be a mistake for me to start taking such patches, sorry." David Miller countered, "I disagree. Putting statements on the same line as the if statement hides bugs and makes the code harder to read." Andrew replied, "we all know that, but this means that we spend the next two years fielding an ongoing dribble of trivial patches which distract from real work."
A short discussion followed in which Andrew agreed, "well I suppose I could live with a few REALLY REALLY BIG patches to do this to lots of files, but if it's the old death-by-1000-cuts, I'm gonna call uncle this time." Jesper agreed to begin working on the large patches, to which Andrew repeated, "OK, a few 100k-400k patches would suit." A similar discussion was raised here.
At the July 2004 kernel summit, it was decided that there was no need to fork a 2.7 kernel [forum] to introduce new functionality into the Linux kernel. Instead, the decision was made that it was possible for Andrew Morton [interview] and Linus Torvalds to continue working together to first merge things into Andrew's -mm tree, and then after testing the changes to merge them into Linus' mainline tree [story]. This of course led to discussion, with some confusion as to how the 2.6 kernel [forum] could be considered stable while new features were still being merged in [story]. During another short discussion nine months after this decision, Rik van Riel [interview] offered some insight into why the new development model works:
"Things get merged one change at a time, and stabilised one change at a time. This is a big change from the even/odd numbered kernel series, where sometimes a bug crops up without anybody knowing exactly what change introduced it. The current development model seems to go much smoother than anything I've seen before."
Greg KH announced the first maintenance release of the 2.6.11 kernel [story], 184.108.40.206. Quickly acting on the recent lengthy discussion regarding kernel release numbering [story] [story], Greg and Chris Wright have begun to maintain this branch. With each 2.6.x release, they will maintain 2.6.x.y releases available from your nearest kernel.org mirror. This first maintenance release includes three simple patches, not including the makefile change, addressing a problem with keyboards on Dell machines, and raid6 compilation on the ppc architecture. Andrew Morton [interview] noted that he has additional fixes appropriate for this tree that will likely lead to a 220.127.116.11 release in the relatively near future.
Greg went on to highlight the requirements for patches to be able to be merged into this new tree: they must be no bigger than 100 lines, they must fix only one thing, they must fix real bugs that are confirmed to be affecting people, and they must fix a build error, an oops, a hang, or a real security issue. Patches explicitly not allowed include things to fix "theoretical race conditions" without an exploit, or "trivial" fixes like spelling changes or whitespace cleanups. Greg described the effort's mantra as "release early and often".
Linux creator Linus Torvalds started a lengthy discussion on the lkml regarding release numbering for the Linux kernel. Some have complained about kernel stability with the new development model discussed back in mid-2004 [story] in which active development occurs in the "stable" 2.6 kernel. In his recent email, Linus explained, "the problem with major development trees like 2.4.x vs 2.5.x was that the release cycles were too long, and that people hated the back- and forward-porting. That said, it did serve a purpose - people kind of knew where they stood, even though we always ended up having to have big changes in the stable tree too, just to keep up with a changing landscape." His new proposal involves still using an even and odd numbering scheme, but on a smaller level. Thus, the upcoming 2.6.12 would be "stable" in that it should only contain bugfixes over 2.6.11. Then 2.6.13 would be more development oriented, including some larger changes. These larger changes would again stabalize in 2.6.14, and so on. He adds, "we'd still do the -rcX candidates as we go along in either case, so as a user you wouldn't even _need_ to know, but the numbering would be a rough guide to intentions. Ie I'd expect that distributions would always try to base their stuff off a 2.6.<even> release."
The lengthy discussion that followed was a collection of mixed reactions. Some liked the proposal, but others were confused as to what it was supposed to solve. Essentially the idea seems to be to get more people to test the kernel, as only with more testers can bugs be found. The current strategy of using a series of "-rc" kernels [story] is confusing to many as in most projects this indicates a "release candidate", or something thought to be stable, whereas with the Linux kernel an -rc release is frequently where the active development takes place. As the common user has come to realize this, the -rc kernels have gotten less testing. Linus says, "that's the whole point here, at least to me. I want to have people test things out, but it doesn't matter how many -rc kernels I'd do, it just won't happen. It's not a 'real release'." Andrew Morton [story]'s -mm tree is intended to weed out obvious errors with big changes before merging patches upstream in the mainline kernel, but again as it frequently proves less stable it tends to get less testing.
In response to whether or not he had any objections to merging FUSE [story] into the mainline kernel, Andrew Morton [interview] offered some insight into what new features were slated for the upcoming 2.6.12 kernel. Andrew began, "I was planning on sending FUSE onto Linus in a week or two," going on to add "that and cpusets are the notable features which are 2.6.12 candidates."
Andrew then referred to several other patches currently in his -mm patchset [story], discussing their likelihood of being merged into the mainline kernel. He described crashdump [story] as seeming "permanently not-quite-ready". He noted that perfctr "works fine", but that it was "similar-to-but-different-from" the IA64 perfmon subsystem, and "might not be suitable for ppc64". Both nfsacl [thread] and the device-mapper multipath [thread] patches were deemed "OK for 2.6.12". Regarding cachefs, Andrew noted it "is a bit stuck because it's a ton of complex code and afs is the only user of it. Wiring it up to NFS would help." Finally, regarding whether or not he planned to merge reiser4 [story], he said this was "less clear" going on to add "once all the review comments have been addressed and we start seeing a bit of vendor pull for it, maybe."
A lengthy and interesting thread was started on the lkml by Chris Wright looking to define a centralized place to report security issues in the Linux Kernel. Chris offered his services in getting things set up, addressing his email to Linus Torvalds, Andrew Morton [interview], Alan Cox [interview] and Marcelo Tosatti [interview]. He explained that he wanted to centralize the information "to help track it, make sure things don't fall through the cracks, and make sure of timely fix and disclosure". The resulting discussion was joined by numerous members of the kernel hacking community, exposing a wide range of opinions.
Linus agreed that it sounded like a good idea, but qualified this by adding, "the _only_ requirement that I have is that there be no stupid embargo on the list. Any list with a time limit (vendor-sec) I will not have anything to do with." An embargo in this case is the time period from when a security problem is first reported to when a fix can be made public. Marcelo pointed out that a certain amount of time is necessary, "for the vendors to catch up", explaining that "it is a simple matter of synchronization". Linus again stressed his dislike for the vendor-sec mailing list suggesting that at times the length of the embargo period is often more about politics than anything else. He then added, "but in the absense of politics, I'd _happily_ have a self-imposed embargo that is limited to some reasonable timeframe (and "reasonable" is definitely counted in days, not weeks. And absolutely _not_ in months, like apparently sometimes happens on vendor-sec)." In a followup comment he clarified, "btw, the only thing I care about is the embargo on the _fix_", noting that he was comfortable if there was a need to delay publishing an explanation of the security hole so long as the fix itself was quickly released.
At the July 2004 kernel summit, it was decided that the current 2.6 development process with teamwork between Andrew Morton [interview] and Linux creator Linus Torvalds was proving quite effective. The process involves using Andrew's test -mm tree [forum] as a staging area for patches prior to going into Linus' mainline tree [forum]. The system has allowed for continued evolution and new features in the 2.6 stable kernel, however it has also lead to a fair amount of discussion and debate [story]. Much of the concern is that with new features constantly being introduced, true stabilization may not be possible.
One theory presented on the lkml was that the process has changed because, "these days nobody wants to be a stable-release maintainer anymore. It's boring." 2.2 maintainer Alan Cox [story] disagreed, "that depends what kind of an engineer you are. Just as there are people who love standards body work and compliance testing/debugging there are people who care about stable trees." When asked if he was willing to maintain a stable 2.6.x kernel, Alan replied, "I'll do it if Linus wants". That is, while 2.6.10 is being developed, the suggestion is to continue to stabalize 2.6.9, releasing 18.104.22.168, 22.214.171.124, etc. And when 2.6.10 is released, to then focus on stabalizing it. Alan already maintains a 2.6-ac patchset [forum] which includes a growing number of bugfixes. However he notes that it is not intended to be all-inclusive, "the goal of -ac is to contain the stuff I personally consider important. A lot of the smaller bugfixes individually are fine but a 'complete set of bugfixes' turns into a large change set and then needs an entire validation and release cycle of its own."
With the release of 2.6.9-mm1, Andrew Morton [interview] offered a quick status update on a number of patches in his -mm tree [forum] that are 2.6-mainline hopefuls. For example, regarding the much debated reiser4 filesystem [story], Andrew said that he is still "not sure, really. The namespace extensions were disabled, although all the code for that is still present. Linus's filesystem criterion used to be 'once lots of people are using it, preferably when vendors are shipping it'. That's a bit of a chicken and egg thing though. Needs more discussion". And as for Ingo Molnar [interview]'s preemption and low-latency fixups [forum] Andrew offered, "I haven't really thought about it and haven't looked at the patches yet. Hopefully 2.6.10 material."
Other projects specifically mentioned include the sysfs backing store, the ext3 reservations code, the ext3 resize code, kexec and crashdump [story], perfctr, cachefs, cpusets, and the md updates. Read on for Andrew's comments and the complete -mm1 changelog.