"sched_yield() is not - and should not be - about 'recalculating the position in the scheduler queue' like you do now in CFS," Linus Torvalds stated in a discussion with Completely Fair Scheduler author Ingo Molnar, pointing to the man pages to back up his argument that sched_yield should instead move a thread to the end of its queue, adding, "quite frankly, the current CFS behaviour simply looks buggy. It should simply not move it to the 'right place' in the rbtree. It should move it *last*."
Ingo described how it worked with the pre-2.6.23 scheduler, "the O(1) implementation of yield() was pretty arbitrary: it did not move it last on the same priority level - it only did it within the active array. So expired tasks (such as CPU hogs) would come _after_ a yield()-ing task." He went on to compare this to the new process scheduler , "so the yield() implementation was so much tied to the data structures of the O(1) scheduler that it was impossible to fully emulate it in CFS. In CFS we dont have a per-nice-level rbtree, so we cannot move it dead last within the same priority group - but we can move it dead last in the whole tree. (then they'd be put even after nice +19 tasks.) People might complain about _that_." He also noted that this would change the behavior for some desktop applications that call sched_yield(), "there will be lots of regression reports about lost interactivity during load."
Having recently returned from the Linux kernel summit, Ingo Molnar and Peter Zijlstra sent out some performance updates to the Completely Fair Scheduler:
"Our main focus has been on simplifications and performance - and as part of that we've also picked up some ideas from Roman Zippel's 'Really Fair Scheduler' patch as well and integrated them into CFS. We'd like to ask people go give these patches a good workout, especially with an eye on any interactivity regressions."
He noted that some of the changes included removing features that had proved unecessary. "while keeping the things that worked out fine, like sleeper fairness." Ingo posted some results from the lmbench benchmark noting around a 16% speedup on both the 32-bit and 64-bit x86 architectures. He added, "we are now a bit faster than the O(1) scheduler was under v2.6.22 - even on 32-bit. The main speedup comes from the avoidance of divisions (or shifts) in the wakeup and context-switch fastpaths."
Jörn Engel announced LogFS, "a scalable flash filesystem." The project's home page notes that LogFS aims to be the successor of JFFS2, "the two main problems of JFFS2 are memory consumption and mount time. Unlike most filesystems, there is no tree structure of any sorts on the medium, so the complete medium needs to be scanned at mount time and a tree structure kept in-memory while the filesystem is mounted. With bigger devices, both mount time and memory consumption increase linearly. JFFS2 has recently gained summary support, which helps reduce mount time by a constant factor. Linear scalability remains. YAFFS also appears to be better by a constant factor, yet still scales linearly."
In contrast, Jörn compared his new LogFS, "LogFS has an on-medium tree, fairly similar to Ext2 in structure, so mount times are O(1). In absolute terms, the OLPC system has mount times of ~3.3s for JFFS2 and ~60ms for LogFS." Regarding its stability, he noted, "LogFS works and survives my testcases. It has fairly good chances of not eating your data during regular operation. There are still two known bugs that will eat data if the filesystem is uncleanly unmounted. Also still missing is wear leveling." Thomas Gleixner reviewed the code and offered the following summary, suggesting the code has a ways to go before it replaces JFFS2, "the code is far from being useful on real world hardware. The error handling via BUG() is just making it useless. Also please fix the coding style and other issues from the seperate review. Some useful comments would make a functional review way easier."
Anyone who's been following Linux kernel development for the past several months has heard about one exciting feature after another being merged into the still un-released 2.6 kernel. New features that noticeably affect user experience include Robert Love's [interview] preemptible kernel work [story], Ingo Molnar's [interview] O(1) Scheduler [story], Rik Van Riel's [interview] reverse mapping VM [story], Nick Piggins' [interview] Anticipatory I/O scheduler [story], and much, much more...
Having some spare time a few nights ago, I decided to give the latest kernel, 2.6.0-test4, a trial run on my aging 550Mhz PIII desktop computer, and the result was nothing short of spectacular. As the final 2.6.0 release approaches, it is important that an increasing number of users (aka testers) give this kernel a try, especially as currently it's still a sexy task for developers to track down kernel bugs and stabalize their work. Once work starts on the 2.7 development tree, inevitably much talent will again be focusing on new features.
The purpose of this document is to provide some helpful tips to readers that currently compile their own 2.4 kernels, but haven't yet made the leap to 2.6. This is still a development kernel, so you may run into problems, but overall stability and performance is quite impressive and I can't recommend enough that you try it today.
Ingo Molnar has been contributing to Linux kernel development since 1995 with an impressive list of accomplishments. Most recently his O(1) scheduler was merged into the 2.5 development kernel, as well as much work to enhance the handling of threads. Other highly visible contributions include software-RAID support and the in-kernel Tux web and FTP servers.
In this interview, Ingo explores how he started working on the Linux kernel noting, "it might sound a bit strange but i installed my first Linux box for the sole purpose of looking at the kernel source." He goes on to explain the concepts behind his new O(1) scheduler, and to describe many of his other kernel efforts. This interview was conducted over several months, and covers a lot of interesting ground...
Con Kolivas, a practicing doctor in Australia, has written a benchmarking tool called ConTest which has proven to be tremendously useful to kernel developers, having been designed to compare the performance of different versions of the Linux kernel. He was kind enough to speak with us, explaining how he got started on this project, what makes his benchmark unique, and how to interpret the resulting output. Comparing the 2.5 development kernel to the 2.4 stable kernel, Con says, "a good 2.5 kernel (and that's not all of them) feels faster than 2.4 in most ways and this bodes well for 2.6." The interesting results from his frequent benchmarks back up this statement.
Con also describes his high performance patchset for the 2.4 stable kernel, currently at version 2.4.19-ck9. This patchset adds a number of performance boosting patches ideal for a desktop environment, such as the O(1) scheduler, kernel preemption, low latency and compressed caching. Read on for the full interview...