"Ok, I've been slacking on the -stable front for a bit here, and didn't realize how far behind I've gotten. Everyone has been sending patches in, which is great, but now we are facing a HUGE 114 patch release," began Greg Kroah-Hartman. He continued:
"As there's no real way that everyone can review all of these patches, I've decided to split them up into 6 different categories, and will be sending patches out in these categories for review. If people can just glance over the ones in the areas they care about, I would really appreciate it."
The stable review resulted in six stable 2.6.23.y releases. The first, 18.104.22.168, contained bug fixes for the core kernel code. 22.214.171.124 contained bug fixes for architecture specific issues. 126.96.36.199 contained bug fixes for the core networking and wireless code. 188.8.131.52 contained bug fixes for networking drivers. 184.108.40.206 contained bug fixes for non-networking drivers. 220.127.116.11 contained file system bug fixes. These releases were followed by 18.104.22.168 containing a couple security fixes.
Linux creator Linus Torvalds announced the third release candidate for the upcoming 2.6.24 kernel summarizing, "hmmm.. Lots of small fixes, some cleanups, and a few things like the cris updates that aren't really either, but which won't affect any normal user, and will hopefully make it easier to sync up in the future. Network driver fixes, some IDE and infiniband updates, some late cpufreq updates, and a hwmon update." He continued:
"On the architecture side, in addition to the afore-mentioned cris updates, there are some sh, arm, powerpc and mips updates, and also one final x86 unification cleanup (and I really mean it - the rest can wait until after 2.6.24, but with this one the x86 configuration really is fairly merged, and both i386 and x86_64 are really just special cases of the 'x86' architecture in the configurator)."
Miklos Szeredi posted a request for comments titled "fuse writable mmap design". He explained, "writable shared memory mappings for fuse are something I've been trying to implement forever. Now hopefully I've got it all worked out, it survives indefinitely with bash-shared-mapping and fsx-linux. And I'd like to solicit comments about the approach." He went on to describe the patch:
"fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM. It copies the contents of the original page, and queues a WRITE request to the userspace filesystem using this temp page. From the VM's point of view, the writeback is finished instantly: the page is removed from the radix trees, and the PageDirty and PageWriteback flags are cleared. The per-bdi writeback count is not decremented until the writeback truly completes. [...] On dirtying the page, fuse waits for a previous write to finish before proceeding. This makes sure, there can only be one temporary page used at a time for one cached page."
"Ceph is a distributed network file system designed to provide excellent performance, reliability, and scalability with POSIX semantics. I periodically see frustration on this list with the lack of a scalable GPL distributed file system with sufficiently robust replication and failure recovery to run on commodity hardware, and would like to think that--with a little love--Ceph could fill that gap," announced Sage Weil on the Linux Kernel mailing list. Originally developed as the subject of his PhD thesis, he went on to list the features of the new filesystem, including POSIX semantics, scalability from a few nodes to thousands of nodes, support for petabytes of data, a highly available design with no signle points of failure, n-way replication of data across multiple nodes, automatic data rebalancing as nodes are added and removed, and a Fuse-based client. He noted that a lightweight kernel client is in progress, as is flexible snapshoting, quotas, and improved security. Sage compared Ceph to other similar filesystems:
"In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely on symmetric access by all clients to shared block devices, Ceph separates data and metadata management into independent server clusters, similar to Lustre. Unlike Lustre, however, metadata and storage nodes run entirely in userspace and require no special kernel support. Storage nodes utilize either a raw block device or large image file to store data objects, or can utilize an existing file system (XFS, etc.) for local object storage (currently with weakened safety semantics). File data is striped across storage nodes in large chunks to distribute workload and facilitate high throughputs. When storage nodes fail, data is re-replicated in a distributed fashion by the storage nodes themselves (with some coordination from a cluster monitor), making the system extremely efficient and scalable."
"This is the listing of the open bugs that are relatively new, around 2.6.22 and up. They are vaguely classified by specific area," Natalie Protasevich said, posting a current list of bugs each linking to an appropriate bugzilla.kernel.org entry. Andrew Morton reviewed the list, noting "no response from developers" in response to many of the bugs. David Miller pointed out that in some cases this wasn't true, referring to 46 bug fixes queued in his networking tree and another 10 already pushed upstream, "when someone like me is bug fixing full time, I take massive offense to the impression you're trying to give especially when it's directed at the networking. So turn it down a notch Andrew." Andrew wasn't convinced, "first we need to work out whether we have a problem. If we do this, then we can then have a think about what to do about it. I tried to convince the 2006 KS attendees that we have a problem and I resoundingly failed. People seemed to think that we're doing OK." He continued:
"This is not a minor matter. If the kernel _is_ slowly deteriorating then this won't become readily apparent until it has been happening for a number of years. By that stage there will be so much work to do to get us back to an acceptable level that it will take a huge effort. And it will take a long time after that for the kerel to get its reputation back. So it is important that we catch deterioration *early* if it is happening."
Ingo Molnar sent a merge request to Linus Torvalds for the latest CFS fixes. CFS, the Completely Fair Scheduler, was merged into the mainline Linux kernel in July of 2007. It was first included in the 2.6.23 kernel, released in October of 2007. The scheduler appears to be quickly stabilizing, visible in the minimal assortment of fixes contained in the latest source code push. Ingo Molnar summarized the changes:
"There are two cross-subsystem groups of fixes: three commits that resolve a KVM build fix on !SMP - acked by Avi to go in via the scheduler git tree because it changes a central include file. The other one is a powerpc CPU time accounting regression fix from Paul Mackerras.
"The remaining 14 commits: one crash fix (only triggerable via the new control-groups filesystem), a delay-accounting regression fix, two performance regression fixes, a latency fix, two small scheduling-behavior regression fixes and seven cleanups."
"For the last release, I stated that I thought the 22.214.171.124 release would be the last one in the 2.6.22.y series. Since then, I've received a number of other patches that would be nice to have in the .22.y tree," explained Greg KH. He continued:
"So, for a while, I'll keep the 2.6.22.y tree open, doing new releases every once in a while as they accumulate. I do this, for no other than the selfish reason that I use it every day on my openSuSE 10.3 boxes as that is the kernel base that release is on :)"
Greg KH and Chris Wright have been maintaining a -stable 2.6.x.y patchset for the 2.6.x and 2.6.(x-1) kernels since March of 2005. 2.4 stable kernel maintainer Willy Tarreau has also maintained patches against the 2.6.20 branch since August of 2007, though noted that he'll switch to maintaining the stable 2.6.22 branch once Greg finishes. Adrian Bunk also continues to maintain a -stable 2.6.16 branch of the Linux kernel.
"This document is intended to specify the security goal that AppArmor is intended to achieve, so that users can evaluate whether AppArmor will meet their needs, and kernel developers can evaluate whether AppArmor is living up to its claims. This document is *not* a general purpose explanation of how AppArmor works, nor is it an explanation for why one might want to use AppArmor rather than some other system," began Crispin Cowan, following Arjan van de Ven's earlier suggestion to document security module intent. Crispin continued:
"AppArmor is intended to protect systems from attackers exploiting vulnerabilities in applications that the system hosts. The threat is that an attacker can cause a vulnerable application to do something unexpected and undesirable. AppArmor addresses this threat by confining the application to access only the resources it needs to access to execute properly, effectively imposing 'least privilege' execution on the application.
"Applications have access to a number of resources including files, interprocess communication, networking, capabilities, and execution of other applications. The purpose of least privilege is to bound the damage that a malicious user or code can do by removing access to all resources that the application does not need for its intended function. For instance, a policy for a web server might grant read only access to most web documents, preventing an attacker who can corrupt the web server from defacing the web pages."
"I'm pleased to announce [the] 7'th and final release of the distributed storage subsystem (DST)," Evgeniy Polyakov stated, completing the TODO list on the project's web page. He titled the release, "squizzed black-out of the dancing back-aching hippo", noting, "it clearly shows my condition". New features in this release include checksum support, extended auto-configuration for detecting and auto-enabling checksums if supported by the remote host, new sysfs files for marking a given node as clean (in-sync) or dirty (not-in-sync), and numerous bug fixes.
Evgeniy released the first version of his distributed storage subsystem in July of 2007. In September he explained that this was the first step in a larger distributed filesystem project he's planning. In late October, Andrew Morton noted that the work looked ready to be merged into his -mm kernel.
"I'm pleased to announce another release of Squashfs. This is the 22nd release in just over five years. Squashfs 3.3 has lots of nice improvements, both to the filesystem itself (bigger blocks and sparse files), but also to the Squashfs-tools Mksquashfs and Unsquashfs," stated Phillip Lougher about the latest release of the compressed read-only Linux filesystem. He noted that he still needed to fix filesystem endianness, then he was going to focus on getting Squashfs into the mainline kernel. New features found in this latest release include:
"1. Maximum block size has been increased to 1Mbyte, and the default block size has been increased to 128 Kbytes. This improves compression.
"2. Sparse files are now supported. Sparse files are files which have large areas of unallocated data commonly called holes. These files are now detected by Squashfs and stored more efficiently. This improves compression and read performance for sparse files."
"Yeah, don't remind me - it's late," began Linus Torvalds, announcing the second 2.6.24 release candidate, "there was nothing in particular holding this thing up, I just basically just forgot to cut a -rc2 release last week." He went on to list some of the changes:
"There's not a lot of hugely exciting stuff here. Some arch updates: MIPS, arm, blackfin, x86, sparc, sh, s390.. Also various driver updates: libata, IDE, networking, DVB.. And some more fallout from the scatter-gather changes. Some scheduler cleanups, and also fixing the CPU usage statistics that got scrogged at some point."
Linus noted that while there were no major changes, the shortlog was still too large to post to the list. He suggested using the command
git shortlog v2.6.24-rc1 to see all changes since the last release candidate, "but quite frankly, it's no Leo Tolstoy. If you have trouble falling asleep, you might try to print it out and take it to bed with you: it's not going to be more than just a couple of pages ('use 2nup and save a tree'), but I dare you to actually get to the end. Snooze city."
I am pleased to announce that KernelTrap has partnered with Specialty Job Markets to offer a unique Linux kernel job board for our readers. It is completely free to submit your resume, which will then be personally reviewed and matched with current and future employment opportunities. If you're an employer, it's also free to post jobs. Jobs and resumes that are posted to our boards are individually reviewed and matched by a professional recruiter, not a computer program, offering quality results with a personal touch. The contact information you provide is kept confidential and is only visible to our dedicated recruiter.
By using our job board, you are not only finding yourself a good job or a good employee, you're also helping to support KernelTrap.org. Each time our recruiter successfully matches a candidate with a job, the employer pays a fee for this service, and KernelTrap.org receives a percentage which allows us to focus on improving these web pages. With every single resume and job manually screened by a human recruiter, we are able to keep our job board focused on kernel development jobs and free of spam. Read on for full details, or skip ahead and submit your resume today!
An earlier discussion about GCC compiler misoptimizations led Linus Torvalds to note, "I'm very ambivalent about gcc." He explained that on one hand he feels it's a great compiler with many great developers, but being an old project, "it has accumulated cruft over time, and cleaning things up is often almost impossible." He added that while compiler bugs can be frustrating, his real concern with the project remains in how some of the developers enforce language definition, "and seem to think that it's more important to read the language spec like a lawyer than it is to solve actual user problems."
Andrew Haley noted that there is an active group of developers trying to improve GCC, requesting, "give us a chance." Returning to the original compiler misoptimization that started the whole discussion, he noted that a fix was being committed to all open GCC branches, "we're back-porting the patch to all open branches. However, this patch only affects one particular case where gcc introduces a data race; we're sure there are others not fixed." Andrew also noted that they were actively continuing to audit the code to find and remove similar optimization bugs.
"The problem with swap over network is the generic swap problem: needing memory to free memory. Normally this is solved using mempools, as can be seen in the BIO layer," explained Peter Zijlstra. "Swap over network has the problem that the network subsystem does not use fixed sized allocations, but heavily relies on kmalloc(). This makes mempools unusable."
The first fifteen patches set up a generic framework for reserving memory. Patches 16-23 actually put the framework to use on the network stack. Peter noted, "a network write back completion [involves] receiving packets, which when there is no memory, is rather hard. And even when there is memory there is no guarantee that the required packet comes in in the window that that memory buys us." He went on to explain, "the solution to this problem is found in the fact that network is to be assumed lossy. Even now, when there is no memory to receive packets the network card will have to discard packets. What we do is move this into the network stack." Patches 24-26 set up an infrastructure for swapping to a filesystem instead of a block device, which is then utilized by the final patches, "finally, convert NFS to make use of the new network and vm infrastructure to provide swap over NFS." When the usefulness of these patches were questioned, Peter noted, "There is a large corporate demand for this, which is why I'm doing this. The typical usage scenarios are: 1) cluster/blades, where having local disks is a cost issue (maintenance of failures, heat, etc) 2) virtualisation, where dumping the storage on a networked storage unit makes for trivial migration and what not.."
"The following patches add a new testing facility for suspend and hibernation," noted Rafael J. Wysocki. He continued, "the first patch adds the possibility to test the suspend (STD) core code without actually suspending, which is useful for tracking problems with drivers etc. The second one modifies the hibernation core so that it can use the same facility (it's a bit more powerful than the existing hibernation test modes, since they really can't test the ACPI global methods)."
The testing facility introduces a new
/sys/power/pm_test_level attribute, accepting a number from 1 to 5, with each value simulating a different level of the suspend or hibernation code. Rafael explained: "5 - test the freezing of processes; 4 - test the freezing of processes and suspending of devices; 3 - test the freezing of processes, suspending of devices and platform global control methods; 2 - test the freezing of processes, suspending of devices, platform global control methods and the disabling of nonboot CPUs; 1 - test the freezing of processes, suspending of devices, platform global control methods, the disabling of nonboot CPUs and suspending of platform/system devices". He added, "if a suspend is started by normal means, the suspend core will perform its normal operations up to the point indicated by the test level. Next, it will wait for 5 seconds and carry out the resume operations needed to transition the system back to the fully functional state." Rafael noted that setting pm_test_level to 0 disables the testing facility.