We're using a lot of different kernel configurations on our servers running Gentoo Linux and Linux VServer. Kernels' configs depend on hardware and functions of every server. Every time new kernel is released and/or we need to enable/disable some features for all servers' kernels we have to update .config's manually.
"Or, we could just do the ugliest patch ever, namely
-#define pcibus_to_node(node) (-1) +#define pcibus_to_node(node) ((int)(long)(node),-1)
Wow. It's so ugly it's almost wraps around and comes out the other side and looks pretty."
"I bow down before you. I thought I had done some rather horrible things with gcc built-ins and macros, but I hereby hand over my crown to you. As my daughter would say: that patch fell out of the ugly tree, and hit every branch on the way down. Very impressive."
"'Too small' and 'too insignificant' is not a patch attribute that is in my vocabulary - by definition the best patches are very small and very insignificant (they just happen to end up doing something cool a 1000 steps later ;-) 99% of our problems come from 'too large' and 'too intrusive' patches."
"The __deprecated marker is quite useful in highlighting the remnants of old APIs that want removing. However, it is quite normal for one or more years to pass, before the (usually ancient, bitrotten) code in question is either updated or deleted," explained Jeff Garzik, posting a small patch to make it possible to silence "warning: 'foo' is deprecated" type messages.
Andrew Morton wasn't impressed, suggesting, "Sigh. Can't we just fix the dud code? Or mark it BROKEN and see what happens?" Linus Torvalds added, "I think removing __deprecated is the better option. Quite frankly, some people add '__deprecated' *way* too eagerly." Jeff agreed that __deprecated is over used, "__deprecated has spread to just about every API that people don't consider fresh and up-to-date." He then added ,"like I noted in the patch description, rewriting grotty ISA/MCA/etc. probe code is a thankless, boring task that few are crazy enough to attempt :) As you can see from the patch flood recently I /have/ been working through the dud code, but it will still take years. The changes required for each are on average ~200 LOC changed, if not more."
Jeff Garzik posted a series of nine patchs to the lkml titled to "remove [the] 'irq' argument from all irq handlers", explaining, "the overwhelming majority of drivers do not ever bother with the 'irq' argument that is passed to each driver's irq handler. Of the minority of drivers that do use the arg, the majority of those have the irq number stored in their private-info structure somewhere." He noted that he had no intention to push the patches upstream anytime soon.
Feedback was entirely positive, with Thomas Gleixner suggesting, "Full ACK. We should do this right at the edge of -rc1. And let's do this right now in .24 and not drag it out for no good reason." Ingo Molnar concurred, "full ACK on the concept from me too. Please go ahead! :)" Eric Biederman noted that there was still work to be done, "the practical question is how do we make this change without breaking the drivers that use their irq argument." Jeff agreed, explaining why the code won't be pushed upstream during -rc1, "I am finding a ton of bugs in each get_irqfunc_irq() driver, so I would rather patiently sift through them, and push fixes and cleanups upstream. Once that effort is done, everything should be in the 'trivial' pile and not have the logic that you are worried about (and thus there would be no need to add an additional branch to the irq handling path)."
"We have seen ramdisk based install systems, where some pages of mapped libraries and programs were suddendly zeroed under memory pressure. This should not happen, as the ramdisk avoids freeing its pages by keeping them dirty all the time," Christian Borntraeger began, explaining the need for his small patch to the ramdisk driver. He continued, "it turns out that there is a case, where the VM makes a ramdisk page clean, without telling the ramdisk driver. On memory pressure shrink_zone runs and it starts to run shrink_active_list. There is a check for buffer_heads_over_limit, and if true, pagevec_strip is called. pagevec_strip calls try_to_release_page. If the mapping has no releasepage callback, try_to_free_buffers is called. try_to_free_buffers has now a special logic for some file systems to make a dirty page clean, if all buffers are clean. Thats what happened in our test case."
He provided two methods for duplicating the reported problem, "you have to make buffer_heads_over_limit true" This is done by either lowering max_buffer_heads or having a system with lots of high memory. "The solution is to provide a noop-releasepage callback for the ramdisk driver. This avoids try_to_free_buffers for ramdisk pages."
"The following patch makes it possible to give kernel messages a selectable color which helps to distinguish it from other noise, such as boot messages," explained Jan Engelhardt, along with a 43 line patch to the char driver. As justification for the patch he offered, "NetBSD has it, OpenBSD has it, FreeBSD to some extent, so I think Linux should too." He also noted that an earlier version of the small patch had been previously posted back in April.
Ingo Molnar responded favorably, "looks really good to me!" He went on to suggest, "feature request: would be interesting to have a color table (defined in the .config) dependent on message loglevel. That way KERN_CRIT messages could be red, KERN_INFO ones white, etc."
Greg KH posted three emails titled State of the Linux Driver Core Subsystem, State of the Linux USB Subsystem, and State of the Linux PCI Subsystem, noting that for each there were no known regressions then going on to list which patches were bound for the upcoming 2.6.24 kernel. Greg pointed out that there were a large number of open bugs against the USB subsystem, "yeah, there are way too many there, I've been really slack in trying to work through them. If anyone wants to help out, feel free :)" He continued:
"Note, there are over 100 patches in the USB queue, so I might have missed a few things in reviewing them by hand right now. If I failed to describe your patch that is already in the queue, and you feel it is important for everyone to know about, please feel free to add to the above list. I did not purposefully mean to exclude anything, merely try to summarize things"
"Here's a new version of my credentials patch. It's still very basic, with only Ext3, (V)FAT, NFS, AFS, SELinux and keyrings compiled in on an x86_64 arch kernel," stated David Howells. He described the patch as, "introduce a copy on write credentials record (struct cred). The fsuid, fsgid, supplementary groups list move into it (DAC security). The session, process and thread keyrings are reflected in it, but don't primarily reside there as they aren't per-thread and occasionally need to be instantiated or replaced by other threads or processes."
Casey Schaufler asked, "what I don't really understand is what value is gained by this exercise. Are the savings sufficiently significant to justify the effort?" Trond Myklebust explained, "it is not about savings, but about new functionality. Basically, the existence of reference-counted credentials will allow AFS and NFS to cache that information and use it for deferred writes etc." David added, "and also make it easier for cachefiles and hopefully NFSd to override the active security. There's a comment somewhere in, I think, the SunRPC code in the Linux kernel bemoaning the lack of this very feature:-)"
Randy Dunlap sent a patch to the Linux kernel mailing list described as adding "info about various email clients and their applicability in being used to send Linux kernel patches." The first revision generated quite a bit of discussion, quickly resulting in a second version, and eventually a third version that Andrew Morton referred to as "soon to be merged". In addition to some general suggestions about emailing patches, it also offers some specific configuration suggestions for a number of popular email clients. It begins:
"Patches for the Linux kernel are submitted via email, preferably as inline text in the body of the email. Some maintainers accept attachments, but then the attachments should have content-type 'text/plain'. However, attachments are generally frowned upon because it makes quoting portions of the patch more difficult in the patch review process.
"Email clients that are used for Linux kernel patches should send the patch text untouched. For example, they should not modify or delete tabs or spaces, even at the beginning or end of lines."
"This patch implements a new version of RCU which allows its read-side critical sections to be preempted," Paul McKenney said in a posting to the Linux Kernel mailing list. He described the RCU code contained in his 9 part patchset, "it uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit [the] read-side critical section." He continued:
"This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the 'classic' RCU implementations and can be switched to at compiler."
Ingo Molnar responded very favorably to the patch, "cool! Now after 2 years of development and testing I think this is one of the most mature patchsets on lkml - so i'd like to see this designated for potential upstream inclusion. I.e. everyone who can see some bug, please holler now."
"The cfs core has been enhanced since quite sometime now to understand task-groups and [to] provide fairness to such task-groups," began Srivatsa Vaddagiri, "what was needed was an interface for the administrator to define task-groups and specify group 'importance' in terms of its cpu share. The patch below adds such an interface."
Srivatsa requested that his patch be merged into Andrew Morton's -mm tree to receive more testing, "note that the load balancer needs more work, esp to deal with cases like 2-groups on 4-cpus, one group has 3 tasks and other having 4 tasks. We are working on some ideas, but nothing to share in the form of a patch yet. I felt sending this patch out would help folks start testing the feature and also improve it."
"The objective of this patchset is to keep the system in a state where actions such as page reclaim or memory compaction will reduce external fragmentation in the system," Mel Gorman described his set of thirteen patches labeled "reduce external fragmentation by grouping pages by mobility v30". He explained, "it works by grouping pages of similar mobility together in PAGEBLOCK_NR_PAGES areas." He defined four mobility types as: "UNMOVABLE - Pages that cannot be trivially reclaimed or moved; MOVABLE - Pages that can be moved using the page migration mechanism; RECLAIMABLE - Pages that the kernel can often directly reclaim such as those used for inode caches; RESERVE - The areas where min_free_kbyte-related pages should be stored". Mel added:
"This grouping clearly requires additional work in the page allocator. kernbench shows effectively no performance difference varying between -0.2% and +1% on a variety of test machines. Success rates for huge page allocation are dramatically increased. For example, on a ppc64 machine, the vanilla kernel was only able to allocate 1% of memory as a hugepage and this was due to a single hugepage reserved as min_free_kbytes. With these patches applied, 40% was allocatable as superpages."
Ingo Molnar announced that the real time patchset [story] that he and Thomas Gleixner maintain is now available as a series of 374 broken out patches, "from now on (as of 2.6.22.1-rt2) it will be part of every upstream -rt release and it is available from the -rt download site". Regarding the patches, he notes that it's responsible for, "698 files changed, 27920 insertions(+), 9603 deletions(-)", going on to note, "which is impressive as we moved a huge chunk of -rt into mainline already ;-) The series file is attached below.". Ingo explains:
"the purpose of this finegrained splitup is to foster (even ;-) quicker upstream integration of various -rt features, and to see the full -rt tree integrated upstream. We also hope that this split-up queue helps various vendors standardize their (currently quite splintered) real-time implementations to the upstream -rt patchset. The queue is not (yet) bisectable at every point, and many of the splits are thematic, to allow the simpler future handling of updates."