"This may count as one of the biggest -rc releases ever. It's humongous. Usually the compressed -rc1 diffs are in the 3-5MB range, with occasional smaller ones, and the occasional ones that top 6M, but this one is *eleven* megs," Linus Torvalds announced the first release candidate of the upcoming 2.6.24 kernel. He summarized some of the changes, "in short, we just had an unusually large amount of not just x86 merges, but also tons of new drivers (wireless networking stands out, but is by no means the only thing - we've got dvb, regular wired network, mmc etc all joining in), and a fair amount or architecture stuff, filesystems, networking etc too." He added:
"In other words, I don't even know where to start. The big noticeable thing is the x86 merge, and I think we all fervently hope that it won't cause any issues. So far it's been pretty smooth sailing. Knock wood. Less smooth has the scatter-gather changes to the block layer been, but they are hopefully all in reasonable shape by now too. And the VM changes? I honestly hope nobody even notices. Same goes for some of the VFS layer changes that affected basically every filesystem (although in mostly very straightforward ways)."
"I think the SG stuff looks ok now, but I think we have a lot of 'fix up the rough edges' to go!" Linus Torvalds noted regarding some of the fallout from the recent merge of Jens Axboe's SG chaining patchset. During one of the many discussions, Jens explained:
"It's all about the end goal - having maintainable and resilient code. And I think the sg code will be better once we get past the next day or so, and it'll be more robust. That is what matters to me, not the simplicity of the patch itself."
Boaz Harrosh commented, "thanks Jens for doing all this, The performance gain is substantial and we will all enjoy it." Jens replied, "my pleasure, I just wish it could have been a little less painful. But in a day or two, it should all be behind us and we can move forward with making good use of it."
Jens Axboe detailed the changes in his linux-2.6-block.git tree that he plans to merge into the upcoming 2.6.24 kernel. Among the changes were the necessary updates to enable SG chaining which is used for large IO commands, "the goal of sg chaining is to allow support for very large sgtables, without requiring that they be allocated from one contigious piece of memory." Andrew Morton asked for more information, "presumably sg chaining means more overhead on the IO submission paths? If so, has this been quantified?"
Jens explained that there is no overhead for existing logic which doesn't use sg chaining, "just cleanups to drivers to use
for_each_sg() and so on." He continued:
"For actually using the sg chaining, there's some overhead of course. Say we support 256 entries without chaining, or 1mb with 4kb pages. A request with 1000 entried would require 4 trips to the allocator to allocate the chainable lists and 4 trips when freeing that list again. We don't loop the sg list on setup of freeing, just jump to the correct locations. So even for chaining, the cost isn't that big. It enables us to support much larger IO commands and potentially speed up some devices quite a lot, so CPU cost is less of a concern. And for small sglists, there isn't a noticable overhead."
Jens Axboe [interview] posted a series of ten patches that add support for large IO commands. He began by defining the problem:
"Some people complain that Linux doesn't support really large IO commands. The main reason why we do not support infinitely sized IO is that we need to allocate a scatterlist to fill these elements into for dma mapping. The Linux scatterlist is an array of scatterlist elements, so we need to allocate a contiguous piece of memory to hold them all. On i386, we can at most fit 256 scatterlist elements into a page, and on x86-64 we are stuck with 128. So that puts us somewhere between 512kb and 1024kb for a single IO."
Jens went on to explain his solution, "to get around that limitation, this patchset introduces an sg chaining concept. The way it works is that the last element of an sg table can point to a new sgtable, thus extending the size of the total IO scatterlist greatly." Regarding the current status he noted, "it works for me, but you can't enable large commands on anything but i386 right now. I still need to go over the x86-64 iommu bits to enable it there as well."