Mel Gorman posted the seventh version of his Memory Compaction patches asking, "are there any further obstacles to merging?" The patches, first posted in May of 2007, provide a mechanism for moving GFP_MOVABLE pages into a smaller number of pageblocks, reducing externally fragmented memory. Mel explains that 'compaction' is another method of defragmenting memory, "for example, lumpy reclaim is a form of defragmentation as was slub 'defragmentation' (really a form of targeted reclaim). Hence, this is called 'compaction' to distinguish it from other forms of defragmentation."
The core compaction patch explains that memory is compacted in a zone by relocating movable pages towards the end of the zone:
"A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages."
"We have seen ramdisk based install systems, where some pages of mapped libraries and programs were suddendly zeroed under memory pressure. This should not happen, as the ramdisk avoids freeing its pages by keeping them dirty all the time," Christian Borntraeger began, explaining the need for his small patch to the ramdisk driver. He continued, "it turns out that there is a case, where the VM makes a ramdisk page clean, without telling the ramdisk driver. On memory pressure shrink_zone runs and it starts to run shrink_active_list. There is a check for buffer_heads_over_limit, and if true, pagevec_strip is called. pagevec_strip calls try_to_release_page. If the mapping has no releasepage callback, try_to_free_buffers is called. try_to_free_buffers has now a special logic for some file systems to make a dirty page clean, if all buffers are clean. Thats what happened in our test case."
He provided two methods for duplicating the reported problem, "you have to make buffer_heads_over_limit true" This is done by either lowering max_buffer_heads or having a system with lots of high memory. "The solution is to provide a noop-releasepage callback for the ramdisk driver. This avoids try_to_free_buffers for ramdisk pages."
"The current VM can get itself into trouble fairly easily on systems with a small ZONE_HIGHMEM, which is common on i686 computers with 1GB of memory," Rik van Riel said explaining a small patch to cmscan.c. He continued, "on one side, page_alloc() will allocate down to zone->pages_low, while on the other side, kswapd() and balance_pgdat() will try to free memory from every zone, until every zone has more free pages than zone->pages_high." He noted that highmem could be filled up with "page tables, ramfs, vmalloc allocations and other unswappable things quite easily and without many bad side effects, since we still have a huge ZONE_NORMAL to do future allocations from. However, as long as the number of free pages in the highmem zone is below zone->pages_high, kswapd will continue swapping things out from ZONE_NORMAL, too! Sami Farin managed to get his system into a stage where kswapd had freed about 700MB of low memory and was still 'going strong'." He described his patch:
"The attached patch will make kswapd stop paging out data from zones when there is more than enough memory free. We do go above zone->pages_high in order to keep pressure between zones equal in normal circumstances, but the patch should prevent the kind of excesses that made Sami's computer totally unusable."
Andrew Morton [interview] posted an overview of patches in -mm, discussing what is destined for inclusion in the upcoming 2.6.18 Linux kernel. He noted, "there is an unusually large amount of difficult material here." Patch sets that were discussed include a cleanup of kernel headers, klibc, various subsystem cleanups, the ACX1xx wireless driver, swsup cleanups, per-task statistic metrics, a clocksource management infrastructure, smpnice, swap prefetching [story], priority-inheriting futexes, a revamp of /proc/pid, ecryptfs, utsname virtualization [story], readahead, reiser4 improvements, a statistics infrastructure, and lock validation code.
Following up on a couple of features discussed earlier on KernelTrap, both swap-prefetching and utsname virtualization were briefly discussed. In regards to swap-prefetching Andrew noted, "I remain skeptical, but I have a lot of RAM. Multiple people have sung its praises. I guess I'll re-review and tentatively plan on sending them along or 2.6.18. Opinions are sought." As for utsname virtualization, "this doesn't seem very pointful as a standalone thing. That's a general problem with infrastructural work for a very large new feature. So probably I'll continue to babysit these patches, unless someone can identify a decent reason why mainline needs this work. I don't want to carry an ever-growing stream of OS-virtualisation groundwork patches for ever and ever so if we're going to do this thing... faster, please."
As RAM increasingly becomes a commodity, the prices drop and computer users are able to buy more. 32-bit archictectures face certain limitations in regards to accessing these growing amounts of RAM. To better understand the problem and the various solutions, we begin with an overview of Linux memory management. Understanding how basic memory management works, we are better able to define the problem, and finally to review the various solutions.
This article was written by examining the Linux 2.6 kernel source code for the x86 architecture types.