Linux: VM Choices; vm-33 & rmap-12i

Submitted by Jeremy
on April 9, 2002 - 7:15pm

When upgrading to the latest kernel, it's generally preferable to apply one of the two major VM patches. One of the patches is maintained by Andrea Arcangeli, the author of the current standard 2.4 kernel VM. In a recent email, Andrea referrs to his vm-33 patch, recommending that you "never use a 2.4 kernel without first applying this vm patch". Andrew Morton also recently broke this patch into smaller pieces to allow for easier inclusion into the mainline kernel.

The other available patch is Rik van Riel's rmap VM. [Earlier story] Regarding this effort, Rik says, "This is an attempt at making a more robust and flexible VM subsystem, while cleaning up a lot of code at the same time." Today he released rmap-12i, now based on Marcelo's main 2.4 kernel tree. The -rmap VM is currently included in Alan Cox's -ac patches.

Emails from both Andrea and Rik follow.


From: Andrea Arcangeli
Subject: vm-33, strongly recommended [Re: [2.4.17/18pre] VM and swap - it's really unusable]
Date: 	Wed, 10 Apr 2002 01:36:09 +0200

I recommend everybody to never use a 2.4 kernel without first applying
this vm patch:

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.19pre5/vm-33.gz

It applies cleanly to both 2.4.19pre5 and 2.4.19pre6. Andrew splitted it
into orthogonal pieces for easy merging from Marcelo's side (modulo
-rest that is important too but that it's still quite monolithic, but
it's pointless to invest further effort at this time until we are
certain Marcelo will do its job and eventually merge it in mainline):

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.19pre5/

So far a first part of those patches is been merged into mainline into
pre5 (not any previous kernel, if you've some problem reproducible with
pre4 pre3 pre2 and pre1 or any previous kernel that's not related to the
async flushing changes, I seen a bogus report floating around to Marcelo
about pre1 pointing to the vm changes, it can't be the vm changes if
it's pre[1234]).

This VM is under heavy stressing for weeks on my SMP highmem machine
with a real life DBMS workload in a real life setup with huge VM
pressure with mem=1024m and 1.2G of shm pushed in swap constantly by the
kernel, performance of the workload is now very good and exactly
reproducible and constant, so I recommend it for all production systems
(both lowmem desktops and highend servers).

Alternatively you can use the whole -aa patchkit, to get all the other
critical highend features like pte-highmem, highio etc...

I haven't bugreports pending on the vm patch.

Thanks,

Andrea



From: Richard Gooch
Subject: Re: vm-33, strongly recommended [Re: [2.4.17/18pre] VM and swap - it's really unusable]
Date: 	Tue, 9 Apr 2002 18:07:50 -0600

Andrea Arcangeli writes:
> I recommend everybody to never use a 2.4 kernel without first applying
> this vm patch:
[...]

The way you write this makes it sound that the unpatched kernel is
very dangerous. Is this actually true? Or do you really just mean "the
patched kernel has better handling under extreme loads"?

				Regards,

					Richard....



From: Andrea Arcangeli
Subject: Re: vm-33, strongly recommended [Re: [2.4.17/18pre] VM and swap - it's really unusable]
Date: 	Wed, 10 Apr 2002 02:30:06 +0200

The unpatched kernel isn't dangerous in the sense it won't destroy data,
it won't corrupt memory and finally it won't deadlock on smp locks, but
it can theoretically deadlock with oom and it has various other runtime
issues starting from highmem balancing, too much swapping, lru list
balancing, related-bhs in highmem, numa broken with += min etc... so
IMHO it is better to _always_ use the patched kernel that takes care of
all problems that I know of at the moment, plus it has further
optimizations. OTOH for lots of workloads mainline is just fine, the
deadlocks never trigger and the runtime behaviour is ok, but unless you
are certain you don't need the vm-33.gz patch, I recommend to apply it.

Andrea

From: Rik van Riel Subject: [PATCH] rmap 12i Date: Tue, 9 Apr 2002 17:39:23 -0300 (BRT) NOTE: this version is based on marcelo's bitkeeper tree, old version info has mostly been lost. This is ok because merging with Linus and Marcelo is done in functional chunks anyway and not in historical chunks. The ninth maintenance release of the 12th version of the reverse mapping based VM is now available. This is an attempt at making a more robust and flexible VM subsystem, while cleaning up a lot of code at the same time. The patch is available from: http://surriel.com/patches/2.4/2.4.19p6-rmap-12i and http://linuxvm.bkbits.net/
My big TODO items for a next release are:
- O(1) page launder - currently functional but slow, needs to be tuned
- pte-highmem
- fine grained locking for SMP and NUMA (William Lee Irwin)

rmap 12i:
- slab cleanup (Christoph Hellwig)
- remove references to compiler.h from mm/* (me)
- move rmap to marcelo's bk tree (me)
- minor cleanups (me)
rmap 12h:
- hopefully fix OOM detection algorithm (me)
- drop pte quicklist in anticipation of pte-highmem (me)
- replace andrea's highmem emulation by ingo's one (me)
- improve rss limit checking (Nick Piggin)
rmap 12g:
- port to armv architecture (David Woodhouse)
- NUMA fix to zone_table initialisation (Samuel Ortiz)
- remove init_page_count (David Miller)
rmap 12f:
- for_each_pgdat macro (William Lee Irwin)
- put back EXPORT(__find_get_page) for modular rd (me)
- make bdflush and kswapd actually start queued disk IO (me)
rmap 12e
- RSS limit fix, the limit can be 0 for some reason (me)
- clean up for_each_zone define to not need pgdata_t (William Lee Irwin)
- fix i810_dma bug introduced with page->wait removal (William Lee Irwin)
rmap 12d:
- fix compiler warning in rmap.c (Roger Larsson)
- read latency improvement (read-latency2) (Andrew Morton)
rmap 12c:
- fix small balancing bug in page_launder_zone (Nick Piggin)
- wakeup_kswapd / wakeup_memwaiters code fix (Arjan van de Ven)
- improve RSS limit enforcement (me)
rmap 12b:
- highmem emulation (for debugging purposes) (Andrea Arcangeli)
- ulimit RSS enforcement when memory gets tight (me)
- sparc64 page->virtual quickfix (Greg Procunier)
rmap 12a:
- fix the compile warning in buffer.c (me)
- fix divide-by-zero on highmem initialisation DOH! (me)
- remove the pgd quicklist (suspicious ...) (DaveM, me)
rmap 12:
- keep some extra free memory on large machines (Arjan van de Ven, me)
- higher-order allocation bugfix (Adrian Drzewiecki)
- nr_free_buffer_pages() returns inactive + free mem (me)
- pages from unused objects directly to inactive_clean (me)
- use fast pte quicklists on non-pae machines (Andrea Arcangeli)
- remove sleep_on from wakeup_kswapd (Arjan van de Ven)
- page waitqueue cleanup (Christoph Hellwig)
rmap 11c:
- oom_kill race locking fix (Andres Salomon)
- elevator improvement (Andrew Morton)
- dirty buffer writeout speedup (hopefully ;)) (me)
- small documentation updates (me)
- page_launder() never does synchronous IO, kswapd
and the processes calling it sleep on higher level (me)
- deadlock fix in touch_page() (me)
rmap 11b:
- added low latency reschedule points in vmscan.c (me)
- make i810_dma.c include mm_inline.h too (William Lee Irwin)
- wake up kswapd sleeper tasks on OOM kill so the
killed task can continue on its way out (me)
- tune page allocation sleep point a little (me)
rmap 11a:
- don't let refill_inactive() progress count for OOM (me)
- after an OOM kill, wait 5 seconds for the next kill (me)
- agpgart_be fix for hashed waitqueues (William Lee Irwin)
rmap 11:
- fix stupid logic inversion bug in wakeup_kswapd() (Andrew Morton)
- fix it again in the morning (me)
- add #ifdef BROKEN_PPC_PTE_ALLOC_ONE to rmap.h, it
seems PPC calls pte_alloc() before mem_map[] init (me)
- disable the debugging code in rmap.c ... the code
is working and people are running benchmarks (me)
- let the slab cache shrink functions return a value
to help prevent early OOM killing (Ed Tomlinson)
- also, don't call the OOM code if we have enough
free pages (me)
- move the call to lru_cache_del into __free_pages_ok (Ben LaHaise)
- replace the per-page waitqueue with a hashed
waitqueue, reduces size of struct page from 64
bytes to 52 bytes (48 bytes on non-highmem machines) (William Lee Irwin)
rmap 10:
- fix the livelock for real (yeah right), turned out
to be a stupid bug in page_launder_zone() (me)
- to make sure the VM subsystem doesn't monopolise
the CPU, let kswapd and some apps sleep a bit under
heavy stress situations (me)
- let __GFP_HIGH allocations dig a little bit deeper
into the free page pool, the SCSI layer seems fragile (me)
rmap 9:
- improve comments all over the place (Michael Cohen)
- don't panic if page_remove_rmap() cannot find the
rmap in question, it's possible that the memory was
PG_reserved and belonging to a driver, but the driver
exited and cleared the PG_reserved bit (me)
- fix the VM livelock by replacing > by >= in a few
critical places in the pageout code (me)
- treat the reclaiming of an inactive_clean page like
allocating a new page, calling try_to_free_pages()
and/or fixup_freespace() if required (me)
- when low on memory, don't make things worse by
doing swapin_readahead (me)
rmap 8:
- add ANY_ZONE to the balancing functions to improve
kswapd's balancing a bit (me)
- regularize some of the maximum loop bounds in
vmscan.c for cosmetic purposes (William Lee Irwin)
- move page_address() to architecture-independent
code, now the removal of page->virtual is portable (William Lee Irwin)
- speed up free_area_init_core() by doing a single
pass over the pages and not using atomic ops (William Lee Irwin)
- documented the buddy allocator in page_alloc.c (William Lee Irwin)
rmap 7:
- clean up and document vmscan.c (me)
- reduce size of page struct, part one (William Lee Irwin)
- add rmap.h for other archs (untested, not for ARM) (me)
rmap 6:
- make the active and inactive_dirty list per zone,
this is finally possible because we can free pages
based on their physical address (William Lee Irwin)
- cleaned up William's code a bit (me)
- turn some defines into inlines and move those to
mm_inline.h (the includes are a mess ...) (me)
- improve the VM balancing a bit (me)
- add back inactive_target to /proc/meminfo (me)
rmap 5:
- fixed recursive buglet, introduced by directly
editing the patch for making rmap 4 ;))) (me)
rmap 4:
- look at the referenced bits in page tables (me)
rmap 3:
- forgot one FASTCALL definition (me)
rmap 2:
- teach try_to_unmap_one() about mremap() (me)
- don't assign swap space to pages with buffers (me)
- make the rmap.c functions FASTCALL / inline (me)
rmap 1:
- fix the swap leak in rmap 0 (Dave McCracken)
rmap 0:
- port of reverse mapping VM to 2.4.16 (me)

Rik

What VM patches do major distributions use?

gncuster
on
April 9, 2002 - 10:26pm

I know that Redhat uses the rmap VM, but what do other major distributions use?

suse

jcopenha
on
April 10, 2002 - 7:27am

suprise.. but I beleive that SuSe uses the -aa vm ... (aa works for SuSe)

mandrake, debian, and conectiva

gncuster
on
April 10, 2002 - 11:02am

Well I did some digging of my own. Mandrake uses the aa vm; from kernel-2.4.spec:

- andrea vm_25 (from 2.4.18-rc2aa2).

Debian it apears uses a stock kernel.

Conectiva uses the rmap (Rik works for conectiva)

interesting.

Alan Cox works for Red Hat

Anonymous
on
April 10, 2002 - 1:36pm

Basically the distributions use whatever their celebrity hacker says is best.

;)

Or...

Cabal
on
April 10, 2002 - 2:06pm

It's more likely that distributions use whichever is suited to their purpose. Mandrake and SuSE, more often used as desktop distributions patch for the -aa VM (faster, considered less robust), whereas RedHat, aimed at the server market, runs a patched -rmap kernel. Debian, sitting in the middle ground, uses a vanilla kernel. Probably not, but it sounded nice to me.

Debian kernel

Anonymous
on
April 11, 2002 - 2:05am

Debian does use almost vanilla kernel, but there are always few patches (like removed double free in zlib).

Celebrity Hackers

gncuster
on
April 10, 2002 - 2:43pm

Who is the celeb hacker at mandrake?

Celebrity Hackers

Anonymous
on
April 10, 2002 - 3:17pm

jgarzik

Re: mandrake, debian, and conectiva

nimrod
on
April 10, 2002 - 4:02pm

>Debian it apears uses a stock kernel.

true; but with debian's painfully slow release cycle, the latest release (potato) is more than a year old; It does use a vanilla kernel (i don't know which kernel woody (the next release, supposed to be out by may) will be using, though).

BUT, potato uses the 2.2 kernel, which came before any of the VM splits; so Debian's kernel is not relevant to this dicussion (i think ;D), since (afaik) there was only 1 VM for the 2.2.x kernels.

well, woody will be using 2.2

Anonymous
on
April 11, 2002 - 12:43am

well, woody will be using 2.2 STILL. because they said they don't feel 2.4 is stable yet. maybe 8 months ago it wasn't.. but now it is stable and boots machines that 2.2 will never boot ever (dual athlons for instance i guess).

http://www.debian.org/releases/testing/i386/release-notes/ch-whats-new.e...

Debian GNU/Linux 3.0 for the Intel x86 architecture ships with kernel version 2.2.20.

The 2.2 kernel series has been updated and developed extensively introducing several valuable changes both in the kernel and in other programs based on kernel features, along with a whole slew of new hardware drivers and bug fixes for existing drivers.

A 2.4 kernel is also included in this release for optional installation by users. Although the 2.4 branch is considered by the kernel developers to be a stable kernel branch, the Debian GNU/Linux release team judged it not to have reached sufficient maturity for inclusion as the default kernel in this release.

2.2 kernel is only the default in woody

Anonymous
on
April 16, 2002 - 6:59am

About a year ago i tried to install debian on my athalon with an HPT370A raid controller, at the time there was no support at all in debian to install using a 2.4 kernel unless you made your own boot disks.... something i was not able to do.... i just did an install of debian woody using a bootable woody ISO that allows you to choose from the basic 2.2 the ide-pci and the bf2.4 kernels, after choosing the bf2.4 kerenel the install went amazingly well. so while debian may not have chosen 2.4 as the default kernel the 2.4 kernel is in there for sure.

Not just i386

Anonymous
on
July 20, 2002 - 9:33am

Don't forget: for the kernel to be considered "stable" by Debian, it must be stable on *all* of Debian's supported architectures, not just i386.

not exactly

Anonymous
on
July 21, 2002 - 8:49am

kernel 2.4 is the default kernel on some architectures. gcc 3/3.1 is the default compiler, on some architectures. And so on.

As I recall, the Debian problem was pretty much this VM issue. Stock kernels suck, and Debian uses only stock kernels with minor patches (for whatever reason).

Maybe once 2.4 actually stabilizes (what, 20 releases after 2.4.0? ick)

Diversity & standards

Anonymous
on
April 10, 2002 - 11:35am

Hi,
I'm not a kernel hacker, so don't expect anything technical.
This comment is posted so as to be helpful: I don't want to start nothing. Pardon me in advance for unintended offenses.

The idea is simple: if possible, I ask both developers -- AA and Rik -- to keep an eye in a future unforking. That is, prepare things for a future occasion when we all acknowledge that one of these VMs has proved itself better, while the other still shows some desirable characteristic, albeit not favoured by the majority.

Nothing is 100% perfect, and an eventual VM winner will need to incorporate some feature from the loser one.

This same happens, IMHO, with toolkits (GTK, FLTK, QT) etc. and with Gnome/KDE.

Nothing ever wins really, but we gain insight from every different aproach. Whenever possible, it is helpful to consolidate the ideas and formalize a standard -- even in the worst scenario, when the standard says "use this VM in some cases, and the other VM in other occasions."

Thanks for your time.

More diversity (multiple standards)

Anonymous
on
July 20, 2002 - 2:49am

I'm no kernel guru either, but I dispute your assessment of the GUI toolkits and desktops. There's a good reason that GTK, FLTK, and QT haven't bee sucked into one Grand Unified TOolkit- each has a very different design and different uses.

Qt has features that make it very good for desktop GUIs, but at the same time it depends on C++ and can be very memory hungry. GTK has interfaces for many languages, but it still primarily a desktop GUI. FLTK is designed to be smaller than the other two, so it's finding a home in embedded systems.

Even something like PicoGUI that tries to scale to many different applications and languages wouldn't be applicable in all situations, mainly because of API compatibility.

In a kernel, the phenomenon isn't as severe as in GUI toolkits, since all the VMs must conform to the same interface standard. But, the same arguments of scalability probably apply.

On another note...

Cabal
on
April 10, 2002 - 10:48pm

Shouldn't the title be vm-33? ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.