"Any time the OOM killer fires, something's wrong with the system, and it's more productive to deal with that than to wish for a more accurate OOM killer."
Has there been any research into alternative approaches, such as throttling processes that are allocating memory too quickly, or induce paging too often?
As has been said, the OOM killer isn't there to fix a broken system, it's there so that a broken system would be recoverable/debuggable. Currently, by the time the OOM killer kicks in, the average desktop user will have lost all patience and already flicked the reset switch, because one process was allowed to push others completely out of RAM.
And those other processes were pushed out because they were using said resources rarely compared to the broken application. So, one could say that Linux currently facilitates broken applications to eat out everything else, by not giving anything else a chance to work anymore.
But in the bad scenario, the reverse makes much more sense -- penalizing memory hogs, not interactive processes that use resources sparingly.
For instance, if an application starts allocating memory and swapping at an insane rate, it makes little sense to swap out/discard X11's pages as a result, because (a) the amount of recoverable memory is probably small; (b) X11 is absolutely critical in allowing the user to terminate the misbehaving process in the first place.
The alternative is to let the misbehaving process run into the swap as much as it wants, while still keeping the working sets of well-behaving processes in RAM.
Anyway, this is just an idea, I'm not sure how applicable it is in real life.
Any time the OOM killer fires, something's wrong with the system, and ... the system will probably take minutes to respond to each key presses, which is exactly why the OOM was created in the first place - to free up the largest memory hog so someone can begin figuring out what went wrong.
I'd recommend monitoring/logging/profiling/tracing the memory hog *before* it transforms the system into molasses.
The problem is that the OOM doesn't actually help fix my system once it goes to molasses. I mean, if my choices are (a) system so slow because Firefox sucks weiner that I can't use my actually important apps or (b) OOM killer kicks in because Firefox sucks weiner and my actually important apps get killed, then I'm going to choose (c) just reboot the computer and be ashamed that my roommate's XP system never, ever has this kind of stupid-ass problem.
Quite simply, Linux shouldn't ever get slow as molasses, and the OOM killer shouldn't even exist. No single user-space app should be able to effectively kill my desktop by way of swapping everything out to disk.
I don't know what sort of system you have, but in all my linux usage, I think I've seen the OOM killer in action once - there was a bad java app that wanted to just suck up all the memory as fast as possible. But in day to day linux usage, I just don't see the OOM killer, ever - and I use linux all day every day at work, and at home.
Sure, I've seen firefox go catatonic - but I haven't seen it trigger the OOM killer yet. I restart firefox and life goes on.
BTW, expee "never ever" has memory problems? ROTFLMAO! Good thing I wasn't drinking coffee or the keyboard would have been sprayed. I've heard too many expee horror stories to belive that hype!
Actually, it is possible to get your XP system into running as slow as molasses--it's simply a matter of exhausting a critical resources. Critical resources are generally CPU time, memory, and/or I/O bandwidth. The OOM is only focused on memory. The OOM killer is not intended to fix your system when you are out of memory. Rather, it is intended to free up enough memory so you can interact with the system to see what's going on.
You're very misinformed that XP systems don't have that problem. Memory isn't magical, it doesn't grow on trees. Completely exhaust the memory on _any_ system and it will "get slow as molasses", grind to a halt or crash.
I see this all to frequently when Visual Studio sucks up 600MB on this machine with 1GB of RAM. It starts paging so bad that it can take a good 20 seconds just to get IE to come to focus after being minimized. The biggest difference with XP is that it will (under default configuration) keep autoextending your paging (swap) file for you. Maybe removing any need for a true OOM killer, but not exactly fixing the problem either. Drive XP hard enough and it will grind so hard it appears locked up.
So really you should fix your system. Add more memory, throw in more swap, set ulimits on known memory hogs. Or just disable the OOM:
echo 0 > /proc/sys/vm/oom-kill
vm.oom-kill = 0
I don't think you read what I said. My roommate's system never, ever has this problem. I don't care if it's possible for XP to run out of memory in theory, because it never fucking happens on his machine, and he runs just as much crap (including Firefox) as I do. My roommate has been using XP with no virus scanner or other bloat for longer than this particular install of Ubuntu has been here (1.5 years), and he's not even once had a system crash, a locked up desktop, a virus or other malware, or any problems other than some trouble getting our printer working (and surprise, Linux can't print to it correctly either, it always cuts off the top 1" of every page, due to hpijs driver bugs that nobody's fixed in a year). If you aren't a jackass moron who installs random crap off the web and you use hardware with stable drivers, XP works faster and more stably than any Linux desktop I have ever had. I don't and won't use it, but I'll tell you right now I get _real_ jealous sometimes thinking of all the unstable crap I have to put up with on Linux.
Well, I have used more than four or five XP systems where when an external USB drive with continuous I/O for more than a couple of minutes will kill any and all network connections. At least three of those had completely different hardware (CPU, chipset, USB host chip, network controller e.t.c.) - the only common factor was the OS. I guess everyone's experience varies. I do find it interesting that most people mention Firefox as a main cuplrit for their problems - next thing we know Firefox will have a --timedemo option, right next to a --showfps one...
Have you ever thought about installing XP? Unless you develop for Linux, it's definitely worth it, even if you do have to pay some cash for it. Heck, even if I did develop for Linux, I would consider running VMWare under XP with a few different Linux VMs. Linux works perfectly in my house, but only in a server/router role. I have one desktop that has Linux installed on it, but it is a second PC in my bedroom that I just use to screw around on -- and it is great for that.
Then, configure your system accordingly and don't allow these apps to grow beyond limits in memory.
Awesome idea, too bad that isn't even remotely possible. Let's assume that I already set my user process limits to 512MiB on this machine, which has 2GiB of RAM. Awesome, now Firefox can at most consume only 1/4th of my machine's memory, right? Well, no, actually, that's not true at all. Because, see, with the great architecture we call X, all of a process's pixmap data isn't actually owned by the process, it's owned by the X process. So, after browsing for long enough on photo gallery sites or other media-heavy sites (like, say, the ones I work on for a living), Firefox is still using less than that 512MiB limit but X is now chewing up gigabytes. Take out the 512MiB of that from my graphics card and you still have a memory problem. One which eventually causes the machine to go into swap-death.
Sure, I could set a limit on X, but then when that limit is reached you either end up with X dying (which is for all intents and purposes no different than just rebooting the machine, since all of your apps and working data go bye-bye) or with X no longer to allocate more pixmap memory, which means all of your other apps still become dead since most interesting apps do a lot of pixmap allocation, even for simple things like text glyphs.
So, until there is some way for X to tell the kernel that some amount of internal memory should count toward's a process's memory limits, there is actually no feasible way to limit the actual amount of memory a process causes your system to consume. Added to the pixmap leaks in Firefox and even the occassional pixmap leak in X, it's only a matter of time before your system runs out of memory if you don't restart Firefox regularly.
Sure, I could set a limit on X, but then when that limit is reached you either end up with X dying [...] or with X no longer to allocate more pixmap memory,
X should never die because of low resources. If it does, that's a very serious bug, and one that you should immediately report to X.Org.
As you justly note, when it's out of memory, X will reject new allocations by returning a BadAlloc error to client applications. Ideally, the X server itself should be able to enfore per-client resource limits. Most of the work needed to do that has been done by Mark Vojkovich a few years ago (have a look at the XRes extension); all that's left is just a little bit of hacking.
See the bottom of this page:
Since 2.1.27 [...] proc file /proc/sys/vm/overcommit_memory [...]
Since 2.5.30 the values are: 0 (default): as before: guess about how much overcommitment is reasonable, 1: never refuse any malloc(), 2: be precise about the overcommit - never commit a virtual address space larger than swap space plus a fraction overcommit_ratio of the physical memory. Here /proc/sys/vm/overcommit_ratio (by default 50) is another user-settable parameter. [...] (See also Documentation/vm/overcommit-accounting.)
Thanks for the link, it's really interesting.
I thought everyone had given the OOM killer up as a bad idea.