On Thu, Jul 03, 2008 at 10:19:57AM +0900, KAMEZAWA Hiroyuki wrote:I see following text in netlink man page. "However, reliable transmissions from kernel to user are impossible in any case. The kernel can’t send a netlink message if the socket buffer is full: the message will be dropped and the kernel and the userspace process will no longer have the same view of kernel state. It is up to the application to detect when this happens (via the ENOBUFS error returned by recvmsg(2)) and resynchronize." So at the end of the day, it looks like unreliability comes from the fact that we can not allocate memory currently so we will discard the packet. Are there alternatives as compared to dropping packets? - Let sender cache the packet and retry later. So maybe netlink layer can return error if packet can not be queued and connector can cache the event and try sending it later. (Hopefully later memory situation became better because of OOM or some process exited or something else...). This looks like a band-aid to handle the temporary congestion kind of problems. Will not be able to help if consumer is inherently slow and event generation is faster. This probably can be one possible enhancement to connector, but at the end of the day, any kind of user space daemon will have to accept the fact that packets can be dropped, leading to lost events. Detect that situation (using ENOBUFS) and then let admin know about it (logging). I am not sure what admin is supposed to do after that. I am CCing Thomas Graf. He might have a better idea of netlink limitations and is there a way to overcome these. I am not sure if proc connector currently allows filtering of various events like fork, exec, exit etc. In a quick look it looks like it does not. But probably that can be worked out. Even then, it will just help reduce the number of messages queued for user space on that socket but will not take away the fact that messages can be dropped under memory pressure. As of today it should happen because newly execed process will run into same cgroup as parent. But that's what probably we need to avoid. For example, if an admin has created three cgroups "database", "browser" "others" and a user launches "firefox" from shell (assuming shell is running originally in "others" cgroup), then any memory allocation for firefox should come from "browser" cgroup and not from "others". I am assuming that this will be a requirement for enterprise class systems. Would be good to know the experiences of people who are already doing some kind of work load management. Thanks Vivek --
| Greg KH | Og dreams of kernels |
| Jens Axboe | [PATCH 31/33] Fusion: sg chaining support |
| Arnd Bergmann | Re: finding your own dead "CONFIG_" variables |
| Mark Brown | [PATCH 2/2] Subject: natsemi: Allow users to disable workaround for DspCfg reset |
| Tony Breeds | [LGUEST] Look in object dir for .config |
git: | |
| Brian Downing | Re: Git in a Nutshell guide |
| John Benes | Re: master has some toys |
| Matthias Lederhofer | [PATCH 4/7] introduce GIT_WORK_TREE to specify the work tree |
| Alexander Sulfrian | [RFC/PATCH] RE: git calls SSH_ASKPASS even if DISPLAY is not set |
| Junio C Hamano | Re: Rss produced by git is not valid xml? |
| Linux Kernel Mailing List | iSeries: fix section mismatch in iseries_veth |
| Linux Kernel Mailing List |
