> On Wed, Apr 7, 2010 at 4:14 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Wed, Apr 07, 2010 at 12:06:07PM +0800, Minchan Kim wrote:
> >> On Wed, Apr 7, 2010 at 11:54 AM, Taras Glek <tglek@mozilla.com> wrote:
> >> > On 04/06/2010 07:24 PM, Wu Fengguang wrote:
> >> >>
> >> >> Hi Taras,
> >> >>
> >> >> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
> >> >>
> >> >>>
> >> >>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
> >> >>>
> >> >>>>
> >> >>>> Hello,
> >> >>>> I am working on improving Mozilla startup times. It turns out that page
> >> >>>> faults(caused by lack of cooperation between user/kernelspace) are the
> >> >>>> main cause of slow startup. I need some insights from someone who
> >> >>>> understands linux vm behavior.
> >> >>>>
> >> >>
> >> >> How about improve Fedora (and other distros) to preload Mozilla (and
> >> >> other apps the user run at the previous boot) with fadvise() at boot
> >> >> time? This sounds like the most reasonable option.
> >> >>
> >> >
> >> > That's a slightly different usecase. I'd rather have all large apps startup
> >> > as efficiently as possible without any hacks. Though until we get there,
> >> > we'll be using all of the hacks we can.
> >> >>
> >> >> As for the kernel readahead, I have a patchset to increase default
> >> >> mmap read-around size from 128kb to 512kb (except for small memory
> >> >> systems). This should help your case as well.
> >> >>
> >> >
> >> > Yes. Is the current readahead really doing read-around(ie does it read pages
> >> > before the one being faulted)? From what I've seen, having the dynamic
> >> > linker read binary sections backwards causes faults.
> >> >
http://sourceware.org/bugzilla/show_bug.cgi?id=11447
> >> >>
> >> >>
> >> >>>>
> >> >>>> Current Situation:
> >> >>>> The dynamic linker mmap()s executable and data sections of our
> >> >>>> executable but it doesn't call madvise().
> >> >>>> By default page faults trigger 131072byte reads. To make matters worse,
> >> >>>> the compile-time linker + gcc lay out code in a manner that does not
> >> >>>> correspond to how the resulting executable will be executed(ie the
> >> >>>> layout is basically random). This means that during startup 15-40mb
> >> >>>> binaries are read in basically random fashion. Even if one orders the
> >> >>>> binary optimally, throughput is still suboptimal due to the puny
> >> >>>> readahead.
> >> >>>>
> >> >>>> IO Hints:
> >> >>>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb
> >> >>>> reads and a binary that tends to take 110 page faults(ie program stops
> >> >>>> execution and waits for disk) can be reduced down to 6. This has the
> >> >>>> potential to double application startup of large apps without any clear
> >> >>>> downsides.
> >> >>>>
> >> >>>> Suse ships their glibc with a dynamic linker patch to fadvise()
> >> >>>> dynamic libraries(not sure why they switched from doing madvise
> >> >>>> before).
> >> >>>>
> >> >>
> >> >> This is interesting. I wonder how SuSE implements the policy.
> >> >> Do you have the patch or some strace output that demonstrates the
> >> >> fadvise() call?
> >> >>
> >> >
> >> > glibc-2.3.90-ld.so-madvise.diff in
> >> >
http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0:
> >> >
> >> > As I recall they just fadvise the filedescriptor before accessing it.
> >> >>
> >> >>
> >> >>>>
> >> >>>> I filed a glibc bug about this at
> >> >>>>
http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented
> >> >>>> with his concern about wasting memory resources. What is the impact of
> >> >>>> madvise(WILLNEED) or the fadvise equivalent on systems under memory
> >> >>>> pressure? Does the kernel simply start ignoring these hints?
> >> >>>>
> >> >>>
> >> >>> It will throttle based on memory pressure. In idle situations it will
> >> >>> eat your file cache, however, to satisfy the request.
> >> >>>
> >> >>> Now, the file cache should be much bigger than the amount of unneeded
> >> >>> pages you prefault with the hint over the whole library, so I guess the
> >> >>> benefit of prefaulting the right pages outweighs the downside of evicting
> >> >>> some cache for unused library pages.
> >> >>>
> >> >>> Still, it's a workaround for deficits in the demand-paging/readahead
> >> >>> heuristics and thus a bit ugly, I feel. Maybe Wu can help.
> >> >>>
> >> >>
> >> >> Program page faults are inherently random, so the straightforward
> >> >> solution would be to increase the mmap read-around size (for desktops
> >> >> with reasonable large memory), rather than to improve program layout
> >> >> or readahead heuristics :)
> >> >>
> >> >
> >> > Program page faults may exhibit random behavior once they've started.
> >> >
> >> > During startup page-in pattern of over-engineered OO applications is very
> >> > predictable. Programs are laid out based on compilation units, which have no
> >> > relation to how they are executed. Another problem is that any large old
> >> > application will have lots of code that is either rarely executed or
> >> > completely dead. Random sprinkling of live code among mostly unneeded code
> >> > is a problem.
> >> > I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB with
> >> > proper binary layout. Even if one lays out a program wrongly, the worst-case
> >> > pagein pattern will be pretty similar to what it is by default.
> >> >
> >> > But yes, I completely agree that it would be awesome to increase the
> >> > readahead size proportionally to available memory. It's a little silly to be
> >> > reading tens of megabytes in 128kb increments :) You rock for trying to
> >> > modernize this.
> >>
> >> Hi, Wu and Taras.
> >>
> >> I have been watched at this thread.
> >> That's because I had a experience on reducing startup latency of application
> >> in embedded system.
> >>
> >> I think sometime increasing of readahead size wouldn't good in embedded.
> >> Many of embedded system has nand as storage and compression file system.
> >> About nand, as you know, random read effect isn't rather big than hdd.
> >> About compression file system, as one has a big compression,
> >> it would make startup late(big block read and decompression).
> >> We had to disable readahead of code page with kernel hacking.
> >> And it would make application slow as time goes by.
> >> But at that time we thought latency is more important than performance
> >> on our application.
> >>
> >> Of course, it is different whenever what is file system and
> >> compression ratio we use .
> >> So I think increasing of readahead size might always be not good.
> >>
> >> Please, consider embedded system when you have a plan to tweak
> >> readahead, too. :)
> >
> > Minchan, glad to know that you have experiences on embedded Linux.
> >
> > While increasing the general readahead size from 128kb to 512kb, I
> > also added a limit for mmap read-around: if system memory size is less
> > than X MB, then limit read-around size to X KB. For example, do only
> > 128KB read-around for a 128MB embedded box, and 32KB ra for 32MB box.
> >
> > Do you think it a reasonable safety guard? Patch attached.
>
> Thanks for reply, Wu.
>
> I didn't have looked at the your attachment.
> That's because it's not matter of memory size in my case.