Feature: How To Upgrade To The -wli Kernel

Submitted by Jeremy
on December 13, 2003 - 1:10pm

William Lee Irwin III [interview], from here on referred to simply as 'wli', has been maintaining a patchset against the 2.5 development kernel for some time. His announcement for 2.6.0-test11-wli-2 [story] caught my attention, so I decided to give it a try. Scroll down to the end of this article for a step-by-step guide walking you through how to apply the -wli patchset and compile your new kernel.

Curious to know more about wli's efforts, I dropped him an email with a few questions. His in-depth replies, included within, are quite insightful and informative. He explains the history behind this patchset, provides an overview of some of the improvements it contains, evaluates its stability, and talks a little about where he's going with it. Regarding the patchset, he explains, "one of the primary goals is to improve performance", adding, "there is a secondary goal of improving resource scalability and another of improving resource accounting."

On a cautionary note, some drivers and possibly some filesystems may have problems with a reduced kernel stack, so the 4K_STACK configuration option may be best left disabled, though read wli's comments within to determine if this affects you. Additionaly, wli explains that the -wli patchset is incompatible with smbfs and ncpfs due to removal of d_validate(), another change explained within. Finally, wli warns against using his patchset with binary-only graphics drivers, commenting that they seem, "utterly unable to cope with the changes I've made". None of these warnings applied to my personal desktop server which booted the -wli-2 kernel without problems. I'm happily testing it now as I write this article.


Jeremy Andrews: Is the focus of your -wli patchset to improve overall performance?

William Lee Irwin III: One of the primary goals is to improve performance, yes. I would say there is a secondary goal of improving resource scalability and another of improving resource accounting.

Jeremy Andrews: Are the changes best felt on big NUMA systems, or on smaller desktop boxes?

wli: It was originally developed as a set of patches to improve the performance on SDET (http://spec.org/sdm91/) on i386 NUMA systems.

At any rate, SDET is a multiuser simulation implemented as a set of shell scripts, so it stands to reason that it should improve the performance of shell scripts and the like on small systems as well as large. A number of the "design decisions" I made, if they can be called that, centered on the patches being useful on more systems than the ones I carried out the benchmark on. For instance, highpmd, which was done in order to allow middle-level pagetable nodes to reside in node-local memory, also has applications to 32-bit RISC machines, which have much stricter ZONE_NORMAL limits than i386, causing much more serious resource scalability issues with middle-level pagetables than i386 has.

Since that work was completed or at least halted, it's also served as something resembling a showcase for my work. I've added some low-impact things unrelated to that original effort like major/minor fault count accounting, wchan reporting improvements, and the like that are generally uninteresting with respect to performance.

JA: What are some of the more significant patches, especially for desktop users?

wli: One of the unfortunate aspects of the SDET benchmark is that it makes an assumption about ps(1) not having significant amounts of kernel involvement as this is the case on many other operating systems. My response to procps' rather heavy stressing of the kernel on Linux was to make adjustments to the kernel's mechanisms for retrieving /proc/ information. To do this, I forward ported Ben LaHaise's O(1) proc_pid_statm() from RHAS to 2.6, as well as a patch from Manfred Spraul for a faster proc_pid_readdir(), which I later replaced because it was more expensive in the case of smaller numbers of tasks than mainline. Others have done things like modifying the benchmark to remove the ps(1) component or replacing procps with libraries that parse /dev/kmem. So basically, top(1) and ps(1) should be much faster:

top - 18:20:07 up  1:17,  9 users,  load average: 1.37, 1.07, 1.96
Tasks: 20611 total,   1 running, 20609 sleeping,   1 stopped,   0 zombie
Cpu(s):   0.7% user,   0.7% system,   0.0% nice,  98.7% idle,   0.0% IO-wait
Mem:  32655240k total,  2658452k used, 29996788k free,      524k buffers
Swap:        0k total,        0k used,        0k free,     9096k cached   

  PID USER      PR  NI  VIRT  RES S %CPU %MEM    TIME+  #C nFLT Command
13735 wli       25   0 12312  11m R 17.6  0.0   1:27.75 15    0 top
10596 wli       16   0  3928 2816 S  1.0  0.0   0:29.99 14    0 slabtop
20969 wli       16   0  3928 2808 S  0.5  0.0   0:01.06  3    0 slabtop
  498 wli       17   0  2052  992 S  0.1  0.0   0:02.78  4    0 profloop
10584 wli       16   0  6556 2128 S  0.1  0.0   0:02.80  2    0 sshd
  470 wli       16   0  6556 2128 S  0.1  0.0   0:00.19  9    0 sshd
  484 wli       15   0  6556 2128 S  0.1  0.0   0:00.89  5    0 sshd
13744 wli       16   0  6556 2128 S  0.1  0.0   0:00.17  6    0 sshd
13779 wli       17   0  2732 1700 S  0.1  0.0   0:00.16  4    0 zsh

and analogous cpu cost reductions hold for smaller machines, though some of the locking advantages of always-ready statistics might not be observable on UP.

Another very significant overhead as observed in the benchmarking was pte_chain manipulation. It's really a consequence of i386's poor MMU architecture and Linux' adoption of the data structures the i386 MMU uses as translation tables as a standardized data structure instead of a procedural interface sane architectures can use to avoid the space, time, TLB, and cache overhead of the structures, and that i386 itself could use to insulate other architectures from the complexity of how these overheads need to be mitigated via sharing and so on. At any rate, the pagetable structures themselves have extremely poor internal fragmentation properties and very high cache footprints, and this was aggravated a great deal on i386 NUMA machines by manipulating lowmem-allocated data structures (i.e. those stuck on node 0) with similar cache and fragmentation properties themselves during the repetitive pagetable setup and teardown in SDET. This kind of overhead can be triggered by stressing pagetable setup and teardown more heavily on smaller systems, for instance, by compiling programs with many different source files, or making numerous connections to forking servers.

So there were two steps to addressing this. The first was to cache prezeroed pagetable pages, which took some effort, since they could have come from highmem. 2.4 did this, though it didn't have to deal with non-addressible pagetable memory, and so linked the nodes through their own memory instead of through other preexisting accounting structures as I've done. The second was to forward port a patch of Hugh Dickins' called "anobjrmap", which sets up data structures requiring much fewer updates than pte_chains, and that have much smaller memory footprints as well. I should probably mention that anobjrmap itself was done as an extension to and correction of the "partial objrmap" patch, which used a hybrid scheme instead of establishing structures for anonymous mappings analogous to those for file-backed mappings, but had serious issues handling memory allocation failures, and also had the ugliness of handling anonymous and file-backed memory differently.

JA: How stable should it be? ie, is there any potential for data-corrupting type bugs? Also, are there any known incompatibilities?

wli: -wli has been largely in maintenance and bugfix mode since 2.5.74, and I've dropped a number of riskier patches, so it should be relatively stable. Not very many new things have been added; only the major/minor fault accounting and wchan accounting are truly new. The O(lg(n)) proc_pid_readdir() _code_ is also new, but it just replaces another similar patch by Manfred Spraul that does the same thing another way.

The two largest incompatibilities are smbfs and ncpfs. I removed d_validate() after hearing from some crash dump hackers about how kern_addr_valid() is utter nonsense on most architectures and noticing that d_validate() used it. I figured out that d_validate() takes some arbitrary address, checks kern_addr_valid() (which is total garbage), and then treats it as a dentry. I was too disgusted to let it stand, and so neither smbfs nor ncpfs will compile in -wli and depend on CONFIG_BROKEN. ncpfs should be rather rare, and smbfs has a replacement, cifs, which should do as well as mainline in -wli. It's something of a nonessential change, but in all truth, d_validate() scared me enough I wanted it gone.

Another thing to notice is that optional 4KB stacks may be problematic with some drivers or fs's that perform large stack allocations. This is inherent, so the safest option is to leave stacks at 8KB. I personally don't need any of the problematic code and just use 4KB stacks. There are reports that some PCMCIA drivers trigger problems with 4KB stacks.

JA: Do you know of any specific drivers or file systems that perform large stack allocations? Conversely, if not using PCMCIA drivers, with what filesystems should 4KB stacks be safe?

wli: I don't remember which driver it was, but there was a PCMCIA wireless card reported to get stack overflows in -wli, probably at some point early during the -test cycle. The PCMCIA stack is very fragile, so I didn't dare carry out the needed rearrangements to reduce stack usage for that case, and would have had a hard time doing it anyway.

I also just happen to know that filesystem codepaths can be involved in deep stack usage, especially when called indirectly from a normal context through the VM for allocations, so even though I've never heard a bugreport of that kind, I'd still say it's a risk.

I also need to add a very strong warning against using binary-only graphics drivers in combination with -wli. I've had numerous reports of nvidia's binary-only drivers being utterly unable to cope with the changes I've made regardless of attempts to update to the glue layer.

JA: Are you hoping to merge some/all of these patches into 2.6?

wli: I'm particularly interested in merging the pte caching bits into 2.6, since those have low core impact and address a clear regression vs. 2.4. Many of the other patches are less compelling as far as risk/benefit due to high core impacts, though in general, I'm perfectly willing and ready to send in things deemed mergeable. It seems that the major/minor fault accounting may go somewhere, as akpm has expressed interest in it. Anobjrmap appears to be accumulating some popularity, so that may be considered later if there's enough demand for it and general core team consensus, though I don't feel comfortable pushing it very hard during a stable release since it carries out some sweeping core VM changes.

JA: How long do you intend to keep updating -wli?

wli: Essentially indefinitely. Among other things, it also serves as a showcase for my work, whether it be original work or forward porting relatively complex patches, so I'll continue adding things that are easy enough to keep around to it.


Howto Install Compile The -wli Kernel

Step 0: Make a backup of important data.
If you're going to be running a development kernel I highly recommend that you have a current backup of any important data.

Step 1: Upgrade to the latest 2.6 kernel
wli's patches apply against the latest 2.6 kernel source, so you'll need to download the latest 2.6 kernel source code first. For help on this, please refer to my earlier story about upgrading from 2.4 [story], and on using patches to upgrade 2.6 [story].

I was personally running 2.6.0.test10-mm1, so to save on time and bandwidth I copied this source tree and then upgraded it to 2.6.0-test11 with patches. First I copied the source tree (using links), then I removed the -mm1 patch, and finally I installed the -test11 patch:

   # pwd
   /usr/src
   # cp -rl linux-2.6.0-test10-mm1 linux-2.6.0-test11-wli-2
   # bunzip2 -dc ../2.6.0-test10-mm1.bz2 | patch -R -p1
   # bunzip2 -dc ../patch-2.6.0-test11.bz2 | patch -p1

Step 2: Obtain wli's latest patch.
At the time of this writing, wli's latest patch is 2.6.0-test11-wli-2. This can be found from your nearest kernel.org mirror by navigating to "/pub/linux/kernel/people/wli/kernels/2.6.0-test11/".

You can find your nearest mirror at this link: http://kernel.org/mirrors/.

It's recommended that you also download the signature file to verify the patch's validity. Find full details on how this is done here.

Step 3: Apply the patch.
Applying the patch to upgrade your source to the -wli-2 tree is quite simple.

Here's what I did to patch my kernel:

   # pwd
   /usr/src
   # cd linux-2.6.0-test11-wli-2
   # bzip2 -dc ../patch-2.6.0-test11-wli-2.bz2 | patch -p1

It's the second line that does the actual patching, taken straight out of the README that's in the top level of your Linux kernel source tree. If you're using a *.gz version of the patch, simply replace 'bzip2' with 'gzip' in that command.

I also applied two additional patches against -wli-2 that wli posted to the lkml after -wli-2 was announced:

   # pwd
   /usr/src/linux-2.6.0-test11-wli-2
   # cat ../wli-2.patch | patch -p1

Step 4: Cleanup stale .o files and dependencies.
Now that your kernel source tree is patched to the latest -wli code, be sure to remove the any stale object files and dependencies. This is done with 'make mrproper', as follows:

   # pwd
   /usr/src/linux-2.6.0-test11-wli-2
   # make mrproper

Note: If you didn't save your old source tree, be sure to save a copy of your '.config' file before running 'make mrproper'! It can be useful to store the latest copy in '/usr/src'.

Some readers have pointed out that this step should no longer be required thanks to the new build system found in the 2.6 kernel. My reply is two-fold. First, it's not going to hurt anything. And second, the README included with the 2.6 kernel (linked above) still recommends this step and thus so do I.

Step 5: Configure your new kernel.
This step is made much simpler if you have an already compiled 2.6.0-test kernel. I used my old '.config' configuration file and the text based 'make oldconfig' method as follows:

   # pwd
   /usr/src/linux-2.6.0-test11-wli-2
   # cp ../linux-2.6.0-test10-mm1/.config .
   # make oldconfig

Most all the options will zoom by, automatically answered based on your existing .config file. You'll only be asked about new options. For example, when I upgraded from 2.6.0-test10-mm1, I saw the following three new options:

Use smaller 4k per-task stacks (4K_STACK) [N/y/?] (NEW) ?

This option will shrink the kernel's per-task stack from 8k to
4k.  This will greatly increase your chance of overflowing it.
But, if you use the per-cpu interrupt stacks as well, your chances
go way down.  Also try the CONFIG_X86_STACK_CHECK overflow
detection.  It is much more reliable than the currently in-kernel
version.

  Detect stack overflows (X86_STACK_CHECK) [N/y/?] (NEW) ?

Say Y here to have the kernel attempt to detect when the per-task
kernel stack overflows.  This is much more robust checking than
the above overflow check, which will only occasionally detect
an overflow.  The level of guarantee here is much greater.

Some older versions of gcc don't handle the -p option correctly.  
Kernprof is affected by the same problem, which is described here:
http://oss.sgi.com/projects/kernprof/faq.html#Q9

Basically, if you get oopses in __free_pages_ok during boot when
you have this turned on, you need to fix gcc.  The Redhat 2.96 
version and gcc-3.x seem to work.  

If not debugging a stack overflow problem, say N
Say Y here if you are hacking the kernel to trim stack usage
on 4KB stacks and are unafraid of frequent panics. If youre
using 8KB stacks, this is less interesting, but could point
out unusual broken codepaths.

Top-down vma allocation (MMAP_TOPDOWN) [N/y/?] (NEW) ?

Say Y here to have the kernel change its vma allocation policy
to allocate vma's from the top of the address space down, and
to shove the stack low so as to conserve virtualspace. This is
risky because various apps, including a number of versions of
ld.so, depend on the kernel's bottom-up behavior.

Step 6: Build your new kernel.
To build a new kernel on x86, all you need to type is 'make'. If you've chosen to compile any modules, you'll also need to install them by typing 'make modules_install'. Or, you can string these two commands together: 'make && make modules_install'.

If you're curious about what other 'make' options there are when building your kernel, type 'make help'.

Step 7: Install your new kernel.
Now that you've built your kernel, you need to copy it into place. You'll want to copy this file and your new System.map into /boot. Some prefer to use 'make install' for this, but I prefer to do it manually so I have complete control over what happens. For example:

    # pwd
    /usr/src/linux-2.6.0-test11-wli-2
    # mv arch/i386/boot/bzImage /boot/bzImage-2.6.0-test11-wli-2 
    # mv System.map /boot/System.map-2.6.0-test11-wli-2
    # cd /boot
    # rm System.map
    # ln -s System.map-2.6.0-test11-wli-2 System.map 

Note that when typing 'rm System.map', I'm only removing a symbolic link, not an actual file.

Having copied your new kernel into place, now you need to configure your boot loader. You're probably using grub [manual] or lilo [howto], refer to the appropriate documentation if you're unsure how your boot loader works. My new grub entry looks like:

title 2.6.0-test11-wli-2
        root (hd0,0)
        kernel /boot/bzImage-2.6.0-test11-wli-2 ro root=/dev/hda1

Step 8: It's still not too late...
It's still not too late to back up any important data on your hard drive...

Step 9: Try your new kernel.
You're now ready to reboot your computer and try out your brand new 2.6-wli kernel. Go give it a try and then come back and post your reactions.

Endorsement

Con Kolivas
on
December 13, 2003 - 1:14pm

If people are looking for a performance patchset that gives them that edge they're looking for on the desktop as well as big b0xen; this is it. Give this patchset a try and watch it closely.

Binary graphics drivers?

Anonymous
on
December 13, 2003 - 9:00pm

I'm running 2.6.0-test11-wli-2 for some minutes now, with a GeForce 2 MX graphics card and the "nvidia" binary driver.

Switching virtuals from/to xfree has became faster =)
At least with the nvidia binary driver there seems to be no problems.

Has anyone tried it with a binary driver of another graphics card?

nvidia doesn't work for me

Anonymous
on
December 13, 2003 - 9:07pm

the "nvidia" driver does not seem to work for me in test11-wli-2, with or without its built in agpgart. I get a complete, and very nasty hardlock as soon as X attempts to leave console. Using a GeForce2 GTS on Debian/unstable.

What's your secret? :P

nvidia's driver has been rath

wli
on
December 14, 2003 - 12:42am

nvidia's driver has been rather unhappy with changes to the fields and size of struct page as well as changes to pagetable handling API's, both of which I've altered. I can only guess, but it would appear they use inline functions accessing struct page or otherwise access fields of struct page, and probably also walk pagetables in the driver. I'm surprised it appears to work for anyone at all and strongly advise against using it in combination with -wli even if it does appear to work.

test11-wli-2 and Nvidia

Anonymous
on
December 13, 2003 - 9:04pm

Recently compiled test11-wli-2. Seems like nVidia's binary driver does not work (tried it with nvAGP and my kernel's AGP GART). He mentioned it breaking binary drivers, so I'm not really suprised or disapointed.

I haven't performed any benchmarks yet, but so far in the console and in X (despite using a lower performance driver) seem to be somewhat quicker; however it could just be the placebo affect.

fixed it - just had to disable 4K_STACK (nt)

Anonymous
on
December 14, 2003 - 12:37am

nt

It's sad, but these things ar

wli
on
December 14, 2003 - 12:54am

It's sad, but these things are hard to quantify. I've tried using things like xwit to try to test it, but the timings end up dominated by process creation and destruction as opposed to the actual graphics operation. It would be rather helpful if there were a hardware test kit to enable a host system to feed simulated user input from keyboard and mouse to a target and instrumentation options in the X server to time various operations, but that is an extremely tall order on both ends.

One unusual property of -wli observed by hrandoz in earlier releases was pipe-based context switch timings being accelerated for non-obvious reasons (potentially cache effects?). X uses AF_UNIX and TCP sockets, and so likely has some dependence on context switch efficiency. This could be at least partially quantified with hackbench, which uses AF_UNIX socketpairs in its default mode, but I don't have any useful comparative results with it. Also, the connection is rather tenuous, so it won't be a truly sufficient explanation for observed improvements in X11 performance.

ps

inojte
on
December 15, 2003 - 12:11am

The order of processes as reported by ps seems backwards, higher pids are reported first. Nothing wrong, its just kinda odd to look at. And kinda cool : )

That's unusual; can you check

wli
on
December 15, 2003 - 1:25am

That's unusual; can you check that ls -1U /proc comes out in numerically ascending order order?

I've not seen or heard of a problem of this kind.

sure can, here you go,..

inojte
on
December 15, 2003 - 2:26am

uptime
loadavg
self@
2/
3/
4/

Yup, numerically ascending. Using procps version 2.0.16 on slackware 9.
'ps axf' triggers this,. while 'ps ax' displays as per normal.

Need anything else?

I'm not sure what, if anythin

wli
on
December 15, 2003 - 3:51am

I'm not sure what, if anything, I can do about it. AFAICT it's userspace. procps 3.x doesn't appear to have this kind of an issue.

didn't think so

inojte
on
December 15, 2003 - 8:38am

Yea I didn't think there was much for you to do. No reason to give acahalan a reason to gloat over 2.x yet though, I just installed 3.1.14 on this system and I see the same behavior.

alan@temujin:~$ /usr/bin/ps -V
procps version 3.1.14

alan@temujin:~$ /usr/bin/ps axf
[sample of output]
5 ? SWN 0:00 [ksoftirqd/1]
4 ? SW 0:00 [migration/1]
3 ? SWN 0:00 [ksoftirqd/0]
2 ? SW 0:00 [migration/0]
[/sample]

alan@temujin:~$ ps ax
[sample]
2 ? SW 0:00 [migration/0]
3 ? SWN 0:00 [ksoftirqd/0]
4 ? SW 0:00 [migration/1]
5 ? SWN 0:00 [ksoftirqd/1]
[/sample]

using glibc 3.2.3. linuxthreads-0.10, basic slackware 9 setup, etc.

I'll poke around on some of my other systems and send a note to calahan and riel later on today.

This version of procps is mor

wli
on
December 15, 2003 - 8:53am

This version of procps is more actively maintained, so I'd expect something to be done about it, though probably by acahalan and/or riel.

I should also note this matches the procps version on my system, which does not behave this way (equivalent kernel version wrt. /proc/ behavior).

how procps-3.1.x works

Anonymous
on
December 16, 2003 - 7:02pm

By default, process order is as given by
the readdir() function. Sorting is avoided
because it eats memory. Also, when an
all-too-common kernel bug causes ps to get
stuck, you'd like to at least get partial
output before the hang.

The f option, for forest output, causes ps
to sort processes. (by ppid if you must know)
Maybe you just didn't ever notice before.
In that case, since sorting is required already,
the start_time may be used as a tie-breaker.

(Albert Cahalan, w/o an account here)

BTW, procps-2.x.xx sorts processes by default
and then discards duplicates on the assumption
that they might be old-style LinuxThreads tasks.
The bug reports are numerous, obviously.

I've rewritten the readdir()

wli
on
December 17, 2003 - 12:59am

I've rewritten the readdir() functions here. The algorithm effectively guarantees no duplicates and perfect sorting, plus O(lg(n)) seeks into
the tasklist. The user sees the perfect sorting and no duplicates and so on when doing ls /proc/ and using some ps(1) options, but using another (I think "ps ax" vs. "ps axf") ps(1) appears to sort in reverse where it doesn't in other kernels, and our user seems to be able to reproduce the behavior on procps-3.1.14.

can i see ur code

nit (not verified)
on
September 15, 2005 - 3:34am

can is see ur code .... what happens when simaltaneously files are added to the same directory when u are reading it

Interesting, but,..

inojte
on
December 17, 2003 - 2:35am

Still doesnt explain the fact that _all_ processes are affected. Look at the example I posted, they're kernel threads, they don't use userspace thread libs. The reverse ps listing is occurs regardless of procps version, and the kernelspace processes shouldn't have ppids,.. right?

One more thing, I don't think it should matter much, but I have vmware modules inserted into my kernel. Somewhat tainting the scenario. I have another machine with a similar setup at work, save that it is using the crux distribution. No problems there. This machine at home seems to be oddly unique somehow.

They have ppid's of 1 except

wli
on
December 17, 2003 - 3:06am

They have ppid's of 1 except for init, which has a ppid of 0.
I suspect vmware and other binary-only modules would affect this kind of thing in a catastrophic way if it were to affect it at all. Probably the best way to deal with this if you're to do so yourself is to rebuild procps and ps(1) with debugging symbols and point gdb(1) at it until some sort of explanation of what internal decisions it's making wrong in your case surfaces.

This may have simple explanations, for instance, the sorting routine being called may not be stable (technical sense, nothing to do with crashing or not, but rather preserving the order of things that are identical by the sorting criteria), and so reverses or otherwise permutes the order of processes with identical ppid's such as kernel threads. This would be especially likely if an unusual libc version is involved.

You may also want to doublecheck how it was invoked if you have both versions installed simultaneously to make sure that path or LD_LIBRARY_PATH issues aren't making the old procps actually get invoked.

I think we may be stuck asking you to point gdb(1) at it at this point. The kernel should have little or nothing to do with this issue unless it produces grossly incorrect results, which AFAICT it isn't. A more rigorous way to verify ppid's is to compare ps -fade vs. ps axf and send full output to both me and acahalan to make sure the ppid values are as they should be. After that, it would be helpful if you could find out what's in ps' memory with gdb(1) before and after it attempts to sort the things, and what sort routine it's calling. I have to confess I'm personally having a tough time following the code flow well enough to find the sorting routine within procps that would be called for ps axf and audit it.

plain old qsort

Anonymous
on
December 17, 2003 - 12:57pm

ps uses qsort, called from fancy_spew in display.c
(fancy_spew is for forest and sorted output, while
simple_spew is for the memory-saving default)

The function passed to qsort calls simple
comparison functions until it gets a non-zero
result or runs out of comparison functions.
The comparison functions are in a linked list.

For forest output (--forest, f, or -H) the list
of sort functions starts with the ppid. After
that comes any user-specified sort functions,
or sorting by start_time if the user didn't
specify anything.

(Albert Cahalan)

qsort is an unstable sort

Anonymous
on
December 17, 2003 - 5:30pm

At least on my system, with plain 2.6.0-test11,
most of the built-in kernel tasks have the
exact same ppid and start_time. So their sort
order is undefined. The qsort man page says:

"If two members compare as equal, their order
in the sorted array is undefined."

I suspect that some kernel config option adds
or removes one of the built-in kernel tasks.
This may cause the unstable qsort function to
produce different output than it normally would.

The "ps axf" or "ps -eH" output is only sorted
by start_time because that is almost free. The
behavior isn't documented, promised, required...

I suppose that, when both ppid and start_time
are the same, the pid could be used.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.