A recent discussion on the Linux Kernel mailing list noted that threaded 64-bit applications suffer a drastic slowdown in pthread_create performance when stack utilization goes above 4GB. Ingo Molnar offered an explanation of the problem, "unfortunately MAP_32BIT use in 64-bit apps for stacks was apparently created without foresight about what would happen in the MM when thread stacks exhaust 4GB. The problem is that MAP_32BIT is used both as a performance hack for 64-bit apps and as an ABI compat mechanism for 32-bit apps. So we cannot just start disregarding MAP_32BIT in the kernel - we'd break 32-bit compat apps and/or compat 32-bit libraries." The original report noted that once the shared stack goes above 4GB in size, thread creation can take as long as 10 milliseconds, a slowdown described as "quite unacceptable".
Ingo created a patch introducing a new MAP_STACK flag for glibc to be used instead of MAP_32BIT and avoid imposing the 32-bit performance limitation on threaded 64-bit applications. He noted, "glibc can switch to this new flag straight away - it will be ignored by the kernel." The new flag was quickly merged upstream, and changes were planned for glibc.