Announcing the third version of his syslets subsystem patches [story], Ingo Molnar [interview] noted that he has implemented many fundamental changes to the code including the introduction of threadlets, "'threadlets' are basically the user-space equivalent of syslets: small functions of execution that the kernel attempts to execute without scheduling. If the threadlet blocks, the kernel creates a real thread from it, and execution continues in that thread. The 'head' context (the context that never blocks) returns to the original function that called the threadlet." As threadlets are only moved into a separate thread context if they block, Ingo refers to them as 'optional threads'. He also describes them as 'on-demand parallelism', "user-space does not have to worry about setting up, sizing and feeding a thread pool - the kernel will execute the workload in a single-threaded manner as long as it makes sense, but once the context blocks, a parallel context is created. So parallelism inside applications is utilized in a natural way."
Ingo goes on to note that the syslet code and API has been significantly enhanced in this latest release, "the v3 code is ABI-incompatible with v2, due to these fundamental changes." He adds, "syslets (small, kernel-side, scripted 'syscall plugins') are still supported - they are (much...) harder to program than threadlets but they allow the highest performance. Core infrastructure libraries like glibc/libaio are expected to use syslets. Jens Axboe's FIO tool already includes support for v2 syslets, and the following patch updates FIO to the v3 API".
Ingo Molnar [interview] posted a second version of his syslets subystem patch set, which offers asynchrous system call support [story]. He noted that the effort is a work in progress, and that there are still outstanding issues to be fixed, "the biggest conceptual change in v2 is the ability of cachemiss threads to be turned into user threads. This fixes signal handling, makes them ptrace-eable, etc," going on to list numerous fixes since the first release. He noted that prior to releasing a third version of the patch set he will add support for multiple completion rings, add logic to share the 'spare thread' between the rings to further reduce startup costs, and remove reliance on mlock().
Linus Torvalds commented, "I'm still not a huge fan of the user space interface, but at least the core code looks quite clean. No objections on that front." He referred to earlier comments in which he had reacted strongly to the syslets userland interface saying, "I dislike it intensely, because it's so _close_ to being usable. But the programming interface looks absolutely horrid for any 'casual' use, and while the loops etc look like fun, I think they are likely to be less than useful in practice. Yeah, you can do the 'setup and teardown' just once, but it ends up being 'once per user', and it ends up being a lot of stuff to do for somebody who wants to just do some simple async stuff." He later noted that he was in particular concerned with the "register" functionality, which Ingo then simplified.
Ingo Molnar [interview] posted a set of 11 patches introducing "the first release of the 'Syslet' kernel feature and kernel subsystem, which provides generic asynchrous system call support". Ingo explains:
"Syslets are small, simple, lightweight programs (consisting of system-calls, 'atoms') that the kernel can execute autonomously (and, not the least, asynchronously), without having to exit back into user-space. Syslets can be freely constructed and submitted by any unprivileged user-space context - and they have access to all the resources (and only those resources) that the original context has access to."
Ingo goes on in his email to explain in greater detail how syslets work, then adds, "as it might be obvious to some of you, the syslet subsystem takes many ideas and experience from my Tux in-kernel webserver :) The syslet code originates from a heavy rewrite of the Tux-atom and the Tux-cachemiss infrastructure." He also offered some benchmark results, showing a 33.9% speedup comparing uncached synchronous IO to syslets, and a 19.2% speedup comparing cached synchronous IO to syslets, "so syslets, in this particular workload, are a nice speedup /both/ in the uncached and in the cached case. (note that i used only a single disk, so the level of parallelism in the hardware is quite limited.)"