Pawel Dawidek has been working on GEOM Gate since Aug. of last year [story], adding incremental features and making requests for testing as required [story]. Earlier this month the code was finally committed to -current. Dawidek describes the purpose and features of this code in his announcement to the freebsd-current@ mailing list:
"GEOM Gate itself (as a GEOM class) is a bridge between GEOM and userland applications that want to handle I/O requests in userland.
Such an application is able to create a ggate provider by ioctl(2)
mechnism and receive I/O requests directed to this newly created provider."
Built from FreeBSD's stackable volume management subsystem GEOM [story], the Gate class offers many promising new features including export of raw disk devices across network connections. Also added are three related utilities: the GEOM Gate local control utility as a reference for GEOM development - ggatel(8), the geom network client and control utility - ggatec(8), and the GEOM Gate network daemon - ggated(8). He also demonstrates exporting disk devices across a network with speedier results than NFS.
FreeBSD's Prison Subsystem, Jail(8), is a facility that can be used to constrain potentially dangerous applications as well as to create "virtual system images" - multiple instances of a running freebsd installation implemented as flexibly configured sandboxes. Recent commits in April have added a couple of overdue features.
In late April, Christian S.J. Peron submitted a patch that allows Raw Sockets to be created inside jail()'s by prison-root. His initial post to freebsd-hackers@ summarizes:
"Although RAW sockets can be used when specifying the source
address of packets (defeating one of the aspects of the jail)
some people may find it usefull to use utilities like ping(8)
or traceroute(8) from inside jails."
Additionally, this patch enforces the use of prison IP addresses only and was subsequently merged with Pawel Dawidek's code which allows the use of multiple IPs from within a jail:
"Looks very neat! I've merge your patch to my jail work (pjd_jail perforce branch) and changed it to be usable with my multiple ips stuff. I haven't reviewed nor tested it yet."
The patch was committed as of April 26, so anyone with spare cycles may want to try out the new features in -current. Read on for the full thread.
Kqueue(2), the scalable facility for event notification in the FreeBSD kernel, has experienced various performance and stability issues over the last several months. This has largely been fallout from the ongoing introduction of fine-grained locking into kernel subsystems and kqueue's reliance on GIANT, further complicated by an abundance of entangled callback routines.
There has recently been some discussion on the freebsd-arch list concerning kqueue’s shortcomings and potential areas for redesign. In mid-April, Brian Fundakowski Feldman submitted controversial proof-of-concept code that relieves kqueue of some of its less utilized features and adds an internal global lock into the kqueue subsystem. He has reported positive results:
"I believe I have come up with a good solution to the kqueue woes in 5.X, and I'd like to get some feedback on work that so far is letting me (on uniprocessor, at least) run make -j8 buildworld, with USE_KQUEUE in make(1), with no ill effect :) The locking thus far is one global kqueue lock, and I firmly believe we should use MUTEX_PROFILING to determine if we should lock it down any further at this point."
For a brief introduction to kqueue, its present issues, and details on testing this code, please read on.
A recent patch submission from Bosko Milekic features on-demand memory allocation for mbufs and flexible sysctl tuning options - no more fiddling with NMBCLUSTERS and recompiling. The code does not appear to be committed yet and could probably stand some testing by the willing user population:
"This basically gets rid of the existing mbuf allocator and replaces
it with some routines which get hooked on top of UMA, the existing
general-purpose SMP-friendly allocator in -CURRENT, after adding
some extensions to UMA which hopefully make allocations faster than
they would be without them."
To those interested in testing this patch, take note of standing issues with
netstat and the nsp device as specified in the thread. Read on for more.
The FreeBSD development
team has released their
January-February status report. The team releases
on a semi-regular basis to document work-in-progress on
various areas of the operating-system, both in-kernel and user-space.
This edition outlines progress on a number of very interesting and exciting
projects on the kernel side including:
The status report also sets a tentative late April release date for
4.10-RELEASE and mid-summer for 5.3-RELEASE.
Andre Oppermann has posted a laundry list of planned modifications to the FreeBSD network stack on a recent freebsd-current thread. Some of the proposed work has already begun and will proceed into the summer. Among other things, the work will touch on improvement of tcp performance including the addition of send buffer autosizing and revising tcp_reass(), optimizing the IPv4 routing table structure as well as adding multi-path and policy routing options, and reimplementing IPFW to use the packet filter API pfil(9).
Several suggestions, as well as flames, pop up through the course of discussion, but many agree these are welcome changes to a stack which is highly regarded for its stability and efficiency. The issue of including Selective ACK into the BSD stack (an off and on discussion that goes well back into '96) also crops up and splits off into a separate thread 'Who wants SACK?'.
Selected portions of the original thread follow, and links to the entirety of both the first and second thread are included at the end.
TrustedBSD is a branch of the FreeBSD codebase that serves as its own unique
platform as well as a source for merging Access Control Model specific
changes back into the FreeBSD 5 branch. Within the Linux space, the Linux Security
Modules project aims to accomplish a very similar goal for the Linux kernel: incorporating
fine-grained access rights into the service providing components (i.e. interprocess communication, filesystems, memory management, etc) of their respective platforms. This involves major
changes to kernel subsystems and integration of various new components including
extended file attributes, access control lists, and capabilities.
Xu Hao kicks off a trustedbsd-discuss@ thread inquiring on the status of
capabilities code in TrustedBSD, as well as the differences between LSM and the
MAC Framework. Robert Watson expands on a reply from Chris Wright, summarizing a
set of 8 essential differences with this last paragraph:
“So the primary difference really lies in the strength of semantics – it might be accurate to say LSM is more a "set of hooks" than a framework in that sense, but it's clear their functionality is pretty similar. The MAC Framework has been focused more on supporting traditional labeled MAC policies, such as Biba, MLS, compartments, and TE, and so attempts to provide more of the common infrastructure for those pieces.”
GEOM is one of the many interesting new directions on the path to a stable FreeBSD 5 architecture as it completely revamps the disk storage layer and provides a broad disk volume management interface for many purposes. Since the completion of the GEOM infrastructure in 2002 [story], there has been much underlying work to convert Freebsd disk subsystems (i.e. ATAPI, SCSI, etc. ) to use GEOM. This has caused breakage and overlap in function in a few areas, in particular with Vinum, the reigning RAID volume management implementation on the FreeBSD platform. In an early January freebsd-arch@ thread, Greg Lehey had some interesting words on Vinum bit-rot and atrophy under the growing GEOM disk framework:
"Vinum and GEOM overlap significantly in their features, and they do some things not only differently but in an incompatible manner. The development of GEOM has resulted in Vinum features atrophying and
rotting. For example, at present, it's not possible to put swap on a
Vinum volume, due to a change in the swapon() code which requires
Greg Lehey, one of the long time members of the FreeBSD Project (he has commit access to FreeBSD since Sun Aug 30 02:14:49 1998 UTC), resigned from the FreeBSD Core team. He explains that it is because of lack of time, but he will continue to contribute to FreeBSD as a usual commiter. Quote from his diary explaining this:
Gradually people have noticed that I have left the FreeBSD core team, and I've had a surprising number of supportive mail messages. It's a little silly that I can only tell the internal lists, which leaves a lot of people out of the loop, and as one person pointed out, Slashdot will have a field day with the cryptic statement on the FreeBSD web site. To make it clear: I resigned because of overwork, not because “FreeBSD is dying”.
One of Greg's most known contributions is The Vinum Volume Manager.
Brian Feldman introduced new versions of the resolver and getaddrinfo DNS functions that are mostly* reentrant**.
The getaddrinfo(3) function is defined for protocol-independent nodename-to-address translation. It performs the functionality of gethostbyname(3) and getservbyname(3), but in a more sophisticated manner.
* mostly - because it is still need to be polished and tested to be fully reentrant.
** reentrant function - a function that can be used at the same time from multiple threads, so long as they do it with different data.
"Roland van Laar has a new, significant wi-fi patch for FreeBSD 5.1 and higher. The patch blocks clients with an empty or "ANY" ssid and disables ssid broadcasting. SSID (Service Set ID) is used to identify wireless clients to a wireless / wired gateway. Wireless devices from the same manufacturer generally ship with the same default SSID. A beacon is a type of packet/frame that contains the SSID of a network. It is used to sync clocks on client devices and to make it easy for new network clients to see what networks are available. Preventing others from using your ssid is a means (although not foolproof!) of securing your wireless network."
"The plan is to leave ULE as the default until we get to 5.3 at which point we will decide whether or not it is production quality. The most [untested] workload that I know of is on massive multiuser systems with lots of interactive tasks. If anyone has such a system, I would love to hear of feedback while running ULE. For anyone else, if your workload is either improved or hindered, I'd appreciate a mail with the a description of your workload, your hardware, behavior with ULE, and behavior with 4BSD."
ULE was merged into FreeBSD 5.1 [story] as an "experimental" process scheduler designed to bring many benefits to SMP servers. The original design is actually based off Ingo Molnar's O(1) scheduler which was merged into the Linux kernel [story] during 2.5 development [forum]. When asked about how FreeBSD interactivity with ULE would compare to Linux interactivity, Con Kolivas [interview] suggested that, "it is prone to all the same issues as the vanilla 2.5 scheduler", issues that were addressed by Con during 2.5 development [forum].
Following the recent debate regarding FreeBSD's switch to dynamic linking for most binaries [story], a new discussion looked at how dynamic linking can be optimized. The focus was in looking at how other kernels handle this same issue, such as Apple's Darwin. Terry Lambert discussed how Darwin's prebinding solves for most of the usual runtime penalty associated with shared libraries. Robert Watson further explained the benefits of precaching, saying, "shared regions are managed by privileged processes, such as prebinding daemons, and are used to hold prebound versions of libraries. My understanding is that they are always mapped into processes at the same address, so a prebound version of the library can be used across many applications. In addition, the shared region uses one set of PTEs for all processes it is mapped into, as well as other VM machinery, so it's very low cost to maintain."
During the conversation, it was also pointed out that Compaq/HP Tru64 Unix does a similar thing which they refer to as "Quickstart". Peter Jeremy explains, "To use it, shared libraries are linked to load at mutually exclusive addresses and applications are linked assuming the preferred .so load addresses. At run-time, the rtld verifies that it can map every shared library at its preferred address and that each shared library is the same one the application was linked against (using checksums in the COFF headers). If all this is true, all the relocations are correct and execution starts immediately. If any checks fail, rtld falls back to the traditional check-every-symbol-and-relocation approach."
The FreeBSD -current mailing has been flooded lately by several threads debating whether the recent introduction of dynamic linking for /bin and /sbin [story] was a good idea. During the lengthy exchange a comment was made suggesting that the performance hit would be noticed by a "large percentage of people", causing M Warner Losh to reply, "I'll bet a larger percentage of the people are ignoring this thread than reading it since it has been so devoid of concrete numbers." Through the course of the discussion, a few numbers did surface ranging anywhere from a 10% slowdown for certain applications, such as /bin/sh, all the way up to a 40% slowdown. Warner acknowledged, "So things are a little bad, but it isn't the end of the world, especially for a 5.2-beta that's going out."
The primary advantage of dynamic loading is evidently to properly support the latest incarnation of nsswitch. Former FreeBSD developer Matt Dillon [interview], now working on DragonFlyBSD, suggested that it would be better to instead re-implement the name service switch using an IPC model, thereby removing the requirement for dynamically linked binaries. All said, the current plan is to go forward with things as they are, releasing 5.2 with the recently introduced dynamic linking, addressing any performance issues during the next development cycle.
Read on for a few brief samples from three lengthy threads on the subject. Those interested in reading more can scroll to the bottom of this article and find links to each of the lengthy threads.
Gordon Tetlow recently announced, "I just committed a patch to change /bin and /sbin from statically to dynamically linked." This change follows the NetBSD project which made the same switch a year ago September [story]. FreeBSD -current now has a /rescue directory that contains a small number of statically linked "rescue" binaries. Gordon goes on to note, "If you don't like the idea of using a dynamically linked /bin and /sbin, now is the time to define NO_DYNAMICROOT in your make.conf."
The two reasons for this change are to significantly shrink the /bin and /sbin directories (from 33 MB to 4 MB on i386), and proper support for FreeBSD's new name service switch (NSS) implementation. For a complete understanding of the latest filesystem structure, see the hier man page. Read on for Gordon's announcement and some of the resulting discussion.