Greetings all, On Wed, 2008-01-30 at 19:56 +0900, FUJITA Tomonori wrote:The PyX storage engine supports a scatterlist linked list algorithm that maps any sector count + sector size combination down to contiguous struct scatterlist arrays across (potentially) multiple Linux storage subsystems from a single CDB received on a initiator port. This design was a consequence of a requirement for running said engine on Linux v2.2 and v2.4 across non cache coherent systems (MIPS R5900-EE) using a single contiguous memory block mapped into struct buffer_head for PATA access, and struct scsi_cmnd access on USB storage. Note that this was before struct bio and struct scsi_request existed.. The PyX storage engine as it exists at Linux-iSCSI.org today can be thought of as a hybrid OSD processing engine, as it maps storage object memory across a number of tasks from a received command CDB. The ability to pass in pre allocated memory from an RDMA capable adapter, as well as allocated internally (ie: traditional iSCSI without open_iscsi's struct skbuff rx zero-copy) is inherient in the design of the storage engine. The lacking Bidi support can be attributed to lack of greater support (and hence user interest) in Bidi, but I am really glad to see this getting into the SCSI ML and STGT, and is certainly of interest in the long term. Another feature that is missing in the current engine is see in Linux as well. This is pretty easy to add in iSCSI with an AHS and in the engine and storage subsystems. The 60k lines of code also includes functionality (the SE mirroring comes to mind) that I do not plan to push towards mainline, along with other legacy bits so we can build on earlier v2.6 embedded platforms. The existing Target mode LIO-SE that provides linked list scatterlist mapping algorithm that is similar to what Jens and Rusty have been working on, and is under 14k lines including the switch(cdb[0]) + function pointer assignment to per CDB specific structure that is called potentially out-of-order in the RX side context of the CmdSN state machine in RFC-3720. The current SE is also lacking the very SCSI specific task management state machines that not a whole lot of iSCSI implementions implement properly, and seem to be minimal interest to users, and of moderate interest to vendors. Getting this implemented generically in SCSI, as opposed to an transport specific mechanisim would benefit the Linux SCSI target engine. The pSCSI (struct scsi_cmnd), iBlock (struct bio) and FILE (struct file) plugins together are a grand total of 3.5k lines using the v2.9 LIO-SE interface. Assuming we have a single preferred data and control patch for underlying physical and virtual block devices, this could also get smaller. A quick check of the code puts the traditional kernel level iSCSI statemachine at roughly 16k, which is pretty good for the complete state machine. Also, having iSER and traditional iSCSI share MC/S and ERL=2 common code will be of interest, as well as iSCSI login state machines, which are identical minus the extra iSER specific keys and requirement to transition from byte stream mode to RDMA accelerated mode. Since this particular code is located in a non-data path critical section, the kernel vs. user discussion is a wash. If we are talking about data path, yes, the relevance of DD tests in kernel designs are suspect :p. For those IB testers who are interested, perhaps having a look with disktest from the Linux Test Project would give a better comparision between the two implementations on a RDMA capable fabric like IB for best case performance. I think everyone is interested in seeing just how much data path overhead exists between userspace and kernel space in typical and heavy workloads, if if this overhead can be minimized to make userspace a better option for some of this very complex code. --nab --
