I came across peculiar situation which I posted and discussed here:
http://www.unix.com/high-level-programming/81568-semaphore-access-speed....
That discussion did not produce any results, so I want to try it here. In short, I am measuring the speed of system's ability to create a process, which simply accesses a semaphore and dies.
Results are:
modern h/w under SCO OpenServer 5 : 1800 times per sec
same modern h/w under Fedora 2.6.9 : 500 times per sec
old PIII under modern linux (have no info) : 900 times per sec.
I am trying to find a bottleneck on Fedora, why same h/w under old OS does it 3 times faster.
The thread at unix.com has all the info, including my source code, results of strace etc... I don't know the rules - should I post my code here? or just reference to unix.com suffice?
Any info/suggestions would be appreciated.
migurus
2.6.9 vs. modern
do you know anything about the differences in the implementation between 2.6.9 and current versions? 2.6.9 is ancient and heavily patched for correctness, not performance.
with the code posted by Otheus i get
Duron/1200, 2.6.26: either 555555.56 or 833333.33 semop/s [0,0]
Core 2 Duo/2200, 2.6.24: either 1250000.00 or 1666666.67 semop/s [0,0]
which just scales with the cpu (and clearly shows the weakness of the time measuring method, better to use a high precision timer or measure iterations until time() changes).
the shell script method is just weird, as you can see you measure lots of things (implementation of /bin/sh (bash is slow compared to e.g. ash), mmap(), fork(), open(), ...) and the overhead is more than 99% (compare 555555 it/s with 1800).
2.6.9 (and the libc that is included in the old distro) is not optimized for modern hardware. e.g. the traditional instruction used for linux syscalls "int 80" got very slow on modern intel cpus since the pentium 4 which are optimized for throughput for simple instructions, software interrupts are fairly complex and the "syscall" instruction -- which is simpler and much faster -- is the preferred way to do syscalls. SCO afaik uses a '386 call gate for syscalls and may not be affected in the same way.
Good answer. Not good enough.
Please see: http://www.unix.com/high-level-programming/81568-semaphore-access-speed-...
Though the 2.6.18 kernel is indeed faster, my benchmarks indicate there is far more than this. If anyone can provide similar benchmarks of a XEON machine running 2.6.18, you can add to this discussion.
what am i to see
what am i to see on the linked page? there are some comments from the discussion the OP refers to and they are talking about strace and semaphore related kernel parameters and SCO has one syscall multiplexing functions with an extra argument while linux has two syscalls, using the multiplexing of the normal call path. how is this going to enlighten me? what is special about 2.6.18 that you need exactly this version tested and why don't you publish your own results but wait for others? why do you say 'not good enough', for who? why? what is better?
btw., both of you describe the machine with the speed difference just as XEON. xeon is just a general name for intel cpus like pentium, since the pentium II the server versions of the cpu with more cache and cores are called xeon. which architecture are you talking about and at what speed? have you benchmarked other syscalls? are there differences in the code paths (assembly code, handling inside kernel)? what about SCO on a system similar to the other systems, not only the ominous XEON?
Clarification. The
Clarification.
The duscussion on unix.com forum grew into 6 pages/30 posts, so I guess it was not a very good idea to point to it, but let me clarify what I am trying to achieve.
I am investigating possibility of migrating an application from SCO OpenServer 5.0.7 to Linux (Fedora 2.6.9; gcc ver.3.4.3; ldd ver. 2.3.5). Both boxes are ML350 dual core Xeon 3.2MHz. The app in question is very simple: listen to a tcp port, receive request, spawn child. Child will read semaphores and send response through the socket and exit. So, I need to make sure that spawn process + check semaphores would run adequately under Linux. As I found the old SCO is capable of executing it 3 times faster than Linux. I want to find out what is the bottleneck.
So, please don't get distracted by trace or time measurement issues and such that were raised in the course of the original discussion.
I don't quite understand some questions in your post
As far as test on the other system, all I have available right now is old PII 455 MHz box under SCO OSR 5.0.5, the results are 60% of the new h/w under SCO OSR 5.0.7, but still roughly twice as fast as the new h/w under Linux.
From previous suggestions I see one that looked like something I can investigate - the bash has more overhead than bourne sh on SCO, so on Linux I tried to run my test under ksh (I guess the closest to sh) and got exact same results. So, shell difference is not something that can make it 3 times slower.
Thanks and looking forward any suggestions
subject
ok, so you posted under subject "semaphore access speed" but the numbers were not a really crude benchmark for said semaphore access speed, but a simplification of your complete application that uses shell scripts. you could have said so, that would have had you wondering less why everyone talked about semaphore access speed not shell scripts... this is the first time you tell people you are not actually interested in what they thought you asked.
under many systems there is only one syscall entry in the kernel, but of course there are many syscalls. so the syscall entry behaves like a http://en.wikipedia.org/wiki/Multiplexer that multiplexes the syscalls -- combining all syscalls into a single entry point. there is a hidden function argument to determine which syscall you actually meant (register AH under DOS, register %eax under linux, ...). as you can see in the strace output, linux uses this to implement two different syscalls semctl and semget, while truss shows that SCO implements a single syscall for semaphores semsys and uses _another_ function argument for selecting which one: semsys(1, ...) or semsys(0, ...). this is what was linked to and it has nothing to do with your question so I wondered why it was linked.
the linux you benchmarked is "Fedora 2.6.9", what you actually mean is some RHEL with linux (kernel) 2.6.9? but all you want is your application not being slow? and people posted results for 2.6.24 that seem to behave normally? why don't you draw the conclusion and just install some modern distro yourself and re-test that? to find out, if this is version specific or a general problem. if you don't want a full install, you can simply boot a live CD; these days you can even install software inside a booted live CD ('write' to the CD) that remains until reboot, so you can install a C compiler or other packages if they are missing but needed for your benchmark. only disk access times are much worse on the CD, I don't know if your shell script is affected by this.
Semops and Linux
strcmp(),
Thanks for your helping out on this. To respond to your objections: (1) we used a gettimeofday() timing implementation and got very similar results; (2) SEP/INT 80 is a non-issue since the CPU running SCO does not support the "sysent" instruction; (3) at any rate, the sysent instruction can speed things up *at most* by 50% when no other work is being done; (4) Static vs dynamic compilation is almost a non-issue, as demonstrated by benchmarks.
On the Unix.com forum, I've posted benchmarks run across different architectures, all under Linux.
-Otheus
attachments
sorry, i am not willing to get through a registration process on that site just to read an attachment