Re: Kernel Development & Objective-C

Previous thread: ldd irq problem by Chris Rutherford on Friday, November 30, 2007 - 2:28 am. (2 messages)

Next thread: Re: Reproducible data corruption with sendfile+vsftp - splice regression? by Holger Hoffstaette on Friday, November 30, 2007 - 3:39 am. (1 message)
From: Ben Crowhurst
Date: Thursday, November 29, 2007 - 5:14 am

Has Objective-C ever been considered for kernel development?

regards,
BPC

-

From: KOSAKI Motohiro
Date: Friday, November 30, 2007 - 3:09 am

Why not Haskell nor Erlang instead ? :-D



-

From: Xavier Bestel
Date: Friday, November 30, 2007 - 3:20 am

I heard of a bash compiler. That would enable development time
rationalization and maximize the collaborative convergence of a
community-oriented synergy.



-

From: Jan Engelhardt
Date: Friday, November 30, 2007 - 3:54 am

Fortran90 it has to be.
-

From: David Newall
Date: Friday, November 30, 2007 - 7:21 am

It used to be written in BCPL; or was that Multics?
-

From: Alan Cox
Date: Friday, November 30, 2007 - 4:40 pm

> BCPL was typeless, as was the successor B (between Bell Labs and GE we 

B isn't quite typeless. It has minimal inbuilt support for concepts like
strings (although you can of course multiply a string by an array
pointer ;))

It also had some elegances that C lost, notably 

	case 1..5:

the ability to do no zero biased arrays

	x[40];
	x-=10;

and the ability to reassign function names.

	printk = wombat;

as well as stuff like free(function);

Alan (who learned B before C, and is still waiting for P)
-

From: Arnaldo Carvalho de Melo
Date: Friday, November 30, 2007 - 5:05 pm

Hey, the language we use, gcC has this too 8-)

[acme@doppio net-2.6.25]$ find . -name "*.c" | xargs grep 'case.\+\.\.' | wc -l
400
[acme@doppio net-2.6.25]$ find . -name "*.c" | xargs grep 'case.\+\.\.' | head
./kernel/signal.c:      default: /* this is just in case for now ... */
./kernel/audit.c:       case AUDIT_FIRST_USER_MSG ...  AUDIT_LAST_USER_MSG:
./kernel/audit.c:       case AUDIT_FIRST_USER_MSG2 ...  AUDIT_LAST_USER_MSG2:
./kernel/audit.c:       case AUDIT_FIRST_USER_MSG ...  AUDIT_LAST_USER_MSG:
./kernel/audit.c:       case AUDIT_FIRST_USER_MSG2 ...  AUDIT_LAST_USER_MSG2:
./kernel/timer.c:        * well, in that case 2.2.x was broken anyways...
./arch/frv/kernel/traps.c:      case TBR_TT_TRAP2 ... TBR_TT_TRAP126:
./arch/frv/kernel/ptrace.c:             case 0 ... PT__END - 1:
./arch/frv/kernel/ptrace.c:             case 0 ... PT__END-1:
./arch/frv/kernel/gdb-stub.c:                   case GDB_REG_GR(1) ...  GDB_REG_GR(63):
[acme@doppio net-2.6.25]$

- Arnaldo
-

From: Bill Davidsen
Date: Saturday, December 1, 2007 - 11:27 am

Well, original C allowed you to do what you wanted with pointers (I used 
to teach that back when K&R was "the" C manual). Now people which about 
having pointers outside the array, which is a crock in practice, as long 

I had forgotten that, the function name was actually a variable with the 
entry point, say so in section 3.11. And as I recall the code, arrays 
were the same thing, a length ten vector was actually the vector and 
variable with the address of the start. I was more familiar with the B 
stuff, I wrote both the interpreter and the code generator+library for 

I had the BCPL book still on the reference shelf in the office, along 
with goodies like the four candidates to be Ada, and a TRAC manual. I 
too expected the next language to be "P".

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
--

From: Alan Cox
Date: Saturday, December 1, 2007 - 11:18 am

Actually the standards had good reasons to bar this use, because many
runtime environments used segmentation and unsigned segment offsets. On a

B on Honeywell L66, so that may well have been a relative of your code
generator ?

--

From: Bill Davidsen
Date: Sunday, December 2, 2007 - 6:23 pm

Probably the Bell Labs one. I did an optimizer on the Pcode which caught 
jumps to jumps, then had separate 8080 and L66 code generators into GMAP 
on the GE and the CP/M assembler or the Intel (ISIS) assembler for 8080. 
There was also an 8085 code generator using the "ten undocumented 
instructions" from the Dr Dobbs article. GE actually had a contract with 
Intel to provide CPUs with those instructions, and we used them in the 
Terminet(r) printers.

Those were the days ;-)

-- 
Bill Davidsen <davidsen@tmr.com>
  "Woe unto the statesman who makes war without a reason that will still
  be valid when the war is over..." Otto von Bismark 


--

From: J.A.
Date: Friday, November 30, 2007 - 3:52 pm

Flash

http://www.lagmonster.info/humor/windowsrg.html

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-

From: Loïc Grenié
Date: Friday, November 30, 2007 - 3:29 am

No, it has not. Any language that looks remotely like an OO language
  has not ever been considered for (Linux) kernel development and for
  most, if not all, other operating systems kernels.

    Various problems occur in an object oriented language. One of them
  is garbage collection: it provokes asynchronous delays and, during
  an interrupt or a system call for a real time task, the kernel cannot
  wait. Another is memory overhead: all the magic that OO languages
  provide take space in memory and Linux kernel is used in embedded
  systems with very tight memory requirements.

    Lots of people will think of better reasons why ObjC is not used...

        Loïc Grenié
-

From: Ben Crowhurst
Date: Friday, November 30, 2007 - 4:16 am

But are embedded systems not rapidly moving on. Turning to stare at the 
Which I'm looking forward to hear :)

Thank you for your appropriate response.

--

Regards
BPC


-

From: Karol Swietlicki
Date: Friday, November 30, 2007 - 4:36 am

Here are a few reasons off the top of my head:
1. Adding extra unneeded complexity. Debugging would be harder.
2. Not many people can code ObjC when compared to the number of C coders.
3. If it ain't broken... Why fix it. The kernel works, right? Good.

You can find a great explanation somewhere out there, I'm not sure who
wrote it and the thing was explaining why C++ is not a great choice
for the Linux kernel. Some things going against C++ will also go
against ObjC. I cannot find it, but it is out there somewhere.

I'm a newbie and I might be wrong, but the above is what I believe to be true.

Karol Swietlicki
-

From: Lennart Sorensen
Date: Friday, November 30, 2007 - 7:37 am

Some embedded systems run on batteries, so the less ram they have to
power the better, and the less cpu cycles that have to spend executing
code the less power they consume.  An ADSL modem on your desk doesn't
have any of those worries, it just has to work and if doubling the ram
cuts the development problems by a lot, then that might have been a
worthwhile trade off.

--
Len Sorensen
-

From: Rogelio M. Serrano Jr.
Date: Saturday, December 8, 2007 - 1:54 am

This is a multi-part message in MIME format.
--------------090404060204050609080100
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I have tried it in a toy kernel. Oskit style. The code reuse is very
high specially with string ops and driver interfaces. Its also very easy
to do unit testing with. My main problem was the quality of the compiler
optimization. Its just not good enough. I think if the compiler can do
the right kind of optimizations correctly then a low overhead OO
language like objective-c can be used in a kernel.

On the other hand its the automated testing part that really matters for
me. Imagine adding features to linux week after week without ever
getting a serious panic or two. And then getting a big performance boost

Its all about optimizations.

--=20
Democracy is about two wolves and a sheep deciding what to eat for dinner=
=2E


--------------090404060204050609080100
Content-Type: text/x-vcard; charset=utf-8;
 name="rogelio.vcf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename="rogelio.vcf"

YmVnaW46dmNhcmQNCmZuOlJvZ2VsaW8gTS4gU2VycmFubyBKcg0KbjpNLiBTZXJyYW5vIEpy
O1JvZ2VsaW8NCm9yZzpTTVNHIENvbW11bmljYXRpb25zIFBoaWxpcHBpbmVzO1RlY2huaWNh
bCBEZXBhcnRtZW50DQphZHI6Ozs7Ozs7UmVwdWJsaWMgb2YgdGhlIFBoaWxpcHBpbmVzDQpl
bWFpbDtpbnRlcm5ldDpyb2dlbGlvQHNtc2dsb2JhbC5uZXQNCnRpdGxlOlByb2dyYW1tZXIN
CnRlbDt3b3JrOis2MzI3NTM0MTQ1DQp0ZWw7aG9tZTorNjMyOTUyNzAyNg0KdGVsO2NlbGw6
KzYzOTIwOTIwMjI2Nw0KeC1tb3ppbGxhLWh0bWw6RkFMU0UNCnZlcnNpb246Mi4xDQplbmQ6
dmNhcmQNCg0K
--------------090404060204050609080100--
From: J.A.
Date: Friday, November 30, 2007 - 4:19 pm

Well, I really would like to learn some things here, could we
keep this off-topic thread alive just a bit, please ?
(I know, I'm going to gain a troll's fame because I can't avoid this

I think BeOS was C++ and OSX is C+ObjectiveC (and runs on an iPhone).
Original MacOS (fron 6 to 9) was Pascal (and a mac SE was very near
to embedded hardware :) ).

I do not advocate to rewrite Linux in C++, but don't say a kernel written

C++ (and for what I read on other answer, nor ObjectiveC) has no garbage
collection. It does not anything you did not it to do. It just allows
you to change this

	struct buffer *x;
	x = kmalloc(...)
	x->sz = 128
	x->buff = kmalloc(...)
	...
	kfree(x->buff)
	kfree(x)
	
to
	struct buffer *x;
	x = new buffer(128); (that does itself allocates x->buff,
                              because _you_ programmed it,
                              so you poor programmer don't forget)
        ...
	delete x;            (that also was programmed to deallocate
                              x->buff itself, sou you have one less

An vtable in C++ takes exactly the same space that the function
table pointer present in every driver nowadays... and probably
the virtual method call that C++ does itself with

	thing->do_something(with,this)

like
	push thing
	push with
	push this
	call THING_vtable+indexof(do_something) // constants at compile time

is much more efficient that what gcc can mangle to do with

	thing->do_something(with,this,thing)

	push with
	push this
	push thing
	get thing+offsetof(do_something) // not constant at compile time
	dereference it
	call it

(that is, get a generic field on a structure and use it as jump address)

In short, the kernel is object oriented, implements OO programming by
hand, but the compiler lacks the knowledge that it is object oriented

People usually complains about RTTI or exceptions, but benefits versus
memory space should be seriously considered (sure there is something
in current drivers to ask 'are ...
From: Nicholas Miell
Date: Friday, November 30, 2007 - 4:53 pm

struct test;
        struct testVtbl
        {
        	int (*fn1)(struct test *t, int x, int y);
        	int (*fn2)(struct test *t, int x, int y);
        };
        struct test
        {
        	struct testVtbl *vtbl;
        	int x, y;
        };
        void testCall(struct test *t, int x, int y)
        {
        	t->vtbl->fn1(t, x, y);
        	t->vtbl->fn2(t, x, y);
        }

and

        struct test
        {
        	virtual int fn1(int x, int y);
        	virtual int fn2(int x, int y);
        
        	int x, y;
        };
        
        void testCall(struct test *t, int x, int y)
        {
        	t->fn1(x, y);
        	t->fn2(x, y);
        }
        
generate instruction-for-instruction identical code.

-- 
Nicholas Miell <nmiell@comcast.net>

-

From: Al Viro
Date: Friday, November 30, 2007 - 5:31 pm

This is not what vtables are.  Think for a minute - all codepaths arriving
to that point in your code will pick the address to call from the same
location.  Either the contents of that location is constant (in which case
you could bloody well call it directly in the first place) *or* it has to
somehow be reassigned back and forth, according to the value of this.  The
former is dumb, the latter - outright insane.

The contents of vtables is constant.  The whole point of that thing is
to deal with the situations where we _can't_ tell which derived class
this ->do_something() is from; if we could tell which vtable it is at
compile time, we wouldn't need to bother at all.

It's a tradeoff - we pay the extra memory access (fetch vtable pointer, then 
fetch method from vtable) for not having to store a slew of method pointers
in each instance of base class.  But the extra memory access is very much
there.  It can be further optimized away if you have several method calls
for the same object next to each other (then vtable can be picked once),
but it's still done at runtime.
-

From: Al Viro
Date: Friday, November 30, 2007 - 5:34 pm

s/this/thing/, of course
-

From: J.A.
Date: Friday, November 30, 2007 - 6:09 pm

Yup, my mistake (that's why I said i will learn something). I was thinking
on non-virtual methods. For virtual ones you have to fetch the vtable

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
-

From: Avi Kivity
Date: Saturday, December 1, 2007 - 12:55 pm

True. C++ vtables have no performance advantage over C ->ops->function() 
calls. But they have no disadvantage either and they do offer many 
syntactic advantages (such as automatically casting the object type to 
the *correct* derived class.

--

From: Lennart Sorensen
Date: Tuesday, December 4, 2007 - 10:54 am

Well I am pretty sure the micro kernel of OS X is in C, and certainly
the BSD layer is as well.  So the only ObjC part would be the nextstep
framework and other parts of the Mac GUI and other Mac APIs they
provide, which all at some point probably end up calling down into the C

But kmalloc is implemented by the kernel.  Who implements 'new'?

--
Len Sorensen
--

From: J.A.
Date: Tuesday, December 4, 2007 - 2:24 pm

Help yourself... as kmalloc() is a replacement for userspace glibc's
malloc, you can write your replacements for functions/operators in
libstdc++ (operators are just cosmetic, as many other features in C++)
In fact, for someone who dared to write a kernel C++ framework, the
very first function he has to write could be something like:

void* operator new(size_t sz)
{
	return kmalloc(sz,GPF_KERNEL);
}

And could write alternatives like

operator new(size_t sz,int flags) -> x = new(GPF_ATOMIC) X;

operator new(size_t sz,MemPool& pl) -> x = new(pool) X;

If you are curious, this page http://www.osdev.org/wiki/C_PlusPlus
has some clues about what should you implement to get rid of
libstdc++.

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
--

From: Matti Aarnio
Date: Friday, November 30, 2007 - 4:37 am

To my recall:  Never.

Some limited subset of C++ was tried, but was soon abandoned.

Overall the kernel data structures are done in objectish-manner,
although there are no strong type mechanisms being used.

Could the kernel be written in a limited subset[*] of ObjC ?  Very likely.
Would it be worth the job ?   Radical decrease in number of available
programmers...

*) Subset as enforcing the rule of not even indirectly using dynamic
   memory allocation, when operating in interrupt state.

      /Matti Aarnio
-

From: Kyle Moffett
Date: Friday, November 30, 2007 - 8:26 am

Objective-C is actually a pretty minimal wrapper around C; it was  
originally implemented as a C preprocessor.  It generally does not  
have any kind of memory management, garbage collection, or anything  
else (although typically a "runtime" will provide those features).   
There are no first-class exceptions, so there would be nothing to  
worry about there (the exceptions used in GUI programs are built  
around the setjmp/longjmp primitives).  Objective-C is also almost  
completely backwards-compatible with C, much more so than C++ ever  
was.  As far as the runtime goes the kernel would be expected to  
write its own, the same way that it implements "kmalloc()" as part of  
a "C runtime".  Since the runtime itself never does any implicit  
memory allocation, I think it would conceivably even be relatively  
safe for kernel usage.

With that said, there is a significant performance penalty as all  
Objective-C method calls are looked up symbolically at runtime for  
every single call.  For GUI programs where large chunks of the code  
are event-loops and not performance-sensitive that provides a huge  
amount of extra flexibility.  In the kernel though, there are many  
codepaths where *every* *single* instruction counts; that could be a  
serious performance hit.

Cheers,
Kyle Moffett

-

From: H. Peter Anvin
Date: Friday, November 30, 2007 - 11:40 am

GACK!

At least C++ has vtables.

	-hpa

-

From: Kyle Moffett
Date: Friday, November 30, 2007 - 12:35 pm

In a tight loop there is a way to do a single symbolic lookup and  
just call directly through a function pointer, but typically it isn't  
necessary for GUI programs and the like.  The flexibility of being  
able to dynamically add new methods to an existing class (at least  
for desktop user interfaces) significantly outweighs the performance  
cost.  Any performance-sensitive code is typically written in  
straight C anyways.

Cheers,
Kyle Moffett

-

From: Avi Kivity
Date: Saturday, December 1, 2007 - 1:03 pm

Write *those* *codepaths* in *C* or *assembly*. But only after you 
manage to measure a difference compared to the object-oriented systems 
language.

[I really doubt there are that many of these; syscall 
entry/dispatch/exit, interrupt dispatch, context switch, what else?]
--

From: Andi Kleen
Date: Sunday, December 2, 2007 - 12:01 pm

Networking, block IO, page fault, ... But only the fast paths in these 
cases. A lot of the kernel is slow path code and could probably
be written even in an interpreted language without much trouble.

-Andi
--

From: Avi Kivity
Date: Sunday, December 2, 2007 - 10:12 pm

Even these (with the exception of the page fault path) are hardly "we 
care about a single instruction" material suggested above.  Even with a 
million packets per second per core (does such a setup actually exist?)  
You have a few thousand cycles per packet.  For block you'd need around 
5,000 disks per core to reach such rates.

The real benefits aren't in keeping close to the metal, but in high 
level optimizations.  Ironically, these are easier when the code is a 
little more abstracted.  You can add quite a lot of instructions if it 
allows you not to do some of the I/O at all.


--

From: Andi Kleen
Date: Monday, December 3, 2007 - 2:50 am

With 10Gbit/s ethernet working you start to care about every cycle.
Similar with highend routing or in some latency sensitive network
applications (e.g. in HPC). Another simple noticeable case is Unix
sockets and your X server communication. 

And there are some special cases where block IO is also pretty critical.
A popular one is TPC-* benchmarking, but there are also others and it 
looks likely in the future that this will become more critical

While that's partly true -- cache misses are good for a lot of cycles --
it is not the whole truth and at some point raw code efficiency matters
too.

For example there are some CPUs who are relatively slow at indirect
function calls and there are actually cases where this can be measured.

-Andi

--

From: Avi Kivity
Date: Monday, December 3, 2007 - 4:46 am

If you have 10M packets/sec no amount of cycle-saving will help you.  
You need high level optimizations like TSO.  I'm not saying we should 

True.  And here, the hardware can cut hundreds of cycles by avoiding the 

Your reflexes are *much* better than mine if you can measure half a 
nanosecond on X.

Here, it's scheduling that matters, avoiding large transfers, and 
avoiding ping-pongs, not some cycles on the unix domain socket.  You 
already paid 150 cycles or so by issuing the syscall and thousands for 

And again the key is batching, improving cpu affinity, and caching, not 

That is true.  But any self-respecting systems language will let you 
choose between direct and indirect calls.

If adding an indirect call allows you to avoid even 1% of I/O, you save 
much more than you lose, so again the high level optimizations win.

Nanooptimizations are fun (I do them myself, I admit) but that's not 
where performance as measured by the end user lies.

-- 
error compiling committee.c: too many arguments to function

--

From: Andi Kleen
Date: Monday, December 3, 2007 - 4:50 am

A lot of applications don't and the user space networking schemes

That's not about mouse/keyboard input, but about all X protocol communication
between X clients and X server. The key is not large copies here 

That's not the whole story no. Batching etc are needed, but the

It depends. Often high level (and then caching) optimizations are better 
bang for the buck, but completely disregarding the fast path work is a bad 
thing too. As an example see Christoph's recent work on the slub fastpath
which makes a quite measurable difference on benchmarks.


-Andi

--

From: Willy Tarreau
Date: Monday, December 3, 2007 - 2:13 pm

Huh? At 4 GHz, you have 400 cycles to process each packet. If you need to
route those packets, those cycles may just be what you need to lookup a
forwarding table and perform a few MMIO on an accelerated chip which will
take care of the transfer. But you need those cycles. If you start to waste

It just depends how many times a second it happens. For instance, consider
this trivial loop (fct is a two-function array which just return 1 or 2) :

        i = 0;
        for (j = 0; j < (1 << 28); j++) {
                k = (j >> 8) & 1;
                i += fct[k]();
        }

It takes 1.6 seconds to execute on my athlon-xp 1.5 GHz. If, instead of
changing the function once every 256 calls, you change it to every call :

        i = 0;
        for (j = 0; j < (1 << 28); j++) {
                k = (j >> 0) & 1;
                i += fct[k]();
        }

Then it only takes 4.3 seconds, which is about 3 times slower. The number
of calls per function remains the same (128M calls each), it's just the
branch prediction which is wrong every time. The very few nanoseconds added
at each call are enough to slow down a program from 1.6 to 4.3 seconds while
it executes the exact same code (it may even save one shift). If you have
such stupid code, say, to compute the color or alpha of each pixel in an
image, you will certainly notice the difference.

And such poorly efficient code may happen very often when you blindly rely

You are forgetting something very important : once you start stacking
functions to perform the dirty work for you, you end up with so much
abstraction that even new stupid code cannot be written at all without
relying on them, and it's where the problem takes its roots, because
when you need to write a fast function and you notice that you cannot
touch a variable without passing through a slow pinhole, your fast
function will remain slow whatever you do, and the worst of all is that
you will think that it is normally fast and that it cannot be written

Every ...
From: J.A.
Date: Monday, December 3, 2007 - 2:39 pm

On Mon, 3 Dec 2007 22:13:53 +0100, Willy Tarreau <w@1wt.eu> wrote:


But don't forget that OOP is just another way to organize your code,
and let the language/compiler do some things you shouldn't de doing,
like fill an vtable pointer, that are error prone.

And of course everything depends on what language you choose and how
you use it.
You could write an equally effcient kernel in languages like C++,
using C++ abstractions as a high level organization, where
the fast paths could be coded the right way; we are not talking about
C# or Java, where even a sum is a call to an overloaded method.
Its the difference between doing school-book push and pops to lists,
and suddenly inventing the splice operator...

--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
--

From: Alan Cox
Date: Monday, December 3, 2007 - 2:57 pm

It's very very hard to generate good C code because of the numerous ways
objects get temporarily created, and the week aliasing rules (as with C).

There are reasons that Fortran lives on (and no I'm not suggesting one
should rewrite the kernel in Fortran ;)) and the fact its not really got
pointer aliasing or "address of" operators and all the resulting
optimsation problems is one of the big ones.

Alan
--

From: J.A.
Date: Tuesday, December 4, 2007 - 2:47 pm

That is what I like of C++, with good placement of high level features
like const's and & (references) one can gain fine control over what
gets copied or not.
Try to write a Vector class that does ops with SSE without storing
temporals on the stack. Its a good example of how one can get low
level control, and gcc is pretty good simplifying things like u=v+2*w
and not putting anything on the stack, all in xmm registers.

The advantage is you onle has to be careful one time, when you write


--
J.A. Magallon <jamagallon()ono!com>     \               Software is like sex:
                                         \         It's better when it's free
Mandriva Linux release 2008.1 (Cooker) for i586
Linux 2.6.23-jam03 (gcc 4.2.2 (4.2.2-1mdv2008.1)) SMP Sat Nov
--

From: Diego Calleja
Date: Tuesday, December 4, 2007 - 3:20 pm

But...if there's some way Linux can get "language improvements", is with
new C standards/gccextensions/etc. It'd be nice if people tried to add
(useful) C extensions to gcc, instead of proposing some random language :)
--

From: Giacomo A. Catenazzi
Date: Wednesday, December 5, 2007 - 3:59 am

[Empty message]
From: Avi Kivity
Date: Tuesday, December 4, 2007 - 2:07 pm

I really doubt Linux spends 400 cycles routing a packet.  Look what an 
skbuff looks like.

A flood ping to localhost on a 2GHz system takes 8 microseconds, that's 
16,000 cycles.  Sure it involves userspace, but you're about two orders 
of magnitude off.  And the localhost interface is nicely cached in L1 

This happens very often in HPC, and when it does, it is often worthwhile 
to invest in manual optimizations or even assembly coding.  
Unfortunately it is very rare in the kernel (memcmp, raid xor, what 
else?).  Loops with high iteration counts are very rare, so any 
attention you give to the loop body is not amortized over a large number 

Using an indirect call where a direct call is sufficient will also 
reduce the compiler's optimization opportunities.  However, I don't see 
anyone recommending it in the context of systems programming.

It is not true that the number of indirect calls necessarily increases 
if you use a language other than C.


I don't understand.  Can you give an example?

There are two cases where abstraction hurts performance: the first is 
where the mechanisms used to achieve the abstraction (functions instead 
of direct access to variables, function pointers instead of duplicating 
the caller) introduce performance overhead.  I don't think C has any 
advantage here -- actually a disadvantage as it lacks templates and is 
forced to use function pointers for nontrivial cases.  Usually the 
abstraction penalty is nil with modern compilers.

The second case is where too much abstraction clouds the programmer's 

A 100 byte program will print "hello world" on a UART and stop.  A 
modern program will load a vector description of a font, scale it to the 
desired size, render it using anti aliasing and sub-pixel positioning, 
lay it out according to the language rules of whereever you live, and 
place it on a multi-megabyte frame buffer.  Yes it needs hundreds of 

That is true, that is why we see a lot more microoptimizations than 
algorithmic ...
From: Willy Tarreau
Date: Tuesday, December 4, 2007 - 3:43 pm

Hi Avi,


That's not what I wrote. I just wrote about doing forwarding table lookup
and MMIO so that dedicated hardware NICs can process the recv/send to the
correct ends. If you just need to scan a list of DMAed packets, look at
their destination IP address, lookup that IP in a table to find the output
NIC and destination MAC address, link them into an output list and waking
the output NIC up, there's nothing which requires more than 400 cycles
here. I never said that it was a requirement to pass through the existing

I don't see where you see a userspace (or I don't understand your test).
On traffic generation I often do from user space, I can send 630 k raw
ethernet packets per second from userspace on a 1.8 GHz opteron and PCI-e
NICs. That's 2857 cycles per packet, including the (small amount of)


Well, in my example above, everythin in the path of the send() syscall down
to the bare metal NIC is under high pressure in a fast loop. 30 cycles
already represent 1% of the performance! In fact, to modulate speed, I


Yes, the most common examples found today involve applications reading
data from databases. For instance, let's say that one function in your
program must count the number of unique people with the name starting
with an "A". It is very common to see "low-level" primitives to abstract
the database for portability purposes. One of such primitives will
generally be consist in retrieving a list of people with their names,
age and sex in one well-formated 3-column array. Many lazy people will
not see any problem in calling this one from the function described
above. Basically, what they would do is :

 count_people_with_name_starting_with_a()
    -> array[name,age,sex] = get_list_of_people()
         -> while read_one_people_entry() {
               alloc(one_line_of_3_columns)
               read then parse the 3 fields
               format_them_appropriately
            }
    -> create a new array "name2" by duplicating the "name" column
    -> name3 = ...
From: Avi Kivity
Date: Wednesday, December 5, 2007 - 10:05 am

If you're writing a single-purpose program then there is justification 
to micro-optimize it to the death.  Write it in VHDL, even.  But that 



Having an interface to send multiple packets in one syscall would cut 

Your optimized version is wrong.  It counts duplicated names, while you 
stated you needed unique names.  Otherwise the sort_unique step is 
completely redundant.

Databases are good examples of where the abstraction helps.  If you had 
hundreds of millions of records in your example, you'd connect to a 
database, present it with an ASCII string describing what you want, upon 
which it would parse it, compile it into an internal language against 
the schema, optimize that and then execute it.  Despite all that 
abstraction it would win against your example because it would implement 
the inner loop as

    open index (by name)
    seek to 'A'
        while (current starts with 'A')
                ++count (taking care of the uniqueness requirement if 
needed)
    close index

Thus it would never see people who's name begins with 'W'.  If the 
database had a materialized view feature, and this particular query was 
deemed important enough, it would optimize it to

    open materialized view
    read count
    close materialized view

The database does all this while allowing concurrent reads and writes 
and keeping your data in case someone trips on the power cord.  You 

If the abstraction if badly written, and further you cannot change it, 
then of course it hurts.  But if the abstraction is well written, or if 
it can be fixed, then all is well.  The problem here is not that 
abstractions exist, but that you persist in using a broken API instead 

That's life.  The fact is that users demand features, and programmers 
cater to them.  If you can find a way to provide all those features 
without the bloat, more power to you.  The abstractions here are not the 
cause of the bloat, they are the tool used to provide the features while 

You don't need ...
From: Gilboa Davara
Date: Monday, December 3, 2007 - 5:35 am

Intel's newest dual 10GbE NIC can easily (?) throw ~14M packets per
second. (theoretical peak at 1514bytes/frame)
Granted, installing such a device on a single CPU/single core machine is
absurd - but even on an 8 core machine (2 x Xeon 53xx/54xx / AMD
Barcelona) it can still generate ~1M packets/s per core.

Now assuming you're doing low-level (passive) filtering of some sort
(frame/packet routing, traffic interception and/or packet analysis)
using hardware assistance (TSO, complete TCP offloading, etc) is off the
table and each and every cycle within netif_receive_skb (and friends)
-counts-.

I don't suggest that the kernel should be (re)designed for such (niche)
applications but on other hand, if it works...

- Gilboa

--

From: Gilboa Davara
Date: Monday, December 3, 2007 - 5:44 am

Sigh... Sorry. Please ignore the broken math on my part.
Make that 1.8M frames/second per card and ~100K packets/second per core.

- Gilboa


--

From: Casey Schaufler
Date: Monday, December 3, 2007 - 9:28 am

I was involved in a 10GBe project like you're describing not too
long ago. Only the driver, and only a tight, lean, special purpose
driver at that, was able to deal with line rate volumes. This was
in a real appliance, where faster CPUs were not an option. In fact,
not hardware changes were possible due to the issues with squeezing
in the 10GBe nics. This project would have been impossible without
the speed and deterministic behavior of th ekernel C environment.


Casey Schaufler
casey@schaufler-ca.com
--

From: Lennart Sorensen
Date: Tuesday, December 4, 2007 - 10:50 am

10GbE can't do 14M packets per second if the packets are 1514 bytes.  At
10M packets per second you have less than 1000 bits per packet, which is
far from 1514bytes.

10Gbps gives you at most 1.25GBps, which at 1514 bytes per packet works
out to 825627 packets per second.  You could reach ~14M packets per
second with only the smallest packet size, which is rather unusual for
high throughput traffic, since you waste almost all the bytes on
overhead in that case.  But you do want to be able to handle at least a
million or two packets per second to do 10GbE.

--
Len Sorensen
--

From: Gilboa Davara
Date: Wednesday, December 5, 2007 - 3:31 am

... I corrected my math in the second email. [1] 

Never the less, a VOIP network (E.g. G729 and friends) can generate the
maximum number of frames allowed on 10GbE Ethernet which is, AFAIR just
below 15M -per- port. (~29M on a dual port card)

While I doubt that any non-NPU based NIC can handle such a load, on
mixed networks we're already seeing well-above 1M frames per port.

- Gilboa
[1] http://lkml.org/lkml/2007/12/3/69


--

From: Avi Kivity
Date: Saturday, December 1, 2007 - 12:59 pm

C also requires a (very minimal) runtime. And I don't see how having a 
runtime disqualifies a language from being usable in a kernel; the 
runtime is just one more library, either supplied by the compiler or by 

Object orientation in C leaves much to be desired; see the huge number 
of void pointers and container_of()s in the kernel.
--

From: Jörn
Date: Sunday, December 2, 2007 - 12:44 pm

While true, this isn't such a bad problem.  A language really sucks when
it tries to disallow something useful.  Back in university I was forced
to write system software in pascal.  Simple pointer arithmetic became a
5-line piece of code.

Imo the main advantage of C is simply that it doesn't get in the way.

Jörn

-- 
But this is not to say that the main benefit of Linux and other GPL
software is lower-cost. Control is the main benefit--cost is secondary.
-- Bruce Perens
--

From: Lennart Sorensen
Date: Monday, December 3, 2007 - 9:53 am

Well the majority of C syntax requires no runtime library.  There are
some system call like things that you often want that need a library
(like malloc and such), but those aren't really part of C itself.  Of
course without malloc and printf and file i/o calls the program would
probably be a bit boring.  I have written some small C programs without
a runtime, where the few things I needed where implemented in assembly

As a programming language, C leaves much to be desired.

--
Len Sorensen
--

From: Chris Snook
Date: Friday, November 30, 2007 - 8:00 am

No.  Kernel programming requires what is essentially assembly language with a 
lot of syntactic sugar, which C provides.  Higher-level languages abstract away 
too much detail to be suitable for the sort of bit-perfect control you need when 
you're directly controlling bare metal.  You can still use object-oriented 
programming techniques in C, and we do this all the time in the kernel, but we 
do so with more fine-grained explicit control than a language like Objective-C 
would give us.  More to the point, if we tried to use Objective-C, we'd find 
ourselves needing to fall back to C-style explicitness so often that it wouldn't 
be worth the trouble.

In other news, I hear Hurd boots again!

	-- Chris
-

From: David Newall
Date: Saturday, December 1, 2007 - 2:50 am

I somewhat disagree.  Kernel programming requires and deserves the same 
care, rigor and eye to details as all other serious systems.  Whilst 
performance is always a consideration, high-level languages give a 
reward in ease of expression and improved reliability, such that a 
notional performance cost is easily justified.  Occasionally, precise 
bit-diddling or tight timing requirements might necessitate use of 
assembly; even so, a lot of bit-diddling can be expressed in high-level 
languages.

Kernel programming might require a scintilla of assembly language, but 
the very vast majority of it should be written in a high-level language.

There's an old joke that claims, "real programmers can write FORTRAN in 
any language."  It's true.  Object orientation is a style of 
programming, not a language, and while certain languages have intrinsic 
support for this style, objects, methods, properties and inheritance can 
be probably be written in any language.  It's an issue of putting in 
care and eye to detail.

Linux could be written in Objective-C, it could be written in Pascal, 
but it is written in plain C, with a smattering of assembler.  Does it 
need to be more complicated than that?
--

Previous thread: ldd irq problem by Chris Rutherford on Friday, November 30, 2007 - 2:28 am. (2 messages)

Next thread: Re: Reproducible data corruption with sendfile+vsftp - splice regression? by Holger Hoffstaette on Friday, November 30, 2007 - 3:39 am. (1 message)