Re: [BUG on PREEMPT_RT, 2.6.23.1-rt5] in rt-mutex code and signals

Previous thread: [PATCH 4/4] proc: fix PDE refcounting by Alexey Dobriyan on Friday, November 16, 2007 - 8:12 am. (2 messages)

Next thread: [PATCH] Driver core: fix race in __device_release_driver by Alan Stern on Friday, November 16, 2007 - 9:57 am. (6 messages)
From: Steven Rostedt
Date: Friday, November 16, 2007 - 1:20 pm

The above sounds more like you need a completion. What's used to wake up
the caller of down_interruptible?

Can you post some code to see what you are doing. That would make it much
easier to analyze.

-

From: Remy Bohmer
Date: Friday, November 16, 2007 - 4:02 pm

Hello Steven,

Funny, I first started with using completion structures, but that did
not work either. I get similar OOPses on all these kind of locking
mechanisms, as long as I use the _interruptible() type. I tried every
work-around I can think of, but none worked :-((
Even if I block on an ordinary rt-mutex in the same routine, wait a
A call to up() is used from inside an interrupt(thread) context, but
this is not relevant for the problem, because only blocking on a
semaphore with down_interruptible() and waking the thread by CTRL-C is
enough to get this Oops.

I saw that the code is trying to wake 'other waiters', while there is
only 1 thread waiting on the semaphore at most. I feel that the root
cause of this problem has to be searched there.

I believe that executing any PI code on semaphores is a strange
behavior anyway, because a semaphore is never 'owned' by a thread, and
it is always another thread that wakes the thread that blocks on a
semaphore, and because the waker is unknown, the PI code will always
boost the prio of the wrong thread.

Strange is also, that I get different behavior on ARM if I use
sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to
crash less, it will not crash until the first up(); while the first
will crash even without any up().

Attached I have put a sample driver I just hacked together a few
minutes ago. It is NOT the driver that has generates the oops in the
previous mail, but I have stripped a scull-driver down that much that
it will be much easier to talk about, and to keep us focussed on the
part of the code that is causing this.
Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the
also OOPSes although slightly different than on ARM. See the attached
dummy.txt file.


Beware: The up(&sema) is NOT required to get this OOPS, I get it even
without any up(&sema) !

I hope you can look at the attached driver source and help me with this...


Kind Regards,

Remy Bohmer



From: Steven Rostedt
Date: Friday, November 16, 2007 - 4:37 pm

The taker of a mutex must also be the one that releases it.  I don't see
how you could use a mutex for this. It really requires some kind of

Exactly why it should be a completion or compat semaphores. The reason we
did PI on semaphores is only because they were used as mutexes before Ingo
pushed to actually get a mutex primative into the kernel. Since then,
we've been trying to remove all semaphores with either a mutex or

        down_interruptible(&dummy);
        printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
        down_interruptible(&dummy);


This double down is actually illegal with rt semaphores. Because we treat
semaphores as mutexes unless they are declared as compat_semaphores. In
which case we don't do PI.

Seems that you need to work out how to use a completion for your code. And
if that doesn't work, then use a compat_semaphore. But beware, that the
compat_semaphore can cause unbounded latencies. But then again, so can
completions.


-- Steve

-

From: Remy Bohmer
Date: Saturday, November 17, 2007 - 4:44 am

I tried several ways of working around the bug, even tried
implementing it with kernel threads and protecting global data with
mutexes. Therefor I know that I have the same problem with mutexes. I
just created a simple example that showed the problem quickly, this
does not mean that this is the only case that does not work.

BTW: I am hacking around in the PREEMPT-RT kernel for years now, I
know the history very well, and I know what I am doing... Please, do
not find holes in the example I quickly hacked together, please focus
on the OOPS message, and help me figuring out what is causing this. I
can give you other examples of code that shows the same problem. But,
basically, every call that blocks with _inturruptible() on a rt-mutex
beneath the surface in the context of a user space process (and

Okay, sounds fair. But: the current implementation does counting the
number of up's and down's, suggesting that it really behaves like a
semaphore. It only does some special things during the transition of
the counter from 0->1 and from 1->0. If this counting is illegal use
of the mutex mechanism, it should report (compile) errors if:
* sema_init is used on 'struct semaphore' -> init_MUTEX() must be used instead.
* sema_init should only be used on 'struct compat_semaphore' types.
* calls to up() and down() in a row should report a BUG message,
* if up() is called from a different thread than the down() it should
report a BUG message. Further, the counting up's and down's are not
allowed on struct semphore types, so it should be removed from the
code.
* PI should only take place if it is for 100% sure that the 'struct
semaphore' is used as a mutex. And this is only the case when it is
initialised with init_MUTEX().

So, because all these items are not there, I doubt it is really true
that it is illegal to use 'struct semaphore' types as counting
semaphores across multiple threads. BESIDES: Everything works fine
UNTIL a signal is generated during a block on the semaphore. I ...
From: Steven Rostedt
Date: Saturday, November 17, 2007 - 7:08 am

Please don't think that I just took your example to find a hole in it.

Actually, the reason I pointed that out was not because of "Oh, this
example is broken", but because of the description of the problem you
are trying to solve. To me, it needs to be done with a completion or


Well, we can't determine that with code ;-)  Remember, there are still
drivers out in the world that use semaphores as mutexes. So the PI

Right, because a non-RT semaphore _is_ a compat_semaphore. I'm saying if
you see the bug in your driver with the compat_semaphore then lets debug

Not really.  There are things that the preempt-rt kernels require. One, is
that things that need to keep semaphores instead of using them as mutexes,
they should be converted to compat_semaphores.  Perhaps now that we have
mutexes, we can remove the PI on semaphores, and out-of-tree drivers will

Yeah, that code is more of a hack to convert counting semaphores into
mutexes. But semaphores still need to have owners, and they should not

OK, I wont be able to work on this this weekend, but I'll try to get to it
on Monday.  A better example to show the bug you are looking for is simply
create a mutex and create a thread that grabs that mutex and goes to
sleep. Have your driver read grab that mutex with
mutex_lock_interruptible. And if the signal code is broken with this, then
you definitely got a point that the inerruptible code is broken.

This will keep the semantics clean and not obfuscate it with the semaphore
code.

I'll write up that example on Monday if you don't have the time.

Note, that the unloading of the module should wake up the thread that
grabbed the mutex so it can release it.

-- Steve
-

From: Steven Rostedt
Date: Saturday, November 17, 2007 - 8:06 am

Oh, I missed your point here. I was up late last night so I'm blaming that
for my _not_so_thorough_ reading ;-)

You are saying that we should change the semaphore to a mutex only if it
was initialized by init_MUTEX(). I see your point.

Actually, if something is initialized by init_MUTEX it really needs to be
a mutex :-)

Right now the compiler determines that something is a mutex or a
semaphore. If we make it a run time option, that will add more complexity
to the code and probably make it less efficient. The more complex part
is something we can deal with. The less efficient part we can not.

find .  -name "*.c" ! -type d  | xargs grep  "init_MUTEX" | wc -l
100

Heh, nice number.

Well, I guess I could start sending patches to mainline to convert
semaphores to mutexes. I'm sure that will probably annoy akpm, but I'll do
it one at a time, with lots of thought behind each change.

BTW, drivers breaking with the RT patch is not always considered a
regression.

For example, this is fine in vanilla kernel:

  local_irq_save(flags);
  [...]
  spin_lock(&lock);

Where as in the RT patch, that would break. And it would require either
changing the local_irq_save to a local_irq_save_nort, or find a way to do
the spin_lock_irqsave(&lock, flags) instead.

-- Steve


-

From: Remy Bohmer
Date: Saturday, November 17, 2007 - 8:36 am

Nope, not yet... I will do that on Monday also. (On ARM I have as less
as debug options enabled per default, because it eats too much

Very strange, although there is now a real mutex in non-RT as in RT,
all semaphores are still converted to some
badly-implemented-recursive-but-not-recursive-callable-mutexes,
because the assumption is made that semaphores are intentionally
always used in the wrong way?!!
So, the code that is written nicely must be adapted to prevent
adaption of bad code?

I would expect a more logical solution like the introduction of raw_
types for the exceptions, just like what is done with spinlocks and so

Up to this semaphore implementation the standard was always that a
driver written for a non-RT kernel should compile and work properly
without any adaption on a RT-kernel, until there are some specific
realtime requirements.
When it comes to semaphores this is thus completely not true.



Or there should be a real recursive mutex implementation. But the need
for recursive mutexes is usually a sign for bad locking, and should




Unloading of a module is not possible as long as there is still a
handle open to it. So, it should be safe without waking up the thread.
And returning from the read call and still have a mutex locked is not
nice either.

Thanks for the explanation. I will keep you informed on Monday.


Have a nice weekend !


Remy
-

From: Remy Bohmer
Date: Monday, November 19, 2007 - 5:55 am

I removed the 'struct semaphore' completely from my driver, using real
mutexes now instead, replace the signalling semaphores by 'struct
completion' mechanisms and got rid of the possible race I saw with the
completions in a different way, and now the problem is completely
gone!

Posix Signals work properly now (no OOPS anymore), so the problem was
likely related to the way I used the 'struct semaphore' types, which
is thus different compared to the non-RT kernel and therefor quite
confusing.

So, thank you (and Daniel) for pointing me into the right direction.

Now lets get rid of the 'struct semaphore' completely in the kernel :-))


Kind Regards,

Remy Bohmer
-

From: Steven Rostedt
Date: Monday, November 19, 2007 - 6:54 am

Remy,

Thanks a lot for looking further into this. I'd like to join the fun in
removing the rest of the semaphores in the kernel, but with you, Ingo,
Daniel and Jon going to do that, one more cook will just spoil the stew.

Once we get rid of all the semaphores in the kernel that are being used as
mutexes, then we can change the code in the -rt patch to keep semaphores
default as compat_semaphores.  And any out of tree driver would just need
to be fixed to use mutexes.

Well, read/write semaphores might have to stay special.

Thanks,

-- Steve

-

From: Daniel Walker
Date: Saturday, November 17, 2007 - 9:22 am

I tried your example and I was able to reproduce the OOPS that you
found.. Although there is one problem, you don't have the same number of
up()'s to down() calls so you end up leaving the dummy_read function
with the lock still held ..

Reviewing the OOPS and the warnings it looks like your progressively
corrupting the mutex waiter list since remove_waiter() actually leaves
the stack based waiter object on the waiter list.. (That's what it looks
like anyway)..

So I converted your code to use a compat_semaphore, and no oops
happens.. Which makes sense because compat_semaphores are designed to
work the way your using them.

Daniel

-

From: Remy Bohmer
Date: Saturday, November 17, 2007 - 10:09 am

Hello Daniel,

Thanks for looking into it also.
Steven already made clear to me that the 'struct semaphore' type on
the RT-kernel should not be used as a counting-semaphore, but as some
sort of legacy-mutex... (The confusion that this will cause is clear
by now...)

I still do not understand the problems I had with the
interruptible-waits on a real rt-mutex, but I have to figure that out
again on Monday. Maybe one confusion let to another...

(Note, A completion will not work for me, because they are not
designed for reuse across several threads. The read/write runs in user
context and as such it can be called by different threads, which would
require a init of a completion before waiting on it, but that would be

Actually, IMO, compat_semaphores behave like semaphores should behave,
and thus the same as they behave on a non-RT kernel, and at the
locations where the semaphores are now misused as mutexes on RT, we
should replace them by differently-named-mutex-type-semaphores, or
better: real-RT-mutexes..
IMO this wrong usage of semaphores is solved by modifying the code
that actually made proper use of the semaphores, and I think that if
the naming matches the mainline kernel, we only need to patch the
files that really need to be patched during the integration in
mainline of the RT-patch.

Kind Regards,

Remy Bohmer
-

From: Daniel Walker
Date: Saturday, November 17, 2007 - 10:29 am

The vast majority of semaphore are actually binary semaphores in the
Linux kernel .. So it's easier to mass convert semaphores to mutexes,
then address the ones that don't conform.. Usually they are converted to

As I say above, it's happen already.. Code is slowly getting converted
to the complete API or the code gets converted to use a binary semaphore
(or a mutex)..

Daniel

-

From: Ingo Molnar
Date: Saturday, November 17, 2007 - 10:46 am

right now there are 3992 mutex_lock() critical sections in the kernel 
and only 351 down() based critical sections are left.

fixing the top 20:

      4 &vuart_bus_priv.probe_mutex
      5 &connections_lock
      5 &irq_ptr->setting_up_sema
      5 &kbd->sem
      5 &pnp_res_mutex
      5 &port->port_lock
      5 &tq_init_sem
      6 &adb_handler_sem
      6 &dev->parent->sem
      6 &driver_lock
      6 &ha->vport_sem
      7 &big_buffer_sem
      8 &dir_f->sem
      9 &c->alloc_sem
     11 &dev->sem
     11 &usbvision->lock
     12 &c->erase_free_sem
     15 &u132->scheduler_lock
     16 &zfcp_data.config_sema
     17 &f->sem

would remove 164 of them, so it would convert half of the remaining 
semaphore use in the kernel. So the job is almost finished - would 
anyone like to go for the final grand feat: complete removal of 
semaphores from the kernel? :-)

	Ingo
-

From: Daniel Walker
Date: Saturday, November 17, 2007 - 10:55 am

Sure, you want to split the list?

Daniel

-

From: Ingo Molnar
Date: Saturday, November 17, 2007 - 11:04 am

split the list with you? Feel free to take any of those :-) dev->sem is 
nontrivial and probably not possible right now - and some of the others 
might be problematic too. But there might be fixable ones in the list. 
This shouldnt become like the BKL conversion - never truly finished.

	Ingo
-

From: Peter Zijlstra
Date: Sunday, November 18, 2007 - 5:33 am

Nothing really, other than that they use semaphores to avoid lockdep :-/

I think I know how to annotate this, after Alan Stern explained all the
use cases, but I haven't come around to implementing it. Hope to do that
soonish.

Another real semaphore user is XFS, they really use the down/up
asymmetry that semaphores allow, last time I spoke with Dave Chinner he
didn't know a way around this.

-

From: David Chinner
Date: Sunday, November 18, 2007 - 3:26 pm

Hasn't changed recently. We could convert some to completions, but
others are much more difficult.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-

From: Remy Bohmer
Date: Saturday, November 17, 2007 - 3:49 pm

Hey Ingo and Daniel, Leave some of the fun open for me :-)

I just looked at the list and I found a few that seem doable.


Remy
-

From: Jon Masters
Date: Monday, November 19, 2007 - 12:25 am

I'm happy to grab a few of these too. Let me know if either of you or
Ingo is working on the whole lot and about to dump it on us ;-)

Jon.



-

From: Daniel Walker
Date: Monday, November 19, 2007 - 8:31 am

[Empty message]
From: Remy Bohmer
Date: Monday, November 19, 2007 - 8:51 am

Hello Daniel,

Can we coordinate who is doing what somehow, to prevent double work?

Remy

-

From: Daniel Walker
Date: Monday, November 19, 2007 - 9:11 am

I actually haven't looked at any on Ingo's list (i8042tregs).. Feel free
to call out the ones your looking at tho..

Daniel

-

From: Daniel Walker
Date: Tuesday, November 20, 2007 - 9:43 am

There are about 25 DECLARE_MUTEX() semaphores remaining .. One is the
BKL which I would guess can't be converted. The others I've looked at
appear to be trivial find/replace changes to get them to use the mutex
type.. Any reason those haven't been converted over?

Daniel

-

From: Ingo Molnar
Date: Tuesday, November 20, 2007 - 1:37 pm

i guess talking about it instead of posting patches is a factor ;-)

	Ingo
-

From: Daniel Walker
Date: Tuesday, November 20, 2007 - 1:54 pm

There's several using init_MUTEX() (~70 or so), reviewing a few they
look fairly trivial ..

Daniel

-

Previous thread: [PATCH 4/4] proc: fix PDE refcounting by Alexey Dobriyan on Friday, November 16, 2007 - 8:12 am. (2 messages)

Next thread: [PATCH] Driver core: fix race in __device_release_driver by Alan Stern on Friday, November 16, 2007 - 9:57 am. (6 messages)