The above sounds more like you need a completion. What's used to wake up the caller of down_interruptible? Can you post some code to see what you are doing. That would make it much easier to analyze. -
Hello Steven, Funny, I first started with using completion structures, but that did not work either. I get similar OOPses on all these kind of locking mechanisms, as long as I use the _interruptible() type. I tried every work-around I can think of, but none worked :-(( Even if I block on an ordinary rt-mutex in the same routine, wait a A call to up() is used from inside an interrupt(thread) context, but this is not relevant for the problem, because only blocking on a semaphore with down_interruptible() and waking the thread by CTRL-C is enough to get this Oops. I saw that the code is trying to wake 'other waiters', while there is only 1 thread waiting on the semaphore at most. I feel that the root cause of this problem has to be searched there. I believe that executing any PI code on semaphores is a strange behavior anyway, because a semaphore is never 'owned' by a thread, and it is always another thread that wakes the thread that blocks on a semaphore, and because the waker is unknown, the PI code will always boost the prio of the wrong thread. Strange is also, that I get different behavior on ARM if I use sema_init(&sema, 1) versus sema_init(&sema,0). The latter seems to crash less, it will not crash until the first up(); while the first will crash even without any up(). Attached I have put a sample driver I just hacked together a few minutes ago. It is NOT the driver that has generates the oops in the previous mail, but I have stripped a scull-driver down that much that it will be much easier to talk about, and to keep us focussed on the part of the code that is causing this. Besides: I tested this driver on X86 with 2.6.23.1-rt5 and I get the also OOPSes although slightly different than on ARM. See the attached dummy.txt file. Beware: The up(&sema) is NOT required to get this OOPS, I get it even without any up(&sema) ! I hope you can look at the attached driver source and help me with this... Kind Regards, Remy Bohmer
The taker of a mutex must also be the one that releases it. I don't see
how you could use a mutex for this. It really requires some kind of
Exactly why it should be a completion or compat semaphores. The reason we
did PI on semaphores is only because they were used as mutexes before Ingo
pushed to actually get a mutex primative into the kernel. Since then,
we've been trying to remove all semaphores with either a mutex or
down_interruptible(&dummy);
printk("We will block now, and if you press CTRL-C from here, we get an OOPS.\n");
down_interruptible(&dummy);
This double down is actually illegal with rt semaphores. Because we treat
semaphores as mutexes unless they are declared as compat_semaphores. In
which case we don't do PI.
Seems that you need to work out how to use a completion for your code. And
if that doesn't work, then use a compat_semaphore. But beware, that the
compat_semaphore can cause unbounded latencies. But then again, so can
completions.
-- Steve
-
I tried several ways of working around the bug, even tried implementing it with kernel threads and protecting global data with mutexes. Therefor I know that I have the same problem with mutexes. I just created a simple example that showed the problem quickly, this does not mean that this is the only case that does not work. BTW: I am hacking around in the PREEMPT-RT kernel for years now, I know the history very well, and I know what I am doing... Please, do not find holes in the example I quickly hacked together, please focus on the OOPS message, and help me figuring out what is causing this. I can give you other examples of code that shows the same problem. But, basically, every call that blocks with _inturruptible() on a rt-mutex beneath the surface in the context of a user space process (and Okay, sounds fair. But: the current implementation does counting the number of up's and down's, suggesting that it really behaves like a semaphore. It only does some special things during the transition of the counter from 0->1 and from 1->0. If this counting is illegal use of the mutex mechanism, it should report (compile) errors if: * sema_init is used on 'struct semaphore' -> init_MUTEX() must be used instead. * sema_init should only be used on 'struct compat_semaphore' types. * calls to up() and down() in a row should report a BUG message, * if up() is called from a different thread than the down() it should report a BUG message. Further, the counting up's and down's are not allowed on struct semphore types, so it should be removed from the code. * PI should only take place if it is for 100% sure that the 'struct semaphore' is used as a mutex. And this is only the case when it is initialised with init_MUTEX(). So, because all these items are not there, I doubt it is really true that it is illegal to use 'struct semaphore' types as counting semaphores across multiple threads. BESIDES: Everything works fine UNTIL a signal is generated during a block on the semaphore. I ...
Please don't think that I just took your example to find a hole in it. Actually, the reason I pointed that out was not because of "Oh, this example is broken", but because of the description of the problem you are trying to solve. To me, it needs to be done with a completion or Well, we can't determine that with code ;-) Remember, there are still drivers out in the world that use semaphores as mutexes. So the PI Right, because a non-RT semaphore _is_ a compat_semaphore. I'm saying if you see the bug in your driver with the compat_semaphore then lets debug Not really. There are things that the preempt-rt kernels require. One, is that things that need to keep semaphores instead of using them as mutexes, they should be converted to compat_semaphores. Perhaps now that we have mutexes, we can remove the PI on semaphores, and out-of-tree drivers will Yeah, that code is more of a hack to convert counting semaphores into mutexes. But semaphores still need to have owners, and they should not OK, I wont be able to work on this this weekend, but I'll try to get to it on Monday. A better example to show the bug you are looking for is simply create a mutex and create a thread that grabs that mutex and goes to sleep. Have your driver read grab that mutex with mutex_lock_interruptible. And if the signal code is broken with this, then you definitely got a point that the inerruptible code is broken. This will keep the semantics clean and not obfuscate it with the semaphore code. I'll write up that example on Monday if you don't have the time. Note, that the unloading of the module should wake up the thread that grabbed the mutex so it can release it. -- Steve -
Oh, I missed your point here. I was up late last night so I'm blaming that for my _not_so_thorough_ reading ;-) You are saying that we should change the semaphore to a mutex only if it was initialized by init_MUTEX(). I see your point. Actually, if something is initialized by init_MUTEX it really needs to be a mutex :-) Right now the compiler determines that something is a mutex or a semaphore. If we make it a run time option, that will add more complexity to the code and probably make it less efficient. The more complex part is something we can deal with. The less efficient part we can not. find . -name "*.c" ! -type d | xargs grep "init_MUTEX" | wc -l 100 Heh, nice number. Well, I guess I could start sending patches to mainline to convert semaphores to mutexes. I'm sure that will probably annoy akpm, but I'll do it one at a time, with lots of thought behind each change. BTW, drivers breaking with the RT patch is not always considered a regression. For example, this is fine in vanilla kernel: local_irq_save(flags); [...] spin_lock(&lock); Where as in the RT patch, that would break. And it would require either changing the local_irq_save to a local_irq_save_nort, or find a way to do the spin_lock_irqsave(&lock, flags) instead. -- Steve -
Nope, not yet... I will do that on Monday also. (On ARM I have as less as debug options enabled per default, because it eats too much Very strange, although there is now a real mutex in non-RT as in RT, all semaphores are still converted to some badly-implemented-recursive-but-not-recursive-callable-mutexes, because the assumption is made that semaphores are intentionally always used in the wrong way?!! So, the code that is written nicely must be adapted to prevent adaption of bad code? I would expect a more logical solution like the introduction of raw_ types for the exceptions, just like what is done with spinlocks and so Up to this semaphore implementation the standard was always that a driver written for a non-RT kernel should compile and work properly without any adaption on a RT-kernel, until there are some specific realtime requirements. When it comes to semaphores this is thus completely not true. Or there should be a real recursive mutex implementation. But the need for recursive mutexes is usually a sign for bad locking, and should Unloading of a module is not possible as long as there is still a handle open to it. So, it should be safe without waking up the thread. And returning from the read call and still have a mutex locked is not nice either. Thanks for the explanation. I will keep you informed on Monday. Have a nice weekend ! Remy -
I removed the 'struct semaphore' completely from my driver, using real mutexes now instead, replace the signalling semaphores by 'struct completion' mechanisms and got rid of the possible race I saw with the completions in a different way, and now the problem is completely gone! Posix Signals work properly now (no OOPS anymore), so the problem was likely related to the way I used the 'struct semaphore' types, which is thus different compared to the non-RT kernel and therefor quite confusing. So, thank you (and Daniel) for pointing me into the right direction. Now lets get rid of the 'struct semaphore' completely in the kernel :-)) Kind Regards, Remy Bohmer -
Remy, Thanks a lot for looking further into this. I'd like to join the fun in removing the rest of the semaphores in the kernel, but with you, Ingo, Daniel and Jon going to do that, one more cook will just spoil the stew. Once we get rid of all the semaphores in the kernel that are being used as mutexes, then we can change the code in the -rt patch to keep semaphores default as compat_semaphores. And any out of tree driver would just need to be fixed to use mutexes. Well, read/write semaphores might have to stay special. Thanks, -- Steve -
I tried your example and I was able to reproduce the OOPS that you found.. Although there is one problem, you don't have the same number of up()'s to down() calls so you end up leaving the dummy_read function with the lock still held .. Reviewing the OOPS and the warnings it looks like your progressively corrupting the mutex waiter list since remove_waiter() actually leaves the stack based waiter object on the waiter list.. (That's what it looks like anyway).. So I converted your code to use a compat_semaphore, and no oops happens.. Which makes sense because compat_semaphores are designed to work the way your using them. Daniel -
Hello Daniel, Thanks for looking into it also. Steven already made clear to me that the 'struct semaphore' type on the RT-kernel should not be used as a counting-semaphore, but as some sort of legacy-mutex... (The confusion that this will cause is clear by now...) I still do not understand the problems I had with the interruptible-waits on a real rt-mutex, but I have to figure that out again on Monday. Maybe one confusion let to another... (Note, A completion will not work for me, because they are not designed for reuse across several threads. The read/write runs in user context and as such it can be called by different threads, which would require a init of a completion before waiting on it, but that would be Actually, IMO, compat_semaphores behave like semaphores should behave, and thus the same as they behave on a non-RT kernel, and at the locations where the semaphores are now misused as mutexes on RT, we should replace them by differently-named-mutex-type-semaphores, or better: real-RT-mutexes.. IMO this wrong usage of semaphores is solved by modifying the code that actually made proper use of the semaphores, and I think that if the naming matches the mainline kernel, we only need to patch the files that really need to be patched during the integration in mainline of the RT-patch. Kind Regards, Remy Bohmer -
The vast majority of semaphore are actually binary semaphores in the Linux kernel .. So it's easier to mass convert semaphores to mutexes, then address the ones that don't conform.. Usually they are converted to As I say above, it's happen already.. Code is slowly getting converted to the complete API or the code gets converted to use a binary semaphore (or a mutex).. Daniel -
right now there are 3992 mutex_lock() critical sections in the kernel
and only 351 down() based critical sections are left.
fixing the top 20:
4 &vuart_bus_priv.probe_mutex
5 &connections_lock
5 &irq_ptr->setting_up_sema
5 &kbd->sem
5 &pnp_res_mutex
5 &port->port_lock
5 &tq_init_sem
6 &adb_handler_sem
6 &dev->parent->sem
6 &driver_lock
6 &ha->vport_sem
7 &big_buffer_sem
8 &dir_f->sem
9 &c->alloc_sem
11 &dev->sem
11 &usbvision->lock
12 &c->erase_free_sem
15 &u132->scheduler_lock
16 &zfcp_data.config_sema
17 &f->sem
would remove 164 of them, so it would convert half of the remaining
semaphore use in the kernel. So the job is almost finished - would
anyone like to go for the final grand feat: complete removal of
semaphores from the kernel? :-)
Ingo
-
Sure, you want to split the list? Daniel -
split the list with you? Feel free to take any of those :-) dev->sem is nontrivial and probably not possible right now - and some of the others might be problematic too. But there might be fixable ones in the list. This shouldnt become like the BKL conversion - never truly finished. Ingo -
Nothing really, other than that they use semaphores to avoid lockdep :-/ I think I know how to annotate this, after Alan Stern explained all the use cases, but I haven't come around to implementing it. Hope to do that soonish. Another real semaphore user is XFS, they really use the down/up asymmetry that semaphores allow, last time I spoke with Dave Chinner he didn't know a way around this. -
Hasn't changed recently. We could convert some to completions, but others are much more difficult. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group -
Hey Ingo and Daniel, Leave some of the fun open for me :-) I just looked at the list and I found a few that seem doable. Remy -
I'm happy to grab a few of these too. Let me know if either of you or Ingo is working on the whole lot and about to dump it on us ;-) Jon. -
Hello Daniel, Can we coordinate who is doing what somehow, to prevent double work? Remy -
I actually haven't looked at any on Ingo's list (i8042tregs).. Feel free to call out the ones your looking at tho.. Daniel -
There are about 25 DECLARE_MUTEX() semaphores remaining .. One is the BKL which I would guess can't be converted. The others I've looked at appear to be trivial find/replace changes to get them to use the mutex type.. Any reason those haven't been converted over? Daniel -
i guess talking about it instead of posting patches is a factor ;-) Ingo -
There's several using init_MUTEX() (~70 or so), reviewing a few they look fairly trivial .. Daniel -
