[sorry, I read and replied to my inbox before mailing lists... please ignore the last mail on this patch, and reply to this one which is properly threaded] Ah, thanks, but can we just use my earlier patch that does the proper __bit_spin_unlock which is provided by bit_spin_lock-use-lock-bitops.patch This primitive should have a better chance at being correct, and also potentially be more optimised for each architecture (it only has to provide release consistency). I have attached the patch here just for reference, but actually I am submitting it properly as part of a patch series today, now This looks wrong, because it would allow the store unlocking flags to pass a load within the critical section. stores aren't allowed to pass loads in x86 (only vice versa), so you might have been confused by looking at x86's spinlocks into thinking this will work. However on powerpc and sparc, I don't think it gives you the right types of barriers.
Yes that is what I attempted to do with the write barrier. To my knowledge there are no reads that could bleed out and I wanted to avoid a full fence Good. Andrew: Drop my patch when this goes in. -
Oh, OK. Bit risky ;) You might be right, but anyway I think it should be just as fast with the optimised bit_unlock on most architectures. Which reminds me, it would be interesting to test the ia64 implementation I did. For the non-atomic unlock, I'm actually doing an atomic operation there so that it can use the release barrier rather than the mf. Maybe it's faster the other way around though? Will be useful to test with something that isn't a trivial loop, so the slub case would be a good benchmark. -
How expensive is the fence? An store with release semantics would be safer Lets avoid mf (too expensive) and just use a store with release semantics. Where can I find your patchset? I looked through lkml but did not see it. -
I'm not sure, I had an idea it was relatively expensive on ia64, but I didn't really test with a good workload (a microbenchmark probably isn't that good because it won't generate too much out Infrastructure in -mm, starting at bitops-introduce-lock-ops.patch. bit_spin_lock-use-lock-bitops.patch and ia64-lock-bitops.patch are ones to look at. The rest of the patches I have queued here, apart from the SLUB patch, I guess aren't so interesting to you (they don't do anything fancy like convert to non-atomic unlocks, just switch things like page and buffer locks to use new bitops). -
ia64-lock-bitops.patch defines:
static __inline__ void
clear_bit_unlock (int nr, volatile void *addr)
{
__u32 mask, old, new;
volatile __u32 *m;
CMPXCHG_BUGCHECK_DECL
m = (volatile __u32 *) addr + (nr >> 5);
mask = ~(1 << (nr & 31));
do {
CMPXCHG_BUGCHECK(m);
old = *m;
new = old & mask;
} while (cmpxchg_rel(m, old, new) != old);
}
/**
* __clear_bit_unlock - Non-atomically clear a bit with release
*
* This is like clear_bit_unlock, but the implementation may use a non-atomic
* store (this one uses an atomic, however).
*/
#define __clear_bit_unlock clear_bit_unlock
A non atomic store is a misaligned store on IA64. That is not
relevant here. The data is properly aligned. I guess it was intended to
refer to the cmpxchg.
How about this patch? [Works fine on IA64 simulator...]
IA64: Slim down __clear_bit_unlock
__clear_bit_unlock does not need to perform atomic operations on the variable.
Avoid a cmpxchg and simply do a store with release semantics. Add a barrier to
be safe that the compiler does not do funky things.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
include/asm-ia64/bitops.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
Index: linux-2.6.23-mm1/include/asm-ia64/bitops.h
===================================================================
--- linux-2.6.23-mm1.orig/include/asm-ia64/bitops.h 2007-10-18 19:37:22.000000000 -0700
+++ linux-2.6.23-mm1/include/asm-ia64/bitops.h 2007-10-18 19:50:22.000000000 -0700
@@ -124,10 +124,21 @@ clear_bit_unlock (int nr, volatile void
/**
* __clear_bit_unlock - Non-atomically clear a bit with release
*
- * This is like clear_bit_unlock, but the implementation may use a non-atomic
- * store (this one uses an atomic, however).
+ * This is like clear_bit_unlock, but the implementation uses a store
+ * with release semantics. See also __raw_spin_unlock().
*/
-#define ...Acked-by: Christoph Lameter <clameter@sgi.com>
Slub can use the non-atomic version to unlock because other flags will not
get modified with the lock held.
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c
+++ linux-2.6/mm/slub.c
@@ -1185,7 +1185,7 @@ static __always_inline void slab_lock(st
static __always_inline void slab_unlock(struct page *page)
{
- bit_spin_unlock(PG_locked, &page->flags);
+ __bit_spin_unlock(PG_locked, &page->flags);
}
static __always_inline int slab_trylock(struct page *page)
-
