[PATCH 1/2] Make page->private usable in compound pages V1

Previous thread: Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13 by Andrew Burgess on Thursday, April 5, 2007 - 3:13 pm. (5 messages)

Next thread: optimizing sendfile by Yaar Schnitman on Thursday, April 5, 2007 - 3:04 pm. (1 message)
From: Christoph Lameter
Date: Thursday, April 5, 2007 - 3:36 pm

[PATCH] Free up page->private for compound pages

If we add a new flag so that we can distinguish between the
first page and the tail pages then we can avoid to use page->private
in the first page. page->private == page for the first page, so there
is no real information in there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information
in the page struct. I can probably avoid the 16 bit ints that I have in
there right now.

Also if page->private is available then a compound page may be equipped
with buffer heads. This may free up the way for filesystems to support
larger blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot
currently be reclaimed. Because of the alias one needs to check
PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=117574302800001&r=1&w=2

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc5-mm4/include/linux/mm.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/linux/mm.h	2007-04-05 13:59:23.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/mm.h	2007-04-05 14:08:11.000000000 -0700
@@ -297,17 +297,28 @@ static inline int get_page_unless_zero(s
 	return atomic_inc_not_zero(&page->_count);
 }
 
+static inline struct page *compound_head(struct page *page)
+{
+	/*
+	 * We could avoid the PageCompound(page) check if
+	 * we would not overload PageTail().
+	 *
+	 * This check has to be done in several performance critical
+	 * paths of the slab etc. IMHO PageTail deserves its own flag.
+	 */
+	if ...
From: Christoph Lameter
Date: Thursday, April 5, 2007 - 3:36 pm

Unalias PG_tail for performance reasons

If PG_tail is an alias then we need to check PageCompound before PageTail.
This is particularly bad because the slab and others have to use these tests
in performance critical paths. 

This patch uses one of the freed up software suspend flags that is defined
next to PG_compound.

Excerpt from kfree (page = compound_head(page)) before patch:

r33 = pointer to page struct.

0xa000000100170271 <kfree+49>:              ld4.acq r14=[r33]
0xa000000100170272 <kfree+50>:              nop.i 0x0;;
0xa000000100170280 <kfree+64>:  [MIB]       nop.m 0x0
0xa000000100170281 <kfree+65>:              tbit.z p9,p8=r14,14
0xa000000100170282 <kfree+66>:        (p09) br.cond.dptk.few 0xa0000001001702c0 <kfree+128>
0xa000000100170290 <kfree+80>:  [MMI]       ld4.acq r9=[r33]
0xa000000100170291 <kfree+81>:              nop.m 0x0
0xa000000100170292 <kfree+82>:              adds r8=16,r33;;
0xa0000001001702a0 <kfree+96>:  [MII]       nop.m 0x0
0xa0000001001702a1 <kfree+97>:              tbit.z p10,p11=r9,17
0xa0000001001702a2 <kfree+98>:              nop.i 0x0
0xa0000001001702b0 <kfree+112>: [MMI]       nop.m 0x0;;
0xa0000001001702b1 <kfree+113>:       (p11) ld8 r33=[r8]
0xa0000001001702b2 <kfree+114>:             nop.i 0x0;;
0xa0000001001702c0 <kfree+128>: [MII]   ...

After patch:

r34 pointer to page struct

0xa00000010016f541 <kfree+65>:              ld4.acq r3=[r34]
0xa00000010016f542 <kfree+66>:              nop.i 0x0
0xa00000010016f550 <kfree+80>:  [MMI]       adds r2=16,r34;;
0xa00000010016f551 <kfree+81>:              nop.m 0x0
0xa00000010016f552 <kfree+82>:              tbit.z p10,p11=r3,13;;
0xa00000010016f560 <kfree+96>:  [MII] (p11) ld8 r34=[r2]

No branch anymore.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc5-mm4/include/linux/page-flags.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/linux/page-flags.h	2007-04-05 15:18:33.000000000 -0700
+++ ...
From: Andrew Morton
Date: Friday, April 6, 2007 - 10:23 pm

I get a reject from this because it is dependent upon later patches.  As
the Official Protector Of Page Flags, I'm going to drop it.

Did you investigate

static inline int page_tail(struct page *page)
{
	return ((page->flags & (PG_compound|PG_tail)) == (PG_compound|PG_tail));
}

and

static inline int page_tail(struct page *page)
{
	return unlikely(PageCompound(page)) && unlikely(PageTail(page));
}

?

In the latter case we _should_ have a not-taken branch to not-inline code. 
If the compiler doesn't do that, we can make the PageTail() test an
out-of-line function.  Or make the whole thing an uninlined function.

More work needed, please.  I don't expect that a not-taken branch to
not-inline code is worth a new page flag.  Especially as it does not
actually reduce the number of branch decisions in the common case.

(I'm assuming in all of this that !PageCompound() is the very common case
with slub.  If that is not true, we need to talk).

-

From: Christoph Lameter
Date: Saturday, April 7, 2007 - 3:16 pm

The usual test_bit that we are using there uses a volatile reference 
so these wont be combined if I check them separately.

A working example of the above would be much uglier:

static inline int page_tail(struct page *page)
{
 	return ((page->flags & ((1L << PG_compound)|(1L << PG_tail))) == 
((1L << PG_compound)|(1L << PG_tail)));
}


Two volatile references in the bit opes that the compiler cannot combine. 
Wont work unless we clean up the bitops first. This means we still have 

Still two branches which cannot be optimized in the same way as the single 

A new page flag does reduce the number of branches. On several platforms
it eliminates the branch completely since a single instruction can be

Yes, it is common for slabs to only have a single page.

The most promising avenue seems to be the simultaneous check for two bits.

-

From: Andrew Morton
Date: Saturday, April 7, 2007 - 3:51 pm

It might generate better code to do

	unsigned long compound;

	compound = page->flags & (1 << PG_compound);
	if (PG_compound > PG_tail)
		return compound & (page->flags << (PG_compound - PG_tail));
	else
		return compound & (page->flags << (PG_tail - PG_compound));


ie: get the PG_compound flag into `compound', then bitwise-and that with
the PG_tail flag, after shifting it into PG_compound' slot.  The return
value will be zero if either bit is clear, (1<<PG_compound) if both are
set.  The `if (PG_compound > PG_tail)' will be swallowed by the compiler.

The compiler should turn it all into

	(page->flags & N) & (page->flags << M)

Which may or may not be better than (page->flags & N == N), dunno. 
Probably not - if the compiler's any good it won't save a branch, I
suspect.



Which is all a ton of fun, but this subversion of the architecture's
freedom to use volatile, memory barriers etc is a worry.  We do the same in
page_alloc.c, of course...  

-

From: Christoph Lameter
Date: Saturday, April 7, 2007 - 5:21 pm

I just tried the approach that we discussed earlier and it was not 
nice either. Lets just use a page flag please. This check will be in 
several hot code paths. And it may become more important because the file 
system folks want to support buffers > page size. Then we may want more 
transparent support for huge pages... For all of this page->private gets 
in the way.

And I think we curently have 5 or so page flags available?

-

From: Andrew Morton
Date: Saturday, April 7, 2007 - 6:25 pm

Nope, try harder.

PageCompound is an unlikely case.  Back in the old days we would have done



	if (PageCompound(page))
		goto out_of_line;
back:

	do_stuff_with(page);
	return;

out_of_line:
	if (PageTail(page)) {
		page = page_tail(page);
		goto back;
	}
	<do other stuff>

and nowadays we hope that gcc does the above for us.  If it doesn't do it
for us, perhaps it needs open-coded help.

Because I don't expect there will be much efficiency difference between the
above and the use of another page flag.

-

From: Christoph Lameter
Date: Saturday, April 7, 2007 - 6:32 pm

Thats the approach of checking two flags at the same time. In that case 
the compiler will generate and "and-immediate" and then a 

Then we end up with all these small efficiency differences in all 
the code paths. I'd rather go for optimal performance in a frequently used 
construct like this.

This check is not rare. It is done on every SLAB free and on every 
get_page() and put_page(). Lets do the page flag.

-

From: Andrew Morton
Date: Saturday, April 7, 2007 - 6:48 pm

Right.

        movl    (%ebx), %eax    # <variable>.flags, tmp399
        andl    $48, %eax       #, tmp399
        cmpl    $48, %eax       #, tmp399
        je      .L265   #,

what's "yuck" about that?

With the single page flag:

	movl	(%ebx), %eax	#* page.521, D.21940
	testb	$32, %al	#, D.21940
	jne	.L265	#,


You can save that worrisome single instruction in the common case by putting the
handling of the uncommon compound pages out of line, as I indicated.

-

From: Christoph Lameter
Date: Monday, April 9, 2007 - 10:22 am

We are also using another register. The bit instructions can just go in an 
test. Ok. Will submit this patch. .... Maybe I can special case 64 bit.

-

From: Christoph Lameter
Date: Monday, April 9, 2007 - 11:09 am

Add PageTail / PageHead in order to avoid multiple branches when compound 
pages are checked.

The patch adds PageTail(page) and PageHead(page) to check if a page is the
head or the tail of a compound page. This is done by masking the two
bits describing the state of a compound page and then comparing them. So 
one comparision and a branch instead of two bit checks and two branches.


---
 include/linux/mm.h         |   11 ++---------
 include/linux/page-flags.h |   28 +++++++++++++++++-----------
 mm/page_alloc.c            |   10 ++++------
 3 files changed, 23 insertions(+), 26 deletions(-)

Index: linux-2.6.21-rc5-mm4/include/linux/mm.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/linux/mm.h	2007-04-09 11:04:15.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/mm.h	2007-04-09 11:04:31.000000000 -0700
@@ -299,14 +299,7 @@ static inline int get_page_unless_zero(s
 
 static inline struct page *compound_head(struct page *page)
 {
-	/*
-	 * We could avoid the PageCompound(page) check if
-	 * we would not overload PageTail().
-	 *
-	 * This check has to be done in several performance critical
-	 * paths of the slab etc. IMHO PageTail deserves its own flag.
-	 */
-	if (unlikely(PageCompound(page) && PageTail(page)))
+	if (unlikely(PageTail(page)))
 		return page->first_page;
 	return page;
 }
@@ -363,7 +356,7 @@ static inline compound_page_dtor *get_co
 
 static inline int compound_order(struct page *page)
 {
-	if (!PageCompound(page) || PageTail(page))
+	if (!PageHead(page))
 		return 0;
 	return (unsigned long)page[1].lru.prev;
 }
Index: linux-2.6.21-rc5-mm4/include/linux/page-flags.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/linux/page-flags.h	2007-04-09 11:04:10.000000000 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/page-flags.h	2007-04-09 11:04:31.000000000 -0700
@@ -95,12 +95,6 @@
 /* PG_owner_priv_1 users should have descriptive ...
From: Andrew Morton
Date: Monday, April 9, 2007 - 11:45 am

On Mon, 9 Apr 2007 11:09:40 -0700 (PDT)

OK.  I'm still a bit concerned about bypassing the bitops synchronisation:
barriers, volatile, etc.  We had lengthy ruminations on that a few years

hm.  The lack of parenthesisation here _might_ be OK, but I haven't
thought it through.

And I'd prefer not to have to, because I know that the do { } while (0)

You meant __SetPageCompound and __ClearPageCompound.


-

From: Christoph Lameter
Date: Monday, April 9, 2007 - 11:49 am

Those are important for bits that are changing while a page is in use. 


Right.

-

From: Christoph Lameter
Date: Monday, April 9, 2007 - 3:05 pm

Here is another rev on that one. Inlined the two functions but that
required #including <mm_types.h>


Add PageTail / PageHead in order to avoid multiple branches when compound
pages are checked.

The patch adds PageTail(page) and PageHead(page) to check if a page is the
head or the tail of a compound page. This is done by masking the two
bits describing the state of a compound page and then comparing them. So
one comparision and a branch instead of two bit checks and two branches.

---
 include/linux/mm.h         |   11 ++---------
 include/linux/page-flags.h |   28 +++++++++++++++++-----------
 mm/page_alloc.c            |   10 ++++------
 3 files changed, 23 insertions(+), 26 deletions(-)

Index: linux-2.6.21-rc6-mm1/include/linux/mm.h
===================================================================
--- linux-2.6.21-rc6-mm1.orig/include/linux/mm.h	2007-04-09 12:08:33.000000000 -0700
+++ linux-2.6.21-rc6-mm1/include/linux/mm.h	2007-04-09 12:09:50.000000000 -0700
@@ -299,14 +299,7 @@ static inline int get_page_unless_zero(s
 
 static inline struct page *compound_head(struct page *page)
 {
-	/*
-	 * We could avoid the PageCompound(page) check if
-	 * we would not overload PageTail().
-	 *
-	 * This check has to be done in several performance critical
-	 * paths of the slab etc. IMHO PageTail deserves its own flag.
-	 */
-	if (unlikely(PageCompound(page) && PageTail(page)))
+	if (unlikely(PageTail(page)))
 		return page->first_page;
 	return page;
 }
@@ -363,7 +356,7 @@ static inline compound_page_dtor *get_co
 
 static inline int compound_order(struct page *page)
 {
-	if (!PageCompound(page) || PageTail(page))
+	if (!PageHead(page))
 		return 0;
 	return (unsigned long)page[1].lru.prev;
 }
Index: linux-2.6.21-rc6-mm1/include/linux/page-flags.h
===================================================================
--- linux-2.6.21-rc6-mm1.orig/include/linux/page-flags.h	2007-04-09 12:08:33.000000000 -0700
+++ ...
From: Andrew Morton
Date: Thursday, April 5, 2007 - 4:43 pm

On Thu,  5 Apr 2007 15:36:51 -0700 (PDT)

So slub is using compound pages so that it can locate the head page in
higher-order pages, whereas slab uses per-object (or per-order-0-page?)
metadata for that?

I see four instances of

+	page = virt_to_page(p);
+
+	if (unlikely(PageCompound(page)))
+		page = page->first_page;

A new virt_to_head_page() is needed.


Sigh.  We're seeing rather a lot of churn to accommodate slub.  Do we
actually have any justification for all this?  If we end up deciding to
merge slub and to deprecate then remove slab, what would our reasons have
been?
-

From: Christoph Lameter
Date: Friday, April 6, 2007 - 10:01 am

Ok.
 

This is not SLUB specific. SLAB does the same. SLAB does special casing 
for page size slabs and does not allow slab debugging because it would 
use page->private if debugging would be enabled. Which would get into 
trouble on i386.

If you want less SLUB churn on other fronts then I can add the special 
casing for PAGE_SIZE slabs back into SLUB. Then we keep all the 
inconsistencies in SLAB use in the code.

I'd rather clean up the stuff and then also remove the special casing from 
SLAB.

 
-

From: Andrew Morton
Date: Friday, April 6, 2007 - 1:04 pm

On Fri, 6 Apr 2007 10:01:38 -0700 (PDT)

Not really.  slab sets the page->lru.prev of each constituent page to point
at the controlling slab.

I assume the PageCompound() handling in (for example) page_get_cache() is

Will slub handle NOMMU anonymous pages appropriately?
-

From: Christoph Lameter
Date: Friday, April 6, 2007 - 1:28 pm

Uhh.. More slab inconsistencies. page[1]->lru is used for compound 

/*
 * Higher-order pages are called "compound pages".  They are structured 
thusly:
 *
 * The first tail page's ->lru.next holds the address of the compound 
page's
 * put_page() function.  Its ->lru.prev holds the order of allocation.
 * This usage means that zero-order pages may not be compound.

I am not sure how anonymous pages relate to the slab allocators.

SLUB consistently uses compound pages for all higher order allocations.
And those are usually fine for mmu and no mmu even without this 
series of patches unless someone pokes around page->private.
-

From: Christoph Lameter
Date: Friday, April 6, 2007 - 12:46 pm

Add virt_to_head_page and consolidate code in slab and slub.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.21-rc5-mm4/include/linux/mm.h
===================================================================
--- linux-2.6.21-rc5-mm4.orig/include/linux/mm.h	2007-04-06 19:35:51.000000000 +0000
+++ linux-2.6.21-rc5-mm4/include/linux/mm.h	2007-04-06 19:42:53.000000000 +0000
@@ -316,6 +316,12 @@
 	atomic_inc(&page->_count);
 }
 
+static inline struct page *virt_to_head_page(const void *x)
+{
+	struct page *page = virt_to_page(x);
+	return compound_head(page);
+}
+
 /*
  * Setup the page count before being freed into the page allocator for
  * the first time (boot or memory hotplug)
Index: linux-2.6.21-rc5-mm4/mm/slab.c
===================================================================
--- linux-2.6.21-rc5-mm4.orig/mm/slab.c	2007-04-06 19:38:58.000000000 +0000
+++ linux-2.6.21-rc5-mm4/mm/slab.c	2007-04-06 19:41:57.000000000 +0000
@@ -614,20 +614,19 @@
 
 static inline struct slab *page_get_slab(struct page *page)
 {
-	page = compound_head(page);
 	BUG_ON(!PageSlab(page));
 	return (struct slab *)page->lru.prev;
 }
 
 static inline struct kmem_cache *virt_to_cache(const void *obj)
 {
-	struct page *page = virt_to_page(obj);
+	struct page *page = virt_to_head_page(obj);
 	return page_get_cache(page);
 }
 
 static inline struct slab *virt_to_slab(const void *obj)
 {
-	struct page *page = virt_to_page(obj);
+	struct page *page = virt_to_head_page(obj);
 	return page_get_slab(page);
 }
 
@@ -2884,7 +2883,7 @@
 
 	objp -= obj_offset(cachep);
 	kfree_debugcheck(objp);
-	page = virt_to_page(objp);
+	page = virt_to_head_page(objp);
 
 	slabp = page_get_slab(page);
 
@@ -3108,7 +3107,7 @@
 		struct slab *slabp;
 		unsigned objnr;
 
-		slabp = page_get_slab(virt_to_page(objp));
+		slabp = page_get_slab(virt_to_head_page(objp));
 		objnr = (unsigned)(objp - slabp->s_mem) / cachep->buffer_size;
 		slab_bufctl(slabp)[objnr] = ...
Previous thread: Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13 by Andrew Burgess on Thursday, April 5, 2007 - 3:13 pm. (5 messages)

Next thread: optimizing sendfile by Yaar Schnitman on Thursday, April 5, 2007 - 3:04 pm. (1 message)