Re: [PATCH v2 1/2] vmscan: don't subtraction of unsined

Previous thread: [Possible regression] i915 GEM related oops since 2.6.34.1 by M. Vefa Bicakci on Thursday, July 8, 2010 - 12:37 am. (1 message)

Next thread: about conntrack for oracle tns by Neco.F on Thursday, July 8, 2010 - 12:56 am. (2 messages)
From: KOSAKI Motohiro
Date: Thursday, July 8, 2010 - 12:38 am

'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
unsafe.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
---
 mm/vmscan.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..8715da1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.swappiness = vm_swappiness,
 		.order = order,
 	};
-	unsigned long slab_reclaimable;
+	unsigned long n, m;
 
 	disable_swap_token();
 	cond_resched();
@@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
 	}
 
-	slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
-	if (slab_reclaimable > zone->min_slab_pages) {
+	n = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+	if (n > zone->min_slab_pages) {
 		/*
 		 * shrink_slab() does not currently allow us to determine how
 		 * many pages were freed in this zone. So we take the current
@@ -2628,16 +2628,16 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * take a long time.
 		 */
 		while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
-			zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
-				slab_reclaimable - nr_pages)
+		       (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
 			;
 
 		/*
 		 * Update nr_reclaimed by the number of slab pages we
 		 * reclaimed from this zone.
 		 */
-		sc.nr_reclaimed += slab_reclaimable -
-			zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+		m = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+		if (m < n)
+			sc.nr_reclaimed += n - m;
 	}
 
 	p->reclaim_state = NULL;
-- 
1.6.5.2



--

From: KOSAKI Motohiro
Date: Thursday, July 8, 2010 - 12:40 am

Fix simple argument error. Usually 'order' is very small value than
lru_pages. then it can makes unnecessary icache dropping.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8715da1..60d813b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2617,6 +2617,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	n = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
 	if (n > zone->min_slab_pages) {
+		unsigned long lru_pages = zone_reclaimable_pages(zone);
+
 		/*
 		 * shrink_slab() does not currently allow us to determine how
 		 * many pages were freed in this zone. So we take the current
@@ -2627,7 +2629,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Note that shrink_slab will free memory on all zones and may
 		 * take a long time.
 		 */
-		while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
+		while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
 		       (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages > n))
 			;
 
-- 
1.6.5.2



--

From: Rik van Riel
Date: Thursday, July 8, 2010 - 6:23 am

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed
--

From: Christoph Lameter
Date: Thursday, July 8, 2010 - 7:04 am

AFAICT this is not argument error but someone changed the naming of the
parameter. The "lru_pages" parameter is really a division factor affecting
the number of pages scanned. This patch increases this division factor
significantly and therefore reduces the number of items scanned during
zone_reclaim.


--

From: Andrew Morton
Date: Thursday, July 8, 2010 - 1:31 pm

On Thu, 8 Jul 2010 09:04:18 -0500 (CDT)


And for that reason I won't apply the patch.  I'd be crazy to do so. 
It tosses away four years testing, replacing it with something which
could have a large effect on reclaim behaviour, with no indication
whether that effect is good or bad.

--

From: Christoph Lameter
Date: Thursday, July 8, 2010 - 2:01 pm

That only shows that the order parameter was passed to shrink_slab() which
is what I remember being done intentionally.

Vaguely recall that this was necessary to avoid shrink_slab() throwing out
too many pages for higher order allocs.

Initially zone_reclaim was full of heuristics that later were replaced by
counter as the new ZVCs became available and gradually better ways of
accounting for pages became possible.



--

From: KOSAKI Motohiro
Date: Thursday, July 8, 2010 - 5:46 pm

But It make opposite effect. number of scanning objects of shrink_slab() are

                          lru_scanned        max_pass
basic_scan_objects = 4 x -------------  x -----------------------------
                          lru_pages        shrinker->seeks (default:2)

scan_objects = min(basic_scan_objects, max_pass * 2)


That said, small lru_pages parameter makes too many slab dropping.
Practically, zone-reclaim always take max_pass*2. about inode, 
shrink_icache_memory() takes number of unused inode as max_pass.
It mean one shrink_slab() calling drop all icache. Is this optimal
behavior? why?



--

From: KOSAKI Motohiro
Date: Friday, July 9, 2010 - 1:21 am

Unfortunatelly, current code is more crazy. it is nearly worst bad behavior.
I've applied attached debugging patch and got following result.

That said, really low memory pressure (scan = 32, order = 0) drop _all_
icache and dcache (about 100MB!). I don't blieve anyone tested slab behavior
on zone_reclaim=1 for this four years.

please remember shrink_slab() scanning equation. order=0 always makes
all slab dropping!

btw, current shrink_slab() don't exit even if (*shrinker->shrink)(0) return 0.
It's a bit inefficient and meaningless loop iteration. I'll make a fix.


           <...>-2677  [001]   840.832711: shrink_slab: shrink_icache_memory before=56100 after=56000
           <...>-2677  [001]   840.832898: shrink_slab: shrink_icache_memory before=56000 after=55900
           <...>-2677  [001]   840.833081: shrink_slab: shrink_icache_memory before=55900 after=55800
           <...>-2677  [001]   840.833693: shrink_slab: shrink_icache_memory before=55800 after=55600
           <...>-2677  [001]   840.833860: shrink_slab: shrink_icache_memory before=55600 after=55500
           <...>-2677  [001]   840.834033: shrink_slab: shrink_icache_memory before=55500 after=55400
           <...>-2677  [001]   840.834201: shrink_slab: shrink_icache_memory before=55400 after=55200
           <...>-2677  [001]   840.834385: shrink_slab: shrink_icache_memory before=55200 after=55100
           <...>-2677  [001]   840.834553: shrink_slab: shrink_icache_memory before=55100 after=55000
           <...>-2677  [001]   840.834720: shrink_slab: shrink_icache_memory before=55000 after=54900
           <...>-2677  [001]   840.834889: shrink_slab: shrink_icache_memory before=54900 after=54700
           <...>-2677  [001]   840.835063: shrink_slab: shrink_icache_memory before=54700 after=54600
           <...>-2677  [001]   840.835229: shrink_slab: shrink_icache_memory before=54600 after=54500
           <...>-2677  [001]   840.835596: shrink_slab: shrink_icache_memory before=54500 after=54300
           ...
From: KOSAKI Motohiro
Date: Friday, July 9, 2010 - 3:13 am

If number of reclaimable slabs are zero, shrink_icache_memory() and
shrink_dcache_memory() return 0. but strangely shrink_slab() ignore
it and continue meaningless loop iteration.

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0f9f624..8f61adb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -243,6 +243,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 			int nr_before;
 
 			nr_before = (*shrinker->shrink)(0, gfp_mask);
+			/* no slab objects, no more reclaim. */
+			if (nr_before == 0) {
+				total_scan = 0;
+				break;
+			}
 			shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
 			if (shrink_ret == -1)
 				break;
-- 
1.6.5.2



--

From: Minchan Kim
Date: Friday, July 9, 2010 - 3:53 am

On Fri, Jul 9, 2010 at 7:13 PM, KOSAKI Motohiro

Why do you reset totoal_scan to 0?
I don't know exact meaning of shrinker->nr.
AFAIU, it can affect next shrinker's total_scan.
Isn't it harmful?

-- 
Kind regards,
Minchan Kim
--

From: KOSAKI Motohiro
Date: Friday, July 9, 2010 - 4:04 am

similar meaning of reclaim_stat->nr_saved_scan.

No.  This loop is

                total_scan = shrinker->nr;		/* Reset and init total_scan */
                shrinker->nr = 0;

                while (total_scan >= SHRINK_BATCH) {
                        nr_before = (*shrinker->shrink)(0, gfp_mask);
                        /* no slab objects, no more reclaim. */
                        if (nr_before == 0) {
                                total_scan = 0;
                                break;
                        }
                        shrink_ret = (*shrinker->shrink)(this_scan, gfp_mask);
                        if (shrink_ret == -1)
                                break;
                        if (shrink_ret < nr_before)
                                ret += nr_before - shrink_ret;
                        total_scan -= this_scan;
                }

                shrinker->nr += total_scan;		/* save remainder #of-scan */



--

From: Minchan Kim
Date: Sunday, July 11, 2010 - 3:28 pm

On Fri, Jul 9, 2010 at 8:04 PM, KOSAKI Motohiro
I can't understand your point.


old shrink_slab

shrinker->nr += delta; /* Add delta to previous shrinker's remained count */
total_scan = shrinker->nr;

while(total_scan >= SHRINK_BATCH) {
	nr_before = shrink(xxx);
	total_scan =- this_scan;
}

shrinker->nr += total_scan;

The total_scan can always be the number < SHRINK_BATCH.
So, when next shrinker calcuates loop count, the number can affect.

new shrink_slab

shrinker->nr += delta; /* nr is always zero by your patch */
total_scan = shrinker->nr;

while(total_scan >= SHRINK_BATCH) {
	nr_before = shrink(xxx);
	if (nr_before == 0) {
		total_scan = 0;
		break;
	}
}

shrinker->nr += 0;

But after your patch, total_scan is always zero. It never affect
next shrinker's loop count.

Am I missing something?
-- 
Kind regards,
Minchan Kim
--

From: KOSAKI Motohiro
Date: Monday, July 12, 2010 - 9:48 pm

no.

No. after my patch this loop has two exiting way
 1) total_scan are less than SHRINK_BATCH.
      -> no behavior change.  we still pass shrinker->nr += total_scan code.
 2) (*shrinker->shrink)(0, gfp_mask) return 0
      don't increase shrinker->nr.  because two reason,
      a) if total_scan are 10000,  we shouldn't carry over such big number.
      b) now, we have zero slab objects, then we have been freed form the guilty of keeping
          balance page and slab reclaim. shrinker->nr += 0; have zero side effect.




--

From: Minchan Kim
Date: Monday, July 12, 2010 - 11:33 pm

On Tue, Jul 13, 2010 at 1:48 PM, KOSAKI Motohiro

Totally, I agree with you.
Thanks for good explanation, Kosaki.

Reviewed-by: Minchan kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim
--

From: Christoph Lameter
Date: Friday, July 9, 2010 - 7:02 am

There is also a per zone/node/global counter SLAB_RECLAIM_ACCOUNT that
could be used to determine if its worth looking at things at all. I saw
some effort going into making the shrinkers zone aware. If so then we may
be able to avoid scanning slabs.
--

From: KOSAKI Motohiro
Date: Monday, July 12, 2010 - 9:59 pm

Yup.
After to merge nick's effort, we can makes more imrovement. I bet :)




--

From: Minchan Kim
Date: Friday, July 9, 2010 - 1:36 am

On Thu, Jul 8, 2010 at 4:40 PM, KOSAKI Motohiro
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

With your test result, This patch makes sense to me.
Please, include your test result in description.

-- 
Kind regards,
Minchan Kim
--

From: Christoph Lameter
Date: Friday, July 9, 2010 - 6:54 am

Ok. I am convinced.

Acked-by: Christoph Lameter <cl@linux-foundation.org>

--

From: KOSAKI Motohiro
Date: Monday, July 12, 2010 - 10:41 pm

How's this?

==============================================================
From 19872d74875e2331cbe7eca46c8ef65f5f00d7c4 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 13 Jul 2010 13:39:11 +0900
Subject: [PATCH] vmscan: shrink_slab() require number of lru_pages, not page order

Now, shrink_slab() has following scanning equation.

                            lru_scanned        max_pass
  basic_scan_objects = 4 x -------------  x -----------------------------
                            lru_pages        shrinker->seeks (default:2)

  scan_objects = min(basic_scan_objects, max_pass * 2)

Then, If we pass very small value as lru_pages instead real number of
lru pages, shrink_slab() drop much objects rather than necessary. and
now, __zone_reclaim() pass 'order' as lru_pages by mistake. that makes
bad result.

Example, If we receive very low memory pressure (scan = 32, order = 0),
shrink_slab() via zone_reclaim() always drop _all_ icache/dcache
objects. (see above equation, very small lru_pages make very big
scan_objects result)

This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6ff51c0..1bf9f72 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2612,6 +2612,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
 	if (nr_slab_pages0 > zone->min_slab_pages) {
+		unsigned long lru_pages = zone_reclaimable_pages(zone);
+
 		/*
 		 * shrink_slab() does not currently allow us to determine how
 		 * many pages were freed in this zone. So we take the current
@@ -2622,7 +2624,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t ...
From: Andrew Morton
Date: Thursday, July 15, 2010 - 12:15 pm

On Tue, 13 Jul 2010 14:41:28 +0900 (JST)

Wouldn't it be better to recalculate zone_reclaimable_pages() each time
around the loop?  For example, shrink_icache_memory()->prune_icache()
will remove pagecache from an inode if it hits the tail of the list. 
This can change the number of reclaimable pages by squigabytes, but
this code thinks nothing changed?


--

From: KOSAKI Motohiro
Date: Thursday, July 15, 2010 - 6:39 pm

Ah, I missed this. incrementa patch is here.

thank you!



From 8f7c70cfb4a25f8292a59564db6c3ff425a69b53 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Fri, 16 Jul 2010 08:40:01 +0900
Subject: [PATCH] vmscan: recalculate lru_pages on each shrink_slab()

Andrew Morton pointed out shrink_slab() may change number of reclaimable
pages (e.g. shrink_icache_memory()->prune_icache() will remove unmapped
pagecache).

So, we need to recalculate lru_pages on each shrink_slab() calling.
This patch fixes it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1bf9f72..1da9b14 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2612,8 +2612,6 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 
 	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
 	if (nr_slab_pages0 > zone->min_slab_pages) {
-		unsigned long lru_pages = zone_reclaimable_pages(zone);
-
 		/*
 		 * shrink_slab() does not currently allow us to determine how
 		 * many pages were freed in this zone. So we take the current
@@ -2624,10 +2622,18 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Note that shrink_slab will free memory on all zones and may
 		 * take a long time.
 		 */
-		while (shrink_slab(sc.nr_scanned, gfp_mask, lru_pages) &&
-		       (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
-			nr_slab_pages0))
-			;
+		for (;;) {
+			unsigned long lru_pages = zone_reclaimable_pages(zone);
+
+			/* No reclaimable slab or very low memroy pressure */
+			if (!shrink_slab(sc.nr_scanned, gfp_mask, lru_pages))
+				break;
+
+			/* Freed enouch memory */
+			nr_slab_pages1 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+			if (nr_slab_pages1 + nr_pages <= nr_slab_pages0)
+				break;
+		}
 
 		/*
 		 * Update nr_reclaimed by ...
From: Minchan Kim
Date: Thursday, July 15, 2010 - 6:44 pm

On Fri, Jul 16, 2010 at 10:39 AM, KOSAKI Motohiro
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

It does make sense.

-- 
Kind regards,
Minchan Kim
--

From: KOSAKI Motohiro
Date: Thursday, July 8, 2010 - 12:41 am

Oops, sorry. I did forget cc Christoph. 



--

From: Christoph Lameter
Date: Thursday, July 8, 2010 - 7:01 am

Reviewed-by: Christoph Lameter <cl@linux-foundation.org>


--

From: Andrew Morton
Date: Thursday, July 8, 2010 - 1:00 pm

On Thu,  8 Jul 2010 16:38:10 +0900 (JST)


And it's not a completly trivial objection.  Your patch made the above
code snippet quite a lot harder to read (and hence harder to maintain).

--

From: KOSAKI Motohiro
Date: Thursday, July 8, 2010 - 6:16 pm

Initially, I proposed following patch to Christoph. but he prefer n and m.
To be honest, I don't think this naming is big matter. so you prefer following
I'll submit it.




=====================================================================
From 397199d69860061eaa5e1aaadac45c46c76b0522 Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Wed, 30 Jun 2010 13:35:16 +0900
Subject: [PATCH] vmscan: don't subtraction of unsined

'slab_reclaimable' and 'nr_pages' are unsigned. so, subtraction is
unsafe.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/vmscan.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9c7e57c..79ff877 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2588,7 +2588,7 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		.swappiness = vm_swappiness,
 		.order = order,
 	};
-	unsigned long slab_reclaimable;
+	unsigned long nr_slab_pages0, nr_slab_pages1;
 
 	disable_swap_token();
 	cond_resched();
@@ -2615,8 +2615,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		} while (priority >= 0 && sc.nr_reclaimed < nr_pages);
 	}
 
-	slab_reclaimable = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
-	if (slab_reclaimable > zone->min_slab_pages) {
+	nr_slab_pages0 = zone_page_state(zone, NR_SLAB_RECLAIMABLE);
+	if (nr_slab_pages0 > zone->min_slab_pages) {
 		/*
 		 * shrink_slab() does not currently allow us to determine how
 		 * many pages were freed in this zone. So we take the current
@@ -2628,16 +2628,17 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * take a long time.
 		 */
 		while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
-			zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
-				slab_reclaimable - nr_pages)
+		       (zone_page_state(zone, NR_SLAB_RECLAIMABLE) + nr_pages >
+				nr_slab_pages0))
 			;
 
 ...
From: Minchan Kim
Date: Thursday, July 8, 2010 - 6:46 pm

On Fri, Jul 9, 2010 at 10:16 AM, KOSAKI Motohiro
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>

I like this than n,m.

-- 
Kind regards,
Minchan Kim
--

From: Andrew Morton
Date: Friday, July 9, 2010 - 3:28 pm

On Fri,  9 Jul 2010 10:16:33 +0900 (JST)

Well no, but it could do so, with some minor changes to struct
reclaim_state and its handling.  Put a zone* and a counter in

My, that's horrible.  The whole expression says "this number is
--

From: KOSAKI Motohiro
Date: Tuesday, July 13, 2010 - 2:32 am

How's this?

Christoph, Can we hear your opinion about to add new branch in slab-free path?
I think this is ok, because reclaim makes a lot of cache miss then branch
mistaken is relatively minor penalty. thought?



From 9f7d7a9bd836b7373ade3056e6a3d2a3d82ac7ce Mon Sep 17 00:00:00 2001
From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Date: Tue, 13 Jul 2010 14:43:21 +0900
Subject: [PATCH] vmscan: count reclaimed slab pages properly

Andrew Morton pointed out __zone_reclaim() shouldn't compare old and new
zone_page_state(NR_SLAB_RECLAIMABLE) result. Instead, it have to account
number of free slab pages by to enhance reclaim_state.

This patch does it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 include/linux/swap.h |    3 ++-
 mm/slab.c            |    4 +++-
 mm/slob.c            |    4 +++-
 mm/slub.c            |    7 +++++--
 mm/vmscan.c          |   44 ++++++++++++++++----------------------------
 5 files changed, 29 insertions(+), 33 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index ff4acea..b8d3f33 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -107,7 +107,8 @@ typedef struct {
  * memory reclaim
  */
 struct reclaim_state {
-	unsigned long reclaimed_slab;
+	unsigned long	reclaimed_slab;
+	struct zone	*zone;
 };
 
 #ifdef __KERNEL__
diff --git a/mm/slab.c b/mm/slab.c
index 4e9c46f..aac9306 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1741,7 +1741,9 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 		page++;
 	}
 	if (current->reclaim_state)
-		current->reclaim_state->reclaimed_slab += nr_freed;
+		if (!current->reclaim_state->zone ||
+		    current->reclaim_state->zone == page_zone(page))
+			current->reclaim_state->reclaimed_slab += nr_freed;
 	free_pages((unsigned long)addr, cachep->gfporder);
 }
 
diff --git a/mm/slob.c b/mm/slob.c
index 3f19a34..192d05c 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -260,7 +260,9 @@ static void *slob_new_pages(gfp_t ...
From: Christoph Lameter
Date: Tuesday, July 13, 2010 - 6:50 pm

Its on the slow path so I would think that should be okay. But is this
really necessary? Working with the per zone slab reclaim counters is not
enough? We are adding counter after counter that have similar purposes and
the handling gets more complex.

Maybe we can get rid of the code in the slabs instead by just relying on
the difference of the zone counters?

--

From: KOSAKI Motohiro
Date: Tuesday, July 13, 2010 - 7:15 pm

Okey, I agree. I'm pending this work at once. and I'll (probably) resume it
after Nick's work merged.

Thanks.




--

Previous thread: [Possible regression] i915 GEM related oops since 2.6.34.1 by M. Vefa Bicakci on Thursday, July 8, 2010 - 12:37 am. (1 message)

Next thread: about conntrack for oracle tns by Neco.F on Thursday, July 8, 2010 - 12:56 am. (2 messages)