Re: [-mm PATCH] register_memory/unregister_memory clean ups

Previous thread: reading user. extended attributes from symlinks in 2.6 kernels by Rok Ruzic on Monday, February 11, 2008 - 9:38 am. (2 messages)

Next thread: [PATCH] dmi: Prevent linked list corruption by Jean Delvare on Monday, February 11, 2008 - 10:22 am. (4 messages)
From: Badari Pulavarty
Date: Monday, February 11, 2008 - 10:23 am

Hi Andrew,

While testing hotplug memory remove against -mm, I noticed
that unregister_memory() is not cleaning up /sysfs entries
correctly. It also de-references structures after destroying
them (luckily in the code which never gets used). So, I cleaned
up the code and fixed the extra reference issue.

Could you please include it in -mm ?

Thanks,
Badari

register_memory()/unregister_memory() never gets called with
"root". unregister_memory() is accessing kobject_name of
the object just freed up. Since no one uses the code,
lets take the code out. And also, make register_memory() static.  

Another bug fix - before calling unregister_memory()
remove_memory_block() gets a ref on kobject. unregister_memory()
need to drop that ref before calling sysdev_unregister().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
---
 drivers/base/memory.c |   22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

Index: linux-2.6.24/drivers/base/memory.c
===================================================================
--- linux-2.6.24.orig/drivers/base/memory.c	2008-02-07 16:59:52.000000000 -0800
+++ linux-2.6.24/drivers/base/memory.c	2008-02-08 15:54:45.000000000 -0800
@@ -62,8 +62,8 @@ void unregister_memory_notifier(struct n
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
-int register_memory(struct memory_block *memory, struct mem_section *section,
-		struct node *root)
+static
+int register_memory(struct memory_block *memory, struct mem_section *section)
 {
 	int error;
 
@@ -71,26 +71,18 @@ int register_memory(struct memory_block 
 	memory->sysdev.id = __section_nr(section);
 
 	error = sysdev_register(&memory->sysdev);
-
-	if (root && !error)
-		error = sysfs_create_link(&root->sysdev.kobj,
-					  &memory->sysdev.kobj,
-					  kobject_name(&memory->sysdev.kobj));
-
 	return error;
 }
 
 static void
-unregister_memory(struct memory_block *memory, struct mem_section *section,
-		struct node ...
From: Greg KH
Date: Monday, February 11, 2008 - 10:54 am

Want me to add this to my tree and send it in my next update for the
driver core to Linus?

I'll be glad to do that.

thanks,

greg k-h
--

From: Badari Pulavarty
Date: Monday, February 11, 2008 - 11:05 am

Please do. Only reason I wanted to push through -mm is, I didn't
test this with mainline (since I have patches in -mm for hotplug 
memory remove).

Thanks,
Badari

--

From: Andrew Morton
Date: Monday, February 11, 2008 - 12:48 pm

On Mon, 11 Feb 2008 09:23:18 -0800


is rather tame.  These are more than cleanups!  These sound like
machine-crashing bugs.  Do they crash machines?  How come nobody noticed
it?

All very strange...


--

From: Greg KH
Date: Monday, February 11, 2008 - 1:35 pm

No one has ever run the 'remove memory' codepath before, that's why they
were never seen before :)

thanks,

greg k-h
--

From: Badari Pulavarty
Date: Monday, February 11, 2008 - 2:32 pm

No they don't crash machine - mainly because, they never get called
with "root" argument (where we have the bug). They were never tested
before, since we don't have memory remove work yet. All it does
is, it leave /sysfs directory laying around and causing next
memory add failure. 

Thanks,
Badari

--

From: Yasunori Goto
Date: Tuesday, February 12, 2008 - 1:06 am

Badari-san.

Which function does call unregister_memory() or unregister_memory_section()?
I can't find its caller in current 2.6.24-mm1.


???????()
  |
  |nothing calls?
  |
  +-->unregister_memory_section()
       |
       |call
       |
       +---> remove_memory_block()
              |
              |call
              |
              +----> unregister_memory()

unregister_memory_section() is only externed in linux/memory.h.

Do you have any another patch to call it?
I think it is necessary for physical memory removing.

If you have not posted it or it is not merged to -mm,
I can understand why this bug remains.
If you posted it, could you point it to me?

Or do I misunderstand something?


Thanks.

-- 
Yasunori Goto 


--

From: Badari Pulavarty
Date: Tuesday, February 12, 2008 - 10:22 am

Yes. I am trying to complete the hotplug memory remove
support, so that I can use it for supporting it on ppc64
DLPAR environment.

Here is the patch to finish up some of the generic work
left. As you can see, I still need to finish up some work :(
Any help is appreciated :)

Thanks,
Badari

---
 include/linux/memory_hotplug.h |    4 ++++
 mm/memory_hotplug.c            |   33 +++++++++++++++++++++++++++++++++
 mm/sparse.c                    |   40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c	2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c	2008-02-07 17:17:57.000000000 -0800
@@ -81,6 +81,14 @@ static int __add_zone(struct zone *zone,
 	return 0;
 }
 
+static void __remove_zone(struct zone *zone, unsigned long phys_start_pfn)
+{
+	/*
+	 * TODO - Check to see if the zone is correctly adjusted
+	 * 	  Need to mark pages reserved ?
+	 */
+}
+
 static int __add_section(struct zone *zone, unsigned long phys_start_pfn)
 {
 	int nr_pages = PAGES_PER_SECTION;
@@ -102,6 +110,16 @@ static int __add_section(struct zone *zo
 	return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, unsigned long phys_start_pfn)
+{
+	if (!pfn_valid(phys_start_pfn))
+		return;
+
+	unregister_memory_section(__pfn_to_section(phys_start_pfn));
+	__remove_zone(zone, phys_start_pfn);
+	sparse_remove_one_section(zone, phys_start_pfn, PAGES_PER_SECTION);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +153,21 @@ int __add_pages(struct zone *zone, unsig
 }
 EXPORT_SYMBOL_GPL(__add_pages);
 
+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+		 unsigned long nr_pages)
+{
+	unsigned long i;
+	int start_sec, ...
From: Dave Hansen
Date: Tuesday, February 12, 2008 - 1:59 pm

I think you need at least a WARN_ON() there.  


Can none of this ever fail?

I also think having a function called __remove_section() that takes a
pfn is a bad idea.  How about passing an actual 'struct mem_section *'
into it?  One of the reasons I even made that structure was so that you
could hand it around to things and never be confused about pfn vs. paddr

I'd like to see some warnings in there if nr_pages or phys_start_pfn are
not section-aligned and some other sanity checks.  If someone is trying
to remove non-section-aligned areas, we either have something wrong, or
some other work to do, first keeping track of what section portions are
"removed".

I'd probably do:

void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
  		    unsigned long nr_pages)
{
	int i;
	int sections_to_remove;

	/*
	 * We can only remove entire sections.
	 */
	if (phys_start_pfn & ~PAGE_SECTION_MASK)
		return ...;
	if (nr_pages % PAGES_PER_SECTION)
		return ...;
	sections_to_remove = nr_pages / PAGES_PER_SECTION;

	for (i = 0; i < sections_to_remove; i++) {
		unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION;
		__remove_section(zone, __pfn_to_section(pfn));
	}

You're abusing this memmap variable.  Please make another variable
that's unsigned long and has a nice name if you're actually going to
store an 'unsigned long' in it.  That cast below should be a big red

Ugh.  Please put this in its own helper.  Also, sparse_decode_mem_map()
has absolutely no other users.  Please modify it so that you don't have
to do this gunk, like put the '& SECTION_MAP_MASK' in there.  You
probably just need:

struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum)
{
	/*
	 * mask off the extra low bits of information
	 */
	coded_mem_map &= SECTION_MAP_MASK;
        return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum);
}

Then, you can just do this:

	memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);

No ...
From: Badari Pulavarty
Date: Tuesday, February 12, 2008 - 2:56 pm

Yes. I got similar feedback from Andy. I was closely trying to mimic
__add_pages() for easy review/understanding.

I have an updated version (not fully tested) which takes section_nr as
argument instead of playing with pfns. Please review this one and see if



Hmm. my understand of memmap is limited. Can you help me out here ?
I was trying to use free_bootmem_node() to free up the allocations,
but I need nodeid from which allocation came from :(

Here is the updated (currently testing) patch.

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing.  offline_pages() correctly 
adjusted zone and marked the pages reserved.

Issue: If mem_map, usemap allocation could come from different places -
kmalloc, vmalloc, alloc_pages or bootmem. There is no easy way
to find and free up properly. Especially for bootmem, we need to
know which node the allocation came from.

---
 include/linux/memory_hotplug.h |    4 +++
 mm/memory_hotplug.c            |   30 ++++++++++++++++++++++++++++
 mm/sparse.c                    |   43 ++++++++++++++++++++++++++++++++++++++---
 3 files changed, 74 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c	2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c	2008-02-12 13:35:52.000000000 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
 	return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, unsigned long section_nr)
+{
+	if (!valid_section_nr(section_nr))
+		return;
+
+	unregister_memory_section(__nr_to_section(section_nr));
+	sparse_remove_one_section(zone, section_nr);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +144,27 @@ int __add_pages(struct ...
From: Dave Hansen
Date: Tuesday, February 12, 2008 - 2:57 pm

I do think passing in a mem_section* here is highly superior.  It makes
it impossible to pass a pfn in and not get a warning.

-- Dave

--

From: Badari Pulavarty
Date: Tuesday, February 12, 2008 - 3:07 pm

Only problem is, I need to hold pgdat_resize_lock() if pass *ms. 
If I don't hold the resize_lock, I have to re-evaluate. And also,
I need to pass section_nr for decoding the mem_map anyway :(

Thanks,
Badari

--

From: Dave Hansen
Date: Tuesday, February 12, 2008 - 3:15 pm

What's wrong with holding the resize lock?  What races, precisely, are

See sparse.c::__section_nr().  It takes a mem_section* and returns a
section_nr.

-- Dave

--

From: Badari Pulavarty
Date: Tuesday, February 12, 2008 - 3:51 pm

I was trying to avoid holding resize lock for entire duration of
remove_section(), which includes removing sysfs entries etc. Its
needed only to decode and clear out sectionmap. (I am no longer 
passing pfns).

Whats wrong with passing section_nr ? It simply checks if that
section exists and if so removes sysfs entries and corresponding

I know. It looked like a round about of getting section_nr while
we have that information easily available.

If you are really passionate about passing mem_section*, sure I
can do that :)

Thanks,
Badari

--

From: Badari Pulavarty
Date: Tuesday, February 12, 2008 - 4:03 pm

Here is the version with your suggestion. Do you like this better ?

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing.  offline_pages() correctly 
adjusted zone and marked the pages reserved.

Issue: If mem_map, usemap allocation could come from different places -
kmalloc, vmalloc, alloc_pages or bootmem. There is no easy way
to find and free up properly. Especially for bootmem, we need to
know which node the allocation came from.

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>

---
 include/linux/memory_hotplug.h |    4 ++++
 mm/memory_hotplug.c            |   34 ++++++++++++++++++++++++++++++++++
 mm/sparse.c                    |   39 ++++++++++++++++++++++++++++++++++++---
 3 files changed, 74 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c	2008-02-07 17:16:52.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c	2008-02-12 14:49:07.000000000 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
 	return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, struct mem_section *ms)
+{
+	if (!valid_section(ms))
+		return;
+
+	unregister_memory_section(ms);
+	sparse_remove_one_section(zone, ms);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +144,31 @@ int __add_pages(struct zone *zone, unsig
 }
 EXPORT_SYMBOL_GPL(__add_pages);
 
+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+		 unsigned long nr_pages)
+{
+	unsigned long i;
+	int sections_to_remove;
+	unsigned long flags;
+	struct pglist_data *pgdat = zone->zone_pgdat;
+
+	/*
+	 * We can only remove entire sections
+	 */
+	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
+	BUG_ON(nr_pages % ...
From: Dave Hansen
Date: Tuesday, February 12, 2008 - 4:20 pm

I do like how it looks, better, thanks.


--

From: Dave Hansen
Date: Tuesday, February 12, 2008 - 3:06 pm

Never mind.  That was a bad suggestion.  I do think it would be a good
idea to mark the 'struct page' of ever page we use as bootmem in some
way.  Perhaps page->private?  Otherwise, you can simply try all of the
possibilities and consider the remainder bootmem.  Did you ever find out
if we properly initialize the bootmem 'struct page's?

Please have mercy and put this in a helper, first of all.

static void free_usemap(unsigned long *usemap)
{
	if (!usemap_
		return;

	if (PageSlab(virt_to_page(usemap))) {
		kfree(usemap)
	} else if (is_vmalloc_addr(usemap)) {
		vfree(usemap);
	} else {
		int nid = page_to_nid(virt_to_page(usemap));
		bootmem_fun_here(NODE_DATA(nid), usemap);
	}
}


How is this any different from pfn_to_nid() on the thing?  Or, can you
not use that because we never init'd the bootmem 'struct page's?

If so, I think the *CORRECT* fix is to give the bootmem areas real
struct pages, probably at boot-time.

-- Dave

--

From: Yasunori Goto
Date: Tuesday, February 12, 2008 - 10:09 pm

Thanks Badari-san.


I agree. page->private is not used by bootmem allocator.

I would like to mark not only memmap but also pgdat (and so on)

It may work. But, to be honest, I feel there are TOO MANY allocation/free
way for memmap (usemap and so on). If possible, I would like to
unify some of them. I would like to try it.

Bye.

-- 
Yasunori Goto 


--

From: Badari Pulavarty
Date: Wednesday, February 13, 2008 - 10:31 am

Thank you for the offer. Here is the latest patch, feel free to
rip it out.

Thanks,
Badari

Generic helper function to remove section mappings and sysfs entries
for the section of the memory we are removing.  offline_pages() correctly 
adjusted zone and marked the pages reserved.

Issue: Need help on freeing up allocation made from bootmem. 

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>

---
 include/linux/memory_hotplug.h |    4 +++
 mm/memory_hotplug.c            |   34 +++++++++++++++++++++++++++++++
 mm/sparse.c                    |   44 ++++++++++++++++++++++++++++++++++++++---
 3 files changed, 79 insertions(+), 3 deletions(-)

Index: linux-2.6.24/mm/memory_hotplug.c
===================================================================
--- linux-2.6.24.orig/mm/memory_hotplug.c	2008-02-12 15:07:09.000000000 -0800
+++ linux-2.6.24/mm/memory_hotplug.c	2008-02-12 15:08:50.000000000 -0800
@@ -102,6 +102,15 @@ static int __add_section(struct zone *zo
 	return register_new_memory(__pfn_to_section(phys_start_pfn));
 }
 
+static void __remove_section(struct zone *zone, struct mem_section *ms)
+{
+	if (!valid_section(ms))
+		return;
+
+	unregister_memory_section(ms);
+	sparse_remove_one_section(zone, ms);
+}
+
 /*
  * Reasonably generic function for adding memory.  It is
  * expected that archs that support memory hotplug will
@@ -135,6 +144,31 @@ int __add_pages(struct zone *zone, unsig
 }
 EXPORT_SYMBOL_GPL(__add_pages);
 
+void __remove_pages(struct zone *zone, unsigned long phys_start_pfn,
+		 unsigned long nr_pages)
+{
+	unsigned long i;
+	int sections_to_remove;
+	unsigned long flags;
+	struct pglist_data *pgdat = zone->zone_pgdat;
+
+	/*
+	 * We can only remove entire sections
+	 */
+	BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK);
+	BUG_ON(nr_pages % PAGES_PER_SECTION);
+
+	sections_to_remove = nr_pages / PAGES_PER_SECTION;
+
+	for (i = 0; i < sections_to_remove; i++) {
+		unsigned long pfn = phys_start_pfn + ...

This is a note to let you know that I've just added the patch titled

     Subject: driver core: register_memory/unregister_memory clean ups and bugfix

to my gregkh-2.6 tree.  Its filename is

     driver-core-register_memory-unregister_memory-clean-ups-and-bugfix.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


From pbadari@us.ibm.com Mon Feb 11 09:20:30 2008
From: Badari Pulavarty <pbadari@us.ibm.com>
Date: Mon, 11 Feb 2008 09:23:18 -0800
Subject: driver core: register_memory/unregister_memory clean ups and bugfix
To: Andrew Morton <akpm@linux-foundation.org>, lkml <linux-kernel@vger.kernel.org>
Cc: Greg KH <greg@kroah.com>, haveblue@us.ibm.com, linux-mm <linux-mm@kvack.org>
Message-ID: <1202750598.25604.3.camel@dyn9047017100.beaverton.ibm.com>

register_memory()/unregister_memory() never gets called with
"root". unregister_memory() is accessing kobject_name of
the object just freed up. Since no one uses the code,
lets take the code out. And also, make register_memory() static.

Another bug fix - before calling unregister_memory()
remove_memory_block() gets a ref on kobject. unregister_memory()
need to drop that ref before calling sysdev_unregister().

Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/base/memory.c |   22 +++++++---------------
 1 file changed, 7 insertions(+), 15 deletions(-)

--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -62,8 +62,8 @@ void unregister_memory_notifier(struct n
 /*
  * register_memory - Setup a sysfs device for a memory block
  */
-int register_memory(struct memory_block *memory, struct mem_section *section,
-		struct node *root)
+static
+int register_memory(struct memory_block *memory, struct mem_section *section)
 {
 	int error;
 
@@ -71,26 +71,18 @@ int register_memory(struct memory_block 
 	memory->sysdev.id = __section_nr(section);
 
 	error = ...
Previous thread: reading user. extended attributes from symlinks in 2.6 kernels by Rok Ruzic on Monday, February 11, 2008 - 9:38 am. (2 messages)

Next thread: [PATCH] dmi: Prevent linked list corruption by Jean Delvare on Monday, February 11, 2008 - 10:22 am. (4 messages)