Re: [PATCH 0/20] x86_64 Relocatable bzImage support (V4)

Previous thread: none

Next thread: [PATCH] Fix race between proc_get_inode() and remove_proc_entry() by Alexey Dobriyan on Wednesday, March 7, 2007 - 2:01 am. (1 message)
From: Vivek Goyal
Date: Tuesday, March 6, 2007 - 11:57 pm

Hi,

Here is another attempt on x86_64 relocatable bzImage patches(V4). This
patchset makes a bzImage relocatable and same kernel binary can be loaded
and run from different physical addresses.

As on now, this mainly helps distros who have to ship an extra kernel compiled
for a different physical address to capture the kernel crash dump. This
patchset will allow distros and kdump users to use production kernel itself
as dump capture kernel and there is no need to ship/build an extra kernel.
I am hopeful people will find other interesting usages down the line.

Eric has done all the heavy weight lifting requird to make this patchset
work. Last time I posted this patchset (V3), there were minor comments
which I have taken care of. Following are the changes since V3.

- Reduced the usage of _AC() macro to only shift operations, as per 
  Andi's comment.
- Restored the CONFIG_PHYSICAL_START option.
- Fixed few bugs with suspend to disk code path.

It would be good if these patches get into -mm so that it can undergo more
testing. I have been testing them and these just work fine for me.

Thanks
Vivek
-

From: Vivek Goyal
Date: Tuesday, March 6, 2007 - 11:59 pm

This patch makes pgtable.h and page.h safe to include
in assembly files like head.S.  Allowing us to use
symbolic constants instead of hard coded numbers when
refering to the page tables.

This patch copies asm-sparc64/const.h to asm-x86_64 to
get a definition of _AC() a very convinient macro that
allows us to force the type when we are compiling the
code in C and to drop all of the type information when
we are using the constant in assembly.  Previously this
was done with multiple definition of the same constant.
const.h was modified slightly so that it works when given
CONFIG options as arguments.

This patch adds #ifndef __ASSEMBLY__ ... #endif
and _AC(1,UL) where appropriate so the assembler won't
choke on the header files.  Otherwise nothing
should have changed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 include/asm-x86_64/const.h   |   20 ++++++++++++++++++++
 include/asm-x86_64/page.h    |   28 ++++++++++------------------
 include/asm-x86_64/pgtable.h |   33 +++++++++++++++++++++------------
 3 files changed, 51 insertions(+), 30 deletions(-)

diff -puN /dev/null include/asm-x86_64/const.h
--- /dev/null	2007-03-07 00:46:17.354096448 +0530
+++ linux-2.6.21-rc2-reloc-root/include/asm-x86_64/const.h	2007-03-07 01:20:54.000000000 +0530
@@ -0,0 +1,20 @@
+/* const.h: Macros for dealing with constants.  */
+
+#ifndef _X86_64_CONST_H
+#define _X86_64_CONST_H
+
+/* Some constant macros are used in both assembler and
+ * C code.  Therefore we cannot annotate them always with
+ * 'UL' and other type specificers unilaterally.  We
+ * use the following macros to deal with this.
+ */
+
+#ifdef __ASSEMBLY__
+#define _AC(X,Y)	X
+#else
+#define __AC(X,Y)	(X##Y)
+#define _AC(X,Y)	__AC(X,Y)
+#endif
+
+
+#endif /* !(_X86_64_CONST_H) */
diff -puN include/asm-x86_64/page.h~x86_64-Assembly-safe-page.h-and-pgtable.h include/asm-x86_64/page.h
--- ...
From: Sam Ravnborg
Date: Wednesday, March 7, 2007 - 12:24 pm

Should this file not live in asm-generic and be useable
for all architectures?

	Sam
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 11:01 pm

Hi Sam,

Thanks for the review. This makes sense to me. Move const.h into
asm-generic and let everybody use it.

This is more of a small cleanup issue and involves changing few header files
in asm-sparc64 and make sure nothing is broken on sparc64. This patchset
is already becoming big and complex. Is it ok if we let the patch 
remain unmodified for now and once this gets in and settles down, I can
post another patch to do above modification?

Thanks
Vivek
-

From: Eric W. Biederman
Date: Wednesday, March 7, 2007 - 11:16 pm

Actually unless there is a reason not to, we can probably move this
into include/linux instead of include/asm-generic.  I don't see anything
in that header file that is architecture specific in any way.  Except
that it happens to only be used in architecture specific code.

Eric
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:00 am

Early in the boot process we need the ability to set
up temporary mappings, before our normal mechanisms are
initialized.  Currently this is used to map pages that
are part of the page tables we are building and pages
during the dmi scan.

The core problem is that we are using the user portion of
the page tables to implement this.  Which means that while
this mechanism is active we cannot catch NULL pointer dereferences
and we deviate from the normal ways of handling things.

In this patch I modify early_ioremap to map pages into
the kernel portion of address space, roughly where
we will later put modules, and I make the discovery of
which addresses we can use dynamic which removes all
kinds of static limits and remove the dependencies
on implementation details between different parts of the code.

Now alloc_low_page() and unmap_low_page() use 
early_iomap() and early_iounmap() to allocate/map and 
unmap a page.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S |    3 -
 arch/x86_64/mm/init.c     |  100 ++++++++++++++++++++--------------------------
 2 files changed, 45 insertions(+), 58 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-Kill-temp_boot_pmds arch/x86_64/kernel/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-Kill-temp_boot_pmds	2007-03-07 01:21:26.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S	2007-03-07 01:21:26.000000000 +0530
@@ -288,9 +288,6 @@ NEXT_PAGE(level2_ident_pgt)
 	.quad	i << 21 | 0x083
 	i = i + 1
 	.endr
-	/* Temporary mappings for the super early allocator in arch/x86_64/mm/init.c */
-	.globl temp_boot_pmds
-temp_boot_pmds:
 	.fill	492,8,0
 	
 NEXT_PAGE(level2_kernel_pgt)
diff -puN arch/x86_64/mm/init.c~x86_64-Kill-temp_boot_pmds arch/x86_64/mm/init.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/mm/init.c~x86_64-Kill-temp_boot_pmds	2007-03-07 01:21:26.000000000 +0530
+++ ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:02 am

- Merge physmem_pgt and ident_pgt, removing physmem_pgt.  The merge
  is broken as soon as mm/init.c:init_memory_mapping is run.
- As physmem_pgt is gone don't export it in pgtable.h.
- Use defines from pgtable.h for page permissions.
- Fix the physical memory identity mapping so it is at the correct
  address.
- Remove the physical memory mapping from wakeup_level4_pgt it
  is at the wrong address so we can't possibly be usinging it.
- Simply NEXT_PAGE the work to calculate the phys_ alias
  of the labels was very cool.  Unfortuantely it was a brittle
  special purpose hack that makes maitenance more difficult.
  Instead just use label - __START_KERNEL_map like we do
  everywhere else in assembly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S    |   61 +++++++++++++++++++------------------------
 include/asm-x86_64/pgtable.h |    1 
 2 files changed, 28 insertions(+), 34 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-Cleanup-the-early-boot-page-table arch/x86_64/kernel/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-Cleanup-the-early-boot-page-table	2007-03-07 01:22:07.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S	2007-03-07 01:22:07.000000000 +0530
@@ -13,6 +13,7 @@
 #include <linux/init.h>
 #include <asm/desc.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
 #include <asm/msr.h>
 #include <asm/cache.h>
@@ -260,52 +261,48 @@ ljumpvector:
 ENTRY(stext)
 ENTRY(_stext)
 
-	$page = 0
 #define NEXT_PAGE(name) \
-	$page = $page + 1; \
-	.org $page * 0x1000; \
-	phys_/**/name = $page * 0x1000 + __PHYSICAL_START; \
+	.balign	PAGE_SIZE; \
 ENTRY(name)
 
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)		\
+	i = 0 ;					\
+	.rept (COUNT) ;				\
+	.quad	(START) + (i << 21) + (PERM) ;	\
+	i = i + 1 ;				\
+	.endr
+
 NEXT_PAGE(init_level4_pgt)
 ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:03 am

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/early_printk.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff -puN arch/x86_64/kernel/early_printk.c~x86_64-fix-early_printk-to-use-the-standard-ISA-mapping arch/x86_64/kernel/early_printk.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/early_printk.c~x86_64-fix-early_printk-to-use-the-standard-ISA-mapping	2007-03-07 01:22:33.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/early_printk.c	2007-03-07 01:22:33.000000000 +0530
@@ -11,11 +11,10 @@
 
 #ifdef __i386__
 #include <asm/setup.h>
-#define VGABASE		(__ISA_IO_base + 0xb8000)
 #else
 #include <asm/bootsetup.h>
-#define VGABASE		((void __iomem *)0xffffffff800b8000UL)
 #endif
+#define VGABASE		(__ISA_IO_base + 0xb8000)
 
 static int max_ypos = 25, max_xpos = 80;
 static int current_ypos = 25, current_xpos = 0;
_
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:04 am

Use virtual addresses instead of physical addresses
in copy bootdata.  In addition fix the implementation
of the old bootloader convention.  Everything is
at real_mode_data always.  It is just that sometimes
real_mode_data was relocated by setup.S to not sit at
0x90000.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head64.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff -puN arch/x86_64/kernel/head64.c~x86_64-modify-copy_bootdata-to-use-virtual-addresses arch/x86_64/kernel/head64.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head64.c~x86_64-modify-copy_bootdata-to-use-virtual-addresses	2007-03-07 01:23:55.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head64.c	2007-03-07 01:23:55.000000000 +0530
@@ -29,25 +29,24 @@ static void __init clear_bss(void)
 }
 
 #define NEW_CL_POINTER		0x228	/* Relative to real mode data */
-#define OLD_CL_MAGIC_ADDR	0x90020
+#define OLD_CL_MAGIC_ADDR	0x20
 #define OLD_CL_MAGIC            0xA33F
-#define OLD_CL_BASE_ADDR        0x90000
-#define OLD_CL_OFFSET           0x90022
+#define OLD_CL_OFFSET           0x22
 
 static void __init copy_bootdata(char *real_mode_data)
 {
-	int new_data;
+	unsigned long new_data;
 	char * command_line;
 
 	memcpy(x86_boot_params, real_mode_data, BOOT_PARAM_SIZE);
-	new_data = *(int *) (x86_boot_params + NEW_CL_POINTER);
+	new_data = *(u32 *) (x86_boot_params + NEW_CL_POINTER);
 	if (!new_data) {
-		if (OLD_CL_MAGIC != * (u16 *) OLD_CL_MAGIC_ADDR) {
+		if (OLD_CL_MAGIC != *(u16 *)(real_mode_data + OLD_CL_MAGIC_ADDR)) {
 			return;
 		}
-		new_data = OLD_CL_BASE_ADDR + * (u16 *) OLD_CL_OFFSET;
+		new_data = __pa(real_mode_data) + *(u16 *)(real_mode_data + OLD_CL_OFFSET);
 	}
-	command_line = (char *) ((u64)(new_data));
+	command_line = __va(new_data);
 	memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
 }
 
@@ -74,7 +73,7 @@ void __init ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:06 am

Move __KERNEL32_CS up into the unused gdt entry.  __KERNEL32_CS is
used when entering the kernel so putting it first is useful when
trying to keep boot gdt sizes to a minimum.

Set the accessed bit on all gdt entries.  We don't care
so there is no need for the cpu to burn the extra cycles,
and it potentially allows the pages to be immutable.  Plus
it is confusing when debugging and your gdt entries mysteriously
change.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S    |   12 ++++++------
 include/asm-x86_64/segment.h |    2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-cleanup-segments arch/x86_64/kernel/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-cleanup-segments	2007-03-07 01:24:53.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S	2007-03-07 01:24:53.000000000 +0530
@@ -362,13 +362,13 @@ gdt:
 	
 ENTRY(cpu_gdt_table)
 	.quad	0x0000000000000000	/* NULL descriptor */
+	.quad	0x00cf9b000000ffff	/* __KERNEL32_CS */
+	.quad	0x00af9b000000ffff	/* __KERNEL_CS */
+	.quad	0x00cf93000000ffff	/* __KERNEL_DS */
+	.quad	0x00cffb000000ffff	/* __USER32_CS */
+	.quad	0x00cff3000000ffff	/* __USER_DS, __USER32_DS  */
+	.quad	0x00affb000000ffff	/* __USER_CS */
 	.quad	0x0			/* unused */
-	.quad	0x00af9a000000ffff	/* __KERNEL_CS */
-	.quad	0x00cf92000000ffff	/* __KERNEL_DS */
-	.quad	0x00cffa000000ffff	/* __USER32_CS */
-	.quad	0x00cff2000000ffff	/* __USER_DS, __USER32_DS  */		
-	.quad	0x00affa000000ffff	/* __USER_CS */
-	.quad	0x00cf9a000000ffff	/* __KERNEL32_CS */
 	.quad	0,0			/* TSS */
 	.quad	0,0			/* LDT */
 	.quad   0,0,0			/* three TLS descriptors */ 
diff -puN include/asm-x86_64/segment.h~x86_64-cleanup-segments include/asm-x86_64/segment.h
--- linux-2.6.21-rc2-reloc/include/asm-x86_64/segment.h~x86_64-cleanup-segments	2007-03-07 01:24:53.000000000 +0530
+++ ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:10 am

o Get rid of dead code in wakeup.S

o We never restore from saved_gdt, saved_idt, saved_ltd, saved_tss, saved_cr3,
  saved_cr4, saved_cr0, real_save_gdt, saved_efer, saved_efer2. Get rid
  of of associated code.

o Get rid of bogus_magic, bogus_31_magic and bogus_magic2. No longer being
  used.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |   57 ---------------------------------------
 1 file changed, 1 insertion(+), 56 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-get-rid-of-dead-code-in-suspend-resume	2007-03-07 01:26:21.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2007-03-07 01:26:21.000000000 +0530
@@ -258,8 +258,6 @@ gdt_48a:
 	.word	0, 0				# gdt base (filled in later)
 	
 	
-real_save_gdt:	.word 0
-		.quad 0
 real_magic:	.quad 0
 video_mode:	.quad 0
 video_flags:	.quad 0
@@ -272,10 +270,6 @@ bogus_32_magic:
 	movb	$0xb3,%al	;  outb %al,$0x80
 	jmp bogus_32_magic
 
-bogus_31_magic:
-	movb	$0xb1,%al	;  outb %al,$0x80
-	jmp bogus_31_magic
-
 bogus_cpu:
 	movb	$0xbc,%al	;  outb %al,$0x80
 	jmp bogus_cpu
@@ -346,16 +340,6 @@ check_vesaa:
 
 _setbada: jmp setbada
 
-	.code64
-bogus_magic:
-	movw	$0x0e00 + 'B', %ds:(0xb8018)
-	jmp bogus_magic
-
-bogus_magic2:
-	movw	$0x0e00 + '2', %ds:(0xb8018)
-	jmp bogus_magic2
-	
-
 wakeup_stack_begin:	# Stack grows down
 
 .org	0xff0
@@ -373,28 +357,11 @@ ENTRY(wakeup_end)
 #
 # Returned address is location of code in low memory (past data and stack)
 #
+	.code64
 ENTRY(acpi_copy_wakeup_routine)
 	pushq	%rax
-	pushq	%rcx
 	pushq	%rdx
 
-	sgdt	saved_gdt
-	sidt	saved_idt
-	sldt	saved_ldt
-	str	saved_tss
-
-	movq    %cr3, %rdx
-	movq    %rdx, saved_cr3
-	movq    %cr4, %rdx
-	movq    %rdx, saved_cr4
-	movq	%cr0, ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:13 am

o Various cleanups. One of the main purpose of cleanups is that make
  wakeup.S as close as possible to trampoline.S.

o Following are the changes
	- Indentations for comments.
	- Changed the gdt table to compact form and to resemble the
	  one in trampoline.S
	- Take the jump to 32bit from real mode using ljmpl. Makes code
	  more readable.
	- After enabling long mode, directly take a long jump for 64bit
	  mode. No need to take an extra jump to "reach_comaptibility_mode"
	- Stack is not used after real mode. So don't load stack in
 	  32 bit mode.
	- No need to enable PGE here.
	- No need to do extra EFER read, anyway we trash the read contents.
	- No need to enable system call (EFER_SCE). Anyway it will be 
	  enabled when original EFER is restored.
	- No need to set MP, ET, NE, WP, AM bits in cr0. Very soon we will
  	  reload the original cr0 while restroing the processor state.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |  112 +++++++++++++--------------------------
 1 file changed, 40 insertions(+), 72 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-misc-cleanups	2007-03-07 01:27:32.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2007-03-07 01:27:32.000000000 +0530
@@ -30,11 +30,12 @@ wakeup_code:
 	cld
 	# setup data segment
 	movw	%cs, %ax
-	movw	%ax, %ds					# Make ds:0 point to wakeup_start
+	movw	%ax, %ds		# Make ds:0 point to wakeup_start
 	movw	%ax, %ss
-	mov	$(wakeup_stack - wakeup_code), %sp		# Private stack is needed for ASUS board
+					# Private stack is needed for ASUS board
+	mov	$(wakeup_stack - wakeup_code), %sp
 
-	pushl	$0						# Kill any dangerous flags
+	pushl	$0			# Kill any dangerous flags
 	popfl
 
 	movl	real_magic - wakeup_code, %eax
@@ -45,7 +46,7 @@ ...
From: Pavel Machek
Date: Wednesday, March 7, 2007 - 3:40 pm

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 9:25 pm

Hi Pavel,

Thanks. I tested all the S3 related changes during my last posting. That
time I had access to an x86_64 box which supported ACPI state S3. Since then
this code has not changed and it has been running successfully in RHEL5 
kernels. Now I don't have access to an x86_64 machine which supports S3
so I can't test suspend to RAM. But I am sure that these patches are working
as nothing has changed since last posting.

Just now Nigel reported successful suspend to RAM results for 2.6.20. I
have requested him to test it for 2.6.21-rc2 also, if possible.

I have throughly tested suspend to disk (S4) and it works fine.

Thanks
Vivek
-

From: Pavel Machek
Date: Wednesday, March 7, 2007 - 3:41 pm

Outbs were my debugging hacks, perhaps you can simply remove them at
this point? Not sure how useful "Linux" debug print is, it can
probably be removed, too.


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 9:29 pm

Hi Pavel,

I found these debugging hacks useful while debugging some problem with 
my changes in this code. It helps to find out till what poing code flow
as reached in this assembly code. So I think its not a bad idea to let
this piece code be there.

Thanks
Vivek
-

From: Pavel Machek
Date: Thursday, March 8, 2007 - 4:43 am

Ok. (I was not sure how common are displays of port 0x80).


-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Lombard, David N
Date: Thursday, March 8, 2007 - 9:45 am

They're now typically available in reasonably well stocked stores; I picked
one up from CompUSA about a year ago...

-- 
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:14 am

o Moved wakeup_level4_pgt into the wakeup routine so we can
  run the kernel above 4G.

o Now we first go to 64bit mode and continue to run from trampoline and
  then then start accessing kernel symbols and restore processor context.
  This enables us to resume even in relocatable kernel context when 
  kernel might not be loaded at physical addr it has been compiled for.

o Removed the need for modifying any existing kernel page table.

o Increased the size of the wakeup routine to 8K. This is required as
  wake page tables are on trampoline itself and they got to be at 4K
  boundary, hence one page is not sufficient.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/sleep.c  |   22 ++------------
 arch/x86_64/kernel/acpi/wakeup.S |   59 ++++++++++++++++++++++++---------------
 arch/x86_64/kernel/head.S        |    9 -----
 3 files changed, 41 insertions(+), 49 deletions(-)

diff -puN arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline arch/x86_64/kernel/acpi/sleep.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline	2007-03-07 01:28:11.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/sleep.c	2007-03-07 01:28:11.000000000 +0530
@@ -60,17 +60,6 @@ extern char wakeup_start, wakeup_end;
 
 extern unsigned long acpi_copy_wakeup_routine(unsigned long);
 
-static pgd_t low_ptr;
-
-static void init_low_mapping(void)
-{
-	pgd_t *slot0 = pgd_offset(current->mm, 0UL);
-	low_ptr = *slot0;
-	set_pgd(slot0, *pgd_offset(current->mm, PAGE_OFFSET));
-	WARN_ON(num_online_cpus() != 1);
-	local_flush_tlb();
-}
-
 /**
  * acpi_save_state_mem - save kernel state
  *
@@ -79,8 +68,6 @@ static void init_low_mapping(void)
  */
 int acpi_save_state_mem(void)
 {
-	init_low_mapping();
-
 	memcpy((void *)acpi_wakeup_address, &wakeup_start,
 	       &wakeup_end - &wakeup_start);
 ...
From: Pavel Machek
Date: Wednesday, March 7, 2007 - 3:45 pm

Hmm, if you split it like
		printk(KERN_CRIT "ACPI: Wakeup code way too big, will crash"
			"on attempt to suspend\n");


spaces vs. tabs problem.

Otherwise looks good. ACK if it was tested.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Bernhard Walle
Date: Wednesday, March 7, 2007 - 3:57 pm

But I guess with a space after "crash" or before "on" :)


Regards,
Bernhard
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 9:58 pm

On Wed, Mar 07, 2007 at 11:45:08PM +0100, Pavel Machek wrote:


Thanks. Done the changes in attached patch.

Vivek



o Moved wakeup_level4_pgt into the wakeup routine so we can
  run the kernel above 4G.

o Now we first go to 64bit mode and continue to run from trampoline and
  then then start accessing kernel symbols and restore processor context.
  This enables us to resume even in relocatable kernel context when 
  kernel might not be loaded at physical addr it has been compiled for.

o Removed the need for modifying any existing kernel page table.

o Increased the size of the wakeup routine to 8K. This is required as
  wake page tables are on trampoline itself and they got to be at 4K
  boundary, hence one page is not sufficient.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/sleep.c  |   23 ++-------------
 arch/x86_64/kernel/acpi/wakeup.S |   59 ++++++++++++++++++++++++---------------
 arch/x86_64/kernel/head.S        |    9 -----
 3 files changed, 41 insertions(+), 50 deletions(-)

diff -puN arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline arch/x86_64/kernel/acpi/sleep.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/sleep.c~x86_64-64bit-ACPI-wakeup-trampoline	2007-03-07 01:28:11.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/sleep.c	2007-03-08 18:26:13.000000000 +0530
@@ -60,17 +60,6 @@ extern char wakeup_start, wakeup_end;
 
 extern unsigned long acpi_copy_wakeup_routine(unsigned long);
 
-static pgd_t low_ptr;
-
-static void init_low_mapping(void)
-{
-	pgd_t *slot0 = pgd_offset(current->mm, 0UL);
-	low_ptr = *slot0;
-	set_pgd(slot0, *pgd_offset(current->mm, PAGE_OFFSET));
-	WARN_ON(num_online_cpus() != 1);
-	local_flush_tlb();
-}
-
 /**
  * acpi_save_state_mem - save kernel state
  *
@@ -79,8 +68,6 @@ static void init_low_mapping(void)
  */
 int acpi_save_state_mem(void)
 {
-	init_low_mapping();
-
 ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:17 am

With the rewrite of the SMP trampoline and the early page
allocator there is nothing that needs identity mapped pages,
once we start executing C code.

So add zap_identity_mappings into head64.c and remove
zap_low_mappings() from much later in the code.  The functions
 are subtly different thus the name change.

This also kills boot_level4_pgt which was from an earlier
attempt to move the identity mappings as early as possible,
and is now no longer needed.  Essentially I have replaced
boot_level4_pgt with trampoline_level4_pgt in trampoline.S

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S    |   39 ++++++++++++++-------------------------
 arch/x86_64/kernel/head64.c  |   17 +++++++++++------
 arch/x86_64/kernel/setup.c   |    2 --
 arch/x86_64/kernel/setup64.c |    1 -
 arch/x86_64/mm/init.c        |   24 ------------------------
 include/asm-x86_64/pgtable.h |    1 -
 include/asm-x86_64/proto.h   |    2 --
 7 files changed, 25 insertions(+), 61 deletions(-)

diff -puN arch/x86_64/kernel/head64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible arch/x86_64/kernel/head64.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head64.c~x86_64-Remove-the-identity-mapping-as-early-as-possible	2007-03-07 01:29:50.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head64.c	2007-03-07 01:29:50.000000000 +0530
@@ -18,8 +18,16 @@
 #include <asm/setup.h>
 #include <asm/desc.h>
 #include <asm/pgtable.h>
+#include <asm/tlbflush.h>
 #include <asm/sections.h>
 
+static void __init zap_identity_mappings(void)
+{
+	pgd_t *pgd = pgd_offset_k(0UL);
+	pgd_clear(pgd);
+	__flush_tlb();
+}
+
 /* Don't add a printk in there. printk relies on the PDA which is not initialized 
    yet. */
 static void __init clear_bss(void)
@@ -57,18 +65,15 @@ void __init x86_64_start_kernel(char * r
 	/* clear bss before set_intr_gate with early_idt_handler */
 	clear_bss();
 
+	/* Make NULL ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:18 am

o __pa() should be used only on kernel linearly mapped virtual addresses
  and not on kernel text and data addresses.

o Hibernation code needs to determine the physical address associated
  with kernel symbol to mark a section boundary which contains pages which
  don't have to be saved and restored during hibernate/resume operation.

o Move this piece of code in arch dependent section. So that architectures
  which don't have kernel text/data mapped into kernel linearly mapped
  region can come up with their own ways of determining physical addresses
  associated with a kernel text.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/i386/power/suspend.c     |   14 ++++++++++++++
 arch/powerpc/kernel/Makefile  |    1 +
 arch/powerpc/kernel/suspend.c |   24 ++++++++++++++++++++++++
 arch/x86_64/kernel/suspend.c  |   14 ++++++++++++++
 kernel/power/power.h          |    5 ++---
 kernel/power/snapshot.c       |   11 -----------
 6 files changed, 55 insertions(+), 14 deletions(-)

diff -puN arch/i386/power/suspend.c~move-swsusp-__pa-dependent-code-to-arch-portion arch/i386/power/suspend.c
--- linux-2.6.21-rc2-reloc/arch/i386/power/suspend.c~move-swsusp-__pa-dependent-code-to-arch-portion	2007-03-07 01:30:18.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/i386/power/suspend.c	2007-03-07 01:30:18.000000000 +0530
@@ -16,6 +16,9 @@
 /* Defined in arch/i386/power/swsusp.S */
 extern int restore_image(void);
 
+/* References to section boundaries */
+extern const void __nosave_begin, __nosave_end;
+
 /* Pointer to the temporary resume page tables */
 pgd_t *resume_pg_dir;
 
@@ -156,3 +159,14 @@ int swsusp_arch_resume(void)
 	restore_image();
 	return 0;
 }
+
+/*
+ *	pfn_is_nosave - check if given pfn is in the 'nosave' section
+ */
+
+int pfn_is_nosave(unsigned long pfn)
+{
+	unsigned long nosave_begin_pfn = __pa_symbol(&__nosave_begin) >> PAGE_SHIFT;
+	unsigned long nosave_end_pfn = PAGE_ALIGN(__pa_symbol(&__nosave_end)) >> PAGE_SHIFT;
+	return (pfn >= ...
From: Pavel Machek
Date: Wednesday, March 7, 2007 - 3:47 pm

...in asm-generic/suspend.h (or something) and then just include it?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 10:34 pm

Actually it is not exactly same code. i386 and x86_64 use __pa_symbol()
and powerpc uses __pa() for determining physical address associated with
a kernel text symbol. That's the precise intent here. Leave it to arch
code to decide how to calculate physical address associated with a kernel

As code is not exactly same, we can't put it in asm-generic/suspend.h.

Thanks
Vivek
-

From: Pavel Machek
Date: Thursday, March 8, 2007 - 4:47 am

Ok... ACK on the original patch, then.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:25 am

o This patch moves the code to verify long mode and SSE to a common file.
  This code is now shared by trampoline.S, wakeup.S, boot/setup.S and
  boot/compressed/head.S

o So far we used to do very limited check in trampoline.S, wakeup.S and
  in 32bit entry point. Now all the entry paths are forced to do the
  exhaustive check, including SSE because verify_cpu is shared.

o I am keeping this patch as last in the x86 relocatable series because
  previous patches have got quite some amount of testing done and don't want
  to distrub that. So that if there is problem introduced by this patch, at
  least it can be easily isolated.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/compressed/head.S |   19 ++++++
 arch/x86_64/boot/setup.S           |   65 ++-------------------
 arch/x86_64/kernel/acpi/wakeup.S   |   30 +++++-----
 arch/x86_64/kernel/trampoline.S    |   51 +----------------
 arch/x86_64/kernel/verify_cpu.S    |  110 +++++++++++++++++++++++++++++++++++++
 5 files changed, 152 insertions(+), 123 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-move-cpu-verfication-code-to-common-file arch/x86_64/boot/compressed/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/boot/compressed/head.S~x86_64-move-cpu-verfication-code-to-common-file	2007-03-07 01:32:27.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/compressed/head.S	2007-03-07 01:32:27.000000000 +0530
@@ -54,6 +54,15 @@ startup_32:
 1:	popl	%ebp
 	subl	$1b, %ebp
 
+/* setup a stack and make sure cpu supports long mode. */
+	movl	$user_stack_end, %eax
+	addl	%ebp, %eax
+	movl	%eax, %esp
+
+	call	verify_cpu
+	testl	%eax, %eax
+	jnz	no_longmode
+
 /* Compute the delta between where we were compiled to run at
  * and where the code will actually run at.
  */
@@ -159,13 +168,21 @@ startup_32:
 	/* Jump from 32bit compatibility mode into 64bit mode. */
 	lret
 
+no_longmode:
+	/* This isn't an x86-64 ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:24 am

o Extend the bzImage protocol (same as i386) to allow bzImage loaders to
  load the protected mode kernel at non-1MB address. Now protected mode
  component is relocatable and can be loaded at non-1MB addresses.

o As of today kdump uses it to run a second kernel from a reserved memory
  area.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/boot/setup.S |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff -puN arch/x86_64/boot/setup.S~x86_64-extend-bzImage-protocol-for-relocatable-bzImage arch/x86_64/boot/setup.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/boot/setup.S~x86_64-extend-bzImage-protocol-for-relocatable-bzImage	2007-03-07 01:32:01.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/setup.S	2007-03-07 01:32:01.000000000 +0530
@@ -80,7 +80,7 @@ start:
 # This is the setup header, and it must start at %cs:2 (old 0x9020:2)
 
 		.ascii	"HdrS"		# header signature
-		.word	0x0204		# header version number (>= 0x0105)
+		.word	0x0205		# header version number (>= 0x0105)
 					# or else old loadlin-1.5 will fail)
 realmode_swtch:	.word	0, 0		# default_switch, SETUPSEG
 start_sys_seg:	.word	SYSSEG
@@ -155,7 +155,16 @@ cmd_line_ptr:	.long 0			# (Header versio
 					# low memory 0x10000 or higher.
 
 ramdisk_max:	.long 0xffffffff
-	
+kernel_alignment:  .long 0x200000       # physical addr alignment required for
+					# protected mode relocatable kernel
+#ifdef CONFIG_RELOCATABLE
+relocatable_kernel:    .byte 1
+#else
+relocatable_kernel:    .byte 0
+#endif
+pad2:                  .byte 0
+pad3:                  .word 0
+
 trampoline:	call	start_of_setup
 		.align 16
 					# The offset at this point is 0x240
_
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:12 am

o Use appropriate names for 64bit regsiters.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/acpi/wakeup.S |   36 ++++++++++++++++++------------------
 include/asm-x86_64/suspend.h     |   12 ++++++------
 2 files changed, 24 insertions(+), 24 deletions(-)

diff -puN arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names arch/x86_64/kernel/acpi/wakeup.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/acpi/wakeup.S~x86_64-wakeup.S-rename-registers-to-reflect-right-names	2007-03-07 01:26:55.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/acpi/wakeup.S	2007-03-07 01:26:55.000000000 +0530
@@ -211,16 +211,16 @@ wakeup_long64:
 	movw	%ax, %es
 	movw	%ax, %fs
 	movw	%ax, %gs
-	movq	saved_esp, %rsp
+	movq	saved_rsp, %rsp
 
 	movw	$0x0e00 + 'x', %ds:(0xb8018)
-	movq	saved_ebx, %rbx
-	movq	saved_edi, %rdi
-	movq	saved_esi, %rsi
-	movq	saved_ebp, %rbp
+	movq	saved_rbx, %rbx
+	movq	saved_rdi, %rdi
+	movq	saved_rsi, %rsi
+	movq	saved_rbp, %rbp
 
 	movw	$0x0e00 + '!', %ds:(0xb801a)
-	movq	saved_eip, %rax
+	movq	saved_rip, %rax
 	jmp	*%rax
 
 .code32
@@ -408,13 +408,13 @@ do_suspend_lowlevel:
 	movq %r15, saved_context_r15(%rip)
 	pushfq ; popq saved_context_eflags(%rip)
 
-	movq	$.L97, saved_eip(%rip)
+	movq	$.L97, saved_rip(%rip)
 
-	movq %rsp,saved_esp
-	movq %rbp,saved_ebp
-	movq %rbx,saved_ebx
-	movq %rdi,saved_edi
-	movq %rsi,saved_esi
+	movq %rsp,saved_rsp
+	movq %rbp,saved_rbp
+	movq %rbx,saved_rbx
+	movq %rdi,saved_rdi
+	movq %rsi,saved_rsi
 
 	addq	$8, %rsp
 	movl	$3, %edi
@@ -461,12 +461,12 @@ do_suspend_lowlevel:
 	
 .data
 ALIGN
-ENTRY(saved_ebp)	.quad	0
-ENTRY(saved_esi)	.quad	0
-ENTRY(saved_edi)	.quad	0
-ENTRY(saved_ebx)	.quad	0
+ENTRY(saved_rbp)	.quad	0
+ENTRY(saved_rsi)	.quad	0
+ENTRY(saved_rdi)	.quad	0
+ENTRY(saved_rbx)	.quad	0
 ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:08 am

EFER varies like %cr4 depending on the cpu capabilities, and which cpu
capabilities we want to make use of.  So save/restore it make certain
we have the same EFER value when we are done.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/suspend.c |    3 ++-
 include/asm-x86_64/suspend.h |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff -puN arch/x86_64/kernel/suspend.c~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state arch/x86_64/kernel/suspend.c
--- linux-2.6.19-rc6-reloc/arch/x86_64/kernel/suspend.c~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state	2006-11-17 00:08:16.000000000 -0500
+++ linux-2.6.19-rc6-reloc-root/arch/x86_64/kernel/suspend.c	2006-11-17 00:08:16.000000000 -0500
@@ -33,7 +33,6 @@ void __save_processor_state(struct saved
 	asm volatile ("str %0"  : "=m" (ctxt->tr));
 
 	/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
-	/* EFER should be constant for kernel version, no need to handle it. */
 	/*
 	 * segment registers
 	 */
@@ -50,6 +49,7 @@ void __save_processor_state(struct saved
 	/*
 	 * control registers 
 	 */
+	rdmsrl(MSR_EFER, ctxt->efer);
 	asm volatile ("movq %%cr0, %0" : "=r" (ctxt->cr0));
 	asm volatile ("movq %%cr2, %0" : "=r" (ctxt->cr2));
 	asm volatile ("movq %%cr3, %0" : "=r" (ctxt->cr3));
@@ -75,6 +75,7 @@ void __restore_processor_state(struct sa
 	/*
 	 * control registers
 	 */
+	wrmsrl(MSR_EFER, ctxt->efer);
 	asm volatile ("movq %0, %%cr8" :: "r" (ctxt->cr8));
 	asm volatile ("movq %0, %%cr4" :: "r" (ctxt->cr4));
 	asm volatile ("movq %0, %%cr3" :: "r" (ctxt->cr3));
diff -puN include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state include/asm-x86_64/suspend.h
--- linux-2.6.19-rc6-reloc/include/asm-x86_64/suspend.h~x86_64-Add-EFER-to-the-set-registers-saved-by-save_processor_state	2006-11-17 00:08:16.000000000 -0500
+++ ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:22 am

This patch modifies the x86_64 kernel so that it can be loaded and run
at any 2M aligned address, below 512G.  The technique used is to
compile the decompressor with -fPIC and modify it so the decompressor
is fully relocatable.  For the main kernel the page tables are
modified so the kernel remains at the same virtual address.  In
addition a variable phys_base is kept that holds the physical address
the kernel is loaded at.  __pa_symbol is modified to add that when
we take the address of a kernel symbol.

When loaded with a normal bootloader the decompressor will decompress
the kernel to 2M and it will run there.  This both ensures the
relocation code is always working, and makes it easier to use 2M
pages for the kernel and the cpu.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/Kconfig                     |   50 ++++
 arch/x86_64/boot/compressed/Makefile    |   12 -
 arch/x86_64/boot/compressed/head.S      |  322 +++++++++++++++++++++++---------
 arch/x86_64/boot/compressed/misc.c      |  251 +++++++++++++-----------
 arch/x86_64/boot/compressed/vmlinux.lds |   44 ++++
 arch/x86_64/boot/compressed/vmlinux.scr |    9 
 arch/x86_64/kernel/head.S               |  225 ++++++++++++----------
 arch/x86_64/kernel/suspend_asm.S        |    7 
 include/asm-x86_64/page.h               |    6 
 9 files changed, 596 insertions(+), 330 deletions(-)

diff -puN arch/x86_64/boot/compressed/head.S~x86_64-Relocatable-kernel-support arch/x86_64/boot/compressed/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/boot/compressed/head.S~x86_64-Relocatable-kernel-support	2007-03-07 01:31:35.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/boot/compressed/head.S	2007-03-07 01:31:35.000000000 +0530
@@ -26,116 +26,262 @@
 
 #include <linux/linkage.h>
 #include <asm/segment.h>
+#include <asm/pgtable.h>
 #include <asm/page.h>
+#include <asm/msr.h>
 
+.section ".text.head"
 	.code32
 	.globl startup_32
-	
+
 ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:16 am

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/setup.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN arch/x86_64/kernel/setup.c~x86_64-Modify-discover_ebda-to-use-virtual-addresses arch/x86_64/kernel/setup.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/setup.c~x86_64-Modify-discover_ebda-to-use-virtual-addresses	2007-03-07 01:28:51.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/setup.c	2007-03-07 01:28:51.000000000 +0530
@@ -205,10 +205,10 @@ static void discover_ebda(void)
 	 * there is a real-mode segmented pointer pointing to the 
 	 * 4K EBDA area at 0x40E
 	 */
-	ebda_addr = *(unsigned short *)EBDA_ADDR_POINTER;
+	ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
 	ebda_addr <<= 4;
 
-	ebda_size = *(unsigned short *)(unsigned long)ebda_addr;
+	ebda_size = *(unsigned short *)__va(ebda_addr);
 
 	/* Round EBDA up to pages */
 	if (ebda_size == 0)
_
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:20 am

o virt_to_page() call should be used on kernel linear addresses and not
  on kernel text and data addresses. Swsusp code uses it on kernel data
  (statically allocated swsusp_header).

o Allocate swsusp_header dynamically so that virt_to_page() can be used
  safely.

o I am changing this because in next few patches, __pa() on x86_64 will
  no longer support kernel text and data addresses and hibernation breaks. 

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 kernel/power/swap.c |   42 +++++++++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff -puN kernel/power/swap.c~swsusp-do-not-use-virt_to_page-on-kernel-data-addr kernel/power/swap.c
--- linux-2.6.21-rc2-reloc/kernel/power/swap.c~swsusp-do-not-use-virt_to_page-on-kernel-data-addr	2007-03-07 01:30:43.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/kernel/power/swap.c	2007-03-07 01:30:43.000000000 +0530
@@ -33,12 +33,14 @@ extern char resume_file[];
 
 #define SWSUSP_SIG	"S1SUSPEND"
 
-static struct swsusp_header {
+struct swsusp_header {
 	char reserved[PAGE_SIZE - 20 - sizeof(sector_t)];
 	sector_t image;
 	char	orig_sig[10];
 	char	sig[10];
-} __attribute__((packed, aligned(PAGE_SIZE))) swsusp_header;
+} __attribute__((packed));
+
+static struct swsusp_header *swsusp_header;
 
 /*
  * General things
@@ -141,14 +143,14 @@ static int mark_swapfiles(sector_t start
 {
 	int error;
 
-	bio_read_page(swsusp_resume_block, &swsusp_header, NULL);
-	if (!memcmp("SWAP-SPACE",swsusp_header.sig, 10) ||
-	    !memcmp("SWAPSPACE2",swsusp_header.sig, 10)) {
-		memcpy(swsusp_header.orig_sig,swsusp_header.sig, 10);
-		memcpy(swsusp_header.sig,SWSUSP_SIG, 10);
-		swsusp_header.image = start;
+	bio_read_page(swsusp_resume_block, swsusp_header, NULL);
+	if (!memcmp("SWAP-SPACE",swsusp_header->sig, 10) ||
+	    !memcmp("SWAPSPACE2",swsusp_header->sig, 10)) {
+		memcpy(swsusp_header->orig_sig,swsusp_header->sig, 10);
+		memcpy(swsusp_header->sig,SWSUSP_SIG, ...
From: Pavel Machek
Date: Wednesday, March 7, 2007 - 3:49 pm

I do not like the panic, but I guess it is okay as we are running
during boot? (Could you add a comment?) Otherwise ok.

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 10:17 pm

Hi Pavel,

Yes, it is an initcall and this memory page will be allocated during
boot time. Not very sure what comment to put there. To me it seems
pretty obivious with "core_initcall".

Thanks
Vivek
-

From: Pavel Machek
Date: Thursday, March 8, 2007 - 4:47 am

I'd put "/* running at boot time, so allocation can't fail */" there
(and maybe just replace panic with BUG_ON), but I guess that's not
important.

ACK.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-

From: Pavel Machek
Date: Wednesday, March 7, 2007 - 3:50 pm

From: Nigel Cunningham
Date: Wednesday, March 7, 2007 - 4:15 pm

Hi.


Absolutely.

Nigel

-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 10:04 pm

Yes. I have tested this and it works fine.

Thanks
Vivek
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:21 am

Currently __pa_symbol is for use with symbols in the kernel address
map and __pa is for use with pointers into the physical memory map.
But the code is implemented so you can usually interchange the two.

__pa which is much more common can be implemented much more cheaply
if it is it doesn't have to worry about any other kernel address
spaces.  This is especially true with a relocatable kernel as
__pa_symbol needs to peform an extra variable read to resolve
the address.

There is a third macro that is added for the vsyscall data
__pa_vsymbol for finding the physical addesses of vsyscall pages.

Most of this patch is simply sorting through the references to
__pa or __pa_symbol and using the proper one.  A little of
it is continuing to use a physical address when we have it
instead of recalculating it several times.

swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
and init_mm.pgd is initialized at boot (instead of compile time)
to the physmem virtual mapping of init_level4_pgd.  The
physical address changed.

Except for the for EMPTY_ZERO page all of the remaining references
to __pa_symbol appear to be during kernel initialization.  So this
should reduce the cost of __pa in the common case, even on a relocated
kernel.

As this is technically a semantic change we need to be on the lookout
for anything I missed.  But it works for me (tm).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/i386/kernel/alternative.c     |    8 ++++----
 arch/i386/mm/init.c                |   15 ++++++++-------
 arch/x86_64/kernel/machine_kexec.c |   14 +++++++-------
 arch/x86_64/kernel/setup.c         |    9 +++++----
 arch/x86_64/kernel/smp.c           |    2 +-
 arch/x86_64/kernel/vsyscall.c      |    9 +++++++--
 arch/x86_64/mm/init.c              |   21 +++++++++++----------
 arch/x86_64/mm/pageattr.c          |   16 ++++++++--------
 include/asm-x86_64/page.h          |    6 ++----
 ...
From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 12:09 am

This modifies the SMP trampoline and all of the associated code so
it can jump to a 64bit kernel loaded at an arbitrary address.

The dependencies on having an idenetity mapped page in the kernel
page tables for SMP bootup have all been removed.

In addition the trampoline has been modified to verify
that long mode is supported.  Asking if long mode is implemented is
down right silly but we have traditionally had some of these checks,
and they can't hurt anything.  So when the totally ludicrous happens
we just might handle it correctly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
---

 arch/x86_64/kernel/head.S       |    1 
 arch/x86_64/kernel/setup.c      |    9 --
 arch/x86_64/kernel/trampoline.S |  168 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 156 insertions(+), 22 deletions(-)

diff -puN arch/x86_64/kernel/head.S~x86_64-64bit-PIC-SMP-trampoline arch/x86_64/kernel/head.S
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/head.S~x86_64-64bit-PIC-SMP-trampoline	2007-03-07 01:25:32.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/head.S	2007-03-07 01:25:32.000000000 +0530
@@ -101,6 +101,7 @@ startup_32:
 	.org 0x100	
 	.globl startup_64
 startup_64:
+ENTRY(secondary_startup_64)
 	/* We come here either from startup_32
 	 * or directly from a 64bit bootloader.
 	 * Since we may have come directly from a bootloader we
diff -puN arch/x86_64/kernel/setup.c~x86_64-64bit-PIC-SMP-trampoline arch/x86_64/kernel/setup.c
--- linux-2.6.21-rc2-reloc/arch/x86_64/kernel/setup.c~x86_64-64bit-PIC-SMP-trampoline	2007-03-07 01:25:32.000000000 +0530
+++ linux-2.6.21-rc2-reloc-root/arch/x86_64/kernel/setup.c	2007-03-07 01:25:32.000000000 +0530
@@ -329,15 +329,8 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
 #ifdef CONFIG_SMP
-	/*
-	 * But first pinch a few for the stack/trampoline stuff
-	 * FIXME: Don't need the extra page at 4K, but need to fix
-	 * trampoline before removing it. (see ...
From: Arjan van de Ven
Date: Wednesday, March 7, 2007 - 8:07 am

have these patches been extensively tested with various suspend
scenarios? (S1,S3,S4 in acpi speak or s2ram and s2disk in Linux speak)

-

From: Eric W. Biederman
Date: Wednesday, March 7, 2007 - 12:08 pm

It should be noted what broke was the non-portable constructs in the generic
suspend code.

In particular using __pa() outside of architecture code is not allowed.
Using virt_to_phys() on addresses not part of the kernel's linear mapping is
not generically supported.

text/data are not required to be part of the kernel's linear mapping.

This patchset now causes all code using these non-portable constructs to
fail on x86_64.  Which I think is a good thing so we can more easily
spot these kinds of problems.

Patches 15 and 16 appear to make the swpsuspend code rely on portable
constructs.   I will let Vivek reply to the amount of testing he has
done in this area.

Eric
-

From: Nigel Cunningham
Date: Wednesday, March 7, 2007 - 1:49 pm

Hi.


We did work on this for RHEL5, getting relocatable kernel support
working fine with S4. While doing it and since, I've been running
Suspend2 with the same patch.

Since that work, Vivek has done more modifications, but I can confirm
that the basic design is reliable with S4. Haven't tried S3, but can do.
Will report back shortly.

Regards,

Nigel

-

From: Nigel Cunningham
Date: Wednesday, March 7, 2007 - 4:15 pm

Hi.


S3 works okay here with a relocatable x86_64 kernel (2.6.20).

Regards,

Nigel

-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 9:40 pm

Hi Nigel,

Is it possible to test S3 with 2.6.21-rc2 kernels also. Right now I don't 
have access to any machine supporting S3. I tested it at the time of my last
posting and it had worked well. Appreciate your help.

Thanks
Vivek
-

From: Nigel Cunningham
Date: Thursday, March 8, 2007 - 1:07 am

Hi.


Tested with rc3 (rc2 wouldn't compile), and it works fine.

If you're willing, please add

Signed-off-by: Nigel Cunningham <ncunning@redhat.com>

or

Acked-by: Nigel Cunningham <ncunning@redhat.com>

to the hibernation related parts as you see appropriate, since I helped
(albeit in a minor way compared to your work and Eric's work) with
preparing and testing them for RHEL5 and have confirmed they're still ok
in this version.

Regards,

Nigel

-

From: Vivek Goyal
Date: Thursday, March 8, 2007 - 1:27 am

Sure. You have helped a lot. I think either Andi or Andrew needs to
add "Acked-by:" string while adding the hibernation related patches
(Assuming they decide do pick up the patches.)

Thanks
Vivek
-

From: Vivek Goyal
Date: Thursday, March 8, 2007 - 12:48 am

Ok. Got hold of a system which supports Standby mode (S1) and it works fine
with 2.6.21-rc2 + relocatable patchset.

Thanks
Vivek
-

From: Vivek Goyal
Date: Wednesday, March 7, 2007 - 8:36 pm

Hi Arjan,

I have tested these patches for suspend to RAM and suspend to disk and they
work fine. In the past we had few issues with suspend to disk and now
these issues have been resolved in this patchset. 

Thanks
Vivek

-

From: Andi Kleen
Date: Wednesday, March 14, 2007 - 4:10 pm

I merged them all for now. Thanks.

-Andi
-

Previous thread: none

Next thread: [PATCH] Fix race between proc_get_inode() and remove_proc_entry() by Alexey Dobriyan on Wednesday, March 7, 2007 - 2:01 am. (1 message)