Re: [PATCH 3/8] xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.

Previous thread: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. by Konrad Rzeszutek Wilk on Thursday, December 30, 2010 - 12:48 pm. (3 messages)

Next thread: [PATCH 8/8] xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M. by Konrad Rzeszutek Wilk on Thursday, December 30, 2010 - 12:48 pm. (1 message)
From: Konrad Rzeszutek Wilk
Date: Thursday, December 30, 2010 - 12:48 pm

We walk the E820 region and start at 0 (for PV guests we start
at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
regions and as well the gaps we set them to be identity mappings.

The reasons we do not want to set the identity mapping from 0->
ISA_END_ADDRESS when running as PV is b/c that the kernel would
try to read DMI information and fail (no permissions to read that).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/xen/setup.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..643b3fc 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,6 +143,38 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
 	return released;
 }
 
+static unsigned long __init xen_set_identity(const struct e820map *e820)
+{
+	phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+	int i;
+	unsigned long identity = 0;
+	unsigned long pfn;
+
+	for (i = 0; i < e820->nr_map; i++) {
+		phys_addr_t start = e820->map[i].addr;
+		phys_addr_t end = start + e820->map[i].size;
+
+		if (end < start)
+			continue;
+
+		/* Skip over the 1MB region. */
+		if (last > end)
+			continue;
+
+		if (e820->map[i].type == E820_RAM) {
+			/* Without saving 'last' we would gooble RAM too. */
+			last = end;
+			continue;
+		}
+
+		for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
+			__set_phys_to_machine(pfn, pfn);
+		identity += pfn - PFN_UP(last);
+
+		last = end;
+	}
+	return identity;
+}
 /**
  * machine_specific_memory_setup - Hook for machine specific memory setup.
  **/
@@ -156,6 +188,7 @@ char * __init xen_memory_setup(void)
 	struct xen_memory_map memmap;
 	unsigned long extra_pages = 0;
 	unsigned long extra_limit;
+	unsigned long identity_pages = 0;
 	int i;
 	int op;
 
@@ -251,6 +284,12 @@ char * __init xen_memory_setup(void)
 
 	xen_add_extra_mem(extra_pages);
 ...
From: Ian Campbell
Date: Tuesday, January 4, 2011 - 10:18 am

I was trying to figure out what any of this had to do with HVM guests,
but you mean as opposed to dom0, which with my pedant hat on is also a

The reason for this special case is that in domU we have already punched
a hole from 640k-1M into the e820 which the hypervisor gave us.

Should we perhaps be doing this identity mapping before we punch that
extra hole? i.e. setup ID mappings based on the hypervisors idea of the
guest e820 not the munged one we subsequently magicked up? Only the
original e820 is going to bear any possible resemblance to the identity
pages which the guest can actually see.

Ian.


--

From: Konrad Rzeszutek Wilk
Date: Tuesday, January 4, 2011 - 11:38 am

For the privileged guest - yes. But for the non-priviligied it does not have

You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved?

It sure would be easier (and it would mean we can return that memory
--

From: Ian Campbell
Date: Tuesday, January 4, 2011 - 12:27 pm

xen_memory_setup has: 
        e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
                        E820_RESERVED);
which is unconditional but is actually more for domU's benefit than
dom0's which already sees the host e820 presumably with the right hole
already in place, which we simply shadow, or maybe slightly extend,
here.

In a domU we do this because if you let these pages into the general
allocation pool then they will potentially get used as page table pages
(hence be R/O) but e.g. the DMI code tries to map them to probe for
signatures and tries to does so R/W which fails. We could try and find
everywhere in the kernel which does this or we can simply reserve the
region which stops it getting used for page tables or other special


I don't think you can return it, since something like the DMI code which
wants to probe it expects to be able to map that PFN, if you've given
the MFN back then that will fail.

I suppose we could alias all such PFNs to the same scratch MFN but I'd
be concerned about some piece of code which expects to interact with
firmware scribbling over it and surprising some other piece of code
which interacts with the firmware...

Ian.

--

From: Konrad Rzeszutek Wilk
Date: Tuesday, January 4, 2011 - 2:28 pm

Actually we don't do anything with that region in Dom0 case. We just
return the PFN without consulting the P2M for 0->0x100 while for DomU _we_

Yeah, went that hole once.. too many generic pieces of code.


It actually works. I setup 0x1->0x100 to point to whatever the MFN was at
0x0, and released the pages from 0x1->0x100 and it worked for DomU PV guests 
(and dom0 since I ended up stomping those regions with the PFN|
IDENTITY_BIT_FRAME).

However, the tools weren't happy ('xm save'). They did not like the same PFN 
across a couple of entries in the P2M table and complained about a potential 
race. But there is another way and that is to special case in 'xen_make_pte' 
when we want to create a PTE for 0->ISA_END_ADDRESS and just give it the MFN 
weird corner cases. Thought the code that is there is already special casing 

Fortunatly the all look for a some signature first before trying to scribble.
--

Previous thread: [PATCH 1/8] xen: Mark all initial reserved pages for the balloon as INVALID_P2M_ENTRY. by Konrad Rzeszutek Wilk on Thursday, December 30, 2010 - 12:48 pm. (3 messages)

Next thread: [PATCH 8/8] xen/mmu: Set _PAGE_IOMAP if PFN is in identity P2M. by Konrad Rzeszutek Wilk on Thursday, December 30, 2010 - 12:48 pm. (1 message)