We walk the E820 region and start at 0 (for PV guests we start
at ISA_END_ADDRESS) and skip any E820 RAM regions. For all other
regions and as well the gaps we set them to be identity mappings.
The reasons we do not want to set the identity mapping from 0->
ISA_END_ADDRESS when running as PV is b/c that the kernel would
try to read DMI information and fail (no permissions to read that).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
arch/x86/xen/setup.c | 39 +++++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7201800..643b3fc 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -143,6 +143,38 @@ static unsigned long __init xen_return_unused_memory(unsigned long max_pfn,
return released;
}
+static unsigned long __init xen_set_identity(const struct e820map *e820)
+{
+ phys_addr_t last = xen_initial_domain() ? 0 : ISA_END_ADDRESS;
+ int i;
+ unsigned long identity = 0;
+ unsigned long pfn;
+
+ for (i = 0; i < e820->nr_map; i++) {
+ phys_addr_t start = e820->map[i].addr;
+ phys_addr_t end = start + e820->map[i].size;
+
+ if (end < start)
+ continue;
+
+ /* Skip over the 1MB region. */
+ if (last > end)
+ continue;
+
+ if (e820->map[i].type == E820_RAM) {
+ /* Without saving 'last' we would gooble RAM too. */
+ last = end;
+ continue;
+ }
+
+ for (pfn = PFN_UP(last); pfn < PFN_DOWN(end); pfn++)
+ __set_phys_to_machine(pfn, pfn);
+ identity += pfn - PFN_UP(last);
+
+ last = end;
+ }
+ return identity;
+}
/**
* machine_specific_memory_setup - Hook for machine specific memory setup.
**/
@@ -156,6 +188,7 @@ char * __init xen_memory_setup(void)
struct xen_memory_map memmap;
unsigned long extra_pages = 0;
unsigned long extra_limit;
+ unsigned long identity_pages = 0;
int i;
int op;
@@ -251,6 +284,12 @@ char * __init xen_memory_setup(void)
xen_add_extra_mem(extra_pages);
...I was trying to figure out what any of this had to do with HVM guests, but you mean as opposed to dom0, which with my pedant hat on is also a The reason for this special case is that in domU we have already punched a hole from 640k-1M into the e820 which the hypervisor gave us. Should we perhaps be doing this identity mapping before we punch that extra hole? i.e. setup ID mappings based on the hypervisors idea of the guest e820 not the munged one we subsequently magicked up? Only the original e820 is going to bear any possible resemblance to the identity pages which the guest can actually see. Ian. --
For the privileged guest - yes. But for the non-priviligied it does not have You mean the ISA_START_ADDRESS->ISA_END_ADDRESS we mark as reserved? It sure would be easier (and it would mean we can return that memory --
xen_memory_setup has:
e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
E820_RESERVED);
which is unconditional but is actually more for domU's benefit than
dom0's which already sees the host e820 presumably with the right hole
already in place, which we simply shadow, or maybe slightly extend,
here.
In a domU we do this because if you let these pages into the general
allocation pool then they will potentially get used as page table pages
(hence be R/O) but e.g. the DMI code tries to map them to probe for
signatures and tries to does so R/W which fails. We could try and find
everywhere in the kernel which does this or we can simply reserve the
region which stops it getting used for page tables or other special
I don't think you can return it, since something like the DMI code which
wants to probe it expects to be able to map that PFN, if you've given
the MFN back then that will fail.
I suppose we could alias all such PFNs to the same scratch MFN but I'd
be concerned about some piece of code which expects to interact with
firmware scribbling over it and surprising some other piece of code
which interacts with the firmware...
Ian.
--
Actually we don't do anything with that region in Dom0 case. We just
return the PFN without consulting the P2M for 0->0x100 while for DomU _we_
Yeah, went that hole once.. too many generic pieces of code.
It actually works. I setup 0x1->0x100 to point to whatever the MFN was at
0x0, and released the pages from 0x1->0x100 and it worked for DomU PV guests
(and dom0 since I ended up stomping those regions with the PFN|
IDENTITY_BIT_FRAME).
However, the tools weren't happy ('xm save'). They did not like the same PFN
across a couple of entries in the P2M table and complained about a potential
race. But there is another way and that is to special case in 'xen_make_pte'
when we want to create a PTE for 0->ISA_END_ADDRESS and just give it the MFN
weird corner cases. Thought the code that is there is already special casing
Fortunatly the all look for a some signature first before trying to scribble.
--
