This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available (like on EFI Macs) then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: -Validate all MMCONFIG configurations provided, not just the first one. -Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. -Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. This might be overly pessimistic - we might be able to instead verify that no other reserved resources (like chipset registers) are inside this memory range. Some testing is needed to see if this rejects MMCONFIG on all systems where it is problematic. There were some patches floating around to read the table location out of the chipset for Intel 915 and 965, I think the author found the latter to be problematic since the chipset had the table mapped over top of motherboard resources. The extra checking here may catch ...
if (( --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -
I have a fundamental problem with that: you don't validate a higher reliability information against a lower one. The chipset registers are high reliability. Modulo unknown hardware erratas and bugs in the code (and accepting f0000000 is in practice a bug in the code, the docs are starting to catch up with it too), the chipset *will* decode mmconfig at the looked up address no matter what. On the other side, the ACPI data is bios generated, and that is well known to be horribly unreliable. Hell, if it was reliable we could just use the MFCG ACPI table without questions. So you can check the ACPI stuff for coherency (MFCG vs. the rest), you can validate the ACPI stuff against the results of the lookup if you want, but validating the lookup against ACPI is nonsensical. OG. -
The problem is that in the event the MMCONFIG table is assigned to an address range that conflicts with other devices, there's no guarantee that MMCONFIG will have the higher decode priority. Apparently this is exactly the case on some boards, the MMCONFIG is mapped on top of chipset registers, and when we think we're accessing the MMCONFIG table we're really scrambling random chipset registers and hosing things. So we can't just blindly trust the values as being usable even when they come directly from the chipset. As I mentioned, though, really what we want to verify in this case is that no other reserved resources fall inside the range being decoded by the chipset. I may see if I can code something to do this. Essentially, though, the existing patch effectively does this, on the assumption that the board won't have conflicting reserved resources in the ACPI tables, which is probably a safe assumption as Windows would likely be terribly unhappy with that.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
We need to look at the register, but we may not want to use it if it looks too confused. If it doesn't agree with what we see in ACPI, we likely have a problem due to the issues Robert outlined in his other mail. Jesse -
Now that I've read his patch closely I think you're right. Robert, it looks like you'll trust acpi_table_parse if pci_mmcfg_check_hostbridge returns a failure. I think it should be treated with a higher priority. If pci_mmcfg_check_hostbridge returns a failure, there's no way MCFG space can work, so we should disable it unconditionally in that case (even if ACPI says "trust me, when have I ever lied to you?"). I'm testing it now on my 965... Thanks, Jesse -
Bah... nevermind Robert, I see you're doing this already in pci_mmcfg_reject_broken. I'm about to reboot & test now. Jesse -
Ok, I've tested a bit on my 965 (after re-adding my old patch to support it) and the new checks are more complete, but my BIOS still appears to be buggy. The extended config space (as defined by the register) is at 0xf0000000 (full value is 0xf0000003 indicating 128M enabled). The ACPI MCFG table has this space reserved according to Robert's new code, but the machine hangs due to the address space aliasing Olivier mentioned awhile back. I don't have a PCIe card to test with (or any devices that require extended config space that I know of) so I can't really tell if Windows supports PCIe on this platform, but if it does I don't see how it would w/o having a full bridge driver and sophisticated address space allocation builtin. I'm going to try updating my BIOS, but if that doesn't solve this problem, I'm not sure what we can do about it. Should pci_mmcfg_insert_resources check for conflicts? Should we just blacklist certain boards? I can try pinging our BIOS folks about this board to see what was intended, but I'm sure this won't be the only board we have problems with, so we'll need to address it generically somehow. Thanks, Jesse -
Windows XP doesn't use MMCONFIG or any extended configuration space. I believe Vista is supposed to, though. Not sure how they are handling Can you post what your board has for PNPACPI reserved resources (I believe they're in /sys/devices/pnp0/*/resources IIRC, don't have a Linux box handy right now). Full dmesg would also be useful, I think it dumps out those reservations at boot nowadays.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
BIOS update didn't help. Here's the boot log and a dump of the pnp0 resources. Jesse
Curious.. It looks like the ACPI resources have the correct reservation for the MMCONFIG window according to what the register says the location should be. There's no other reservations that overlap with that range (f000000-f7ffffff), and according to the 965 datasheet there's nothing that's hard-coded to occupy that memory range. I can't really see what this range could be conflicting with. What happens if you take out the chipset register detection, does the MCFG table give you the same result? Wonder if they're doing something funny with start/end bus values or something in their table. There's some code in my patch that prints out the important data from the MCFG table, can you tell me what that shows with the chipset detection taken out? If that doesn't provide any useful information, I think we may need some assistance from Intel chipset/motherboard people to figure out what is going on here.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
Yeah, it's strange. Even /proc/iomem from a working boot looks ok: d0700000-d07fffff : PCI Bus #04 d0800000-d08fffff : PCI Bus #05 f0000000-f7ffffff : pnp 00:01 fec00000-fec00fff : IOAPIC 0 Yeah, I'll look a little more closely. It could also be that another register needs tweaking somewhere to actually get the bridge to decode I'm talking with them now, hopefully they'll shed some light on it. Thanks, Jesse -
I did a little more debugging this morning, and found that I can actually do reads from the space described by ACPI and the device register, but later when ACPI actually scans the root bridges, it hangs. Specifically the call to pci_acpi_scan_root in pci_root.c:acpi_pci_root_add() never seems to return. I'll walk through that logic when I get back to my test box, but it's also worth noting that Vista's MCFG on this machine apparently works ok too. Jesse -
I would try sticking some debug in arch/x86_64/pci/mmconfig.c at the beginning and end of pci_mmcfg_read and pci_mmcfg_write to print the seg, bus, devfn and reg for each read and write. Hopefully that will track down the one that is causing the lockup, if it is an actual MMCONFIG access that's doing it.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
I can't see how any MCFG based accesses could work on this box, but I
don't know why. According to the boot log (with our code patched in
but disabled after checking the ACPI reserved status), the space is fine:
...
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
pciexbar lo: 0xf0000003
pciexbar hi: 0x00000000
Enabled MCFG space at 0x00000000f0000000, size 134217728
PCI: Found Intel Corporation G965 Express Memory Controller Hub with MMCONFIG support.
PCI: MCFG configuration 0: base 00000000f0000000 segment 0 buses 0 - 127
PCI: MCFG area at f0000000 reserved in ACPI motherboard resources
PCI: Not using MMCONFIG. <-- due to the 'goto reject' after
if (is_acpi_reserved) { ... }
PM: Adding info for acpi:acpi_system:00
PM: Adding info for acpi:button_power:00
...
Same thing happens if I disable the chipset specific code and just use
the ACPI stuff you added.
If I leave it enabled, several config cycles work fine, but the box
eventually hangs after probing 24 devices or so. I don't see anything
else mapped into this space, and the MTRRs seem ok, so either there's
something hidden in this memory range or there's another chipset register
that needs poking to fully enable this space properly.
Sysrq doesn't seem to work, and I don't see any events in my machine log,
so figuring out exactly why it's hanging is a bit difficult.
Any ideas on what to try next? I'll see if I can get some more details
from our BIOS folks and do yet another pass over the documentation to see
if there's something I'm missing.
Thanks,
Jesse
-
Can you find out which config access (bus, device, function, address) is the one that hangs the box? I assume that either the corresponding address in the MCFG table is problematic (i.e. has something else mapped over it), or maybe that device just doesn't like being probed with MCFG somehow. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
Yeah, I've got that data... just a sec while I make sure it's reproducable... Aha, I hadn't decoded the devfn before, looks like it's dying on an access to the graphics device (bus 0, slot 2, device 0): ... pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = 0xc000000c pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = <hang> ... Offset 0x18 into the graphics config space should be the graphics memory range address, and 0xc000000c is the correct value. But for some reason it hangs on the second access. It hangs here everytime. Jesse -
That register is in the config space BAR region, so it should be ok to write 0xffffffff to it and read it back to size the register. However, it's after writing the 0xffffffff to it and trying to read it back that the machine hangs. I didn't see any accesses to the command register to disable decoding (at least not via the mmconfig methods), so maybe that's broken during MCFG based probing? Jesse -
Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfffffff0 momentarily, which might cause some issues. I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
I'm a bit shocked that things work as well as they do without the I tried that, and it seems to get past probing the graphics device at least, but it hangs a bit later. It could be that the enable/disable I added wasn't correct though, I didn't check to see which one I should disable in the command word, which may be a problem (just disabled them both every probe). I'll try again with more precise enable/disable semantics. Jesse -
It'd be interesting to see at what access it ran into trouble next, at least if it's consistent. Could be that some device doesn't like having the decode disabled.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
I think it actually gets through the probing but hangs elsewhere, but I'll have to test again to be sure. Jesse -
There was a big discussion about this back in 2002, in which Linus wasn't overly enthused about disabling the decode during probing due to risk of causing problems with some devices: http://lkml.org/lkml/2002/12/19/145 In this particular case (64-bit BAR) we might be able to avoid the problem by changing the order in which we probe the two halves of the address, i.e. change the top half to 0xffffffff before messing with the bottom half and then change it back last. That way, we end up mapping it way to the top of 64-bit address space, which hopefully is less likely to conflict.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
Fixed it (finally). I don't think moving the 64 bit probing around would make a difference, since we'd restore its original value anyway before moving on to the 32 bit probe which is where I think the problem is. I think what's happening is the probe is writing 0xffffffff to the video device, which is in the GMCH, and without memory decoding disabled, it'll start claiming PCI config access cycles causing the problems I saw. So my code to disable I/O and memory decode was actually working but I had a bug in the re-enable path so all my devices were staying disabled. :) So here's the patch I used (along with your ACPI patch and my 965 MCFG enable patch of course). The probing code could probably use a bit more cleanup, but this patch limits itself to implementing PCI_COMMAND based I/O and memory space decode disabling during size probing. We might want to do this unconditionally if we're using mmconfig based configuration access, since I imagine other machines might end up having similar address space layouts that would cause problems. Linus, since you were the one concerned about breaking working setups, what do you think? Should we use this approach, or specifically quirk out cases where mmconfig space might conflict with BAR probing? Thanks, Jesse diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index e48fcf0..69dfe0c 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -170,6 +170,48 @@ static inline int is_64bit_memory(u32 mask) return 0; } +#define BAR_IS_MEMORY(bar) (((bar) & PCI_BASE_ADDRESS_SPACE) == \ + PCI_BASE_ADDRESS_SPACE_MEMORY) + +/** + * pci_bar_size - get raw PCI BAR size + * @dev: PCI device + * @reg: BAR to probe + * + * Use basic PCI probing: + * - save original BAR value + * - disable MEM or IO decode as appropriate in PCI_COMMAND reg + * - write all 1s to the BAR + * - read back value + * - reenble MEM or IO decode as necessary + * - write original value back + * + * ...
Well, the thing is, I'm pretty sure there is at least one northbridge that stops memory accesses from the CPU when you turn off the MEM bit on it. Oops, you just killed the machine. Looking at the 925X datasheet (which I happened to have around in my google search history because of the discussions of the sky2 DMA problems), it looks like at least that one just hardcodes the MEM bit to be 1, and thus writing to it is a total no-op. But I really think that clearing the MEM bit for at least the host bridge is conceptually quite wrong, even if it might turn out that all chipsets end up just saying (like Intel) "screw it, the user is insane, we're not going to actually do what he asks us to do". Do we really want to be that insane? Turn off memory accesses when probing the CPU host bridge? So at a _minimum_ I would say that that thing needs to be more careful So see above. I think at a minimum, we should consider the host bridge special. I also suspect that we'd be simply better off if we didn't use mmconfig at all unless we _have_ to. Why use mmconfig for the standard BAR accesses? Is there really any reason? I can understand using it for extended config space, since then the old-fashioned approach won't work. But for normal accesses? What's the point, really? mmconfig seems to be fundamentally designed to be impossible to bootstrap off, so there's no way you can have a machine that _only_ supports mmconfig. So why do people seem to think it's so wonderful? Please fill me in on this fundamental mystery. Quite frankly, if we just didn't use mmconfig, the whole issue would go away. Isn't _that_ the much better solution? Linus -
CS5520. But it doesn't have 64bit or PCI Express. Alan -
That patch does it for _all_ PCI probing. So it would turn any machine using that northbridge into a brick. Linus -
I'm not sure either, but the PCI spec is pretty clear about how probing ought to be done, and it seems that other OSes do the disabling (though I'm not sure about how they handle broken host bridges like the one you Yeah, it's mainly needed for extended config space and PCIe (lots of regular PCIe features are in the extended space and are assumed to be Not for systems with PCIe... and the platforms I've been having trouble with have PCIe slots, so I'd really like mmconfig to be used at least on machines with PCIe bridges. For other machines, it probably doesn't matter much. I don't know of any regular PCI devices offhand that really need extended config space. Jesse -
Umm. Why? Think about it. You ASKED it to stop forwarding memory. So who is lamer: the chip that does what it is told, or the software that tells it to do it? I'd vote for the software. Any programmer who expects the hardware to "just do what I mean, not what I say" is not a programmer, but a dreamer. Ehh. Even for PCIe, why not use the normal accesses for the first 256 bytes? Problem solved. Linus -
Well, because that's not actually very useful functionality, and likely Yeah, that's another option. Would just mean an additional conditional in the mmconfig code, I'll give it a try... Apparently Vista will move away from using type 1 config space accesses though, so if we keep using it, we'll probably run into some lame board that assumes you're using mmconfig at some point in the near future. But then again, we're often on that less tested path (e.g. with ACPI), so maybe that doesn't matter much. Jesse -
One of the reasons why hardware vendors want to move away from traditional accesses is to be able to use the larger config space in PCI-Express, rather than being locked into the 256-byte legacy PCI config space. Several modern PCI-Express devices utilize the upper config space, but due to legacy reasons the registers are usually ones that do not require OS drivers to know about (like BIST stuff or diagnostic registers). Expect that to change, as MS shakes out the bugs (or maybe we are doing their job for them?). Jeff -
Mostly for treacherous computing extensions where subsets of the config space can only be accessed by signed machines blessed by your favourite The longer it takes - the better. Alan -
Um, no, Mr. Paranoia, it's a standard part of the spec. Jeff -
I hate "trusted" platform garbage as much as the next guy (where "trusted" means the actual user can't trust it, just the seller), but I think there are legitimate uses of extended space as well, PCIe AER uses it iirc, so don't dismiss it on those grounds. :) Jesse -
Indeed. It's just a register space. Assuming one register space is "more evil" than another, simply because it is bigger, is.. well.. silly. Jeff -
On Wed, 23 May 2007 17:09:37 -0400 On some PCI-Express boards, if you don't clear the advanced error reporting registers on boot up, they will cause IRQ storm. The AER registers are above 256 boundary. In fact, the AER support in Linux should depend on MMCONFIG. -- Stephen Hemminger <shemminger@linux-foundation.org> -
I agree that a chip that doesn't do it isn't broken either, but the fact is, there is never any reason to disable MEM/IO on a host bridge. Doing so is senseless - it can never be a valid operation. So I duspute the "obviously correct" part. It's _not_ obviously correct at all. To get back to the MMIO example: even if you were to never shut off RAM, if you turn off just PCI MMIO on the northbridge, what is a mmconfig cycle supposed to do? It's not going to _work_ if you disable MEM accesses. So again, the only sane situation is: don't do it then! You claim that hardware shouldn't do it, but I don't think software is in any different situation at all! If it's insane to do, then software shouldn't do it. It's just insane to turn off the MEM bit. There's simply no valid reason to. And any PCI spec that says you should is *broken*, or written by somebody who really only meant to talk about normal PCI devices, not How are those boards going to set up mmconfig? The whole standard is broken, since there is no way to set it up. Trust the firmware? What a piece of crap! Linus -
Well theoretically for just sizing BARs, turning off the MEM bit should be fine, since your next accesses should only be to config space until the MEM bit is reenabled. But if RAM accesses really are disabled, then you'd better be sure all the code you need is already in cache, or you'll get into trouble. So yeah, I guess special handling for host bridges is needed, but that What do you mean? You set it up the normal way, by poking at config space to see what's there, then size the BARs (disabling mem and I/O accesses in PCI_COMMAND shouldn't affect config space cycles afaik). You just have to be careful to disable decoding for I/O and memory regions, especially if your mmconfig space overlaps with what the devices end up with in their BARs. Which is why my initial patch works ok (because fortunately the Intel host bridges hard code the mem decode bit to 1 too). Jesse -
HOW DO YOU GET TO THE CONFIG SPACE IN THE FIRST PLACE? The reason mmconfig is *BROKEN*CRAP* is that you cannot bootstrap it. There's no standard way to even figure out WHERE IT IS! So we depend on firmware tables that are known to be broken! That crap should be seen for the crap it is! Dammit, how hard can it be to just admit that mmconfig isn't that great? Linus -
Ah, yeah, that's platform specific, I thought you were confused about how the sizing worked. On x86, we either have to look at ACPI tables (yay) or use type 1 config accesses to get at the mmconfig base register (which is what the patches Olivier and I posted do). On ia64 there are firmware calls to do config space accesses. Not sure about other platforms. I'm not claiming mmconfig is great and we should make everything use it, but we do need it these days, so we should figure out a good way of getting at it. Jesse -
From: Linus Torvalds <torvalds@linux-foundation.org> I knew mmconfig was broken conceptually the first time I started seeing write posting "bug fixes" for it that would do a read back from PCI config space via mmconfig to post the write, which of course has potential side-effects on the device and is absolutely illegal if the write just performed put the device into a PM state or whatever. Truth is stranger than fiction at times. MMCONFIG is very much an ill-conceived idea. -
I've actually seen that specific form of posted write flushing cause crashes on some machines, so yes, it sucks. Unfortunately, I don't think we have any other way of getting at extended config space on x86, unless EFI provides methods or something, but I'm not sure that would be an improvement... Jesse -
That "fix" shouldn't be needed at all, the MMCONFIG memory range shouldn't be covered by PCI ordering rules, so there should be no such thing as write posting. I suspect that the author of such patch(es) was doing so out of some misguided sense that it was needed. (And if there is some chipset where it is actually needed, better just disable MMCONFIG on that one, as there's no way to use it sanely.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
PCI allows write posting, and on systems using mmconfig the posting generally (unfortunately) extends to that space as well. So drivers need to deal with it somehow. But many get it wrong. Jesse -
Ok, this patch also works. We still need to enable mmconfig space for
PCIe and extended config space, but we can continue to use type 1
accesses for legacy PCI config space cycles to avoid decode trouble
with mmconfig based BAR sizing.
Assuming Robert's and my patches to enable mmconfig space go in, we'd
want a similar patch to the i386 mmconfig code.
Jesse
diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c
index 65d8273..5052f80 100644
--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned
int bus,
}
addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if (!addr || reg < 256) /* Use type 1 for non-extended access */
return pci_conf1_read(seg,bus,devfn,reg,len,value);
switch (len) {
@@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned
int bus,
return -EINVAL;
addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if (!addr || reg < 256) /* Use type 1 for non-extended access */
return pci_conf1_write(seg,bus,devfn,reg,len,value);
switch (len) {
-
Isn't that a mac-intel instant killer? AFAIK they don't have type1, period. OG. -
Yuck. I'll have to add a check for type 1 then... but that also means Macs will probably want the decode disable stuff I posted earlier. Jesse -
mac-intel are totally standard Intel chipsets. They have all of conf1/conf2/mmconfig afaik. I just happily booted my mac-mini with "pci=nommconf", nothing bad happened, and the kernel says PCI: Using configuration type 1 and I don't think you even _can_ disable conf1 type accesses: they are deep in the Intel chipsets. Of course, in a virtualized environment, anything can happen. Virtual machines prefer mmconf, because you can use page-level remapping to hide devices or make pseudo-devices show up by mapping in pages that have nothing to do with the true hardware. So no, I don't think Alan was totally smoking crack when he talked about "trusted" computing. Read the above paragraph a few times. (You can do it with trapping IO port accesses too, but it's going to cost you a lot, so if you want to make a fast but untrustoworthy setup, MMIO is the better option). Linus -
After I sent my last message I realized the same thing... though I occasionally hear people talk about removing it (I seriously doubt that will ever happen). I don't even think there's a way to disable type 1 config access on Intel chipsets... So the last patch is ok then, as long as we can find mmconfig space in the first place, but that's a separate problem for another set of patches (ones that seem to be working fairly well now btw). Jesse -
Considering that the chipsets still have support for features that *really* aren't used (and haven't been used in over a decade), I doubt the conf1 thing is going away any time soon. Things like: A20 gate, 15-16MB holes, i387 FP exception on irq 13 are totally pointless in this day and age. Things like the DMA controller are getting there, along with PS/2 keyboard support. So there's a lot of things that are likely to be removed before conf1 accesses would. Removing CONF1 accesses would break every single current OS, they'll do that ten years from now at the earliest. Linus -
So what do you think? You ok with enabling mmconfig if it's available as long as we use type 1 accesses for non-extended stuff? If so, I think the patches are pretty much ready... Thanks, Jesse -
Sure. I think mmconfig is perfectly sane if it falls back to conf1 accesses for legacy stuff.. And I also actually think that your patch to disable MMIO/PIO when testing the BAR size is fine - I just think that it should likely only be done for non-bridge devices (or at least non-host-bridge). Linus -
.. but without a regression, it's obviously a post-2.6.22 thing, I guess I should make that clear, just because I think people send me patches after -rc1 way too eagerly just because they think it fixes a bug. Basically if it's not somethign that has _ever_ worked some way, it's not a bug, it's a feature ;) Linus -
No, I know better than to send something after your merge window closes. I have no desire to be flamed even further on this topic. :) And come to think of it, adding the enable/disable bits might be good even with the patch to make legacy accesses go through type 1, since PCIe BAR probing is probably done the same way (I haven't looked) and so we might run into the same problems there. Thanks, Jesse -
I think that disabling decode on non-host-bridge devices during the BAR sizing is something we should at least try, indeed. The issue I have with forcing legacy config space accesses to type1 is that it would make it much less obvious if the MMCONFIG access wasn't working properly. You'd likely be able to boot up but then wonder why something that does extended config space accesses didn't work or hung the box. As I mentioned before, either we trust the MMCONFIG or we don't, and if we decide that we don't on a particular box, we should really be shutting it off entirely. Hopefully with the ACPI reservation checking patch and the disable-decode-during-BAR-sizing patch we wouldn't need to add that restriction. But yes, post-2.6.22 for all of this :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
The latest Intel chipset I have (ICH9) is legacy free: no serial port and no PS/2 ports. I had to disable the Linux PS2 input drivers completely, just to get the thing to boot. Whee, "progress". :) Jeff, digging out that USB debug cable cuz there's no serial -
Ahh, that would be a bug. Can you help trying to debug where it locks up? I'm also surprised, since on the mac mini I have, I already have: i8042.c: No controller found. and it all works beautifully. Of course, it only did that after the horrible crud that is "grub" got fixed, because the bootloader used to wait forever, but I thought the kernel itself was able to handle a missing PS/2 controller fine. Linus -
Which is retarded, since the command bits are only supposed to be for memory ranges that are part of the BARs, it's not supposed to completely kill the device function. Unless somehow the memory on that system is accessed through the PCI bus or something. Anyway, it's something we I think we should likely avoid disabling the command bits on host bridges (maybe any bridge) due to this risk of disabling something that will break things. Ideally we can get around this without doing any Why not? Either you trust that the MMCONFIG is working or you don't. If you trust it, you might as well use it for everything, and if you don't, you can't risk using it for anything. If there are problems that show up only with MMCONFIG, doing what you propose would simply cover them up Sure you can bootstrap off it, you just need to have some way to know I don't think that is going to be viable in the long run now that Windows Vista is out and MS is actually encouraging HW developers to allow using that config space.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
You couldn't just reorder the code the way it is now, you'd have to rearrange the way we do things for 64-bit BARs: -write FFFFFFFF to high part of 64-bit address (we end up moving the BAR to 0xFFFFFFFFC0000000 for example) -If any bits stick, we know what the size is now (more than 4GB of decode), so just change it back, we're done -If not, we need to check the low part, so write FFFFFFFF to low part of 64-bit address (BAR moves to 0xFFFFFFFFFFFFFFFF) -Check which bits stick and calculate the address -Change the low part of the address back (BAR moves to 0xFFFFFFFFC000000) -Change the high part of the address back (BAR moves to the original 0xC0000000 address) This means that at no point do we map the BAR anywhere near the top of 32-bit memory, so we should avoid this issue in this particular case. I don't think this strategy is too likely to break anything, surely less likely than disabling command bits. Jesse, you might want to try hacking up something like this and see what happens. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ -
Ah yeah, that would probably work in this particular case, but doesn't seem very general. I think just using type 1 accesses for non-extended config space is a bit more solid. Jesse -
Acked-by: Jesse Barnes <jesse.barnes@intel.com>
As long as we get a fix for the mmconfig based probing issues in the
other thread, I think this patch should go in. Robert, maybe you could
submit it along with this one (and an i386 equivalent)? Type 1 config
access is already a fallback for mmconfig for x86_64 at least, so it
should be safe for non-extended access too, and it avoids problems with
our lack of decode disable in the generic PCI probing code.
Signed-off-by: Jesse Barnes <jesse.barnes@intel.com>
diff --git a/arch/x86_64/pci/mmconfig.c b/arch/x86_64/pci/mmconfig.c
index 65d8273..5052f80 100644
--- a/arch/x86_64/pci/mmconfig.c
+++ b/arch/x86_64/pci/mmconfig.c
@@ -61,7 +61,7 @@ static int pci_mmcfg_read(unsigned int seg, unsigned
int bus,
}
addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if (!addr || reg < 256) /* Use type 1 for non-extended access */
return pci_conf1_read(seg,bus,devfn,reg,len,value);
switch (len) {
@@ -89,7 +89,7 @@ static int pci_mmcfg_write(unsigned int seg, unsigned
int bus,
return -EINVAL;
addr = pci_dev_base(seg, bus, devfn);
- if (!addr)
+ if (!addr || reg < 256) /* Use type 1 for non-extended access */
return pci_conf1_write(seg,bus,devfn,reg,len,value);
switch (len) {
-
