Re: kernel 2.6.37-rc2 breaks i915 graphics

Previous thread: Scheduler bug related to rq->skip_clock_update? by Bjoern B. Brandenburg on Saturday, November 20, 2010 - 9:22 pm. (23 messages)

Next thread: should list poisoning only kick in after selecting CONFIG_DEBUG_LIST? by Robert P. J. Day on Sunday, November 21, 2010 - 3:55 am. (1 message)
From: Chris Vine
Date: Sunday, November 21, 2010 - 3:23 am

Hi,

With kernel 2.6.37-rc2, i915 graphics usually fails on boot-up after
modesetting with my Lenovo S12 netbook which uses the Intel 945GME
Express Integrated Graphics Controller.  It displays up to the point at
which modesetting takes place and then usually goes blank.

There may be some kind of race at work here: first, sometimes (maybe 1
times in 4) graphics comes up correctly on a first cold boot, but I
have never managed to get it to come up on a warm reboot. Secondly,
graphics can be restored when I know (but cannot see) that boot-up has
concluded, simply by suspending the laptop and then resuming.  Resuming
the laptop after a suspend always brings up the graphics correctly.

I have not tested against 2.6.37-rc1, so I cannot say whether the
problem is there as well.

lspci -v gives this:

00:00.0 Host bridge: Intel Corporation Mobile 945GME Express Memory
Controller Hub (rev 03)
	Subsystem: Lenovo Device 386f
	Flags: bus master, fast devsel, latency 0
	Capabilities: [e0] Vendor Specific Information <?>
	Kernel driver in use: agpgart-intel
	Kernel modules: intel-agp

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GME
Express Integrated Graphics Controller (rev 03) (prog-if 00 [VGA
controller])
	Subsystem: Lenovo Device 3870
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at fc000000 (32-bit, non-prefetchable) [size=512K]
	I/O ports at 1800 [size=8]
	Memory at d0000000 (32-bit, prefetchable) [size=256M]
	Memory at fc100000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
	Capabilities: [d0] Power Management version 2
	Kernel driver in use: i915
	Kernel modules: i915

00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME,
943/940GML Express Integrated Graphics Controller (rev 03)
	Subsystem: Lenovo Device 3870
	Flags: bus master, fast devsel, latency 0
	Memory at fc080000 (32-bit, non-prefetchable) [size=512K]
	Capabilities: [d0] ...
From: Chris Wilson
Date: Sunday, November 21, 2010 - 3:30 am

Add drm.debug=0xe to your boot commandline and compare if there is any
difference between a successful cold boot, a broken cold boot and a warm
boot. Similarly, comparing the output of intel_reg_dumper after each
should yield a few clues as to what stage in the boot process we fail.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
--

From: Chris Vine
Date: Sunday, November 21, 2010 - 11:34 am

On Sun, 21 Nov 2010 10:30:43 +0000

First an additional datum point: 2.6.37-rc1 works normally, so the bug
is something introduced between 2.6.37-rc1 and 2.6.37-rc2 (to that
extent it may be related to this bug which reports a similar phenomenon:
http://lkml.org/lkml/2010/11/21/23 ).

Attached is the dmesg output from a successful cold boot with
2.6.37-rc2, an unsuccessful cold boot and an unsuccessful warm boot,
each with drm.debug=0xe.

I can't provide the output of intel_reg_dumper:  I can't compile the
latest intel-gpu-tools from git (missing declaration/definition of
I915_EXE_BLT).  Probably something in a relevant library is too old.

Chris
From: Boaz Harrosh
Date: Monday, November 22, 2010 - 1:36 am

Do you know / can do a "git bisect" good 2.6.37-rc1 bad 2.6.37-rc2
should not be that long. This will pinpoint the bug to a specific
patch.

Thanks
Boaz

--

From: Florian Mickler
Date: Friday, November 26, 2010 - 2:23 am

On Sun, 21 Nov 2010 18:34:23 +0000

I trimmed the failure boot logs to the stuff before suspending, in
order to only compare the boot sequence. 

Looks like only in failure case there are "pipe b
underrun" reported...
In success case, no "pipe b underrun" is reported. 

Also in success case connectors are probed much more often:

$ grep drm:drm_helper_probe_single_connector_modes dmesg-2.6.37-rc2.succeed  | wc -l
76
$ grep drm:drm_helper_probe_single_connector_modes dmesg-2.6.37-rc2.*.fail  | wc -l
32
$ grep drm:drm_helper_probe_single_connector_modes dmesg-2.6.37-rc2.cold.fail  | wc -l
16


I could not spot any other differences both failure modes have in common against the success case.

If Chris (Wilson) doesn't yet know what could be the bug a bisection 
--

From: Zdenek Kabelac
Date: Friday, November 26, 2010 - 6:34 am

Reminds me - I need to keep drm_kms_helper disabled for reliable
resume on T61:

https://bugzilla.kernel.org/show_bug.cgi?id=19052
https://bugzilla.redhat.com/show_bug.cgi?id=617809

Zdenek

--

Previous thread: Scheduler bug related to rq->skip_clock_update? by Bjoern B. Brandenburg on Saturday, November 20, 2010 - 9:22 pm. (23 messages)

Next thread: should list poisoning only kick in after selecting CONFIG_DEBUG_LIST? by Robert P. J. Day on Sunday, November 21, 2010 - 3:55 am. (1 message)