"This case is a good example to use the next time a stupid thread starts up about bug reports not being looked into. To me it seems clearly more a matter of the quality of the bug report."
I admire David and consider him one of the best kernel developers (on par with Ingo).
However, this example is NOT a good example that shows how kernel developers care for bugs. The bug reporter essentially did all the investigative work, and he/other developers just came up a solution based on the analysis.
Those who worked on bug fixing know that the most time consuming part is to analyze and probably reproduce the bug, not find a fix once the cause is known. If David thinks all kernel developers have to do is to let other people tell them after a thorough bisection and vast amount of time investment to find the root-cause of the bug, then no wonder why Andrew is upset about the situation.
Thumb down for you on this one, David.
I completely agree. My previous laptop used to have suspend to RAM working flawlessly until somewhere in the 2.6.12 process it got broken. It still worked sometimes, but then once in a while the laptop would never wake up. So I reported the bug and got told nobody would do anything until I found what rc broke it. So after months of tests (this was my main work machine and the bug could take a week to reproduce), I finally came back with the answer. I reported that it got broken between 2.6.12-rc5 and 2.6.12-rc6. I was then expecting a developer to at least have a closer look at what had changed and possibly suggest I try a "diagnostic patch" or something. No, I just got told: "that's not enough, we want you to look at the stuff in testing", so after another several months of testing, I got that done. I reported it got broken between 2.6.12-rc5-git5 and 2.6.12-rc5-git6. Cool, my work was finally going to pay off as there was only one day between the two. What was the answer? We won't have a look until you do a git-bisect to find the actual commit. Never mind the fact that I had never used git at that time, nobody was even willing to look at the code in case there was something obvious that would have saved me lots of testing. That's when I gave up. I just wasn't going to learn git and go through another month or two of testing just to have the bug being (most likely) eventually ignored anyway. From then on, I've decided I wasn't going to spend any time doing testing unless I had a developer proving (e.g. by doing actual work) me he's interested in solving the problem. (FYI, the bug is http://bugzilla.kernel.org/show_bug.cgi?id=6166)
It looks like you were given help. I can't imagine it being easy for someone to fix a bug that happens on hardware they don't own. I also don't see them asking you for a bisect, I only see lots of kind suggestions and lots of "thank you for your help on this." David did offer to look at the changes.
>between 2.6.12-rc5-git5 and 2.6.12-rc5-git6.
That's great. I'll do a diff and see the changes.
What do you have against the kernel developers? please go in to more detail.
Here's the relevant LKML thread about git-bisect:
What annoys me is that they're all happy to tell me test this and test that (even though each test is at least one week), but as far as I know, nobody has actually looked closely at the code in case there was something obvious. In general, it seems like most developers see it as "we're doing a favor to the guy who's reporting the guy", when really it's the other way around (which I definitely try to do with Speex). There's a lot of emphasis on "please test kernels and report bugs", but in practice, not everyone can report bugs the way things are. I mean, most Linux users don't even know how to compile a kernel. You think they're going to learn git-bisect (which BTW I now use and find really useful)?
Note that I haven't mentioned names because I'm not directing this at anyone in particular. I think what's broken is really the process.
I don't see why the kernel developers should bother, this should be filtered thru your distribution. Filing a bug report with the upstream kernel developers is okay and useful for tracking and coordinating with other distros, but since they had so little to go on you can't really blame them it's their time, they might have better things to do than to come up with a test case and identify the offending commit, that's not for you or me to judge. You have to communicate with them on their terms if you want them to fix anything, and with the knowledge you have that didn't really work out well. You say that you don't know if they even checked their code, well I'm quite certain most devs check their code many times daily and with no inclination to prove that to anyone. Don't know how to use git, well that's an end user problem and should be handled in an end user forum not bothering kernel devs nor distro maintainers. Ergo, first file a bugreport with your distro (always), jointly you decide if and what should be in an upstream bugreport. Together with your distro maintainers narrow down the scope of your bug as much as possible and take "simple" questions to end user forums. Feed finds to them and answer questions from them. Fix things at the "lowest level" (the best i could come up with, no offence intended to anyone). And this goes for just about all bugs. Please don't take this the wrong way it's great that you report bugs, many people don't and I'm quite sure that fixing this bug would help not only you but others as well, but with the expectations you have you'll get disappointed and possibly even resentment/abuse that your good intentions do not deserve.
It was a kernel issue. The regression I found was in the vanilla kernel version, between two development version two days apart. There's nothing the distro guys would have done (I'm using Ubuntu and they from my experience, they won't do much *even* when you pinpoint the bug). Basically, what you're describing is the "it's a privilege to have the developers look at your bugs" attitude, which I really dislike in some projects. And yes, it means I will no longer waste time tracking regressions in the kernel -- unless I see up-front a developer willing to spend some time to help. And I'm pretty sure I'm not the only one in this situation. The whole "we have thousands/millions of users/testers so we can do a really good job" only works when you actually do something about the bug reports. Hell, how many kernel users do you think even know how to actually compile a kernel? Oops, you just lost more than half of your testers when it comes to finding regressions. Now, if you require people to know how to use git-bisect in the state it was a year ago (I'm now a git user, but that's more recent), they you end up with your testers being pretty much the same as your developers.
Thank you for the extra information. It does look bad from that angle. Hopefully things will get better.
Shaohua is from Intel. I don't think it's that difficult for Intel to find a common PC with a common distribution to reproduce the bug.
It's more like Shaohua had good intention to help but was not the right person for this bug. Part of the problem. Most kernel developers aren't so interested in fixing bugs (especially when there is no one pointing a gun at their head screaming "it's your code!"), compared to, say, posting patches of glorious new feathers.
That is a much more reasonable stance than simply speading FUD around. Hopefully the bug is fixed now.
Also the poster mentions "months of testing" a few times. The bug was reported in 03/04/2006 and somehow managed to spend what sounds like 4 to 6 months from the description in the post to find that it was -git6 on 04/17/2006. A little over a month.
I normally never care to reply to things like this but when people start bashing other people who are nothing but helpful to you, I begin to take offense.
Last kernel I checked was 2.6.15 and it still had the bug. I doubt it's been fixed since it's closed now.
You'll notice that "a little over a month" is only what it took to go from "between rc5 and rc6" to "between rc5-git5 and rc5-git6". But before I even filed the bug report, I had to find out it was a kernel regression (not scripts that changed) and track it down to [rc5,rc6]. As I'm mentioned before, it's particularly annoying because it was my main work machine, so I had to close everything everytime I tested (bug only happened 10% of the time and only after a few days) or risk losing some stuff. Overall, I spent 6 months on this. I wasn't always testing things, but was still running a kernel that didn't suspend properly.
First, I did not bash any *person* (those who did something are already better than those who didn't do anything) and did not name names. I am complaining about the bug reporting/fixing process. So now it's my fault it didn't get solved? What should I have done? Start learning git so I can bisect, then maybe do more tests, and then more, while praying for some developer to have a look? But anyway, I've learned now. I'm no longer reporting anything unless it's really annoying to me and if I report it, I'm not spending any serious amount of time on testing unless I find someone who's really willing to fix the problem (not just tell me things to test).
I find many projects have this sort of attitude where the developers are making you a favor by fixing the bugs you report (Ubuntu being one example). OTOH, once in a while I run into great developers/projects that really pay attention. For those, I make sure to report any problem I find and I know it's worth doing work myself because it's not going to be wasted. That's also what I'm trying to achieve when writing Speex (though I guess you'd have to ask my users whether I'm doing a good job).
> So now it's my fault it didn't get solved? What should I have done? Start learning git so I can
> bisect, then maybe do more tests, and then more, while praying for some developer to have a look?
You nailed it. Actually, since you already develope in C, you could have been able to fix the entire bug with some effort.
Joking aside. I have read the thread, and I see the discussion is entirely reasonable on both sides. Based on my reading of the situation the problem with this bug is, you do not really have a case that shows when it triggers and when it does not. A developer tends to only get really committed after he hears "it works with version x" and "it doesn't work with version x+1", and "I can trigger it every time with this testcase I developed". That identifies a testable regression and is always worth looking at.
The largest single problem with kernel development is that users aren't able to produce testcases, which means they can't actually fix the bugs because they have no metric to determine the fix status apart from asking the original bug reporter, and this is a lossy mode of communication with long roundtrip time. And with the blinding speed of development, who knows when that bug will inadvertently be reintroduced. So that is why good test cases tend to dictate which bugs get (and stay) fixed.
In this case, you yourself say that someone with identical laptop crashes with a setup that you do not seem to be able to crash, but since the bug is rare to occur you could just be lucky. If you do not have certain way to trigger the bug, you don't really have a bug. It's all too easy to be chasing incorrect commits because luck sometimes lasts longer than you expected. It's up to you to show where the bug is, and if your own report of bug conflicts with some other's report, then the issue is either related or not. If it's related, you haven't found your bug yet.
On to philosophy.
As a developer strictly for my own needs, I take offence at the position that I'm supposed to serve my users. Software is the king. Users can tack along if they pull their own weight, which means they are not an inconvenience to me. Factors such as nature of the software or money involved does affect this calculation, and sometimes I may feel responsible for a bug I have caused or unearthed, and go to extra effort to fix things for everybody.
For some software such as speech codecs, it's important to have wide user base, so I understand that serving users well makes economic sense; they determine the software's quality for its stated purpose. What I mean is that such software has no "specification" per se -- it only has to serve its users well, through ad-hoc criteria based on the input users throw at it. I believe Linux Kernel has portions that fall into this basket, such as the hardware support -- it's inherently "unspecified" because of the complexities involved in hardware production and design.
Other software only should be the good/best implementation of a technical standard or algorithm, and users who do not understand this are merely annoying and totally miss the point. This more characterizes the sort of thing I sometimes do. I am not a kernel contributor.
Actually, the only person I knew who had that same laptop model had exactly the same problem. As for being lucky, it's clearly not the case. I was clearly able to pinpoint where things broke withing a day. Going back and forth across versions would always have the expected behaviour (git5 worked, git6 didn't). Also, I'm not saying finding the bug would have been easy, but as far as I know, nobody was willing to seriously look into it and I was just wasting my time testing for nobody.
As a developer strictly for my own needs, I take offence at the position that I'm supposed to serve my users. Software is the king.
I actually don't see that directly as "serving the users". When one of my users finds a bug and reports it, I consider that *he* is doing me a favor by helping me improve my code (and fixing the bug is a way for me to give an incentive for people to report bugs). If nobody had bothered reporting bugs, Speex would really suck today.
For some software such as speech codecs, it's important to have wide user base, so I understand that serving users well makes economic sense; they determine the software's quality for its stated purpose. What I mean is that such software has no "specification" per se -- it only has to serve its users well, through ad-hoc criteria based on the input users throw at it.
I'm not sure I understand the point here. All OSS projects benefit from a wide user base. As for specs, codecs are actually more restricted because there's a lot of things you cannot change. I don't think my users would be very happy if a new release were to make the bit-stream incompatible!