"The following three patches are intended to start the redesign of the suspend and hibernation framework for devices," began Rafael Wysocki. He noted that the first patch introduces new callbacks for suspend and hibernation, while the other two patches implement the new suspend and hibernation callbacks for the platform and PCI bus types. In describing the first patch in the series, he noted that previous callbacks were being phased out, explaining:
"The main purpose of doing this is to separate suspend (aka S2RAM and standby) callbacks from hibernation callbacks in such a way that the new callbacks won't take arguments and the semantics of each of them will be clearly specified. This has been requested for multiple times by many people, including Linus himself, and the reason is that within the current scheme if ->resume() is called, for example, it's difficult to say why it's been called (ie. is it a resume from RAM or from hibernation or a suspend/hibernation failure etc.?).
"The second purpose is to make the suspend/hibernation callbacks more flexible so that device drivers can handle more than they can within the current scheme. For example, some drivers may need to prevent new children of the device from being registered before their ->suspend() callbacks are executed or they may want to carry out some operations requiring the availability of some other devices, not directly bound via the parent-child relationship, in order to prepare for the execution of ->suspend(), etc."
"Ok, it's out there, ready for your enjoyment," Linus Torvalds said, announcing the 2.6.25-rc3 kernel. He summarized the changes:
"As usual, most of the updates are in architecture and drivers, with the dirstat showing about 37% in arch (and that's with rename detection: there's some file movement in arch/xtensa that would bring it up to 43% if you looked at it as a traditional diff) and almost 50% in drivers. Much of the include file stuff is also architecture-related updates. The driver updates are mostly fairly spread out, but some of it comes from a couple of new drivers: the mvsas SCSI driver, a new adt7473 driver, and a couple of new watchdog drivers."
Linus continued, "if you ignore the architecture-specific stuff and drivers, the rest is mostly in networking, some Documentation updates, and a few filesystem updates (mainly efs and xfs). Anyway, the upshot of it all? Quite frankly, it's all over the place. The changes in -rc3 are bigger than -rc2, probably mostly because we had some more time (-rc2 was a couple of days early because of the long weekend in the US), but hopefully also because people have started to find regressions." Among the bug fixes, he highlighted, "we had a nasty SLUB corruption issue in -rc2 that is fixed (not that very many people probably saw it), and we've hopfully fixed a number of regressions in networking and suspend/resume."
Issues reported during the suspend-to-disk process lead Linux creator Linus Torvalds to suggest, "please - just make the f*cking suspend-to-disk use other routines already. 99% of all hardware needs to do exactly *nothing* on suspend-to-disk, and the ones that really do need things tend to need to not do a whole lot." He went on to explain why sharing the code path for suspend-to-disk and freezing to RAM is wrong:
"For example, the 'freeze' action for USB (which is one of the hardest things to suspend) should literally be something like just setting the controller STOP bit, and waiting for it to have stopped. The 'unfreeze' should be to just clear the stop bit, while the 'restart' should be just a controller reset to use the current memory image. NONE OF THIS HAS ABSOLUTELY ANYTHING TO DO WITH SUSPEND. It never did. I've told people so for years. Maybe actually seeing the problems will make people realize."
Linus also noted another advantage to having separate code paths for the two actions, "the other issue is that I've long wanted to make sure that when people fix suspend-to-ram, they don't screw up suspend-to-disk by mistake and vice versa." During the discussion, Rafael Wysocki noted that he would be fixing this up presently, "I'm already convinced, really. :-)"
"Sorry, but I've had it with this stuff and I'm tired of fixing everyone else's stuff. I'm just going to ship it. Good luck."
"Current Linux versions can enter suspend-to-RAM just fine, but only can do it on explicit request. But suspend-to-RAM is important, eating something like 10% of [the] power needed [compared to an] idle system. Starting suspend manually is not too convenient," began Pavel Machek, describing an idea he referred to as Sleepy Linux. He continued, "[starting suspend manually] is not an option on multiuser machines, and even on single user machines some things are not easy: 1) Download this big chunk in Mozilla, then go to sleep; 2) Compile this, then go to sleep; 3) You can sleep now, but wake me up in 8:30 with mp3 player". Pavel provided a simple not-fully-functional patch, then described his proposed solution:
"Today's hardware is mostly capable of doing better: with correctly set up wakeups, machine can sleep and successfully pretend it is not sleeping -- by waking up whenever something interesting happens. Of course, it is easier on machines not connected to the network, and on notebook computers."
"Perhaps it will also help with whatever effort I find time to make towards convincing Andrew that [TuxOnIce] really does have significant advantages over [u]swsusp and kexec based hibernation."
"The following patches add a new testing facility for suspend and hibernation," noted Rafael J. Wysocki. He continued, "the first patch adds the possibility to test the suspend (STD) core code without actually suspending, which is useful for tracking problems with drivers etc. The second one modifies the hibernation core so that it can use the same facility (it's a bit more powerful than the existing hibernation test modes, since they really can't test the ACPI global methods)."
The testing facility introduces a new
/sys/power/pm_test_level attribute, accepting a number from 1 to 5, with each value simulating a different level of the suspend or hibernation code. Rafael explained: "5 - test the freezing of processes; 4 - test the freezing of processes and suspending of devices; 3 - test the freezing of processes, suspending of devices and platform global control methods; 2 - test the freezing of processes, suspending of devices, platform global control methods and the disabling of nonboot CPUs; 1 - test the freezing of processes, suspending of devices, platform global control methods, the disabling of nonboot CPUs and suspending of platform/system devices". He added, "if a suspend is started by normal means, the suspend core will perform its normal operations up to the point indicated by the test level. Next, it will wait for 5 seconds and carry out the resume operations needed to transition the system back to the fully functional state." Rafael noted that setting pm_test_level to 0 disables the testing facility.
"Ok, I think I'm getting close to releasing a real 2.6.23," began Linus Torvalds in his release announcement for the eighth release candidate of the upcoming 2.6.23 kernel. "Things seem to have calmed down, and I think Thomas Gleixner may have found the suspend/resume regression that has dogged us for a while, so I'm feeling happy about things." Linus continued:
"Of course, me feeling happy is usually immediately followed by some nasty person finding new problems, but I'll just ignore that and enjoy the feeling anyway, however fleeting it may be.
"The shortlog really is pretty short, and I'm appending the diffstat at the end too in case anybody cares, but basically it's just a number of fairly small but real fixes, with some support for a few new chips to the sky2 network driver.."
"It took me quite a while to realize the real root cause of the VAIO - and probably many other machines - suspend/resume regressions, which were unearthed by the dyntick / clockevents patches," Thomas Gleixner explained regarding two patches for fixing suspend issues that Andrew Morton experienced with his VAIO laptop. He continued, "we disable a lot of ACPI/BIOS functionality during suspend, but we keep the lower idle C-states functionality active across suspend/resume. It seems that this causes trouble with certain BIOSes, but I assume that the problem is more wide spread and just not surfacing due to the various scenarios in which a machine goes into suspend/resume." Thomas concluded, "I really hope that this two patches finally set an end to the 'jinxed VAIO heisenbug series', which started when we removed the periodic tick with the clockevents/dyntick patches."
Linus Torvalds expressed some concerns, "the patches look fine, but I somehow have this slight feeling that you gave up a bit too soon on the '*why* does this happen?' question." He agreed that at that point there was a problem with ACPI, but cautioned that this could be triggered by another bug, "in particular, I also suspect that this may not really fix the problem - maybe it just makes the window sufficiently small that it no longer triggers. Because we don't necessarily understand what the real background for the problem is, I'm not sure we can say that it is solved." Linus concluded, "but hey, I think I'll apply the patches as-is. I'd just feel even better if we actually understood *why* doing the CPU Cx states is not something we can do around the suspend code!"
Rafael Wysocki posted a lengthy status report "describing the current state of development of the suspend and hibernation infrastructure: how it works, what known problems there are in it and what the future development plans are". Regarding future plans, Rafael noted, "the part of the suspend and hibernation code that should be taken care of first is the handling of devices. Namely, I think that we should first separate the hibernation-related handling of devices from the suspend-related handling of them in order to overcome limitations mentioned in Section IX. This also will be necessary if we want to try some new approaches to hibernation, such as the kexec-based one recently discussed on the LKML." He added, "the next thing that seems reasonable to do is to eliminate the freezing of tasks, described in Section VI, from the suspend and resume code, since the limitations related to it are regarded by many people as too restrictive." He went on to note:
"There also is the alternative hibernation framework TuxOnIce maintained by Nigel Cunningham, which is more feature-rich than the current in-kernel hibernation code. It therefore seems reasonable to incorporate at least some of the more advanced TuxOnIce features into the in-kernel code. I believe that by combining TuxOnIce with the current in-kernel hibernation implementation we can obtain a relatively simple, but powerful and solid hibernation framework, so I am going to work in this direction".
"Since many alternative approaches to hibernation are now being considered and discussed," Rafael Wysocki began on the lkml, "I thought it might be a good idea to list some things that in my not so humble opinion should be taken care of by any hibernation framework. They are listed below, not in any particular order, because I think they all are important." He continued with the following list, including a paragraph discussing each: "filesystems mounted before the hibernation are untouchable; swap space in use before the hibernation must be handled with care; there are memory regions that must not be saved or restored; the user should be able to limit the size of a hibernation image; hibernation should be transparent from the applications' point of view; state of devices from before hibernation should be restored, if possible; on ACPI systems special platform-related actions have to be carried out at the right points, so that the platform works correctly after the restore; hibernation and restore should not be too slow; hibernation framework should not be too difficult to set up." Rafael went on to summarize:
"In my opinion any hibernation framework that doesn't take the above requirements into account in any way will be a failure. Moreover, the existing frameworks fail to follow some of them too, so I consider all of these frameworks as a work in progress. For this reason, I will much more appreciate ideas allowing us to improve the existing frameworks in a more or less evolutionary way, then attempts to replace them all with something entirely new."
Offering a potential alternative to the existing suspend and restore implementations in the Linux Kernel, Ying Huang posted a patch utilizing kexec, "kexec based hibernation has some potential advantages over uswsusp and suspend2. " He listed two such potential advantages, "the hibernation image size can exceed half of memory size easily," and, "the hibernation image can be written to and read from almost anywhere, such as a USB disk [or] NFS." He described the feature implemented by his patch as "jumping from a kexeced kernel to the original kernel", allowing someone to first boot from one kernel, then to kexec another crashdump kernel in reserved memory and run from it for a while, and finally to "jump back" to the original kernel.
Andrew Morton replied to the idea very positively, "this sounds awesome. Am I correct in expecting that ultimately the existing hibernation implementation just goes away and we reuse (and hence strengthen) the existing kexec (and kdump?) infrastructure? And that we get hibernation support almost for free on all kexec (and relocatable-kernel?) capable architectures? And that all the management of hibernation and resume happens in userspace?" He went on to ask, "how close do you think all this is to being a viable thing?" Ying replied, "the kexec jump is the first step, maybe the simplest step. There are many other issues to be resolved, at least the following ones," going on to list a series of steps that still have to be implemented before kexec based hibernation would be a viable option.
The 'Suspend2' project [story] has been renamed to 'TuxOnIce' Nigel Cunningham announced on the lkml, "this is for a couple of reasons: In recent discussions on LKML, the point was made that the word 'Suspend' is confusing. It is used to refer to both suspending to disk and suspending to ram. Life will be simpler if we more clearly differentiate the two. The name Suspend2 came about a couple of years ago when we made the 2.0 release and started self-hosting. If we ever get to a 3.0 release, the name could become even more confusing! (And there are already problems with people confusing the name with swsusp and talking about uswsusp as version 3!)."
Some concern was expressed that this move indicated that the rewritten suspend infrastructure was not being targetted for core inclusion. Nigel explained that this was not the case, "as far as I know, the desire on Rafael's part and mine is still to get at least some of the functionality merged. The only reason this isn't happening yet is that we're both busy with other aspects of the work. Rafael is focussing on infrastructure issues, I'm focussing on minimising the diff and finishing off cleaning up and adding comments to functions." Rafael Wysocki agreed, "my first target is to introduce a framework allowing drivers to have different callbacks for hibernation and suspend. Unfortunately, that turned out to require quite a lot of work - and time - to do."
What started as the review of a bug report grew into an interesting debate as Linus Torvalds slammed the current suspend and resume [story] design in the Linux Kernel, "why the HELL cannot you realize that kernel threads are different? The right thing to do is AND HAS ALWAYS BEEN, to stop and start user threads only around the whole thing. Don't touch those kernel threads. Stop freezing them." Later in the discussion, Linus noted that he had no interest in Suspend to Disk (STD), and was only interested in a working Suspend to Ram (STR) implementation. He noted that complexity introduced by STD was infecting the STR logic, and that the two should be completely separated, "what irritates me is that STR really shouldn't have _had_ that bug at all. The only reason STR had the same bug as STD was exactly the fact that the two features are too closely inter-twined in the kernel. That irritates me hugely. We had a bug we should never had had! We had a bug because people are sharing code that shouldn't be shared! We had a bug because of code that makes no sense in the first place!" Linus noted that he doesn't use laptops much, but still likes STR on his desktop, "STR means they are quiet and don't waste energy when I don't use them, but they're instantly available when I care." He then went on to point to design flaws in the freezer:
"I actually don't think that processes should be frozen really at all. I agree that filesystems have to be frozen (and I think that checkpointing of the filesystem or block device is 'too clever'), but I just don't think that has anything to do with freezing processes. So I'd actually much prefer to freeze at the VFS (and socket layers, etc), and make sure that anybody who tries to write or do something else that we cannot do until resuming, will just be blocked (or perhaps just buffered)!"