Okay, and the kernel never suspends. We _are_ talking about a kind of No, it won't. That's the problem suspend blockers were meant to solve. The event winds up sitting in a kernel queue, the PM core doesn't know about it (that's what I meant above -- the PM core doesn't know as much It would be okay if that happened. But once the event gets into the kernel and the hardware IRQ source has turned off, there's nothing to Agreed. Badly behaved apps must not be allowed to block suspends. As far as I'm concerned, we can ignore them. Alan Stern --
On Thu, 27 May 2010 17:38:03 -0400 (EDT) No ? We are talking about just letting power management solve the whole No - because we are not forcing the suspend. The app must go idle. If you force the suspend of running processes then yes the entire thing goes Read the discussion about how the race is avoided at the hardware level. That race is I think not there and furthermore most drivers get it right already. There are several cases 1. IRQ during app layer (ie policy in user space) asking applications to go passive - The event occurs, we undo the app layer policy, easy (or app wakes process and we let it fall through) 2. IRQ after the app layer quiesces its clients - The task wakes, the app layer won't see it - the app layer allows suspend as an idle mode. Not a problem - the app is running the cpu policy manager will see this and not suspend until the app has been asleep for a bit. The app may well of course tell the UI layer 'hey I want you back on' and it take you back to the full on case. 3. IRQ after kernel suspend begins - The driver will refuse the suspend, we don't suspend, we unwind the resume so far, the app wakes, we propogate stuff back up to user space whose policy manager unwinds its position 4. IRQ after driver has done its final checks - Wake up lines are set - We suspend - We immediately get resumed - We follow the full resume path This is I believe robust (and has been implemented on some non x86 boxes). It depends on not forcing running tasks into suspend. That is the key. Alan --
We've already established that ACPI systems require us to force running tasks into suspend. How do we avoid the race in that situation? -- Matthew Garrett | mjg59@srcf.ucam.org --
On Thu, 27 May 2010 23:09:49 +0100 Android phones do not have ACPI. Embedded platforms do not have ACPI. MID x86 devices do not have ACPI. I would imagine the existing laptops will handle power management limited by the functionality they have available. Just like any other piece of hardware. Alan --
It doesn't matter. Right now there's a race condition in terms of wakeup events on ACPI systems. What's your proposal for fixing that? -- Matthew Garrett | mjg59@srcf.ucam.org --
On Thu, 27 May 2010 23:36:05 +0100 I see it as a different problem - and one that seems to be minimally pressing to most users jduging by the amount of noise it hasn't caused in the past seven odd years. This started because the Android people came to a meeting that was put together of various folks to try and sort of the big blockage in getting Android and Linux kernels back towards merging. I am interested right now in finding a general solution to the Android case and the fact it looks very similar to the VM, hard RT, gamer and other related problems although we seem to have diverged from that logic. I dont think it particularly useful to go off on a mostly unrelated wild goose chase into ACPI land, especially one based on a premise of changing all the apps when the hardware will end up fixed faster. Alan --
Keep in mind, though, that a solution which is acceptable for Android has to include making sure that crappy applications don't cause the battery to get drained. There seem to be some people who seem adamently against this requirement. From the Android folks' perspective, this is part of what is required to have an open app store, as opposed to one where each application has to be carefully screened and approved ala the Apple iPhone App Store. Maybe it would be acceptable if there were an easy way THAT A USER AND NOT A DEVELOPER COULD USE ON A SMART PHONE to find the bad application, but realistically, it's much better if the solution can work well even in the face of crappy application. Having interacted with application programmers, I can assure you there are a lot of crappy application programmers out there, and they vastly outnumber us kernel developers. (See as exhibit A all of the application programs who refuse to use fsync, even though it's going to wipe them out on all new modern file systems, including btrfs.) We need to agree on the requirements up front, because otherwise this is going to be a waste of everyone's time. And if we can't get agreement on requirements, I'd suggest appealing this whole thing to Linus. Either he'll agree to the requirements and/or the existing implementation, in which case we can move on with our lives, or he'll say no, in which case it will be blately obvious that it was Linux developer community who rejected the Android approach, despite a fairly large amount of effort trying to get something that satisfies *all* of the various LKML developers who have commented on this patch, and we can continue with Android having kernel which is different from mainline --- just as many other embedded companies have patches which are utterly required by their products, but which have been judged Too Ugly To Live In Mainline --- and we can also move on and get on with our lives. - Ted P.S. Keep in mind how this looks from an ...
Ted if you are speaking for Android do you think you should post from a The other vendors appear to be managing nicely without magic blockers. I The existing implementation has been comprehensively rejected by half the x86 maintainers and scheduler people to start with. That's a fairly big Ted save the politicing and blame mongering for management meetings please. If we don't have a solution it means that between us we couldn't find a viable solution. Maybe there isn't one, maybe we missed it. It's as much 'google rejects kernel approach' as 'kernel rejects google approach' and more importantly its actually 'we (cumulative) were not smart enough In some cases it is easier to do stuff yourself than work with others. One of the conditions of working in a public space is that you do so without harming others. This is why in much of the western world you can drive a car around your own land without a licence but must have one to drive on a public road. This is why a restuarant must meet different food standards to a home kitchen. This is why the kernel standards are higher than what you go off and do in private. Android is a very unique and extremely narrow environment. If it really is special enough to need its own kernel fork it isn't the first case for that and it's not a problem. The GPL is quite happy to encourage this. Time will then answer the questions because in 3 years time either every non Google phone will be kicking butt without suspend blockers, or every phone vendor using Linux with a traditional user space will be demanding them. Alan --
Actually, no. A badly behaved application will kill the N900's battery life. Nobody else has "managed nicely" - they've just made life harder for application developers and users, which may have something to do with the relative levels of market adoption of Maemo and Android. I'm not aware of any form of resource management framework in MeeGo either, so as far as I know it'll have exactly the same problem. -- Matthew Garrett | mjg59@srcf.ucam.org --
It's true that a braindead app can kill the battery. However we provide a version of powertop that is tailored to the N900, there is a nokia energy profiler meant to give graphical representation of the battery current, there is htop available and you can even get the processor activity visualized on the leftmost and rightmost keyboard backlight LEDS, when in RD mode and with screen blanked. I would advice you to not start debating on company strategies, this is not the right place. Otherwise I'll have to ask what's the expected threshold of devices sold with broken sw design to get automatic admission into the mainline kernel source tree. But this is not the direction we want to take. Notice also that we _do_ have a store and official repository where apps are monitored for sanity, also with feedback from users and their help to promote new apps to trusted state. Former Maemo 6, now MeeGo do introduce resource management from security POV, but that will also have the side effect of discriminating between signers. igor --
At a certain point, if one side of the argument is using "N900 / OMAP3 works just fine as is" (which has certainly been the case stated by a number of folks throughout these discussions), I think it's a little unrealistic to express shock that somebody argues the opposing point. I've personally avoided commenting on specific power management issues or properties of competitive platforms because it can easily be viewed as rather rude or unprofessional. (though in theory we all could benefit from any improvements to the kernel regarding power management, no?). I am quite willing to state that on both MSM and OMAP based Android platforms, we've found that the suspend blocker model allows us to obtain a lower average power draw than if we don't use it -- Mike Chan provided some numbers earlier in another thread in the trivial device idle case, the win is of course much larger in the case of several poorly behaved apps being active. I do think that everyone involved agrees that it is beneficial to educate users and developers in hopes that users will understand that some apps are non-optimal and developers will be encouraged to write better apps. I think we also all agree that striving to obtain the lowest power state at all times through cpu frequency scaling, runtime pm, drivers that aggressively clock/power down when idle, etc is a worthy goal. Some have argued that suspend blockers may deter further development in these areas, but I think this is unlikely -- power usage while the device is active and the user is interacting with it is just as critical as when it's not being used interactively. We (Android) certainly pursue aggressive low power optimization in both states. There appears to be some disagreement in terms of what one should do in the face of poorly behaved applications. The Android approach has been to both gather as much data as possible for education of user and developer and to mitigate the impact of poorly written apps on endusers, goals which are ...
The problem lies in the definition of the goal and means to achieve it. We do rely on repositories to discriminate on the quality of applications. As I stated some are accessible and run by our community. What I consider plain wrong i to claim that since there are this many units out, some code should be merged. A company needs to cut corners sometimes when making a product but this That's very good. But if it is done in a conceptually flawed way, some better solution should be considered for upstream merge. Sure. I simply disagree on the methods proposed (suspend_blockers) and some of the rationale used for promoting them (volume of otherwise unsupported units). igor --
I've never suggested that we should get a get-out-of-code-review-free card or be automatically merged based on shipping volume. Hell, I never thought we should even bother trying to merge wakelocks upstream, because I assumed that they'd be hated for not being the linux way (tm). Greg KH and others have spent a bunch of time shouting at me (or Google) that we should be doing this, and here we are giving it a go. At this point we've spent more engineering time on revising this one patchset (10 revisions to address various rounds of feedback) and discussion of it than we have on rebasing our working kernel trees to roughly every other linux release from 2.6.16 to I will disagree that wakelocks are "cutting corners" (we certainly have some corner cutting code in our trees, because yeah, ship is compromise, but I don't believe wakelocks are an example). They're a real solution for real problems faced on real devices. Obviously not a solution that everyone here likes, and maybe they'll never end up in mainline as a result, but so far I haven't seen a counter proposed solution that seems to solve the same problem, avoid races, and be How is it flawed? Serious question. Brian --
I would avoid repeating all the good arguments given so far, but to make it short: * I believe runtime PM is a much better starting point (at least for the type of HW targeted at mobile devices) because it mimics an always-on system toward userspace, which requires less disruption in the way apps are designed * QoS is closer to the apps pov: fps if it is a media player or a game, transfer speed if it is a file manager, bandwidth if it is a network app, etc The app is required to express its opinion by using a format that it understands better and is less system dependent. Actually the kernel should only be concerned with 2 parameters at most for any given operation: latency and bandwidth/throughput * Some form of resource management is needed as trust mechanism to discriminate "trusted" vs untrusted apps that can give reliable info (but in your case you should give trust to whom prevents the suspend) * Most of this could be done in userspace with the kernel merely providing the means to enforce the decisions taken by the userspace manager. * The kernel wouldn't even have to try to outsmart the "evil application writer" igor --
I agree. If I understand correctly, if we have a perfect user-space that only does work when strictly needed and trying to do it in bursts, then we would be reaching the lowest power state, and there would be no need for suspend. The problem is that Android's user-space is pretty far from that, so they said "let's segregate user-space and go to lower power mode anyway". If that's true, then this problem can be fixed in user-space, and in fact, it already is on N900. Good behaving applications are asynchronous, use g_timeout_add_seconds() to align bursts of work at the same second intervals, and don't do polls directly, but use GLib's mainloop. Same as in GNOME desktop. It seems there are other methods to align multiple processes for longer periods of time, but that code I think this information can be obtained dynamically while the application is running, and perhaps the limits can be stored. It would be pretty difficult for the applications to give this kind of information because there are so many variables. For example, an media player can tell you: this clip has 24 fps, but if the user is moving the time slider, the fps would increase and drop very rapidly, and how much depends at least on the container format and type of seek. A game or a telephony app could tell you "I need real-time priority" but so much as giving the details of latency and bandwidth? I find that very unlikely. Cheers. -- Felipe Contreras --
On Sat, 29 May 2010 02:42:35 +0300 This has already been mentioned (who knew?): Android doesn't want to depend on userspace for this. Cheers, Flo --
I doubt that belongs to typical QoS. Maybe the target could be to be from my gaming days the games were still evaluated in fps ... maybe i made the wrong assumption? A telephony app should still be able to tell if it's dropping audio frames. In all cases there should be some device independent limit - like: what is the sort of degradation that is considered acceptable by the typical user? Tuning might be offered, but at least this should set some sane set of defaults. igor --
I'm not sure what you mean. I-frames comes usually one per second, so if you only decode I-frames, your experience would be really bad. Moreover, you don't know beforehand when an I-frame is coming, only when it's there, and some clips can have only one I-frame at the Yes, the more fps, the better, but you calculate that by counting the amount of frames rendered over a period of time; you know the fps Yes, which could be unrelated to PM, like bad network conditions, but yeah, it should also be able to tell if the problem is with the It is easy to tell after the PM actions have been made, as in "wait! I'm not able to perform gimme more power!". But I don't see how that could be done _before_ the PM actions are done. From all the QoS proposals I have seen here, and considering that some people said that suspend blockers could be a specific case of QOS, I don't think people have been considering QoS as something to state Huh? Defaults in what units, based on what, and when and how to update? Cheers. -- Felipe Contreras --
- It means changing drivers and quite a few apps - It doesn't solve the problem of rogue apps if they end up owning locks - It puts the deep knowledge of the platform in the applications - It gives the apps control of the action taken not policy indication - It doesn't resolve the problem of synchronization of take/releases stopping any suspend - The kernel parts are not generically useful, merely effective for solving a specific problem right now - even things like VM migration to/from phones seems to break it - It inverts the whole logic the kernel is following and trend it is following that suspend is simply a very deep idle (with implementations merged) If it was a localised turd I wouldn't worry. There are plenty df deep unmentionables hidden away enirely in platform specific code that deal with everything from stoned hardware engineers to crazed software stack implementations. Here is a question back the other way perhaps - If the existing kerne was almostl entirely read only, or you had to pay a large fee per line of code changed outside your own driver how would you implement the wakelock/suspend blocker API ? Because if you take the path that 'we want wakelockers' that is essentially the question you have to answer. How do you merge it so that nobody outside of your driver and maybe a spot of arch code knows about it. You are permitted a couple of sneaky substitions of core function bits in headers. Right now bits are going to leak out over the kernel which is the cause of friction. At the point it's invisible to everyone else they cease to be stakeholders so you don't have keep them happy. You've only got a couple in your patches but its painfully obvious from Matthew and your comments you'll end up needing a ton more and these will get everywhere as Android grows hardware platforms and CPU support as phones become more featureful and PC like. The moment a phone grows a USB base station with hub for example the entire USB stack becomes ...
Linus will disagree with you there. Linus *has* merged code on the basis that it is shipping in distributions, regardless of the fact that some developers objected to it. Sometimes "perfect" should not be the enemy of "good enough" shipping code. For example, I used to point out that we shipped PCMCIA code in mainline that had a 10% chance of crashing the system if you ejected the card. NetBSD was proud to say that their code was so iron-clad and well designed that it always did the right thing, even if you ejected while it was busily passing network traffic. Unfortunately, NetBSD had working PCMCIA support 3 years later than Linux. So it used to be that we were the technical pragmatists (and Linus fortunately, still very much is the pragmatists, while others were the hard-line perfectionists. It seems to me we've started getting some of the NetBSD attitude infecting LKML, and IMHO, that's unfortunate. We've rewritten our networking stack, 3 or 4 times, depending on how you count. And sometimes shipping in products counts for a lot. It doesn't count for everything, and it isn't a get-out-of-jail card, for sure. But if it's a hard problem, and we have something that's good enough, maybe the right call is to merge it now, and we'll rework things to make something better and more general later. Ultimately that's a call only Linus can make. If everyone agrees we're making progress, and we can let this 100+ mail thread keep going. But if anyone feels that we are spinning endlessly without making forward progress (which is after all the same criteria the OOM killer uses, no? :-), people should remember that sometimes Linus *has* ended arguments that have gone on too long by making a "merge or kill" decision. - Ted --
I have seen very good proposals for saner solutions. Is that progress? igor --
The proposals so far involve either redefining the problem space or being inherently racey. It may be that we can redefine the problem space in such a way that everyone's happy, but it's not possible to do so by fiat. -- Matthew Garrett | mjg59@srcf.ucam.org --
I think the suggestion that has the closet fit with what we're trying to accomplish is Ingo's (or perhaps Ingo's explanation of Alan's): http://lkml.org/lkml/2010/5/28/106 where it's implemented as a constraint of some sort. Arve points out that qos constraint objects could work (but not if specifically tied to apps): http://lkml.org/lkml/2010/5/28/120 though he suggests that "latency" constraints don't represent this as well as "state" constraints. Though if you look at it that way, then suspend_blockers become qos constraint objects, but their implementation and usage remain pretty much the same as we have now, which does not address Alan's concern regarding code turning up in drivers, etc. I'm not sure how you can solve this problem (avoiding races around entering/exiting the suspend or suspend-like state) without having a means for drivers to prevent entry to that state. I need to think more about the cgroups approach, but I'm pretty sure it still suffers from wakeup race situations, and due to the complexity of userspace (at least ours), I suspect it would risk livelock/deadlock/priority-inversion style issues due to interaction between different processes in different groups. Brian --
I think the cgroups approach works if you assume that applications that consume wakeup events can be trusted to otherwise be good citizens. Everything that has no direct interest in wakeup events (except the generic Android userspace) can be frozen, and you can use the scheduler to make everything else Just Work. That's a rather big if, but you've got a better idea of the state of the Android app base than I do. -- Matthew Garrett | mjg59@srcf.ucam.org --
With latency you have an "I don't give damn" latency in your model which I am much much less concerned about general expressions of constraint appearing in drivers. One of my early mails gave a list of other people/projects/problems that need them - from hard real time, to high speed serial on low end embedded to virtualisation. They fix a general problem in terms of a driver specific item. We end up making changes around the tree but we make everyone happy not just Android. Also we are isolating policy properly. The apps and drivers say "I have these needs", the power manager figures out how to meet them. Where it gets ugly is if you start trying to have drivers giving an app a guarantee which the app then magically has to know to dispose of. If you are prepared to exclude untrusted apps from perfectly reliable event reporting (ie from finger to application action) that doesn't seem Priority inversion with the cgroup case is like synchronization effects with the suspend blockers - its a real ugly problem and one that is known to be hard to fix if you let it happen so I agree there. --
I think Arve's concern was the representation of the "I care, but only a little" or "just low enough to ensure threads must run" level which is what suspend blockers would map to (low enough to ensure we shouldn't halt the world but not necessarily implying a hard latency That makes sense -- and as I've mentioned elsewhere, we're really not super picky about naming -- if it turns out that wakelocks/suspendblockers were shorthand for "request a qos constraint that ensures that threads are running", we'll be able to get things Yeah -- which is something we've avoided in the existing model with overlapping wakelocks during handoff between domains. - input service is select()ing on input devices - when select() returns it grabs a wakelock, reads events, passes them on, releases the wakelock - the event subsystem can then safely drop its "should be running threads" constraint as soon as the last event is read because it has no queues for userspace to drain, but the overlapping wakelock Currently in the Android userpace only trusted (system) apps can directly obtain wakelocks -- arbitrary apps obtain them via rpc to a trusted system service (which ensures the app has been granted permission to do this and tracks usage for accountability to user/developer). Brian --
That's why I suggested "manyana" (can't get accents for mañana in a define) or perhaps "dreckly"[1]. They are both words that mean "at some point" but in a very very vague and 'relax it'll happen eventually' sense. More importantly it's policy. It's a please meet this constraint guide Cool. I think they are or at least they are close enough that nobody will I'm not sure avoided is the right description - its there in all its identical ugliness in wakelock magic If you treat QoS guarantees as a wakelock for your purposes (which is just fine, drivers and apps give you policy, you use it how you like) then you could write the paragraph below substituting the word 'guarantee' for 'wakelock' So in that sense the mess is the same because in both cases you are trying to suspend active tasks rather than asking The conventional PC model is 'we don't go back into sleep proper fast enough for that race to occur'. It's hard to see how you change it. An app->device "thank you for that event, I enjoyed it very much and have finished with it" message moves the underlying event management and QoS Clearly that would continue to work out. Alan [1] Dreckly being used in Cornwall, as one friend put it 'Like manãna but without that dreadful sense of urgency' --
This is the same as saying these two threads don't run often enough to need a mutex around their critical section. Just because you have not If each layer prevents suspend while it knows there are pending events Yes you can do this, and it it how the android alarm driver works, but we found the select()/poll(), block suspend, read event, process event then unblock suspend sequence cleaner (especially for interfaces that can return more than one event at a time). Kernel suspend blocker lets you implement the alarm driver model, adding user-space suspend -- Arve Hjønnevåg --
From my reading of this thread, there's a lot of overlap between suspendblockers and constraints. Many use cases are served equally well with one or the other, except for one: a case where an event that should ultimately wake the system triggers a code execution path (or data flow path) that wanders through a user-space full of complex interacting processes where the kernel (and maybe even the processes) can't see it. Suspend-blockers in user-space handle this by making such code/data paths visible to the kernel. An all-kernel constraint-based approach has no way to see the user-space paths, so the system will end up trying to sleep when it should be waking up. Wait, what? Surely all the user-space code handling such events is running under a PM-QoS constraint that says "don't sleep if this process is runnable," so the system won't go to sleep. Presumably all other processes which don't handle wakeup events will be running under a PM-QoS constraint that says "do sleep even if this process is runnable." That's true, except for one common case: a process is drawing things on the display on behalf of other processes, and that drawing process can't have the "don't sleep" constraint because if it did the system would seem to be continuously busy and never go to sleep. Any process that is handling a critical event but also needs to talk to the display process will end up being not-runnable, and the system may go to sleep before the display process wakes up. So we need another PM-QoS constraint that says "don't sleep even if this process isn't runnable, because some *other* runnable process might do something that makes our critical process runnable again." The critical event handling app would switch to this PM-QoS constraint until it had received an ack from whatever it talked to in user-space, then switch back to the "don't sleep if this process is runnable" state until a new event comes in. So, three constraint policies should do it (*): 1. Do sleep even if ...
If using suspend-blockers, Please explain to me how: - I will avoid the cpu going into some idle state for which the wakeup latency is larger than my RT app fancies? - to avoid some tasks from being serviced by the filesystems whilst others are? (ionice on steroids). - does my sporadic task (with strict bandwidth budget) not suffer bandwidth inversion? suspend blockers do a bit of each of that, but none of it in a usable fashion. --
Oops, I apparently meant "many use cases *of suspendblockers* are served ...though I'd think you could do that by holding a suspendblocker, thus preventing the CPU from going into any idle state at all. There's four likely outcomes, corresponding to inclusion or non-inclusion of suspend blockers and PM constraints in the kernel. Both could coexist in the same kernel, since a suspend blocker can be trivially expressed as "an extreme PM constraint with other non-constraint-related semantics." It's the "other non-constraint-related semantics" that seem to be the contentious issue. What can a suspend blocker do that a PM resource constraint cannot do? If that set contains at least one useful use case, then we need either suspend blockers, or some other thing that provides for the use case. Lots of people want PM constraints, and I haven't seen anyone suggest there should *not* be PM constraints in the kernel some day. I've seen a few "working and useful PM constraints aren't going to happen any time soon" statements, and several "there's lots of stuff you still can't do with PM constraints or suspend blockers" statements, but those aren't arguments *against* PM constraints or *for* suspend blockers. --
Without the clear description of the experiments, that statement proves just nothing other than your applications work better with your model, but I would expect that to be so without any experiments at all. ~Vitaly --
On Fri, 28 May 2010 12:41:23 +0100 Maemo has battery management applications. Right now they show you what is going on but haven't gone to a pop-up 'XYZ is eating all your battery' kill it behaviour. The information is there. If my phone eventually becomes a 1GB RAM PC class system I will be running PC class apps on it and I will be migrating virtual machines to and from my phone which have no idea about the device properties of each device they migrate to and from. Be that as it may the question of how you manage a naughty app is a good one. Historically we've managed them for network abuse, memory abuse, cpu use abuse, access rights, but not yet power. Whether that looks like setrlimit(pid, LIMIT_CHARGE, 150mWH); or setrlimit(pid, LIMIT_POWER, 150mW); or something else is the question. I rather like the above but I don't see how to implement them nicely at the moment. Alan --
Either way, this will require a detailed model of the system in terms of latency, throughput, current consumption and heat generation. Which can be provided only by the HW manufacturer. But, should such model be available (and we have some form of it for the OMAP3 in N900), then it can be abstracted through generic interfaces, which accept constraints and produce the selected target state (typically a vector of states for each sub component). igor --
Maybe. And perhaps the right solution in that case is to merge both, as opposed to "consign one to the outer darkness". And I think that's a decision Linus should make. I do hope we can come up with a better solution, eventually. But I do want to point out as a process point of view, we do have other alternates other than "spinning endlessly". -- Ted --
Those apps were from an experimental repository, which is not enabled by default in stock SW. Of course tools can be improved, but if someone decides to run sw which is clearly under heavy development, i see little point in complaining that it might not work as expected. igor --
Well, yes, if the company strategy is to have a walled garden ala the Apple iPhone App store, life is much simpler. But if the requirements mean that apps don't need preapproval, the requirements on the platform get harder. I think the take-home here is we have a requirement that the platform behave well even without someone screening the applications for the "default SW repository". --
No, the strategy is to try to merge commercial and community needs. We do support signed repositories. The community has control on the public one. Members are encouraged to help by alpha/beta testing apps that are under development. That's a wrong way to put it. By installing something on your phone you What it meant is totally different. Regardless how much effort you put into twisting it. It means that different repositories provide different level of trust. As Debian user, I don't blame anybody other than myself is something I've pulled from unstable or experimental breaks my system. Debian by default doesn't ship with either unstable or experimental enabled. And using suspend blockers doesn't really solve the problem of who to trust to take the block and who not. Or we'll have to have suspend-blockers-blockers and so on ... Like it or not, QoS and resource management - in some form - are needed to allow trusted application to provide valuable feedback, while filtering requests from untrusted applications. You might want to add dynamic profiling and try to use some heuristic to have the system doing runtime evaluation of good vs bad applications, but still some discrimination mechanism will be required. igor --
Sorry, miswording: s/faster/less frequent/ I'm not convinced CPU activity LEDs help either, BTW. It only takes the CPU getting crowbarred out of idle for a tiny amount of time before you start impacting battery life, and if the crapplication is only doing it every 30-60 seconds or so, I doubt you'd see it on the LED's.... that sort of thing might be acceptable if you have a 1-3 pound battery, but maybe much less so if you have a bettery which is cell-phoned sized. -- Ted --
Ted As a PS to the previous email the situation has I think more choices than you portray. Given the need for various constraints imposed by drivers for things like RT it's entirely possible that a solution ends up being something like Kernel proper: Turn suspend block kernel API into an expression of constraints (or whatever else seems to work) Throw the user space in the bin Google: Use the constraints in a sledgehammer manner (hey it solves your problem in that form so why not) Patch in a private user space API That makes things much much easier as we don't risk getting a horribly broken API into the kernel that is hard to remove, while hopefully meaning its rather easier for google to merge drivers and other code as well as to maintain a smaller patch set. --
Again, Alan, Thomas and myself don't argue against that, what we do however argue against is suspend running apps as a form of power management. If you were to read Alan's latest posts he clearly outlines how you can contain crappy apps. A combination of weakening QoS guarantees (delaying wakeups etc.) blocking on resources (delay servicing requests) and monitoring resource usage (despite all that its still not idle) and taking affirmative action (shoot it in the head). If we pose that a well behaved application is one that listens to the environment hints and idles when told to, we can let regular power management kick in and let deep idle states do their thing. If a bad application ignores those hints and manages to avoid getting blocked on denied resources, we can easily spot it and promote an attitude of violence toward it in the form of SIGXCPU, SIGSTOP, SIGTERM and SIGKILL, possibly coupled with a pop-up dialog -- much like we get today when we try to close a window and the app isn't responding. If we then also let the environment maintain a shitlist of crappy apps (those it had to take affirmative action against) and maybe set up a service that allows people to share their results, it provides an incentive to the app developers to fix their thing. How is this not working? --
You seem to argue that android is not allowed to use suspend because the hardware we have shipped on can enter the same power state from idle. From my point of view, since we need to support suspend on some hardware we should be allowed to leverage this solution on the better I have not seen any suggestions for how to deal with all our interprocess dependencies when pausing a subset of processes. Without a solution to that we can only pause a subset of the processes we want These solutions do not allow us to use suspend. They may get us closer to the power consumption we get from suspend on the good hardware or even surpass it, but we still need suspend on some hardware, and we would get event better results by using these solutions in addition to suspend compared to using them instead of suspend. -- Arve Hjønnevåg --
Correct, I strongly oppose using suspend. Not running runnable tasks is not a sane solution. If current hardware can't cope, too friggin bad, get better hardware. Do not 'pause' processes and you don't have the problem, make them stop on their own accord or kill them if they dont listen.. who cares about ill-behaved apps anyway? But really, if you want a more detailed answer, you need to provide more detail on these problems. If you want to allow an untrusted app to provide a dependency for a trusted app, you've lost and I don't care. Not using suspend is exactly the point. As Alan has argued, propagating suspend blockers up into all regions of userspace will take much longer than fixing the hardware. You got to realize this is about Linux as a whole, I really don't care one whit about the specific Android case. We want a solution that is generic enough to solve the power consumption problem and makes sense on future hardware. The only abstraction that really makes sense in that view is idle states. --
Look, this is getting into the realms of a pointless semantic quibble. The problem is that untrusted tasks need to be forcibly suspended when they have no legitimate work to do and the user hasn't authorised them to continue even if the scheduler sees them as runnable. Whether that's achieved by suspending the entire system or forcibly idling the tasks (using blocking states or freezers or something) so the scheduler can suspend from idle is something to be discussed, but the net result is that we have to stop a certain set of tasks in such a way that they can still receive certain external events ... semantically, this is equivalent to not running runnable tasks in my book. (Perhaps this whole thing is because the word runnable means different things ... I'm thinking a task that would consume power ... are you thinking in the scheduler R state?) Realistically, the main thing we need to do is stop timers posted against the task (which is likely polling in a main loop, that being the usual form of easy to write but power crazy app behaviour) from waking the task and bringing the system out of suspend (whether from idle or That's rubbish and you know it. We do software workarounds for hardware problems all the time ... try doing a git grep -i errata in arch x86, or imagine a USB subsystem that only supported sane standards conforming devices: that would have an almost zero intersect with the current USB device market. The job of the kernel is to accommodate hardware as best it can ... sometimes it might not be able to, but most of the time it does a pretty good job. The facts are that C states and S states are different and are entered differently. For some omap hardware, the power consumption in the lowest C state (with all the ancillary power control) is the same as S3, that's fine, suspend from idle works as well as suspend to ram modulo bad apps. For quite a lot of MSM hardware, the lowest C state power consumption is quite a bit above S3. It's not acceptable to ...
So what happens if you task is CPU bound and gets suspended and is holding a resource (lock, whatever) that is required by someone else that didn't get suspended? That's the classic inversion problem, and is caused by not running Why would be care about external events? Clearly these apps are ill behaved, otherwise they would have listened to the environment telling them to idle. Why would you try to let buggy apps work as intended instead of break them as hard as possible? Such policy promotes crappy code since people Sure, that same main loop will probably receive a message along the lines of, 'hey, screen is off, we ought to go sleep'. If after that it doesn't listen, and more serious messages don't get responded to, simply kill the thing. Again, there is no reason what so ever to tolerate broken apps, it only promotes crappy apps. --
On Sat, 29 May 2010 20:12:14 +0200 The trick with the approach currently discussed (i.e. opportunistic suspend, if you missed it): We suspend the whole machine. And I really think, this is the only way to do it. It is a big hammer, If I have a simple shell script then I don't wanna jump through hoops just to please your fragile kernel. And before you judge code that does not behave to work with YOUR buggy kernel, i would think twice. This cuts both ways. Just because the problem is too hard for you, this does not excuse forcing crappy kernels on other people. I think you have a point in that it is _in general_ not easily possible to solve. But for this case this is clearly a simple, to the point and working solution for android based phones. I think this would be a possibility. And maybe even sane. But I also think this has nothing to do with suspend_blockers. They block This simple doesn't solve the problem. Cheers, Flo --
On Mon, 31 May 2010 22:12:19 +0200 Also why should that code on one device kill my uptime and on the other machine (my wall-plugged desktop) work just well? That doesn't sound right. Clearly opportunistic suspend is a workaround for battery-driven devices and no general solution. But it is not specific to android. At least not inherently. It could be useful for any embedded or mobile device where you can clearly distinguish important functions from convenience functions. I really can't understand the whole _fundamental_ opposition to this design choice. Cheers, Flo --
Sounds perfectly right to me; one code runs perfectly fine on one machine, and on the other doesn't even compile. Well, sure, it wasn't Yes, it could, but why go for the hacky solution when we know how to Nobody is using it, except Android. Nobody will use it, except Android. I have seen recent proposals that don't require changing the whole user-space. That might actually be used by other players. -- Felipe Contreras --
That's like saying "Android is not a legitimate user of the kernel". Is that Sure, an approach benefitting more platforms than just Android would be better, but saying that the kernel shouldn't address the Android's specific needs as a rule if no one else has those needs too is quite too far reaching to me. Rafael --
Well, if the android people keep rejecting all sensible approaches to power savings except their suspend blocker mess, then I don't see why we should support their ill designed mess. We should strive to provide an interface that can be used by all interested parties to conserve power; if Android really is the only possible user of the interface then I don't see any reason at all to merge it, they might as well keep it in their private tree. --
Well, I certainly would like the Android people to be more appreciative of our There is a number of kernel users that depend on Android user space (phone vendors using Android on their hardware, but providing their own drivers), so I don't think we really can identify Android with Google in that respect. Rafael --
I don't see why we can't merge the platform code and drivers without suspend blockers. Google can patch them back in on their side if they want to. --
Read the context: opportunistic suspend, which is considered a workaround, which requires new user-space API for suspend blockers, might be remotely considered for inclusion *if* it indeed solves a problem for battery-driven devices, which other parties also experience and could benefit from this solution. There are no Android specific needs, why should certain user-space ecosystem need certain API that somehow *nobody* else does? I think in this huge thread it has become obvious that people are reluctant to this idea... whatever problem Android user-space presents (I don't think there's any), it can be solved for "he rest of the world" too, and such generic solution is worth exploring. -- Felipe Contreras --
Hi, again! My two mails were probably a bit pointless and not helping to find a solution. There are notable and useful approaches mentioned by Peter to the mitigation problem. It's just that it's not the one and only way to think about this. Just rants, Flo --
OK ... but if the options are running and S3 for the entire platform, then all tasks get suspended and this isn't a problem. This is why the current wakelock implementation on the android platform works flawlessly today. Inversion only becomes a problem if tasks get individually idled, so you can see that, from the android point of view, we're creating a problem which their implementation doesn't have. In this view, S3 suspend does look elegant: it solves the inversion problem by suspending everything, it controls rogue applications' power consumption and it gets certain hardware into a lower power state than is possible from suspend from idle. The inelegance of the S3 suspend solution is the requirement to use these suspend blockers through kernel and user space to get the whole thing up again to respond to an event, which is an inelegance suspend That's not a correct characterisation. Badly behaved apps from a power point of view can do useful things for the user. The object is to Actually, no, if this were a correct view, we wouldn't have the huge x86 hardware work around problem because we'd just be able to tell manufacturers of shoddy or badly standards compliant stuff where to stick it. The great strength of the x86 commodity platform revolution was the fact that the hardware became cheap, plentiful and outside the ambit of a single walled garden manufacturer. It's great weakness is integration problems and shoddy hardware. We tolerate the weakness because the strength vastly outweighs it: and toleration to us in the kernel means driver work arounds ... it also means that if a device doesn't work with the kernel, we get blamed (rather than the manufacturer). By the same token, the revolution in smart phones is driven in quite a large part by the provision of third party applications. This commodity app view is almost the direct software analogue of the commodity platform view that has been so successful in hardware; Therefore, fair play seems to ...
That, among other things, is why suspend uses the freezer which guarantees Do you realistically think that by hurting the _user_ you will make the _developer_ write better code? No, really. If the user likes the app very much (or depends on it or whatever makes him use it), he will rather switch the platform to one that allows him to run that app without _visible_ problems than complain to the developer, because _the_ _user_ _doesn't_ _realize_ that the app is broken. From the user's perspective, the platform that has problems with the app is broken, because the app apparently runs without problems on concurrent platforms. The whole "no reason to tolerate broken apps" midset is simply misguided IMO, because it's based on unrealistic assumptions. That's because in general users only need the platform for running apps they like (or need or whatever). If they can't run apps they like on a given platform, or it is too painful to them to run their apps on it, they will rather switch to another platform than stop using the apps. Thanks, Rafael --
As an application writer, if my users complain that their battery is being drained (as it happened), they stop using it, and other people see there are problems, so they stop using it, if people get angry about it they will vote it down. New users will see it has low score; they will not install it. That's a network effect. Yeah, right. I don't think anybody has every bought an iPhone because of Tweetie. People care how the applications run on their phones, not how their phone's platform runs their favorite application, in fact, most probably it became their favorite application because it was running great on their phone, and they wouldn't expect it to run on phones with other platforms. Either applications run on S60, iPhone OS, Android, or Maemo, but not in a combination of those. And if their certain app that runs on multiple platforms, and the user actually knows that (probably a geek), then he knows he can't expect it to work You seriously think people switch high-end phones just to run their favorite apps? It's much cheaper to switch apps, and that's what users do. -- Felipe Contreras --
On Sat, 5 Jun 2010 20:16:33 +0300 That is nice. But how does it impact the problem that suspend blockers solve? And why do suspend blockers interfere with that? Cheers, Flo --
It doesn't, I don't know why people keep bringing this argument, I just though it should not be left open as a valid one. I should have mentioned that this is indeed irrelevant. -- Felipe Contreras --
On Sat, 5 Jun 2010 22:56:45 +0300 Uh! I found out how this is relevant to the suspend blockers case. Because not having users means that the bugs don't get fixed. Whereas in the suspend blockers case the users can use the app and get the bugs fixed. Cheers, Flo p.s.: I really wished you would focus more on solving the problem and not on dismissing it. --
Sure, and if x86 could wake from S3 on a keypress/mouse movement etc.. you could use S3 as idle state.. not sure people would love the wakeup-latency, but that's a QoS matter. But if there simply are no suitable wakeup sources from an idle state (S3 really is nothing more than a hardware idle state) then it might not be suitable for transparent idle modes and no amount of software hackery will solve that. So what I'm saying is, if your hardware can't generate the needed wakeup events, the auto-suspend stuff won't work either. If it can it can be Wth is MSM? But really, why can't existing hardware get shipped with existing hacks, and for future hardware that does behave we have a proper solution? --
<20100527232043.784d5c72@lxorguk.ukuu.org.uk> <20100528101755.7b5f6b8a@lxorguk.ukuu.org.uk> --
That's an x86'ism which is going away. And that's really completely irrelevant for the mobile device space. Can we please stop trying to If you'd have read the answers from Alan carefully, then you'd have noticed that even x86 hardware is getting to the point where OMAP is today. i.e. support of transparent suspend from idle. If that wouldn't happen then x86 would be simply unusable for mobile devices. It's that easy. And we really do _NOT_ care about the existing laptop hardware which does not provide that because it's a lost case. Not only due to the missing (or just disabled) wakeup sources, also due to the fact that you cannot do sensible power management by completely disabling clock and/or power of unused devices in the chipset. There is a damn good reason why the mobile space is _NOT_ x86 based at the moment. Thanks, tglx --
You're the one mentioning x86, not me. I already explained that some MSM hardware (the G1 for example) has lower power consumption in S3 (which I'm using as an ACPI shorthand for suspend to ram) than any suspend from idle C state. The fact that current x86 hardware has the So not at all interested in x86 at the moment. For MSM hardware, it looks possible to unify the S and C states by doing suspend to ram from idle but I'm not sure how much work that is. James --
On ARM, it's not rocket science and we have in tree support for this already (OMAP). I have done the same thing on a Samsung part as a prove of concept two years ago and it's really easy as the hardware is sane. Hint: It's designed for mobile devices :) Thanks, tglx --
We already enter the same power state from idle and suspend on msm. In the absence of misbehaving apps, the difference in power consumption is entirely caused by periodic timers in the user-space framework _and_ kernel. It only takes a few timers triggering per second (I think 3 if they do no work) to double the average power consumption on the G1 if the radio is off. We originally added wakelocks because the hardware we had at the time had much lower power consumption in suspend then idle, but we still use suspend because it saves power. -- Arve Hjønnevåg --
So how do you differentiate between timers which _should_ fire and those you do not care about ? We have mechanisms in place to defer timers so the wakeups are minimized. If that's not enough we need to revisit. Thanks, tglx
Deferring the the timers forever without stopping the clock can cause problems. Our user space code has a lot of timeouts that will trigger an error if an app does not respond in time. Freezing everything and stopping the clock while suspended is a lot simpler than trying to stop individual timers and processes from running. -- Arve Hjønnevåg --
And resume updates timekeeping to account for the slept time. So the only way to get away with that is to sleep under a second or just ignoring the update by avoiding the access to rtc. So how do you keep timekeeping happy ? Thanks, tglx
No, for the monotonic clock it does the opposite. The hardware clock is read on resume and the offset is set so the monotonic clock gets -- Arve Hjønnevåg --
Grr, yes. Misread the code. -ENOTENOUGHCOFFEE Thanks, tglx
Those machines can go from idle into S2RAM just fine w/o touching the /sys/power/state S2RAM mechanism. It's just a deeper "C" state, really. The confusion is that S3 is considered to be a complete different mechanism - which is true for PC style x86 - but not relevant for hardware which is sane from the PM point of view. Now some people think, that suspend blockers are a cure for the existing x86/ACPI/BIOS mess, which cannot go to S3 from idle, but that's simply not feasible. Thanks, tglx --
As long as you can set a wakeup timer, an S state is just a C state with side effects. The significant one is that entering an S state stops the process scheduler and any in-kernel timers. I don't think Google care at all about whether suspend is entered through an explicit transition or something hooked into cpuidle - the relevant issue is that they want to be able to express a set of constraints that lets them control whether or not the scheduler keeps on scheduling, and which doesn't let them lose wakeup events in the process. -- Matthew Garrett | mjg59@srcf.ucam.org --
Exactly, so my understanding of where we currently are is:
1. pm_qos will be updated to be able to express the android suspend
blockers as interactivity constraints (exact name TBD, but
probably /dev/cpu_interactivity)
2. pm_qos will be updated to be callable from atomic context
3. pm_qos will be updated to export statistics initially closely
matching what suspend blockers provides (simple update of the rw
interface?)
After this is done, the current android suspend block patch becomes a
re-expression in kernel space in terms of pm_qos, with the current
userspace wakelocks being adapted by the android framework into pm_qos
requirements expressed to /dev/cpu_interactivity (or whatever name is
chosen). Then opportunistic suspend is either a small add-on kernel
patch they have in their tree to suspend when the interactivity
constraint goes to NONE, or it could be done entirely by a userspace
process. Long term this could migrate to the freezer and suspend from
idle approach as the various problem timers get fixed.
I think the big unresolved issue is the stats extension. For android,
we need just a name written along with the value, so we have something
to hang the stats off ... current pm_qos userspace users just write a
value, so the name would be optional. From the kernel, we probably just
need an additional API that takes a stats name or NULL if none
(pm_qos_add_request_named()?). Then reading the stats could be done by
implementing a fops read routine on the misc device.
Did I miss anything?
James
--
I think that's not been decided yet precisely enough. I saw a few ideas Is the original idea of having that information in debugfs objectionable? Rafael --
Well, android only needs two states (block and don't block), so that gets translated as 2 s32 values (say 0 and INT_MAX). I've seen defines like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to describe these, but if all we're arguing over is the define name, that's progress. The other piece they need is the suspend block name, which comes with the stats API, and finally we need to decide what the actual constraint Well ... debugfs is usually used to get around the sysfs rules. In this case, pm_qos has a dev interface ... I don't specifically object to using debugfs, but I don't see any reason to forbid it from being a simple dev read interface either. James --
I think we need separate state constraints for suspend and idle low power modes. On the msm platform only a subset of the interrupts can wake up from the low power mode, so we block the use if the low power mode from idle while other interrupts are enabled. We do not block suspend however if those interrupts are not marked as wakeup interrupts. Most constraints that prevent suspend are not hardware specific and should not prevent entering low power modes from idle. In other words we may need to prevent low power idle modes while allowing suspend, and we may need to prevent suspend while allowing low power idle modes. It would also be good to not have an implementation that gets slower and slower the more clients you have. With binary constraints this is 4. It would be useful to change pm_qos_add_request to not allocate anything so can add constraints from init functions that currently We don't currently have a dev interface for stats so this is not an immediate requirement. The suspend blocker debugfs interface is just as good as the proc interface we have for wakelocks. -- Arve Hjønnevåg --
2010/6/1 Gross, Mark <mark.gross@intel.com>: The calling code will have to store a pointer to your structure anyway, you may as well have them provide the whole structure. -- Arve Hjønnevåg --
[mtg: ] duh! You are right. Make the caller's hold the structure. Its been a long day. That would be easy todo. --gmross --
Well, as I said, pm_qos is s32 ... it's easy to make the constraint Well, that's an implementation detail ... ordering the list or using a btree would significantly fix that. However, the most number of constraint users I've seen in android is around 60 ... that's not huge from a kernel linear list perspective, so is this really a concern? ... particularly when most uses don't necessarily change the constrain, so a Sure .. we do that for the delayed work queues, it's just an API which takes the structure as an argument leaving it the responsibility of the OK, great ... what actually exports the statistics is just an implementation detail. James --
No, they have to be two separate constraints, otherwise a constraint to block suspend would override a constraint to block a low power idle True. I think we also need timeout support in the short term though which is also somewhat simpler to implement in an efficient way for -- Arve Hjønnevåg --
Depends. If you block the system from going into low power idle, does
that mean you still want it to be fully suspended?
If yes, then we do have independent constraints. If not, they have a
hierarchy:
* Fully Interactive (no low power idle or suspend)
* Partially Interactive (may go into low power idle but not
suspend)
* None (may go into low power idle or suspend)
Which is expressable as a ternary constraint.
James
--
On Wed, 02 Jun 2010 10:05:11 -0500 But unblocking suspend at the moment is independent to getting idle. If you have the requirement to stay in the highest-idle level (i.e. best latency you can get) that does not (currently) mean, that you can not suspend. To preserve that explicit fall-through while still having working run-time-powermanagement I think the qos-constraints need to be separated. <disclaimer: just from what I read> Provided you can reach the same power state from idle, current suspend could probably also be implemented by just the freezing part and a hint to the idle-loop to provide accelerated fall-through to lowest power. </disclaimer> At that point, you could probably merge the constraints. But the freezing part is also the hard part, isn't it? (I have no idea. Thomas seems to think about cgroups for that and doing smth about the timers.) Cheers, Flo --
I don't understand that as a reason. If we looks at this a qos constraints, you're saying that the system may not drop into certain low power states because it might turn something off that's currently being used by a driver or a process. Suspend is certainly the lowest state of that because it turns everything off, why would it be legal to drop into that? I also couldn't find this notion of separation of idleness power from suspend blocking in the original suspend block patch set ... if you can either tell me where it is, or give me an example of the separated use Um, well, as I said, I think using suspend from idle and freezer is longer term. I think if we express the constraints as qos android can then use them to gate when to enter S3 .. which is functionally equivalent to suspend blockers. And the vanilla kernel can use them to gate power states for the drivers in suspend from idle. James --
Because the driver gets called on suspend which gives it a change to The suspend block patchset only deals with suspend, not low power idle modes. The original wakelock patchset had two wakelock types, idle and The i2c bus on the Nexus One is used by the other core to turn off the power you our core when we enter the lowest power mode. This means that we cannot enter that low power mode while the i2c bus is active, so we block low power idle modes. At some point we also tries to block suspend in this case, but this caused a lot of failed suspend attempts since the frequency scaling code would try to ramp up while freezing -- Arve Hjønnevåg --
OK, so this is a device specific power constraint state. I suppose it makes sense to have a bunch of those, because the device isn't necessarily going to know what idle power mode it can't go into, so the cpu govenor should sort it out rather than have the device specify a minimum state. James --
On Wed, 02 Jun 2010 15:41:11 -0500
Hm. Maybe it is me who doesn't understand.
With proposed patchset:
1. As soon as we unblock suspend we go down. (i.e. suspending)
2. While suspend is blocked, the idle-loop does it's things. (i.e.
runtime power managment -> can give same power-result as suspend)
possible cases:
1:
- qos-latency-constraints: 1ms, [here: forbids anything other than
C1 idle state.]
- suspend is blocked
2: - qos latency-constraints: as in 1
- suspend unblocked
3: - qos latency-constraints: infinity, cpu in lowest power state.
- suspend is blocked
4: - qos latency-constraints: infinity, cpu in lowest power state.
- suspend unblocked
in case 2 and 4 we would suspend, regardeless of the qos-latency.
in case 1 and 3 we would stay awake, regardeless of the qos-latency
constraint.
If only one constraint, then case 2 (or 3) wouldn't be possible. But it
is possible now.
A possible use case as an example?
(hmm... i'm trying my imagination hard now):
Your sound needs low latency, so that could be a cause for the
qos-latency constraint.
And unblocking suspend could nonetheless happen:
For example... you have an firefox open and don't want to
prevent suspend for that case when the display is turned off
Cheers,
Flo
--
[mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is. We've had a number of attempts at fixing this, but I think the proper fix is to bolt a "disable C-states > x" interface into cpu_idle that bypases pm_qos altogether. Or, perhaps add a new pm_qos API that does the equivalent operation, overriding whatever constraint is active. --mgross --
> [mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is. That should be trivial in the usual case because 99% of the time you can hot path the QoS entry changing is the latest one there have been no other changes If it is valid I can use the cached previous aggregate I cunningly saved in the top QoS entry when I computed the new one We need some of this anyway for deep power saving because there is hardware which can't wake from soem states, which in turn means if that device is active we need to be above the state in question. --
Why would the kernel change the QoS state of a task? Why not have two interacting QoS variables, one for the task, one for the subsystem in Right, and I can imagine that depending on the platform details and not the device details, so we get platform hooks in the drivers, or possible up in the generic stack because I don't think NICs actually know if there are open connections. --
Yes, having a QoS parameter per-subsystem (or even per-device) is very important for SoCs that have independently controlled powerdomains. If all devices/subsystems in a particular powerdomain have QoS parameters that permit, the power state of that powerdomain can be lowered independently from system-wide power state and power states of other power domains. Kevin --
This seems similar to that pm_qos generalization into bus drivers we where waving our hands at during the collab summit in April? We never did get into meaningful detail at that time. --mgross --
The hand-waving was around how to generalize it into the driver-model, or PM QoS. We're already doing this for OMAP, but in an OMAP-specific way, but it's become clear that this is something useful to generalize. Kevin --
Do you have a pointer to the source and description? It might be useful to look at to do a reality check on what we're talking about. James --
Hi Kevin, Mark, all, Yes, from our brief discussions at ELC, and all the ensuing discussions that have happened in the last few weeks, it certainly seems like a good time to think about: - what is a good model to tie up device idleness, latencies, constraints with cpu idle infrastructure - extensions to PM_QOS, part of what is being discussed, especially Kevin's earlier mail about QOS parameter per subsystem/device that may have independent clock/power domain control. - what is a good infrastructure to subsequently allow platform-specific low power state - extensions to cpuidle infrastructure to allow platform-wide low power state? Exact conditions for such entry/exit into low power state (latency, wake, etc.) could be platform specific. Is it a good idea to discuss about a model that could be applicable to other SOCs/platforms as well? Thanks Rajeev -----Original Message----- From: linux-pm-bounces@lists.linux-foundation.org [mailto:linux-pm-bounces@lists.linux-foundation.org] On Behalf Of Kevin Hilman Sent: Thursday, June 03, 2010 10:28 PM To: Gross, Mark Cc: Neil Brown; tytso@mit.edu; Peter Zijlstra; felipe.balbi@nokia.com; LKML; Florian Mickler; James Bottomley; Thomas Gleixner; Linux OMAP Mailing List; Linux PM; Alan Cox Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8) The hand-waving was around how to generalize it into the driver-model, or PM QoS. We're already doing this for OMAP, but in an OMAP-specific way, but it's become clear that this is something useful to generalize. Kevin _______________________________________________ linux-pm mailing list linux-pm@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/linux-pm --
I think there is definitely a need for QoS parameters per-device. I've been pondering how to incorporate this concept into runtime_pm. One idea would be to add pm_qos-like callbacks to struct dev_pm_ops, e.g. runtime_pm_qos_add/update/remove_requirement(). Requirements would be passed up the tree to the first parent that cares, usually a bus driver. Is this similar to what you guys were discussing at the collab summit? Thanks. - Bryan --
It's not just the list based computation: that's trivial to fix, as you say ... the other problem is the notifier chain, because that's blocking and could be long. Could we invoke the notifier through a workqueue? It doesn't seem to have veto power, so it's pure notification, does it James --
On Thu, 03 Jun 2010 08:24:31 -0500 I think schedule_work() (worqueue.h) can take care of that. Thats how the rfkill subsystem does it. Cheers, Flo --
[mtg: ] true. The notifications "could be" done on as a scheduled work item in most cases. I think there is only one user of the notification so far any way. Most pm_qos users do a pole of the current value for whatever parameter they are interested in. --
It depends on the information type and for a lot of things we might get away without notifiers. The only real issue is when you need to get other cores out of their deep idle state to make a new constraint work. That's what we do with the DMA latency notifier right now. Thanks, tglx --
But the only DMA latency notifier is cpuidle_latency_notifier. That looks callable from atomic context, so we could have two chains: one atomic and one not. The only other notifier in use is the ieee80211_max_network_latency, which uses mutexes, so does require user context. James --
This is all nice but, all this does is implement the exact same thing as the wake lock / suspend blocker API as a pm_qos request-class. It leaves the overlapping constraint issue from ISR to user mode in place depending on exactly how the oppertunistic suspend is implemented. I expect it will be via a notifier on the pm_qos request-class update that would do exactly what the wake lock code does today. just load up an a "suspend_on_non_interactivity" driver that registers for the call back, have it enabled by the user mode PM, and you have the equivelent architecture as what was proposed by the wake lock patches. it gives the Android guys what they want, without adding a new subsystem, minimizing the changes and makes most of the architecture much more politicaly acceptible. But doesn't it have the same issues with getting the overlapping constraints right from wake up source to user mode and dealing with the wake up envents in a sane way? Instead of sprinkling suspend-blockers about the kernel we'll sprinkle pm_qos_requests about. I like getting I don't think the status would be a big deal to add. However; I am really burned out by this discussion. I am willing to stub this out ASAP if it puts this behind us if the principles in the discussion are in more or less agreement. --mgross For the record, I still like my low power event idea, which could coexist with the above. --
if the vanilla kernel is simply consuming the pm_qos infrastructure and using suspend from idle, this is irrelevant. As I said, S3 suspend *can* be implemented via a suspend manager process from userspace (the alan stern proposal). However, if I were coding the android kernel, I'd do it as a tiny add on kernel patch. The main goal of making the android kernel close enough to the vanilla kernel for there not to be two separate upstreams for the device driver writers has been achieved Suspend from idle doesn't have the wakeup problem. it only manifests if you want to take the system down via the S states. I think long term, making suspend from idle work for all hardware is the agreed goal, even if android can't implement it today and has to use an S state work The proposal is isomorphic to what I said above ... just s/pm_qos/whatever the lp API is/ James --
That's wrong. You only need the explicit dynamic QoS constraints for
applications which follow the scheme:
while (1) {
if (event_available())
process_event();
else
do_useless_crap_which_consumes_power();
}
which need the following annotation:
while (1) {
block_suspend();
if (event_available()) {
process_event();
unblock_suspend();
} else {
unblock_suspend();
do_useless_crap_which_consumes_power();
}
}
Plus the kernel counterpart of drivers which take the suspend blocker
in the interrupt handler and release it when the event queue is empty.
So that's done for making polling event handling power "efficient".
Even worse, you need the same "annotation" for non polling mode and it
enforces the use of select() because you cannot take a suspend blocker
across a blocking read() without adding more invasive interactions to
the kernel..
So the "sane" app looks like:
while (1) {
select();
block_suspend();
process_events();
unblock_suspend();
}
I'm really tired of arguing that this promotion of "programming style"
is the worst idea ever, so let's look how you can do the same thing
QoS based.
s/block_suspend()/qos(INTERACTIVE)/ and
s/unblock_suspend()/qos(NONE)/ and
s/block_magic()/qos_magic()/ in the drivers.
Yes, it's mostly the same, with a subtle difference:
While android can use it in the big hammer approach to disable the
existing user initiated suspend via /sys/power/state, the rest of the
world can benefit as well in various ways.
- Sane applications which use a blocking event wait can be handled
with a static QoS setting simply because a blocking read relies on
the QoS state of the underlying I/O system.
- Idle based suspend as the logical consequence of idle states is
just a matter of QoS constraint based decisions.
- Untrusted apps can be confined in cgroups. The groups are set to
...I generally agree. I think the Alan Stern's recent proposal goes along these lines, but it has the advantage of being a bit more specific. ;-) Thanks, Rafael --
Yes, Alan Stern's proposal is going into that direction and I'm not opposed. Just wanted to get the overall picture straight for James :) Thanks, tglx --
So this is the re-expression in terms of a QoS API that I mentioned ... as I said, I think it's the way forwards. (from the android point of view, it keeps the user space expression in exactly the same place as the original wake locks, or suspend blocks, which is why it looks like a I understand this ... it's effectively the alan stern approach. I've Yes, which is why I think something like this can be made to work ... I don't really see that we differ on the broad brush picture. As long as the acceptable implementation accomplishes what everyone wants, I think we're home. --
I think that the suspend block model can be viewed as a constraints problem (similar to some of things things you've been sketching out in these threads), but I think we (Google/Android) view it as more of a state constraint (don't enter suspend) than a latency constraint. We think there's a need for these constraints both from the driver side and userspace side, and that these constraints are not tied to processes (multiple entities in one process may have different constraints at different times or multiple processes may be working together to accomplish some goal under a single constraint -- at least both cases exist in the Android system as it ships today). The exact naming of the API is not terribly important to us. The first thing we spent a bunch of time discussing last summer when Arve first looked into sending wakelocks upstream was changing the name because many objected to "wakelock" for various reasons. Being able to have userful statistics (which drivers/processes/etc held which wakelock for how long, how many times, etc) is important to us. While we want to do the best we can in the face of poorly written apps, we also want to educate users and developers about which apps are contributing to their poor battery life -- so users can decide to uninstall an app if its usefulness does not justify its impact on battery life and application developers can be more aware of what the cost of their app is to endusers. As an example, http://frotz.net/misc/battery-stats-unplugged.txt contains a dump from the "battery service" aggregating wakelock usage, cpu usage, and sensor device usage of processes (#....: sections) on my phone the other day for a ~3 hour period. This data is presented visually to the enduser in a "what's using my battery" feature of the platform. "realtime" refers to wall clock time here and "uptime" refers to not-in-suspend execution time. Brian --
On Thu, 27 May 2010 21:55:26 -0700 Hi! Thinking about the issue a little more, this isn't really about trusted apps and not trusted apps. Or crapplications. The point is, that as soon as an app takes a suspend-blocker it becomes what is here referred to as a "trusted app". But just because it is then visible as consuming power in an official way. Android suspends (as in echo mem > /sys/power/state) whenever possible. It's as if there were a spring on the laptop lid, and if the user doesnt hold his grip on it, the thing closes. How does he hold his grip? The application registers a suspend-blocker for him. So, why not use something like idle/QOS with this? I can imagine to theoretically have a "latency requirement" where 0 means this application does not interact with the user. and != 0 means this application interacts with the user. ("latency requirement" doesn't quite get it, but it works for now) In android land, the default would be that every application has a latency-requirement of 0. And then everything (userland) that takes a suspend-blocker would be changed to take a "latency requirement != 0". Now, if the system interacts with the user ( i.e. there is a global latency requirement > 0, where "global latency requirement" is computed by the pm framework maxing over all the userland processes and the kernel side) everything has to run. So we also need to schedule things which specify a latency requirement == 0. This last thing means, that it has to be independent of the scheduler, doesn't it? I don't see how renaming suspend_blocker to set_pidle would not do something equivalent to this, but the bit's are probably a bit scattered throughout the kernel. (Which I don't think is introduced by that patch set, but by the fact that suspend is currently not an idle state.) I can understand if there needs to be a good solution in the kernel from day 1. So, what would compose to a good solution? Here should probably the more experienced ...
Android does not only run on phones. It is possible that no android devices have ACPI, but I don't know that for a fact. What I do know is that people want to run Android on x86 hardware and supporting suspend I think existing laptops (and desktops) can benefit from opportunistic suspend support. If opportunistic suspend is used for auto-sleep after inactivity instead of forced suspend, the user space suspend blocker api will allow an application to delay this auto sleep until for instance a download completes. This part could also be done with a user-space IPC call, but having a standard kernel interface for it may make it more common. A less common case, but more critical, is RTC alarms. I know my desktops can wakeup at a specific time by programming an RTC alarm, but without suspend blockers how do you ensure that the system does not suspend right after the alarm triggered? I have a system that wakes up at specific times requested by my DVR application, but I cannot use this system for anything else unless I manually turn off the DVR application's auto-sleep feature. With suspend blockers and something like the android alarm driver, I could use this system for more than one application that have scheduled tasks and it would be more usable for interactive applications. -- Arve Hjønnevåg --
Sufficently beneficial to justify putting all this stuff all over the kernel and apps ? That is a *very* high hurdle, doubly so when those vendors who have chosen to be part of the community are shipping phones This assumes you modify all the applications. That isn't going to happen. How do you know that isn't the correct behavior. My laptop behaves in that way if for example the battery is almost flat. Your suspend blocker would cause me to lose all my work with a flat battery. This is another example of why the application must not be the policy manager. In the normal case in the PC world outside of corner cases like flat batteries the answer is really simple. The laptop suspend to RAM on idle intervals set in the BIOS and the like are sufficient that progress will have been made before it considers going back to sleep again. Right now its about ten seconds in each direction plus other costs (wear on LCD backlight, disc parking etc). Alan --
No it does not. You only have to modify the applications were you want If the inactivity timeout happens to expire at the same time as my alarm that would wake up the system to run my scheduled task if it was already suspended my sceduled task will not run when scheduled. How I'm not sure what you are trying to say here. Are you saying your laptop enters S3 from idle? -- Arve Hjønnevåg --
If I have an alarm set on my laptop it will wake up when the alarm goes off. Once it has woken up it will not go back to suspend (except for something libe a battery event) until a timeout has elapsed that began when the laptop woke up. This in the laptop work solves the problem of making progress. On a laptop power budget, with laptop constraints on suspend (both physical cycle limits of hardware and performance) this works fine. If I suspend/resume my laptop every time I have a 30 second idle gap I will need a new laptop much sooner than makes me happy. I don't claim this is true for a typical mobile phone obviously. Alan --
Forced suspend is still supported. No new API is needed if you really I think you are missing the point. It works fine if the alarm caused the wakeup, but if you had just used your system and your inactivity timeout expired just as your alarm goes off, the alarm will not wake Then don't set your inactivity timeout to 30 seconds. I don't see how The only difference on the phone is that we have way more wakeup events which makes the race conditions more visible. The race exist on your laptop as well. -- Arve Hjønnevåg --
As far as I can tell (and its an extremely hard situation to replicate), this is not true. My laptop sleeps and wakes straight back up. The following cannot occur on my laptop for simple idling Alarm Suspend because the Alarm resets the suspend timer when it is delivered. The wake pins and wake logic also ensure that the sequence Suspend Alarm always causes Suspend Alarm Suspend Finishes It's very relevant because it means that considering current laptops is The number of events is I think only partly relevant. What matters is how long you wait between idle and suspending. The longer you wait the less potential you have to end up with an event successfully owned by an application you are not considering relevant to suspend. --
Userspace is about to write to /sys/power/state when it gets scheduled. Alarm delivery occurs at that instant. Kernel has no idea that it's about to go to sleep, so the driver handles things appropriately and clears the hardware state. Userspace gets scheduled, writes and the system suspends. The problem is that having userspace decidie to initiate a suspend and then actually initiate a suspend isn't an atomic operation. -- Matthew Garrett | mjg59@srcf.ucam.org --
Ok lets try and produce something more concrete. The control groups may
be the wrong tool but we've got several such tools already
Kernel involved
----------------
acquire: mark myself important (into cgroup important)
acquire(timeout) ditto, plus app timer/timeout handler
release: mark myself unimportant (into cgroup downtrodden)
All user
--------
isHeld: app implementation internal
setReferenceCounted: app implementation internal
In the idle manager [Androids own probably]
if (member of ignored cgroup && in user space)
ignore for idle purposes
In the Android code managing this [Android specific bits of
probably userspace]
mark downtrodden as ignored
mark downtrodden as not ignored
[Total kernel changes
Ability to mark/unmark a scheduler control group as outside of
some parts of idle consideration. Generically useful and
localised. Group latency will do most jobs fine (Zygo is correct
it can't solve his backup case elegantly I think)
Test in the idling logic to distinguish the case and only needed
for a single Android specific power module. Generically useful
and localised]
So I put my phone down
The UI manager gets told the phone is 'down'
Ten seconds later it is still down
It marks the downtrodden group as 'ignored'
The idle logic goes
Nothing to run powersave
Still nothing
Ooh 0.3 seconds of nothing
Drop into suspend state
If I push the button we get an IRQ
We come out of power save
The app gets poked
The app may be unimportant but the IRQ means we have a new timeout of
some form to run down to idle
The app marks itself important
The app stays awake for 60 seconds rsyncing your email
The app marks itself unimportant
Time elapses
We return to suspend
If you are absolutely utterly paranoid about it you need the button
driver to mark the task it wakes back as important rather than rely on
time for response like everyone else. That specific bit is uggglly but
worst case its just a google ...I really don't like this.. Why can't we go with the previously suggested: make bad apps block on QoS resources or send SIGXCPU, SIGSTOP, SIGTERM and eventually SIGKILL? --
On Fri, 28 May 2010 14:30:36 +0200 Ok. Are you happy with the QoS being attached to a scheduler control group and the use of them to figure out what is what ? --
Up to a point, but explicitly not running runnable tasks complicates the task model significantly, and interacts with fun stuff like bandwidth inheritance and priority/deadline inheritance like things -- a subject you really don't want to complicate further. We really want to do our utmost best to make applications block on something without altering our task model. If applications keep running despite being told repeatedly to cease, I think the SIGKILL option is a sane one (they got SIGXCPU, SIGSTOP and SIGTERM before that) and got ample opportunity to block on something. Traditional cpu resource management treats the CPU as an ever replenished resource, breaking that assumption (not running runnable tasks) puts us on very shaky ground indeed. --
Also, I'm not quite sure why we would need cgroups to pull this off. It seems most of the problems the suspend-blockers are trying to solve are due to the fact of not running runnable tasks. Not running runnable tasks can be seen as assigning tasks 0 bandwidth. Which is a situation extremely prone to all things inversion. Such a situation would require bandwidth inheritance to function at all, so possibly we can see suspend-blockers as a misguided implementation of that. So lets look at the problem, we want to be frugal with power, this means that the system as a whole should strive to do nothing. And we want to enforce this as strict as possible. If we look at the windowing thing, lets call it X, X will inform its clients about the visibility of their window, any client trying to draw to its window when it has been informed about it not being visible is wasting energy and should be punished. (I really wish the actual X on my desktop would do more of that -- its utterly rediculous that firefox keeps animating banners and the like when nobody can possibly see them) Clearly when we turn the screen off, nothing is visible and all clients should cease to draw. How do we want to punish dis-obedient clients? Is blocking them sufficient? Do we want to maintain a shitlist of iffy clients? Note that the 'buggy' client doesn't function properly, if we block its main event loop doing this, it won't respond to other events -- but as argued, its a buggy app, hence its per definition unreliable and we don't care. Next comes the interesting problem of who gets to keep the screen lit, I think in the above case that is a pure userspace problem and doesn't need kernel intervention. Can we apply the same reasoning to other resources, filesystems, network? For both of them it seems the main governing body isn't this windowing system, but the kernel (although arguably you could fully do it in middle-ware, just like X is that). But in both cases I think we can work with a QoS ...
On Fri, 28 May 2010 16:59:54 +0200 An interesting thought might be to add the costs of staying in a state versus going to a lower power state into consideration. If the system is busy doing stuff it would need to do anyway (today stuff that is guarded/annotated by the suspend blockers) , the costs for not being in suspend have to be paid anyway. So it is opportune for processes to run. Even if they by themselves would not justify the system running. If instead nothing system-relevant has to be done, the costs of running anything non-relevant is the full amount of battery-life that could be saved by suspending + (some minor) running costs. Also if there is much work to do (many tasks) its more likely that it's good to do the work. something along the lines : (amount of energy saved by being in suspend) / (number of tasks we would run if we werent suspended) * some_parameter_for_this_tasks_importance (which falls clearly into scheduler-territory) And if this goes above some threshold we run it. But this isn't easily done in a robust way. Also it complicates things. Cheers, Flo --
I think this is a matter of what is regarded as a "runnable task". Some tasks may not even be regarded as runnable in specific power conditions, although otherwise they would be. Consider updatedb or another file indexing ... thing on a laptop. I certainly don't want anything like this to run and drain my battery, even if it has already been started when the machine was on AC power. Now, of course, I can kill it, but for that I need to notice that it's running and it presumably might have done some job already and it would be wasteful to lose it. It would be quite nice if that app was not regarded as runnable when the system was on battery power. In my view that's quite analogous to the Android situation, when they simply don't want some tasks to be regarded as runnable in specific situations. Rafael --
How will a ionice on steriods that will defer servicing IO when the IO system QoS limit doesn't meet the updatedb process's level is too low, not solve this? In that case the updatedb process will simply block on IO, will hence not be runnable and thus not drain your battery. --
It will only work for apps that use I/O, but there may be purely CPU-bound ones that need that kind of approach too. --
<- wakeup event that should be delivered to untrusted app arrives here At this point you may mark the downtrodden group as ignored between the untrusted app receiving the event and the untrusted app marking itself as important. To avoid this you need the UI manager to receive every (The cgroup has to have some awareness of suspend/resume so that it can The timeout-based nature means that if the application doesn't get scheduled for some reason (say there's heavy swap pressure - not likely in the embedded world, but an issue on laptop-type devices) the event may not be handled before you get back to sleep. I accept that this isn't likely to be a problem in the real world, but it does make this Not just the button driver. Every driver that generates wakeupa. This gets difficult when it comes to the network layer, for instance, when the network driver has very little idea how the packet it just received The problem is that you still have a race, and fixing that race requires every event that could generate a wakeup to be proxied out to the policy manager as well. That's a moderate additional overhead. -- Matthew Garrett | mjg59@srcf.ucam.org --
The event wakes the device, the event itself means the kernel is doing bits so the kernel is active and we are not idled so we have a time before we will consider re-suspending. [If you accept that untrusted apps must be constrained then you can't allow one to mark itself important - or at least you can't listen to it I don't think so. The apps will get scheduled anyway when not suspended. The only reason they are not being scheduled is that the device is No. Every driver which generates wakeups which should wake an untrusted application. If network packets to untrusted applications should wake the box up then a simple background ping process left running is going to drain your battery and bugger your containment of the mess completely as you've just accepted an infinite supply of untrusted timed wakeup events I am not convinced at this point. If the app gets put into the important group by the driver then you don't need to poke a policy manager. This again moves us beyond containment because we just allowed an 'untrusted' app a way to be trusted - just as it might abuse a suspend blocker. If you accept untrusted apps can't be fixed (for example they could simply lose the event internally due to app code bugs) then the static case all looks pretty trivial. With a Meego hat on you'd dump all the stuff you didn't trust into a scheduler group and tell the suspend aspect of the idle choice to ignore it when the screen blanks. While you are it you also get a free ticket to putting trusted rt apps into the 'and don't even C6' group. Alan --
Ok, I think I've misunderstood you. You're actually saying that only applications that are trusted to behave well are allowed to receive wakeup events? Yes, that makes implementation significantly easier. If that maps reasonably well onto the existing Android application space, it may even be an acceptable compromise. -- Matthew Garrett | mjg59@srcf.ucam.org --
To receive them in a manner that they are permitted to defer a suspend. There is non reason why bouncing cows shouldn't get to see an event, but there is always the miniscule possibility that we choose to suspend as it gets the event. That to me seems fine. Our starting basis was - Bouncing cows is not trusted Android's reaction was - We reserve the right to suspend bouncing cows where it likes it or not The caveat becomes - Bouncing cows may get suspended then get an event when the phone wakes back up. So I might press "Moo" just before a suspend and get the noise when it resumes. Given the untrusted cows could respond to the event otherwise by blocking the suspend for as long as permitted with a suspend blocker or similar that seems no worse. In this case probably better [oof zap! as opposed to 60 seconds of 'event, no sorry got a cow to draw at 100% CPU') As the app is untrusted we can't assume they would get suspend blockers right even if they had any. You can still be nice to the cows app and when the phone is put down send it a 10 second warning via dbus or Android equivalents. Your trusted call handling app can still request (by QoS or big hammers) that the phone does not suspend even if the app goes idle (because you have a wakeup latency QoS) A naïve trusted app will behave according to power management idling to suspend and get stopped A naïve untrusted app that is doing sane things will spend most of its life asleep and behave. --
May we somehow live without acquire(timeout)? This is the feature that can screw up a lot of things with very complicated debugging options. ~Vitaly --
I'm not sure "other people are shipping without them" is such a good metric, especially for scheduler features. For some reason (I have some ideas what it might be, but I won't speculate here) people don't like messing with the scheduler in mainline, even though there's a lot of special cases where a bit of messing with the scheduler (or replacing it outright) goes a long way toward qualitatively improving performance on some workloads. I'd love to have several more ways to have large classes of processes stop executing, and stay stopped, even though traditional Unix and mainline Linux would try to run them. I don't want to put knowledge of this into every application I run since there are literally thousands of them, and IMNSHO it's not even an application's responsibility to know this kind of thing. The "sort" program can't know what QoS to ask for in any sane system design. The best it can do is try to execute as hard as it can whenever the kernel lets it, and have some other application advise the kernel about how much or how little service (including cases like "no service at all") the sort program should get from the system. To choose a random example, I'd like a "duty cycle" constraint on process execution (i.e. a runnable task must execute between L and M ns per N ns interval--stealing slices from lower priority processes if it doesn't get enough and isn't blocked on I/O, and leaving the CPU idle even though the process is runnable if it gets too much). I usually want to apply this kind of limit to programs like Firefox, because Firefox is a) big enough that controlling it actually matters for power consumption, b) sensitive enough to user interaction latency that I want it to have fairly high CPU priority when it has something to do, and c) big and complex enough that I wouldn't want to try to adjust its behavior by modifying its source. Also, Firefox's behavior tends to be driven by the data it pulls from random web sites, over which I have no ...
(If there's a sane framework then we'll fix x86 to fit into it and will deal I really like the level of detail and care that went into suspend-blockers, and i think the Android solution is very mature in terms of functionality offered to users. In terms of bringing this depth of functionality and control to the upstream kernel, what do you think about Alan's QoS scheme, described in: <20100528001514.28e593ef@lxorguk.ukuu.org.uk> ? It's in essence suspend-blockers on steroids. It consists of two main components: - Unify the 'suspended' state into the regular chain of idle states, and create a single, coherent and transparent way we handle system idleness. - Give apps a QoS attribute that allows them to express how long they can afford to wait for a wakeup. (A downloading app would set it to say 50msecs, and thus the kernel would know it automatically which method of idleness is still achievable. If all currently running apps have a max(QoS) attribute of infinite, then the kernel can suspend for an unlimited amount of time.) AFAICS, and i have read through your suspend-blocker usecases, this should handle all the usecases you listed - and some more. (please yell if that's not so) Suspend-blockers are equivalent to: 'app sets idle QoS latency to 0 msecs'. (And on x86, for BIOS/CPU combos that allow it we can implement this scheme too.) Thoughts? Ingo --
Tying the QoS attribute to apps does not work (all proposals I have seen have race conditions), but replacing every suspend blocker with unique QoS object will work, since is the same thing as what suspend blockers provide. I think replacing suspend blockers with artificial latency requirements is a bad idea though, since we use them to ensure a specific level of functionality (tasks, timers and interrupts operate normally). If we get a more generic constraint framework, suspend blockers may possibly be absorbed by this, but I think the current implementation is useful as is (it could even be useful to someone working on a generic constraints framework). -- Arve Hjønnevåg --
