Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

Previous thread: Congratulations... by INFO on Thursday, May 27, 2010 - 2:33 pm. (1 message)

Next thread: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8) by Alan Stern on Thursday, May 27, 2010 - 2:49 pm. (2 messages)
From: Alan Stern
Date: Thursday, May 27, 2010 - 2:38 pm

Okay, and the kernel never suspends.  We _are_ talking about a kind of

No, it won't.  That's the problem suspend blockers were meant to solve.  
The event winds up sitting in a kernel queue, the PM core doesn't know
about it (that's what I meant above -- the PM core doesn't know as much

It would be okay if that happened.  But once the event gets into the
kernel and the hardware IRQ source has turned off, there's nothing to

Agreed.  Badly behaved apps must not be allowed to block suspends.  As 
far as I'm concerned, we can ignore them.

Alan Stern

--

From: Alan Cox
Date: Thursday, May 27, 2010 - 3:08 pm

On Thu, 27 May 2010 17:38:03 -0400 (EDT)

No ? We are talking about just letting power management solve the whole

No - because we are not forcing the suspend. The app must go idle. If you
force the suspend of running processes then yes the entire thing goes

Read the discussion about how the race is avoided at the hardware level.
That race is I think not there and furthermore most drivers get it right
already.

There are several cases

1.	IRQ during app layer (ie policy in user space) asking
		applications to go passive

	- The event occurs, we undo the app layer policy, easy
	  (or app wakes process and we let it fall through)

2.	IRQ after the app layer quiesces its clients

	- The task wakes, the app layer won't see it - the app layer
	  allows suspend as an idle mode. Not a problem - the app is
	  running the cpu policy manager will see this and not suspend
	  until the app has been asleep for a bit. The app may well of
	  course tell the UI layer 'hey I want you back on' and it take
	  you back to the full on case.

3.	IRQ after kernel suspend begins

	- The driver will refuse the suspend, we don't suspend, we unwind
	  the resume so far, the app wakes, we propogate stuff back up to
	  user space whose policy manager unwinds its position

4.	IRQ after driver has done its final checks

	- Wake up lines are set
	- We suspend
	- We immediately get resumed
	- We follow the full resume path

This is I believe robust (and has been implemented on some non x86
boxes). It depends on not forcing running tasks into suspend. That is the
key.

Alan
--

From: Matthew Garrett
Date: Thursday, May 27, 2010 - 3:09 pm

We've already established that ACPI systems require us to force running 
tasks into suspend. How do we avoid the race in that situation?

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Alan Cox
Date: Thursday, May 27, 2010 - 3:23 pm

On Thu, 27 May 2010 23:09:49 +0100

Android phones do not have ACPI. Embedded platforms do not have ACPI. MID
x86 devices do not have ACPI.

I would imagine the existing laptops will handle power management limited
by the functionality they have available. Just like any other piece of
hardware.

Alan
--

From: Matthew Garrett
Date: Thursday, May 27, 2010 - 3:36 pm

It doesn't matter. Right now there's a race condition in terms of 
wakeup events on ACPI systems. What's your proposal for fixing that?

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Alan Cox
Date: Thursday, May 27, 2010 - 3:55 pm

On Thu, 27 May 2010 23:36:05 +0100

I see it as a different problem  - and one that seems to be minimally
pressing to most users jduging by the amount of noise it hasn't caused in
the past seven odd years.

This started because the Android people came to a meeting that was put
together of various folks to try and sort of the big blockage in getting
Android and Linux kernels back towards merging.

I am interested right now in finding a general solution to the Android
case and the fact it looks very similar to the VM, hard RT, gamer and
other related problems although we seem to have diverged from that logic.

I dont think it particularly useful to go off on a mostly unrelated wild
goose chase into ACPI land, especially one based on a premise of changing
all the apps when the hardware will end up fixed faster.

Alan
--

From: tytso
Date: Thursday, May 27, 2010 - 9:31 pm

Keep in mind, though, that a solution which is acceptable for Android
has to include making sure that crappy applications don't cause the
battery to get drained.  There seem to be some people who seem
adamently against this requirement.  From the Android folks'
perspective, this is part of what is required to have an open app
store, as opposed to one where each application has to be carefully
screened and approved ala the Apple iPhone App Store.

Maybe it would be acceptable if there were an easy way THAT A USER AND
NOT A DEVELOPER COULD USE ON A SMART PHONE to find the bad
application, but realistically, it's much better if the solution can
work well even in the face of crappy application.  Having interacted
with application programmers, I can assure you there are a lot of
crappy application programmers out there, and they vastly outnumber us
kernel developers.  (See as exhibit A all of the application programs
who refuse to use fsync, even though it's going to wipe them out on
all new modern file systems, including btrfs.)

We need to agree on the requirements up front, because otherwise this
is going to be a waste of everyone's time.

And if we can't get agreement on requirements, I'd suggest appealing
this whole thing to Linus.  Either he'll agree to the requirements
and/or the existing implementation, in which case we can move on with
our lives, or he'll say no, in which case it will be blately obvious
that it was Linux developer community who rejected the Android
approach, despite a fairly large amount of effort trying to get
something that satisfies *all* of the various LKML developers who have
commented on this patch, and we can continue with Android having
kernel which is different from mainline --- just as many other
embedded companies have patches which are utterly required by their
products, but which have been judged Too Ugly To Live In Mainline ---
and we can also move on and get on with our lives.

						- Ted

P.S.  Keep in mind how this looks from an ...
From: Alan Cox
Date: Friday, May 28, 2010 - 2:37 am

Ted if you are speaking for Android do you think you should post from a

The other vendors appear to be managing nicely without magic blockers. I


The existing implementation has been comprehensively rejected by half the
x86 maintainers and scheduler people to start with. That's a fairly big

Ted save the politicing and blame mongering for management meetings
please.

If we don't have a solution it means that between us we couldn't find a
viable solution. Maybe there isn't one, maybe we missed it. It's as much
'google rejects kernel approach' as 'kernel rejects google approach' and
more importantly its actually 'we (cumulative) were not smart enough

In some cases it is easier to do stuff yourself than work with others.
One of the conditions of working in a public space is that you do so
without harming others. This is why in much of the western world you can
drive a car around your own land without a licence but must have one to
drive on a public road. This is why a restuarant must meet different food
standards to a home kitchen. This is why the kernel standards are higher
than what you go off and do in private.

Android is a very unique and extremely narrow environment. If it really
is special enough to need its own kernel fork it isn't the first case for
that and it's not a problem. The GPL is quite happy to encourage this.
Time will then answer the questions because in 3 years time either every
non Google phone will be kicking butt without suspend blockers, or every
phone vendor using Linux with a traditional user space will be demanding
them.

Alan
--

From: Matthew Garrett
Date: Friday, May 28, 2010 - 4:41 am

Actually, no. A badly behaved application will kill the N900's battery 
life. Nobody else has "managed nicely" - they've just made life harder 
for application developers and users, which may have something to do 
with the relative levels of market adoption of Maemo and Android. I'm 
not aware of any form of resource management framework in MeeGo either, 
so as far as I know it'll have exactly the same problem.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Igor Stoppa
Date: Friday, May 28, 2010 - 5:26 am

It's true that a braindead app can kill the battery.

However we provide a version of powertop that is tailored to the N900, 
there is a nokia energy profiler meant to give graphical representation 
of the battery current, there is htop available and you can even get the 
processor activity visualized on the leftmost and rightmost keyboard 
backlight LEDS, when in RD mode and with screen blanked.

I would advice you to not start debating on company strategies, this is 
not the right place.

Otherwise I'll have to ask what's the expected threshold of devices sold 
with broken sw design to get automatic admission into the mainline 
kernel source tree.

But this is not the direction we want to take.

Notice also that we _do_ have a store and official repository where apps 
are monitored for sanity, also with feedback from users and their help 
to promote new apps to trusted state.

Former Maemo 6, now MeeGo do introduce resource management from security 
POV, but that will also have the side effect of discriminating between 
signers.

igor
--

From: Brian Swetland
Date: Friday, May 28, 2010 - 5:52 am

At a certain point, if one side of the argument is using "N900 / OMAP3
works just fine as is" (which has certainly been the case stated by a
number of folks throughout these discussions), I think it's a little
unrealistic to express shock that somebody argues the opposing point.

I've personally avoided commenting on specific power management issues
or properties of competitive platforms because it can easily be viewed
as rather rude or unprofessional.  (though in theory we all could
benefit from any improvements to the kernel regarding power
management, no?).

I am quite willing to state that on both MSM and OMAP based Android
platforms, we've found that the suspend blocker model allows us to
obtain a lower average power draw than if we don't use it -- Mike Chan
provided some numbers earlier in another thread in the trivial device
idle case, the win is of course much larger in the case of several
poorly behaved apps being active.

I do think that everyone involved agrees that it is beneficial to
educate users and developers in hopes that users will understand that
some apps are non-optimal and developers will be encouraged to write
better apps.

I think we also all agree that striving to obtain the lowest power
state at all times through cpu frequency scaling, runtime pm, drivers
that aggressively clock/power down when idle, etc is a worthy goal.
Some have argued that suspend blockers may deter further development
in these areas, but I think this is unlikely -- power usage while the
device is active and the user is interacting with it is just as
critical as when it's not being used interactively.  We (Android)
certainly pursue aggressive low power optimization in both states.

There appears to be some disagreement in terms of what one should do
in the face of poorly behaved applications.  The Android approach has
been to both gather as much data as possible for education of user and
developer and to mitigate the impact of poorly written apps on
endusers, goals which are ...
From: Igor Stoppa
Date: Friday, May 28, 2010 - 6:32 am

The problem lies in the definition of the goal and means to achieve it.
We do rely on repositories to discriminate on the quality of applications.
As I stated some are accessible and run by our community.


What I consider plain wrong i to claim that since there are this many 
units out, some code should be merged.
A company needs to cut corners sometimes when making a product but this 

That's very good. But if it is done in a conceptually flawed way, some 
better solution should be considered for upstream merge.


Sure.

I simply disagree on the methods proposed (suspend_blockers) and some of 
the rationale used for promoting them (volume of otherwise unsupported 
units).

igor
--

From: Brian Swetland
Date: Friday, May 28, 2010 - 6:27 am

I've never suggested that we should get a get-out-of-code-review-free
card or be automatically merged based on shipping volume.

Hell, I never thought we should even bother trying to merge wakelocks
upstream, because I assumed that they'd be hated for not being the
linux way (tm).  Greg KH and others have spent a bunch of time
shouting at me (or Google) that we should be doing this, and here we
are giving it a go.  At this point we've spent more engineering time
on revising this one patchset (10 revisions to address various rounds
of feedback) and discussion of it than we have on rebasing our working
kernel trees to roughly every other linux release from 2.6.16 to

I will disagree that wakelocks are "cutting corners" (we certainly
have some corner cutting code in our trees, because yeah, ship is
compromise, but I don't believe wakelocks are an example).  They're a
real solution for real problems faced on real devices.  Obviously not
a solution that everyone here likes, and maybe they'll never end up in
mainline as a result, but so far I haven't seen a counter proposed
solution that seems to solve the same problem, avoid races, and be

How is it flawed?  Serious question.

Brian
--

From: Igor Stoppa
Date: Friday, May 28, 2010 - 7:12 am

I would avoid repeating all the good arguments given so far, but to make 
it short:

* I believe runtime PM is a much better starting point (at least for the 
type of HW targeted at mobile devices) because it mimics an always-on 
system toward userspace, which requires less disruption in the way apps 
are designed

* QoS is closer to the apps pov: fps if it is a media player or a game, 
transfer speed if it is a file manager, bandwidth if it is a network 
app, etc
The app is required to express its opinion by using a format that it 
understands better and is less system dependent.
Actually the kernel should only be concerned with 2 parameters at most 
for any given operation: latency and bandwidth/throughput

* Some form of resource management is needed as trust mechanism to 
discriminate "trusted" vs untrusted apps that can give reliable info 
(but in your case you should give trust to whom prevents the suspend)

* Most of this could be done in userspace with the kernel merely 
providing the means to enforce the decisions taken by the userspace manager.

* The kernel wouldn't even have to try to outsmart the "evil application 
writer"

igor
--

From: Felipe Contreras
Date: Friday, May 28, 2010 - 4:42 pm

I agree.

If I understand correctly, if we have a perfect user-space that only
does work when strictly needed and trying to do it in bursts, then we
would be reaching the lowest power state, and there would be no need
for suspend. The problem is that Android's user-space is pretty far
from that, so they said "let's segregate user-space and go to lower
power mode anyway".

If that's true, then this problem can be fixed in user-space, and in
fact, it already is on N900. Good behaving applications are
asynchronous, use g_timeout_add_seconds() to align bursts of work at
the same second intervals, and don't do polls directly, but use GLib's
mainloop. Same as in GNOME desktop. It seems there are other methods
to align multiple processes for longer periods of time, but that code

I think this information can be obtained dynamically while the
application is running, and perhaps the limits can be stored. It would
be pretty difficult for the applications to give this kind of
information because there are so many variables.

For example, an media player can tell you: this clip has 24 fps, but
if the user is moving the time slider, the fps would increase and drop
very rapidly, and how much depends at least on the container format
and type of seek.

A game or a telephony app could tell you "I need real-time priority"
but so much as giving the details of latency and bandwidth? I find
that very unlikely.

Cheers.

-- 
Felipe Contreras
--

From: Florian Mickler
Date: Saturday, May 29, 2010 - 1:28 am

On Sat, 29 May 2010 02:42:35 +0300

This has already been mentioned (who knew?): Android doesn't
want to depend on userspace for this.

Cheers,
Flo
--

From: Florian Mickler
Date: Saturday, May 29, 2010 - 1:56 am

On Sat, 29 May 2010 10:28:19 +0200

--

From: Igor Stoppa
Date: Sunday, May 30, 2010 - 10:55 pm

I doubt that belongs to typical QoS. Maybe the target could be to be 

from my gaming days the games were still evaluated in fps ... maybe i 
made the wrong assumption?

A telephony app should still be able to tell if it's dropping audio frames.

In all cases there should be some device independent limit - like: what 
is the sort of degradation that is considered acceptable by the typical 
user?

Tuning might be offered, but at least this should set some sane set of 
defaults.

igor
--

From: Felipe Contreras
Date: Saturday, June 5, 2010 - 9:58 am

I'm not sure what you mean. I-frames comes usually one per second, so
if you only decode I-frames, your experience would be really bad.
Moreover, you don't know beforehand when an I-frame is coming, only
when it's there, and some clips can have only one I-frame at the

Yes, the more fps, the better, but you calculate that by counting the
amount of frames rendered over a period of time; you know the fps

Yes, which could be unrelated to PM, like bad network conditions, but
yeah, it should also be able to tell if the problem is with the

It is easy to tell after the PM actions have been made, as in "wait!
I'm not able to perform gimme more power!". But I don't see how that
could be done _before_ the PM actions are done.

From all the QoS proposals I have seen here, and considering that some
people said that suspend blockers could be a specific case of QOS, I
don't think people have been considering QoS as something to state

Huh? Defaults in what units, based on what, and when and how to update?

Cheers.

-- 
Felipe Contreras
--

From: Alan Cox
Date: Friday, May 28, 2010 - 7:20 am

- It means changing drivers and quite a few apps
- It doesn't solve the problem of rogue apps if they end up owning locks
- It puts the deep knowledge of the platform in the applications
- It gives the apps control of the action taken not policy indication
- It doesn't resolve the problem of synchronization of take/releases
  stopping any suspend
- The kernel parts are not generically useful, merely effective for
  solving a specific problem right now - even things like VM migration
  to/from phones seems to break it
- It inverts the whole logic the kernel is following and trend it is
  following that suspend is simply a very deep idle (with implementations
  merged)

If it was a localised turd I wouldn't worry. There are plenty df deep
unmentionables hidden away enirely in platform specific code that deal
with everything from stoned hardware engineers to crazed software stack
implementations.

Here is a question back the other way perhaps

- If the existing kerne was almostl entirely read only, or you had to pay
  a large fee per line of code changed outside your own driver how would
  you implement the wakelock/suspend blocker API ?

Because if you take the path that 'we want wakelockers' that is
essentially the question you have to answer. How do you merge it so that
nobody outside of your driver and maybe a spot of arch code knows about
it. You are permitted a couple of sneaky substitions of core function
bits in headers.

Right now bits are going to leak out over the kernel which is the cause of
friction. At the point it's invisible to everyone else they cease to be
stakeholders so you don't have keep them happy. You've only got a couple
in your patches but its painfully obvious from Matthew and your comments
you'll end up needing a ton more and these will get everywhere as Android
grows hardware platforms and CPU support as phones become more featureful
and PC like. The moment a phone grows a USB base station with hub for
example the entire USB stack becomes ...
From: tytso
Date: Friday, May 28, 2010 - 6:39 am

Linus will disagree with you there.  Linus *has* merged code on the
basis that it is shipping in distributions, regardless of the fact
that some developers objected to it.  Sometimes "perfect" should not
be the enemy of "good enough" shipping code.

For example, I used to point out that we shipped PCMCIA code in
mainline that had a 10% chance of crashing the system if you ejected
the card.  NetBSD was proud to say that their code was so iron-clad
and well designed that it always did the right thing, even if you
ejected while it was busily passing network traffic.  Unfortunately,
NetBSD had working PCMCIA support 3 years later than Linux.  So it
used to be that we were the technical pragmatists (and Linus
fortunately, still very much is the pragmatists, while others were the
hard-line perfectionists.  It seems to me we've started getting some
of the NetBSD attitude infecting LKML, and IMHO, that's unfortunate.

We've rewritten our networking stack, 3 or 4 times, depending on how
you count.  And sometimes shipping in products counts for a lot.  It
doesn't count for everything, and it isn't a get-out-of-jail card, for
sure.  But if it's a hard problem, and we have something that's good
enough, maybe the right call is to merge it now, and we'll rework
things to make something better and more general later.  Ultimately
that's a call only Linus can make.

If everyone agrees we're making progress, and we can let this 100+
mail thread keep going.  But if anyone feels that we are spinning
endlessly without making forward progress (which is after all the same
criteria the OOM killer uses, no? :-), people should remember that
sometimes Linus *has* ended arguments that have gone on too long by
making a "merge or kill" decision.

						- Ted
--

From: Igor Stoppa
Date: Friday, May 28, 2010 - 7:14 am

I have seen very good proposals for saner solutions.

Is that progress?

igor
--

From: Matthew Garrett
Date: Friday, May 28, 2010 - 7:21 am

The proposals so far involve either redefining the problem space or 
being inherently racey. It may be that we can redefine the problem space 
in such a way that everyone's happy, but it's not possible to do so by 
fiat.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Brian Swetland
Date: Friday, May 28, 2010 - 7:29 am

I think the suggestion that has the closet fit with what we're trying
to accomplish is Ingo's (or perhaps Ingo's explanation of Alan's):
http://lkml.org/lkml/2010/5/28/106 where it's implemented as a
constraint of some sort.

Arve points out that qos constraint objects could work (but not if
specifically tied to apps): http://lkml.org/lkml/2010/5/28/120 though
he suggests that "latency" constraints don't represent this as well as
"state" constraints.

Though if you look at it that way, then suspend_blockers become qos
constraint objects, but their implementation and usage remain pretty
much the same as we have now, which does not address Alan's concern
regarding code turning up in drivers, etc.  I'm not sure how you can
solve this problem (avoiding races around entering/exiting the suspend
or suspend-like state) without having a means for drivers to prevent
entry to that state.

I need to think more about the cgroups approach, but I'm pretty sure
it still suffers from wakeup race situations, and due to the
complexity of userspace (at least ours), I suspect it would risk
livelock/deadlock/priority-inversion style issues due to interaction
between different processes in different groups.

Brian
--

From: Matthew Garrett
Date: Friday, May 28, 2010 - 7:41 am

I think the cgroups approach works if you assume that applications that 
consume wakeup events can be trusted to otherwise be good citizens. 
Everything that has no direct interest in wakeup events (except the 
generic Android userspace) can be frozen, and you can use the scheduler 
to make everything else Just Work. That's a rather big if, but you've 
got a better idea of the state of the Android app base than I do.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Alan Cox
Date: Friday, May 28, 2010 - 8:06 am

With latency you have an "I don't give damn" latency in your model which

I am much much less concerned about general expressions of constraint
appearing in drivers. One of my early mails gave a list of other
people/projects/problems that need them - from hard real time, to high
speed serial on low end embedded to virtualisation.

They fix a general problem in terms of a driver specific item. We end up
making changes around the tree but we make everyone happy not just
Android. Also we are isolating policy properly. The apps and drivers say
"I have these needs", the power manager figures out how to meet them.

Where it gets ugly is if you start trying to have drivers giving an app a
guarantee which the app then magically has to know to dispose of.

If you are prepared to exclude untrusted apps from perfectly reliable
event reporting (ie from finger to application action) that doesn't seem

Priority inversion with the cgroup case is like synchronization effects
with the suspend blockers - its a real ugly problem and one that is known
to be hard to fix if you let it happen so I agree there.

--

From: Brian Swetland
Date: Friday, May 28, 2010 - 8:13 am

I think Arve's concern was the representation of the "I care, but only
a little" or "just low enough to ensure threads must run" level which
is what suspend blockers would map to (low enough to ensure we
shouldn't halt the world but not necessarily implying a hard latency

That makes sense -- and as I've mentioned elsewhere, we're really not
super picky about naming -- if it turns out that
wakelocks/suspendblockers were shorthand for "request a qos constraint
that ensures that threads are running", we'll be able to get things

Yeah -- which is something we've avoided in the existing model with
overlapping wakelocks during handoff between domains.
- input service is select()ing on input devices
- when select() returns it grabs a wakelock, reads events, passes them
on, releases the wakelock
- the event subsystem can then safely drop its "should be running
threads" constraint as soon as the last event is read because it has
no queues for userspace to drain, but the overlapping wakelock

Currently in the Android userpace only trusted (system) apps can
directly obtain wakelocks -- arbitrary apps obtain them via rpc to a
trusted system service (which ensures the app has been granted
permission to do this and tracks usage for accountability to
user/developer).

Brian
--

From: Alan Cox
Date: Friday, May 28, 2010 - 9:31 am

That's why I suggested "manyana" (can't get accents for mañana in a
define) or perhaps "dreckly"[1]. They are both words that mean "at some
point" but in a very very vague and 'relax it'll happen eventually' sense.

More importantly it's policy. It's a please meet this constraint guide

Cool. I think they are or at least they are close enough that nobody will

I'm not sure avoided is the right description - its there in all its
identical ugliness in wakelock magic

If you treat QoS guarantees as a wakelock for your purposes (which is
just fine, drivers and apps give you policy, you use it how you like)
then you could write the paragraph below substituting the word
'guarantee' for 'wakelock' So in that sense the mess is the same because
in both cases you are trying to suspend active tasks rather than asking

The conventional PC model is 'we don't go back into sleep proper fast
enough for that race to occur'. It's hard to see how you change it. An
app->device "thank you for that event, I enjoyed it very much and have
finished with it" message moves the underlying event management and QoS

Clearly that would continue to work out.

Alan
[1] Dreckly being used in Cornwall, as one friend put it 'Like manãna but
without that dreadful sense of urgency'

--

From: Arve Hjønnevåg
Date: Friday, May 28, 2010 - 2:53 pm

This is the same as saying these two threads don't run often enough to
need a mutex around their critical section. Just because you have not

If each layer prevents suspend while it knows there are pending events
Yes you can do this, and it it how the android alarm driver works, but
we found the select()/poll(), block suspend, read event, process event
then unblock suspend sequence cleaner (especially for interfaces that
can return more than one event at a time). Kernel suspend blocker lets
you implement the alarm driver model, adding user-space suspend



-- 
Arve Hjønnevåg
--

From: Zygo Blaxell
Date: Friday, May 28, 2010 - 10:27 am

From my reading of this thread, there's a lot of overlap between
suspendblockers and constraints.  Many use cases are served equally
well with one or the other, except for one:  a case where an event that
should ultimately wake the system triggers a code execution path (or data
flow path) that wanders through a user-space full of complex interacting
processes where the kernel (and maybe even the processes) can't see it.

Suspend-blockers in user-space handle this by making such code/data paths
visible to the kernel.  An all-kernel constraint-based approach has no
way to see the user-space paths, so the system will end up trying to
sleep when it should be waking up.

Wait, what?  Surely all the user-space code handling such events is
running under a PM-QoS constraint that says "don't sleep if this process
is runnable," so the system won't go to sleep.  Presumably all other
processes which don't handle wakeup events will be running under a
PM-QoS constraint that says "do sleep even if this process is runnable."

That's true, except for one common case:  a process is drawing things on
the display on behalf of other processes, and that drawing process can't
have the "don't sleep" constraint because if it did the system would
seem to be continuously busy and never go to sleep.  Any process that is
handling a critical event but also needs to talk to the display process
will end up being not-runnable, and the system may go to sleep before the
display process wakes up.  So we need another PM-QoS constraint that says
"don't sleep even if this process isn't runnable, because some *other*
runnable process might do something that makes our critical process
runnable again."  The critical event handling app would switch to this
PM-QoS constraint until it had received an ack from whatever it talked
to in user-space, then switch back to the "don't sleep if this process
is runnable" state until a new event comes in.

So, three constraint policies should do it (*):

	1.  Do sleep even if ...
From: Peter Zijlstra
Date: Friday, May 28, 2010 - 11:16 am

If using suspend-blockers, 

Please explain to me how:

- I will avoid the cpu going into some idle state for which the wakeup
latency is larger than my RT app fancies?

- to avoid some tasks from being serviced by the filesystems whilst
others are? (ionice on steroids).

- does my sporadic task (with strict bandwidth budget) not suffer
bandwidth inversion?


suspend blockers do a bit of each of that, but none of it in a usable
fashion.

--

From: Zygo Blaxell
Date: Friday, May 28, 2010 - 12:51 pm

Oops, I apparently meant "many use cases *of suspendblockers* are served

...though I'd think you could do that by holding a suspendblocker, thus
preventing the CPU from going into any idle state at all.

There's four likely outcomes, corresponding to inclusion or non-inclusion
of suspend blockers and PM constraints in the kernel.  Both could coexist
in the same kernel, since a suspend blocker can be trivially expressed as
"an extreme PM constraint with other non-constraint-related semantics."

It's the "other non-constraint-related semantics" that seem to be the
contentious issue.  What can a suspend blocker do that a PM resource
constraint cannot do?  If that set contains at least one useful use case,
then we need either suspend blockers, or some other thing that provides
for the use case.

Lots of people want PM constraints, and I haven't seen anyone suggest
there should *not* be PM constraints in the kernel some day.  I've seen
a few "working and useful PM constraints aren't going to happen any time
soon" statements, and several "there's lots of stuff you still can't do
with PM constraints or suspend blockers" statements, but those aren't
arguments *against* PM constraints or *for* suspend blockers.

--

From: Vitaly Wool
Date: Saturday, May 29, 2010 - 1:43 am

Without the clear description of the experiments, that statement
proves just nothing other than your applications work better with your
model, but I would expect that to be so without any experiments at
all.

~Vitaly
--

From: Alan Cox
Date: Friday, May 28, 2010 - 6:54 am

On Fri, 28 May 2010 12:41:23 +0100

Maemo has battery management applications. Right now they show you what
is going on but haven't gone to a pop-up 'XYZ is eating all your battery'
kill it behaviour. The information is there.

If my phone eventually becomes a 1GB RAM PC class system I will be running
PC class apps on it and I will be migrating virtual machines to and from
my phone which have no idea about the device properties of each device
they migrate to and from.

Be that as it may the question of how you manage a naughty app is a good
one. Historically we've managed them for network abuse, memory abuse, cpu
use abuse, access rights, but not yet power.

Whether that looks like

	setrlimit(pid, LIMIT_CHARGE, 150mWH);

or
	setrlimit(pid, LIMIT_POWER, 150mW);

or something else is the question. I rather like the above but I don't
see how to implement them nicely at the moment.

Alan
--

From: Igor Stoppa
Date: Friday, May 28, 2010 - 7:28 am

Either way, this will require a detailed model of the system in terms of 
latency, throughput, current consumption and heat generation.

Which can be provided only by the HW manufacturer.

But, should such model be available (and we have some form of it for the 
OMAP3 in N900), then it can be abstracted through generic interfaces, 
which accept constraints and produce the selected target state 
(typically a vector of states for each sub component).

igor
--

From: Theodore Tso
Date: Friday, May 28, 2010 - 5:16 am

Maybe.  And perhaps the right solution in that case is to merge both, as opposed to "consign one to the outer darkness".   And I think that's a decision Linus should make.

I do hope we can come up with a better solution, eventually.  But I do want to point out as a process point of view, we do have other alternates other than "spinning endlessly".

-- Ted

--

From: Igor Stoppa
Date: Friday, May 28, 2010 - 5:49 am

Those apps were from an experimental repository, which is not enabled by 
default in stock SW.

Of course tools can be improved, but if someone decides to run sw which 
is clearly under heavy development, i see little point in complaining 
that it might not work as expected.

igor
--

From: Theodore Tso
Date: Friday, May 28, 2010 - 5:31 am

Well, yes, if the company strategy is to have a walled garden ala the Apple iPhone App store, life is much simpler.   But if the requirements mean that apps don't need preapproval, the requirements on the platform get harder.   I think the take-home here is we have a requirement that the platform behave well even without someone screening the applications for the "default SW repository".


--

From: Igor Stoppa
Date: Friday, May 28, 2010 - 6:30 am

No, the strategy is to try to merge commercial and community needs.

We do support signed repositories.

The community has control on the public one.

Members are encouraged to help by alpha/beta testing apps that are under 
development.

That's a wrong way to put it. By installing something on your phone you 

What it meant is totally different. Regardless how much effort you put 
into twisting it.
It means that different repositories provide different level of trust.

As Debian user, I don't blame anybody other than myself is something 
I've pulled from unstable or experimental breaks my system.
Debian by default doesn't ship with either unstable or experimental enabled.

And using suspend blockers doesn't really solve the problem of who to 
trust to take the block and who not.

Or we'll have to have suspend-blockers-blockers and so on ...

Like it or not, QoS and resource management - in some form - are needed 
to allow trusted application to provide valuable feedback, while 
filtering requests from untrusted applications.

You might want to add dynamic profiling and try to use some heuristic to 
have the system doing runtime evaluation of good vs bad applications, 
but still some discrimination mechanism will be required.


igor
--

From: Theodore Tso
Date: Friday, May 28, 2010 - 5:28 am

Sorry, miswording:   s/faster/less frequent/

I'm not convinced CPU activity LEDs help either, BTW.    It only takes the CPU getting crowbarred out of idle for a tiny amount of time before you start impacting battery life, and if the crapplication is only doing it every 30-60 seconds or so, I doubt you'd see it on the LED's....  that sort of thing might be acceptable if you have a 1-3 pound battery, but maybe much less so if you have a bettery which is cell-phoned sized.

-- Ted


--

From: Alan Cox
Date: Friday, May 28, 2010 - 2:53 am

Ted

As a PS to the previous email the situation has I think more choices than
you portray.

Given the need for various constraints imposed by drivers for things like
RT it's entirely possible that a solution ends up being something like


Kernel proper:
Turn suspend block kernel API into an expression of constraints (or
				whatever else seems to work)
Throw the user space in the bin

Google:
Use the constraints in a sledgehammer manner (hey it solves your problem
			in that form so why not)
Patch in a private user space API


That makes things much much easier as we don't risk getting a horribly
broken API into the kernel that is hard to remove, while hopefully
meaning its rather easier for google to merge drivers and other code as
well as to maintain a smaller patch set.

--

From: Peter Zijlstra
Date: Friday, May 28, 2010 - 12:11 am

Again, Alan, Thomas and myself don't argue against that, what we do
however argue against is suspend running apps as a form of power
management.

If you were to read Alan's latest posts he clearly outlines how you can
contain crappy apps.

A combination of weakening QoS guarantees (delaying wakeups etc.)
blocking on resources (delay servicing requests) and monitoring resource
usage (despite all that its still not idle) and taking affirmative
action (shoot it in the head).

If we pose that a well behaved application is one that listens to the
environment hints and idles when told to, we can let regular power
management kick in and let deep idle states do their thing.

If a bad application ignores those hints and manages to avoid getting
blocked on denied resources, we can easily spot it and promote an
attitude of violence toward it in the form of SIGXCPU, SIGSTOP, SIGTERM
and SIGKILL, possibly coupled with a pop-up dialog -- much like we get
today when we try to close a window and the app isn't responding.

If we then also let the environment maintain a shitlist of crappy apps
(those it had to take affirmative action against) and maybe set up a
service that allows people to share their results, it provides an
incentive to the app developers to fix their thing.

How is this not working?



--

From: Arve Hjønnevåg
Date: Friday, May 28, 2010 - 5:43 pm

You seem to argue that android is not allowed to use suspend because
the hardware we have shipped on can enter the same power state from
idle. From my point of view, since we need to support suspend on some
hardware we should be allowed to leverage this solution on the better

I have not seen any suggestions for how to deal with all our
interprocess dependencies when pausing a subset of processes. Without
a solution to that we can only pause a subset of the processes we want

These solutions do not allow us to use suspend. They may get us closer
to the power consumption we get from suspend on the good hardware or
even surpass it, but we still need suspend on some hardware, and we
would get event better results by using these solutions in addition to
suspend compared to using them instead of suspend.

-- 
Arve Hjønnevåg
--

From: Peter Zijlstra
Date: Saturday, May 29, 2010 - 1:10 am

Correct, I strongly oppose using suspend. Not running runnable tasks is
not a sane solution.

If current hardware can't cope, too friggin bad, get better hardware.


Do not 'pause' processes and you don't have the problem, make them stop
on their own accord or kill them if they dont listen.. who cares about
ill-behaved apps anyway?

But really, if you want a more detailed answer, you need to provide more
detail on these problems.

If you want to allow an untrusted app to provide a dependency for a
trusted app, you've lost and I don't care.


Not using suspend is exactly the point. As Alan has argued, propagating
suspend blockers up into all regions of userspace will take much longer
than fixing the hardware.

You got to realize this is about Linux as a whole, I really don't care
one whit about the specific Android case. We want a solution that is
generic enough to solve the power consumption problem and makes sense on
future hardware.

The only abstraction that really makes sense in that view is idle
states.


--

From: James Bottomley
Date: Saturday, May 29, 2010 - 9:10 am

Look, this is getting into the realms of a pointless semantic quibble.
The problem is that untrusted tasks need to be forcibly suspended when
they have no legitimate work to do and the user hasn't authorised them
to continue even if the scheduler sees them as runnable.  Whether that's
achieved by suspending the entire system or forcibly idling the tasks
(using blocking states or freezers or something) so the scheduler can
suspend from idle is something to be discussed, but the net result is
that we have to stop a certain set of tasks in such a way that they can
still receive certain external events ... semantically, this is
equivalent to not running runnable tasks in my book. (Perhaps this whole
thing is because the word runnable means different things ... I'm
thinking a task that would consume power ... are you thinking in the
scheduler R state?)

Realistically, the main thing we need to do is stop timers posted
against the task (which is likely polling in a main loop, that being the
usual form of easy to write but power crazy app behaviour) from waking
the task and bringing the system out of suspend (whether from idle or

That's rubbish and you know it.  We do software workarounds for hardware
problems all the time ... try doing a git grep -i errata in arch x86, or
imagine a USB subsystem that only supported sane standards conforming
devices: that would have an almost zero intersect with the current USB
device market.

The job of the kernel is to accommodate hardware as best it can ...
sometimes it might not be able to, but most of the time it does a pretty
good job.

The facts are that C states and S states are different and are entered
differently.  For some omap hardware, the power consumption in the
lowest C state (with all the ancillary power control) is the same as S3,
that's fine, suspend from idle works as well as suspend to ram modulo
bad apps. For quite a lot of MSM hardware, the lowest C state power
consumption is quite a bit above S3.  It's not acceptable to ...
From: Peter Zijlstra
Date: Saturday, May 29, 2010 - 11:12 am

So what happens if you task is CPU bound and gets suspended and is
holding a resource (lock, whatever) that is required by someone else
that didn't get suspended?

That's the classic inversion problem, and is caused by not running

Why would be care about external events? Clearly these apps are ill
behaved, otherwise they would have listened to the environment telling
them to idle.

Why would you try to let buggy apps work as intended instead of break
them as hard as possible? Such policy promotes crappy code since people


Sure, that same main loop will probably receive a message along the
lines of, 'hey, screen is off, we ought to go sleep'. If after that it
doesn't listen, and more serious messages don't get responded to, simply
kill the thing.

Again, there is no reason what so ever to tolerate broken apps, it only
promotes crappy apps.



--

From: Florian Mickler
Date: Monday, May 31, 2010 - 1:12 pm

On Sat, 29 May 2010 20:12:14 +0200

The trick with the approach currently discussed (i.e.
opportunistic suspend, if you missed it): We suspend the whole machine.

And I really think, this is the only way to do it. It is a big hammer,

If I have a simple shell script then I don't wanna jump through
hoops just to please your fragile kernel. 

And before you judge code that does not behave to work with YOUR buggy
kernel, i would think twice. This cuts both ways. Just because the
problem is too hard for you, this does not excuse forcing crappy
kernels on other people. 

I think you have a point in that it is _in general_ not easily
possible to solve. But for this case this is clearly a simple, to the
point and working solution for android based phones. 


I think this would be a possibility. And maybe even sane. But I also
think this has nothing to do with suspend_blockers. They block

This simple doesn't solve the problem.

Cheers,
Flo 
--

From: Florian Mickler
Date: Monday, May 31, 2010 - 1:47 pm

On Mon, 31 May 2010 22:12:19 +0200

Also why should that code on one device kill my uptime and on the
other machine (my wall-plugged desktop) work just well? That doesn't
sound right.

Clearly opportunistic suspend is a workaround for battery-driven devices
and no general solution. But it is not specific to android. At least
not inherently. It could be useful for any embedded or mobile device
where you can clearly distinguish important functions from convenience
functions.

I really can't understand the whole _fundamental_ opposition to this
design choice. 

Cheers,
Flo
--

From: Felipe Contreras
Date: Saturday, June 5, 2010 - 10:04 am

Sounds perfectly right to me; one code runs perfectly fine on one
machine, and on the other doesn't even compile. Well, sure, it wasn't

Yes, it could, but why go for the hacky solution when we know how to

Nobody is using it, except Android. Nobody will use it, except Android.

I have seen recent proposals that don't require changing the whole
user-space. That might actually be used by other players.

-- 
Felipe Contreras
--

From: Rafael J. Wysocki
Date: Saturday, June 5, 2010 - 12:04 pm

That's like saying "Android is not a legitimate user of the kernel".  Is that

Sure, an approach benefitting more platforms than just Android would be better,
but saying that the kernel shouldn't address the Android's specific needs as a
rule if no one else has those needs too is quite too far reaching to me.

Rafael
--

From: Peter Zijlstra
Date: Saturday, June 5, 2010 - 12:16 pm

Well, if the android people keep rejecting all sensible approaches to
power savings except their suspend blocker mess, then I don't see why we
should support their ill designed mess.

We should strive to provide an interface that can be used by all
interested parties to conserve power; if Android really is the only
possible user of the interface then I don't see any reason at all to
merge it, they might as well keep it in their private tree.



--

From: Rafael J. Wysocki
Date: Saturday, June 5, 2010 - 12:39 pm

Well, I certainly would like the Android people to be more appreciative of our

There is a number of kernel users that depend on Android user space
(phone vendors using Android on their hardware, but providing their own
drivers), so I don't think we really can identify Android with Google in that
respect.

Rafael
--

From: Peter Zijlstra
Date: Saturday, June 5, 2010 - 12:52 pm

I don't see why we can't merge the platform code and drivers without
suspend blockers. Google can patch them back in on their side if they
want to.



--

From: Felipe Contreras
Date: Saturday, June 5, 2010 - 12:53 pm

Read the context: opportunistic suspend, which is considered a
workaround, which requires new user-space API for suspend blockers,
might be remotely considered for inclusion *if* it indeed solves a
problem for battery-driven devices, which other parties also
experience and could benefit from this solution.


There are no Android specific needs, why should certain user-space
ecosystem need certain API that somehow *nobody* else does? I think in
this huge thread it has become obvious that people are reluctant to
this idea... whatever problem Android user-space presents (I don't
think there's any), it can be solved for "he rest of the world" too,
and such generic solution is worth exploring.

-- 
Felipe Contreras
--

From: Florian Mickler
Date: Monday, May 31, 2010 - 2:13 pm

Hi, again!

My two mails were probably a bit pointless and not helping to
find a solution. 

There are notable and useful approaches mentioned by Peter to the
mitigation problem. It's just that it's not the one and only way to
think about this.

Just rants,
Flo 
--

From: James Bottomley
Date: Monday, May 31, 2010 - 1:52 pm

OK ... but if the options are running and S3 for the entire platform,
then all tasks get suspended and this isn't a problem.  This is why the
current wakelock implementation on the android platform works flawlessly
today.

Inversion only becomes a problem if tasks get individually idled, so you
can see that, from the android point of view, we're creating a problem
which their implementation doesn't have.

In this view, S3 suspend does look elegant: it solves the inversion
problem by suspending everything, it controls rogue applications' power
consumption and it gets certain hardware into a lower power state than
is possible from suspend from idle.

The inelegance of the S3 suspend solution is the requirement to use
these suspend blockers through kernel and user space to get the whole
thing up again to respond to an event, which is an inelegance suspend

That's not a correct characterisation.  Badly behaved apps from a power
point of view can do useful things for the user.  The object is to

Actually, no, if this were a correct view, we wouldn't have the huge x86
hardware work around problem because we'd just be able to tell
manufacturers of shoddy or badly standards compliant stuff where to
stick it.

The great strength of the x86 commodity platform revolution was the fact
that the hardware became cheap, plentiful and outside the ambit of a
single walled garden manufacturer.  It's great weakness is integration
problems and shoddy hardware.  We tolerate the weakness because the
strength vastly outweighs it: and toleration to us in the kernel means
driver work arounds ... it also means that if a device doesn't work with
the kernel, we get blamed (rather than the manufacturer).

By the same token, the revolution in smart phones is driven in quite a
large part by the provision of third party applications.  This commodity
app view is almost the direct software analogue of the commodity
platform view that has been so successful in hardware; Therefore, fair
play seems to ...
From: Rafael J. Wysocki
Date: Monday, May 31, 2010 - 2:14 pm

That, among other things, is why suspend uses the freezer which guarantees


Do you realistically think that by hurting the _user_ you will make the
_developer_ write better code?  No, really.

If the user likes the app very much (or depends on it or whatever makes him
use it), he will rather switch the platform to one that allows him to run that
app without _visible_ problems than complain to the developer, because _the_
_user_ _doesn't_ _realize_ that the app is broken.  From the user's
perspective, the platform that has problems with the app is broken, because
the app apparently runs without problems on concurrent platforms.

The whole "no reason to tolerate broken apps" midset is simply misguided IMO,
because it's based on unrealistic assumptions.  That's because in general users
only need the platform for running apps they like (or need or whatever).  If
they can't run apps they like on a given platform, or it is too painful to them
to run their apps on it, they will rather switch to another platform than stop
using the apps.

Thanks,
Rafael
--

From: Felipe Contreras
Date: Saturday, June 5, 2010 - 10:16 am

As an application writer, if my users complain that their battery is
being drained (as it happened), they stop using it, and other people
see there are problems, so they stop using it, if people get angry
about it they will vote it down.

New users will see it has low score; they will not install it. That's
a network effect.


Yeah, right. I don't think anybody has every bought an iPhone because
of Tweetie. People care how the applications run on their phones, not
how their phone's platform runs their favorite application, in fact,
most probably it became their favorite application because it was
running great on their phone, and they wouldn't expect it to run on
phones with other platforms. Either applications run on S60, iPhone
OS, Android, or Maemo, but not in a combination of those. And if their
certain app that runs on multiple platforms, and the user actually
knows that (probably a geek), then he knows he can't expect it to work

You seriously think people switch high-end phones just to run their
favorite apps? It's much cheaper to switch apps, and that's what users
do.

-- 
Felipe Contreras
--

From: Florian Mickler
Date: Saturday, June 5, 2010 - 12:49 pm

On Sat, 5 Jun 2010 20:16:33 +0300

That is nice. But how does it impact the problem that suspend blockers
solve? And why do suspend blockers interfere with that?

Cheers,
Flo
--

From: Felipe Contreras
Date: Saturday, June 5, 2010 - 12:56 pm

It doesn't, I don't know why people keep bringing this argument, I
just though it should not be left open as a valid one.

I should have mentioned that this is indeed irrelevant.

-- 
Felipe Contreras
--

From: Florian Mickler
Date: Saturday, June 5, 2010 - 2:52 pm

On Sat, 5 Jun 2010 22:56:45 +0300

Uh! I found out how this is relevant to the suspend blockers case.
Because not having users means that the bugs don't get fixed.
Whereas in the suspend blockers case the users can use the app and get
the bugs fixed. 

Cheers,
Flo

p.s.: I really wished you would focus more on solving the
problem and not on dismissing it.
--

From: Peter Zijlstra
Date: Saturday, May 29, 2010 - 11:12 am

Sure, and if x86 could wake from S3 on a keypress/mouse movement etc..
you could use S3 as idle state.. not sure people would love the
wakeup-latency, but that's a QoS matter.

But if there simply are no suitable wakeup sources from an idle state
(S3 really is nothing more than a hardware idle state) then it might not
be suitable for transparent idle modes and no amount of software hackery
will solve that.

So what I'm saying is, if your hardware can't generate the needed wakeup
events, the auto-suspend stuff won't work either. If it can it can be

Wth is MSM?

But really, why can't existing hardware get shipped with existing hacks,
and for future hardware that does behave we have a proper solution?

--

From: Peter Zijlstra
Date: Saturday, May 29, 2010 - 11:12 am

<20100527232043.784d5c72@lxorguk.ukuu.org.uk>
<20100528101755.7b5f6b8a@lxorguk.ukuu.org.uk>



--

From: Thomas Gleixner
Date: Monday, May 31, 2010 - 1:49 pm

That's an x86'ism which is going away. And that's really completely
irrelevant for the mobile device space. Can we please stop trying to

If you'd have read the answers from Alan carefully, then you'd have
noticed that even x86 hardware is getting to the point where OMAP is
today. i.e. support of transparent suspend from idle. If that wouldn't
happen then x86 would be simply unusable for mobile devices. It's that
easy. And we really do _NOT_ care about the existing laptop hardware
which does not provide that because it's a lost case. Not only due to
the missing (or just disabled) wakeup sources, also due to the fact
that you cannot do sensible power management by completely disabling
clock and/or power of unused devices in the chipset. There is a damn
good reason why the mobile space is _NOT_ x86 based at the moment.

Thanks,

	tglx
--

From: James Bottomley
Date: Monday, May 31, 2010 - 2:21 pm

You're the one mentioning x86, not me.  I already explained that some
MSM hardware (the G1 for example) has lower power consumption in S3
(which I'm using as an ACPI shorthand for suspend to ram) than any
suspend from idle C state.  The fact that current x86 hardware has the

So not at all interested in x86 at the moment.

For MSM hardware, it looks possible to unify the S and C states by doing
suspend to ram from idle but I'm not sure how much work that is.

James


--

From: Thomas Gleixner
Date: Monday, May 31, 2010 - 2:46 pm

On ARM, it's not rocket science and we have in tree support for this
already (OMAP). I have done the same thing on a Samsung part as a
prove of concept two years ago and it's really easy as the hardware is
sane. Hint: It's designed for mobile devices :)

Thanks,

	tglx
--

From: Arve Hjønnevåg
Date: Monday, May 31, 2010 - 10:21 pm

We already enter the same power state from idle and suspend on msm. In
the absence of misbehaving apps, the difference in power consumption
is entirely caused by periodic timers in the user-space framework
_and_ kernel. It only takes a few timers triggering per second (I
think 3 if they do no work) to double the average power consumption on
the G1 if the radio is off. We originally added wakelocks because the
hardware we had at the time had much lower power consumption in
suspend then idle, but we still use suspend because it saves power.

-- 
Arve Hjønnevåg
--

From: Thomas Gleixner
Date: Tuesday, June 1, 2010 - 4:10 am

So how do you differentiate between timers which _should_ fire and
those you do not care about ?

We have mechanisms in place to defer timers so the wakeups are
minimized. If that's not enough we need to revisit.

Thanks,

	tglx

From: Arve Hjønnevåg
Date: Tuesday, June 1, 2010 - 8:32 pm

Deferring the the timers forever without stopping the clock can cause
problems. Our user space code has a lot of timeouts that will trigger
an error if an app does not respond in time. Freezing everything and
stopping the clock while suspended is a lot simpler than trying to
stop individual timers and processes from running.


-- 
Arve Hjønnevåg
--

From: Thomas Gleixner
Date: Wednesday, June 2, 2010 - 12:00 am

And resume updates timekeeping to account for the slept time. So the
only way to get away with that is to sleep under a second or just
ignoring the update by avoiding the access to rtc. 

So how do you keep timekeeping happy ?

Thanks,

	tglx
From: Arve Hjønnevåg
Date: Wednesday, June 2, 2010 - 12:17 am

No, for the monotonic clock it does the opposite. The hardware clock
is read on resume and the offset is set so the monotonic clock gets

-- 
Arve Hjønnevåg
--

From: Thomas Gleixner
Date: Wednesday, June 2, 2010 - 12:21 am

Grr, yes. Misread the code. -ENOTENOUGHCOFFEE

Thanks,

	tglx
From: Thomas Gleixner
Date: Monday, May 31, 2010 - 3:17 pm

Those machines can go from idle into S2RAM just fine w/o touching the
/sys/power/state S2RAM mechanism.

It's just a deeper "C" state, really.

The confusion is that S3 is considered to be a complete different
mechanism - which is true for PC style x86 - but not relevant for
hardware which is sane from the PM point of view.

Now some people think, that suspend blockers are a cure for the
existing x86/ACPI/BIOS mess, which cannot go to S3 from idle, but
that's simply not feasible.

Thanks,

	tglx
--

From: Matthew Garrett
Date: Tuesday, June 1, 2010 - 6:51 am

As long as you can set a wakeup timer, an S state is just a C state with 
side effects. The significant one is that entering an S state stops the 
process scheduler and any in-kernel timers. I don't think Google care at 
all about whether suspend is entered through an explicit transition or 
something hooked into cpuidle - the relevant issue is that they want to 
be able to express a set of constraints that lets them control whether 
or not the scheduler keeps on scheduling, and which doesn't let them 
lose wakeup events in the process.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: James Bottomley
Date: Tuesday, June 1, 2010 - 2:01 pm

Exactly, so my understanding of where we currently are is:

     1. pm_qos will be updated to be able to express the android suspend
        blockers as interactivity constraints (exact name TBD, but
        probably /dev/cpu_interactivity)
     2. pm_qos will be updated to be callable from atomic context
     3. pm_qos will be updated to export statistics initially closely
        matching what suspend blockers provides (simple update of the rw
        interface?)

After this is done, the current android suspend block patch becomes a
re-expression in kernel space in terms of pm_qos, with the current
userspace wakelocks being adapted by the android framework into pm_qos
requirements expressed to /dev/cpu_interactivity (or whatever name is
chosen).  Then opportunistic suspend is either a small add-on kernel
patch they have in their tree to suspend when the interactivity
constraint goes to NONE, or it could be done entirely by a userspace
process.  Long term this could migrate to the freezer and suspend from
idle approach as the various problem timers get fixed.

I think the big unresolved issue is the stats extension.  For android,
we need just a name written along with the value, so we have something
to hang the stats off ... current pm_qos userspace users just write a
value, so the name would be optional.  From the kernel, we probably just
need an additional API that takes a stats name or NULL if none
(pm_qos_add_request_named()?).  Then reading the stats could be done by
implementing a fops read routine on the misc device.

Did I miss anything?

James


--

From: Rafael J. Wysocki
Date: Tuesday, June 1, 2010 - 3:24 pm

I think that's not been decided yet precisely enough.  I saw a few ideas

Is the original idea of having that information in debugfs objectionable?

Rafael
--

From: James Bottomley
Date: Tuesday, June 1, 2010 - 3:36 pm

Well, android only needs two states (block and don't block), so that
gets translated as 2 s32 values (say 0 and INT_MAX).  I've seen defines
like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
describe these, but if all we're arguing over is the define name, that's
progress.

The other piece they need is the suspend block name, which comes with
the stats API, and finally we need to decide what the actual constraint

Well ... debugfs is usually used to get around the sysfs rules.  In this
case, pm_qos has a dev interface ... I don't specifically object to
using debugfs, but I don't see any reason to forbid it from being a
simple dev read interface either.

James


--

From: Arve Hjønnevåg
Date: Tuesday, June 1, 2010 - 6:10 pm

I think we need separate state constraints for suspend and idle low
power modes. On the msm platform only a subset of the interrupts can
wake up from the low power mode, so we block the use if the low power
mode from idle while other interrupts are enabled. We do not block
suspend however if those interrupts are not marked as wakeup
interrupts. Most constraints that prevent suspend are not hardware
specific and should not prevent entering low power modes from idle. In
other words we may need to prevent low power idle modes while allowing
suspend, and we may need to prevent suspend while allowing low power
idle modes.

It would also be good to not have an implementation that gets slower
and slower the more clients you have. With binary constraints this is

4. It would be useful to change pm_qos_add_request to not allocate
anything so can add constraints from init functions that currently

We don't currently have a dev interface for stats so this is not an
immediate requirement. The suspend blocker debugfs interface is just
as good as the proc interface we have for wakelocks.

-- 
Arve Hjønnevåg
--

From: Gross, Mark
Date: Tuesday, June 1, 2010 - 7:44 pm

From: Arve Hjønnevåg
Date: Tuesday, June 1, 2010 - 8:15 pm

2010/6/1 Gross, Mark <mark.gross@intel.com>:

The calling code will have to store a pointer to your structure
anyway, you may as well have them provide the whole structure.

-- 
Arve Hjønnevåg
--

From: Gross, Mark
Date: Tuesday, June 1, 2010 - 8:26 pm

[mtg: ] duh!  You are right.  Make the caller's hold the structure.  Its been a long day.  That would be easy todo.

--gmross


--

From: James Bottomley
Date: Tuesday, June 1, 2010 - 9:02 pm

Well, as I said, pm_qos is s32 ... it's easy to make the constraint

Well, that's an implementation detail ... ordering the list or using a
btree would significantly fix that.  However, the most number of
constraint users I've seen in android is around 60 ... that's not huge
from a kernel linear list perspective, so is this really a concern? ...
particularly when most uses don't necessarily change the constrain, so a

Sure .. we do that for the delayed work queues, it's just an API which
takes the structure as an argument leaving it the responsibility of the

OK, great ... what actually exports the statistics is just an
implementation detail. 

James



--

From: Arve Hjønnevåg
Date: Tuesday, June 1, 2010 - 9:41 pm

No, they have to be two separate constraints, otherwise a constraint
to block suspend would override a constraint to block a low power idle

True. I think we also need timeout support in the short term though
which is also somewhat simpler to implement in an efficient way for




-- 
Arve Hjønnevåg
--

From: James Bottomley
Date: Wednesday, June 2, 2010 - 8:05 am

Depends.  If you block the system from going into low power idle, does
that mean you still want it to be fully suspended?

If yes, then we do have independent constraints.  If not, they have a
hierarchy:

      * Fully Interactive (no low power idle or suspend)
      * Partially Interactive (may go into low power idle but not
        suspend)
      * None (may go into low power idle or suspend)

Which is expressable as a ternary constraint.

James


--

From: Florian Mickler
Date: Wednesday, June 2, 2010 - 12:47 pm

On Wed, 02 Jun 2010 10:05:11 -0500

But unblocking suspend at the moment is independent to getting idle.
If you have the requirement to stay in the highest-idle level (i.e.
best latency you can get) that does not (currently) mean, that you can
not suspend.

To preserve that explicit fall-through while still having working
run-time-powermanagement I think the qos-constraints need to be
separated. 

<disclaimer: just from what I read>
Provided you can reach the same power state from idle, current suspend
could probably also be implemented by just the freezing part and a hint
to the idle-loop to provide accelerated fall-through to lowest power. 
</disclaimer>

At that point, you could probably merge the constraints. 

But the freezing part is also the hard part, isn't it? (I have no
idea. Thomas seems to think about cgroups for that and doing smth about the timers.)

Cheers,
Flo
--

From: James Bottomley
Date: Wednesday, June 2, 2010 - 1:41 pm

I don't understand that as a reason.  If we looks at this a qos
constraints, you're saying that the system may not drop into certain low
power states because it might turn something off that's currently being
used by a driver or a process.  Suspend is certainly the lowest state of
that because it turns everything off, why would it be legal to drop into
that?

I also couldn't find this notion of separation of idleness power from
suspend blocking in the original suspend block patch set ... if you can
either tell me where it is, or give me an example of the separated use


Um, well, as I said, I think using suspend from idle and freezer is
longer term.  I think if we express the constraints as qos android can
then use them to gate when to enter S3 .. which is functionally
equivalent to suspend blockers.  And the vanilla kernel can use them to
gate power states for the drivers in suspend from idle.

James


--

From: Arve Hjønnevåg
Date: Wednesday, June 2, 2010 - 3:27 pm

Because the driver gets called on suspend which gives it a change to

The suspend block patchset only deals with suspend, not low power idle
modes. The original wakelock patchset had two wakelock types, idle and

The i2c bus on the Nexus One is used by the other core to turn off the
power you our core when we enter the lowest power mode. This means
that we cannot enter that low power mode while the i2c bus is active,
so we block low power idle modes. At some point we also tries to block
suspend in this case, but this caused a lot of failed suspend attempts
since the frequency scaling code would try to ramp up while freezing



-- 
Arve Hjønnevåg
--

From: James Bottomley
Date: Wednesday, June 2, 2010 - 4:03 pm

OK, so this is a device specific power constraint state.  I suppose it
makes sense to have a bunch of those, because the device isn't
necessarily going to know what idle power mode it can't go into, so the
cpu govenor should sort it out rather than have the device specify a
minimum state.  

James


--

From: Florian Mickler
Date: Wednesday, June 2, 2010 - 4:06 pm

On Wed, 02 Jun 2010 15:41:11 -0500

Hm. Maybe it is me who doesn't understand. 

With proposed patchset: 
1. As soon as we unblock suspend we go down.  (i.e. suspending)
2. While suspend is blocked, the idle-loop does it's things. (i.e.
runtime power managment -> can give same power-result as suspend)

possible cases:
1: 
   - qos-latency-constraints: 1ms,  [here: forbids anything other than
     C1 idle state.]
   - suspend is blocked

2: - qos latency-constraints: as in 1
   - suspend unblocked

3: - qos latency-constraints: infinity, cpu in lowest power state.
   - suspend is blocked

4: - qos latency-constraints: infinity, cpu in lowest power state.
   - suspend unblocked


in case 2 and 4 we would suspend, regardeless of the qos-latency.

in case 1 and 3 we would stay awake, regardeless of the qos-latency
constraint.


If only one constraint, then case 2 (or 3) wouldn't be possible. But it
is possible now. 

A possible use case as an example?
(hmm... i'm trying my imagination hard now): 
	Your sound needs low latency, so that could be a cause for the
	qos-latency constraint. 

	And unblocking suspend could nonetheless happen:
	For example... you have an firefox open and don't want to
	prevent suspend for that case when the display is turned off


Cheers,
Flo

--

From: Gross, Mark
Date: Wednesday, June 2, 2010 - 4:15 pm

[mtg: ] This has been a pain point for the PM_QOS implementation.  They change the constrain back and forth at the transaction level of the i2c driver.  The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.

We've had a number of attempts at fixing this, but I think the proper fix is to bolt a "disable C-states > x" interface into cpu_idle that bypases pm_qos altogether.  Or, perhaps add a new pm_qos API that does the equivalent operation, overriding whatever constraint is active.

--mgross


--

From: Alan Cox
Date: Thursday, June 3, 2010 - 3:03 am

> [mtg: ] This has been a pain point for the PM_QOS implementation.  They change the constrain back and forth at the transaction level of the i2c driver.  The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.

That should be trivial in the usual case because 99% of the time you can
hot path

	the QoS entry changing is the latest one
	there have been no other changes
	If it is valid I can use the cached previous aggregate I cunningly
		saved in the top QoS entry when I computed the new one


We need some of this anyway for deep power saving because there is
hardware which can't wake from soem states, which in turn means if that
device is active we need to be above the state in question.

--

From: Peter Zijlstra
Date: Thursday, June 3, 2010 - 3:05 am

Why would the kernel change the QoS state of a task? Why not have two
interacting QoS variables, one for the task, one for the subsystem in

Right, and I can imagine that depending on the platform details and not
the device details, so we get platform hooks in the drivers, or possible
up in the generic stack because I don't think NICs actually know if
there are open connections.
--

From: Kevin Hilman
Date: Thursday, June 3, 2010 - 7:42 am

Yes, having a QoS parameter per-subsystem (or even per-device) is very
important for SoCs that have independently controlled powerdomains.
If all devices/subsystems in a particular powerdomain have QoS
parameters that permit, the power state of that powerdomain can be
lowered independently from system-wide power state and power states of
other power domains.

Kevin
--

From: Gross, Mark
Date: Thursday, June 3, 2010 - 7:52 am

This seems similar to that pm_qos generalization into bus drivers we where 
waving our hands at during the collab summit in April?  We never did get 
into meaningful detail at that time.

--mgross


--

From: Kevin Hilman
Date: Thursday, June 3, 2010 - 9:58 am

The hand-waving was around how to generalize it into the driver-model,
or PM QoS.  We're already doing this for OMAP, but in an OMAP-specific
way, but it's become clear that this is something useful to
generalize.

Kevin
--

From: James Bottomley
Date: Thursday, June 3, 2010 - 10:01 am

Do you have a pointer to the source and description?  It might be useful
to look at to do a reality check on what we're talking about.

James


--

From: Muralidhar, Rajeev D
Date: Thursday, June 3, 2010 - 10:16 am

Hi Kevin, Mark, all,

Yes, from our brief discussions at ELC, and all the ensuing discussions that have happened in the last few weeks, it certainly seems like a good time to think about:
- what is a good model to tie up device idleness, latencies, constraints with cpu idle infrastructure - extensions to PM_QOS, part of what is being discussed, especially Kevin's earlier mail about QOS parameter per subsystem/device that may have independent clock/power domain control.

- what is a good infrastructure to subsequently allow platform-specific low power state - extensions to cpuidle infrastructure to allow platform-wide low power state? Exact conditions for such entry/exit into low power state (latency, wake, etc.) could be platform specific.

Is it a good idea to discuss about a model that could be applicable to other SOCs/platforms as well?

Thanks
Rajeev


-----Original Message-----
From: linux-pm-bounces@lists.linux-foundation.org [mailto:linux-pm-bounces@lists.linux-foundation.org] On Behalf Of Kevin Hilman
Sent: Thursday, June 03, 2010 10:28 PM
To: Gross, Mark
Cc: Neil Brown; tytso@mit.edu; Peter Zijlstra; felipe.balbi@nokia.com; LKML; Florian Mickler; James Bottomley; Thomas Gleixner; Linux OMAP Mailing List; Linux PM; Alan Cox
Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)


The hand-waving was around how to generalize it into the driver-model,
or PM QoS.  We're already doing this for OMAP, but in an OMAP-specific
way, but it's become clear that this is something useful to
generalize.

Kevin
_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
--

From: Bryan Huntsman
Date: Thursday, June 3, 2010 - 2:50 pm

I think there is definitely a need for QoS parameters per-device.  I've 
been pondering how to incorporate this concept into runtime_pm.  One 
idea would be to add pm_qos-like callbacks to struct dev_pm_ops, e.g. 
runtime_pm_qos_add/update/remove_requirement().  Requirements would be 
passed up the tree to the first parent that cares, usually a bus driver. 
  Is this similar to what you guys were discussing at the collab summit? 
  Thanks.

- Bryan
--

From: James Bottomley
Date: Thursday, June 3, 2010 - 6:24 am

It's not just the list based computation: that's trivial to fix, as you
say ... the other problem is the notifier chain, because that's blocking
and could be long.  Could we invoke the notifier through a workqueue?
It doesn't seem to have veto power, so it's pure notification, does it

James


--

From: Florian Mickler
Date: Thursday, June 3, 2010 - 7:18 am

On Thu, 03 Jun 2010 08:24:31 -0500

I think schedule_work() (worqueue.h) can take care of that. 
Thats how the rfkill subsystem does it. 

Cheers,
Flo
--

From: Gross, Mark
Date: Thursday, June 3, 2010 - 7:26 am

[mtg: ] true.  The notifications "could be" done on as a scheduled work item
in most cases.  I think there is only one user of the notification so far 
any way.  Most pm_qos users do a pole of the current value for whatever parameter they are interested in.


--

From: Thomas Gleixner
Date: Thursday, June 3, 2010 - 7:35 am

It depends on the information type and for a lot of things we might
get away without notifiers. 

The only real issue is when you need to get other cores out of their
deep idle state to make a new constraint work. That's what we do with
the DMA latency notifier right now.

Thanks,

	tglx
--

From: James Bottomley
Date: Thursday, June 3, 2010 - 7:55 am

But the only DMA latency notifier is cpuidle_latency_notifier.  That
looks callable from atomic context, so we could have two chains: one
atomic and one not.

The only other notifier in use is the ieee80211_max_network_latency,
which uses mutexes, so does require user context.

James


--

From: mark gross
Date: Tuesday, June 1, 2010 - 7:45 pm

This is all nice but, all this does is implement the exact same thing as
the wake lock / suspend blocker API as a pm_qos request-class.  It
leaves the overlapping constraint issue from ISR to user mode in place
depending on exactly how the oppertunistic suspend is implemented.

I expect it will be via a notifier on the pm_qos request-class update
that would do exactly what the wake lock code does today.  just load up
an a "suspend_on_non_interactivity" driver that registers for the call
back, have it enabled by the user mode PM, and you have the equivelent
architecture as what was proposed by the wake lock patches.

it gives the Android guys what they want, without adding a new
subsystem, minimizing the changes and makes most of the architecture
much more politicaly acceptible.

But doesn't it have the same issues with getting the overlapping
constraints right from wake up source to user mode and dealing with the
wake up envents in a sane way?  Instead of sprinkling suspend-blockers
about the kernel we'll sprinkle pm_qos_requests about.  I like getting

I don't think the status would be a big deal to add.


However; I am really burned out by this discussion.  I am willing to
stub this out ASAP if it puts this behind us if the principles in the
discussion are in more or less agreement.

--mgross

For the record, I still like my low power event idea, which could
coexist with the above.


--

From: James Bottomley
Date: Tuesday, June 1, 2010 - 9:14 pm

if the vanilla kernel is simply consuming the pm_qos infrastructure and
using suspend from idle, this is irrelevant.  As I said, S3 suspend
*can* be implemented via a suspend manager process from userspace (the
alan stern proposal).  However, if I were coding the android kernel, I'd
do it as a tiny add on kernel patch.  The main goal of making the
android kernel close enough to the vanilla kernel for there not to be
two separate upstreams for the device driver writers has been achieved

Suspend from idle doesn't have the wakeup problem.  it only manifests if
you want to take the system down via the S states.  I think long term,
making suspend from idle work for all hardware is the agreed goal, even
if android can't implement it today and has to use an S state work

The proposal is isomorphic to what I said above ... just
s/pm_qos/whatever the lp API is/

James


--

From: Thomas Gleixner
Date: Monday, May 31, 2010 - 2:41 pm

That's wrong. You only need the explicit dynamic QoS constraints for
applications which follow the scheme:

     while (1) {
     	   if (event_available())
	      process_event();
	   else
	      do_useless_crap_which_consumes_power();
     }	   

which need the following annotation:

     while (1) {
     	   block_suspend();
     	   if (event_available()) {
	      process_event();
	      unblock_suspend();
	   } else {
	      unblock_suspend();
	      do_useless_crap_which_consumes_power();
           }
     }	   

Plus the kernel counterpart of drivers which take the suspend blocker
in the interrupt handler and release it when the event queue is empty.

So that's done for making polling event handling power "efficient".

Even worse, you need the same "annotation" for non polling mode and it
enforces the use of select() because you cannot take a suspend blocker
across a blocking read() without adding more invasive interactions to
the kernel..

So the "sane" app looks like:

   while (1) {
   	 select();
	 block_suspend();
	 process_events();
	 unblock_suspend();
   }

I'm really tired of arguing that this promotion of "programming style"
is the worst idea ever, so let's look how you can do the same thing
QoS based.

s/block_suspend()/qos(INTERACTIVE)/ and
s/unblock_suspend()/qos(NONE)/ and
s/block_magic()/qos_magic()/ in the drivers.

Yes, it's mostly the same, with a subtle difference:

While android can use it in the big hammer approach to disable the
existing user initiated suspend via /sys/power/state, the rest of the
world can benefit as well in various ways.

 - Sane applications which use a blocking event wait can be handled
   with a static QoS setting simply because a blocking read relies on
   the QoS state of the underlying I/O system.

 - Idle based suspend as the logical consequence of idle states is
   just a matter of QoS constraint based decisions.

 - Untrusted apps can be confined in cgroups. The groups are set to
   ...
From: Rafael J. Wysocki
Date: Monday, May 31, 2010 - 3:23 pm

I generally agree.

I think the Alan Stern's recent proposal goes along these lines, but it has
the advantage of being a bit more specific. ;-)

Thanks,
Rafael
--

From: Thomas Gleixner
Date: Monday, May 31, 2010 - 3:27 pm

Yes, Alan Stern's proposal is going into that direction and I'm not
opposed. Just wanted to get the overall picture straight for James :)

Thanks,

	tglx
--

From: James Bottomley
Date: Monday, May 31, 2010 - 4:47 pm

So this is the re-expression in terms of a QoS API that I mentioned ...
as I said, I think it's the way forwards. (from the android point of
view, it keeps the user space expression in exactly the same place as
the original wake locks, or suspend blocks, which is why it looks like a

I understand this ... it's effectively the alan stern approach.  I've

Yes, which is why I think something like this can be made to work ... I
don't really see that we differ on the broad brush picture.  As long as
the acceptable implementation accomplishes what everyone wants, I think
we're home.



--

From: Brian Swetland
Date: Thursday, May 27, 2010 - 9:55 pm

I think that the suspend block model can be viewed as a constraints
problem (similar to some of things things you've been sketching out in
these threads), but I think we (Google/Android) view it as more of a
state constraint (don't enter suspend) than a latency constraint.

We think there's a need for these constraints both from the driver
side and userspace side, and that these constraints are not tied to
processes (multiple entities in one process may have different
constraints at different times or multiple processes may be working
together to accomplish some goal under a single constraint -- at least
both cases exist in the Android system as it ships today).

The exact naming of the API is not terribly important to us.  The
first thing we spent a bunch of time discussing last summer when Arve
first looked into sending wakelocks upstream was changing the name
because many objected to "wakelock" for various reasons.

Being able to have userful statistics (which drivers/processes/etc
held which wakelock for how long, how many times, etc) is important to
us.  While we want to do the best we can in the face of poorly written
apps, we also want to educate users and developers about which apps
are contributing to their poor battery life -- so users can decide to
uninstall an app if its usefulness does not justify its impact on
battery life and application developers can be more aware of what the
cost of their app is to endusers.

As an example, http://frotz.net/misc/battery-stats-unplugged.txt
contains a dump from the "battery service" aggregating wakelock usage,
cpu usage, and sensor device usage of processes (#....: sections) on
my phone the other day for a ~3 hour period.  This data is presented
visually to the enduser in a "what's using my battery" feature of the
platform.  "realtime" refers to wall clock time here and "uptime"
refers to not-in-suspend execution time.

Brian
--

From: Florian Mickler
Date: Thursday, May 27, 2010 - 11:39 pm

On Thu, 27 May 2010 21:55:26 -0700


Hi!
Thinking about the issue a little more, this isn't really about trusted
apps and not trusted apps. Or crapplications. 

The point is, that as soon as an app takes a suspend-blocker it becomes
 what is here referred to as a "trusted app". But just because it is then visible as
consuming power in an official way. 

Android suspends (as in echo mem > /sys/power/state)
whenever possible. It's as if there were a spring on the laptop lid,
and if the user doesnt hold his grip on it, the thing closes. How does
he hold his grip? The application registers a suspend-blocker for him.

So, why not use something like idle/QOS with this? 

I can imagine to theoretically have a "latency requirement" where 0
means this application does not interact with the user. and != 0 means
this application interacts with the user.

("latency requirement" doesn't quite get it, but it works for now)

In android land, the default would be that every application has a
latency-requirement of 0. And then everything (userland) that takes a
suspend-blocker would be changed to take a "latency requirement != 0". 

Now, if the system interacts with the user
( i.e. there is a global
latency requirement > 0, where "global latency requirement" is
computed by the pm framework maxing over all the userland processes
and the kernel side)
everything has to run. So we also need to schedule things which specify 
a latency requirement == 0.

This last thing means, that it has to be independent of the scheduler, doesn't it?

I don't see how renaming suspend_blocker to set_pidle would not do
something equivalent to this, but the bit's are probably a bit scattered
throughout the kernel. 
(Which I don't think is introduced by that patch set, but by the fact that 
suspend is currently not an idle state.)

I can understand if there needs to be a good solution in the kernel
from day 1. 

So, what would compose to a good solution? 

Here should probably the more experienced ...
From: Arve Hjønnevåg
Date: Thursday, May 27, 2010 - 7:47 pm

Android does not only run on phones. It is possible that no android
devices have ACPI, but I don't know that for a fact. What I do know is
that people want to run Android on x86 hardware and supporting suspend

I think existing laptops (and desktops) can benefit from opportunistic
suspend support. If opportunistic suspend is used for auto-sleep after
inactivity instead of forced suspend, the user space suspend blocker
api will allow an application to delay this auto sleep until for
instance a download completes. This part could also be done with a
user-space IPC call, but having a standard kernel interface for it may
make it more common. A less common case, but more critical, is RTC
alarms. I know my desktops can wakeup at a specific time by
programming an RTC alarm, but without suspend blockers how do you
ensure that the system does not suspend right after the alarm
triggered? I have a system that wakes up at specific times requested
by my DVR application, but I cannot use this system for anything else
unless I manually turn off the DVR application's auto-sleep feature.
With suspend blockers and something like the android alarm driver, I
could use this system for more than one application that have
scheduled tasks and it would be more usable for interactive
applications.

-- 
Arve Hjønnevåg
--

From: Alan Cox
Date: Friday, May 28, 2010 - 2:17 am

Sufficently beneficial to justify putting all this stuff all over the
kernel and apps ? That is a *very* high hurdle, doubly so when those
vendors who have chosen to be part of the community are shipping phones

This assumes you modify all the applications. That isn't going to happen.

How do you know that isn't the correct behavior. My laptop behaves in
that way if for example the battery is almost flat. Your suspend blocker
would cause me to lose all my work with a flat battery. This is another
example of why the application must not be the policy manager.

In the normal case in the PC world outside of corner cases like flat
batteries the answer is really simple. The laptop suspend to RAM
on idle intervals set in the BIOS and the like are sufficient that
progress will have been made before it considers going back to sleep
again. Right now its about ten seconds in each direction plus other costs
(wear on LCD backlight, disc parking etc).

Alan
--

From: Arve Hjønnevåg
Date: Friday, May 28, 2010 - 2:32 am

No it does not. You only have to modify the applications were you want


If the inactivity timeout happens to expire at the same time as my
alarm that would wake up the system to run my scheduled task if it was
already suspended my sceduled task will not run when scheduled. How


I'm not sure what you are trying to say here. Are you saying your
laptop enters S3 from idle?

-- 
Arve Hjønnevåg
--

From: Alan Cox
Date: Friday, May 28, 2010 - 4:16 am

If I have an alarm set on my laptop it will wake up when the alarm goes
off. Once it has woken up it will not go back to suspend (except for
something libe a battery event) until a timeout has elapsed that began
when the laptop woke up.

This in the laptop work solves the problem of making progress. On a
laptop power budget, with laptop constraints on suspend (both physical
cycle limits of hardware and performance) this works fine.

If I suspend/resume my laptop every time I have a 30 second idle gap I
will need a new laptop much sooner than makes me happy.

I don't claim this is true for a typical mobile phone obviously.

Alan
--

From: Arve Hjønnevåg
Date: Friday, May 28, 2010 - 4:20 am

Forced suspend is still supported. No new API is needed if you really

I think you are missing the point. It works fine if the alarm caused
the wakeup, but if you had just used your system and your inactivity
timeout expired just as your alarm goes off, the alarm will not wake

Then don't set your inactivity timeout to 30 seconds. I don't see how
The only difference on the phone is that we have way more wakeup
events which makes the race conditions more visible. The race exist on
your laptop as well.

-- 
Arve Hjønnevåg
--

From: Alan Cox
Date: Friday, May 28, 2010 - 6:55 am

As far as I can tell (and its an extremely hard situation to replicate),
this is not true. My laptop sleeps and wakes straight back up.

The following cannot occur on my laptop for simple idling

	Alarm
		Suspend

because the Alarm resets the suspend timer when it is delivered. The wake
pins and wake logic also ensure that the sequence

		Suspend
			Alarm

always causes

		Suspend
			Alarm
		Suspend Finishes

It's very relevant because it means that considering current laptops is

The number of events is I think only partly relevant. What matters is how
long you wait between idle and suspending. The longer you wait the less
potential you have to end up with an event successfully owned by an
application you are not considering relevant to suspend.

--

From: Matthew Garrett
Date: Friday, May 28, 2010 - 7:05 am

Userspace is about to write to /sys/power/state when it gets scheduled. 
Alarm delivery occurs at that instant. Kernel has no idea that it's 
about to go to sleep, so the driver handles things appropriately and 
clears the hardware state. Userspace gets scheduled, writes and the 
system suspends. The problem is that having userspace decidie to 
initiate a suspend and then actually initiate a suspend isn't an atomic 
operation.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Alan Cox
Date: Friday, May 28, 2010 - 5:21 am

Ok lets try and produce something more concrete. The control groups may
be the wrong tool but we've got several such tools already


Kernel involved
----------------
acquire:		mark myself important (into cgroup important)
acquire(timeout)	ditto, plus app timer/timeout handler
release:		mark myself unimportant (into cgroup downtrodden) 

All user
--------
isHeld:			app implementation internal
setReferenceCounted:	app implementation internal


In the idle manager [Androids own probably]

	if (member of ignored cgroup && in user space)
		ignore for idle purposes


In the Android code managing this [Android specific bits of
probably userspace]
	mark downtrodden as ignored
	mark downtrodden as not ignored


[Total kernel changes

	Ability to mark/unmark a scheduler control group as outside of
	some parts of idle consideration. Generically useful and
	localised. Group latency will do most jobs fine (Zygo is correct
	it can't solve his backup case elegantly I think)

	Test in the idling logic to distinguish the case and only needed
	for a single Android specific power module. Generically useful
	and localised]

So I put my phone down

	The UI manager gets told the phone is 'down'
	Ten seconds later it is still down
	It marks the downtrodden group as 'ignored'

	The idle logic goes
		Nothing to run powersave
		Still nothing
		Ooh 0.3 seconds of nothing
		Drop into suspend state


If I push the button we get an IRQ
We come out of power save
The app gets poked
The app may be unimportant but the IRQ means we have a new timeout of
    some form to run down to idle
The app marks itself important
The app stays awake for 60 seconds rsyncing your email
The app marks itself unimportant
Time elapses
We return to suspend


If you are absolutely utterly paranoid about it you need the button
driver to mark the task it wakes back as important rather than rely on
time for response like everyone else. That specific bit is uggglly but
worst case its just a google ...
From: Peter Zijlstra
Date: Friday, May 28, 2010 - 5:30 am

I really don't like this..

Why can't we go with the previously suggested: make bad apps block on
QoS resources or send SIGXCPU, SIGSTOP, SIGTERM and eventually SIGKILL?



--

From: Alan Cox
Date: Friday, May 28, 2010 - 6:02 am

On Fri, 28 May 2010 14:30:36 +0200

Ok. Are you happy with the QoS being attached to a scheduler control
group and the use of them to figure out what is what ?
--

From: Peter Zijlstra
Date: Friday, May 28, 2010 - 6:20 am

Up to a point, but explicitly not running runnable tasks complicates the
task model significantly, and interacts with fun stuff like bandwidth
inheritance and priority/deadline inheritance like things -- a subject
you really don't want to complicate further.

We really want to do our utmost best to make applications block on
something without altering our task model.

If applications keep running despite being told repeatedly to cease, I
think the SIGKILL option is a sane one (they got SIGXCPU, SIGSTOP and
SIGTERM before that) and got ample opportunity to block on something.

Traditional cpu resource management treats the CPU as an ever
replenished resource, breaking that assumption (not running runnable
tasks) puts us on very shaky ground indeed.

--

From: Peter Zijlstra
Date: Friday, May 28, 2010 - 7:59 am

Also, I'm not quite sure why we would need cgroups to pull this off.

It seems most of the problems the suspend-blockers are trying to solve
are due to the fact of not running runnable tasks. Not running runnable
tasks can be seen as assigning tasks 0 bandwidth. Which is a situation
extremely prone to all things inversion. Such a situation would require
bandwidth inheritance to function at all, so possibly we can see
suspend-blockers as a misguided implementation of that.

So lets look at the problem, we want to be frugal with power, this means
that the system as a whole should strive to do nothing. And we want to
enforce this as strict as possible.

If we look at the windowing thing, lets call it X, X will inform its
clients about the visibility of their window, any client trying to draw
to its window when it has been informed about it not being visible is
wasting energy and should be punished.

(I really wish the actual X on my desktop would do more of that -- its
utterly rediculous that firefox keeps animating banners and the like
when nobody can possibly see them)

Clearly when we turn the screen off, nothing is visible and all clients
should cease to draw.

How do we want to punish dis-obedient clients? Is blocking them
sufficient? Do we want to maintain a shitlist of iffy clients?

Note that the 'buggy' client doesn't function properly, if we block its
main event loop doing this, it won't respond to other events -- but as
argued, its a buggy app, hence its per definition unreliable and we
don't care.

Next comes the interesting problem of who gets to keep the screen lit, I
think in the above case that is a pure userspace problem and doesn't
need kernel intervention.

Can we apply the same reasoning to other resources, filesystems,
network? For both of them it seems the main governing body isn't this
windowing system, but the kernel (although arguably you could fully do
it in middle-ware, just like X is that).

But in both cases I think we can work with a QoS ...
From: Florian Mickler
Date: Friday, May 28, 2010 - 8:53 am

On Fri, 28 May 2010 16:59:54 +0200

An interesting thought might be to add the costs of staying in
a state versus going to a lower power state into consideration. 

If the system is busy doing stuff it would need to do anyway (today
stuff that is guarded/annotated by the suspend blockers) , the costs for
not being in suspend have to be paid anyway. So it is opportune for
processes to run. Even if they by themselves would not justify the
system running. 

If instead nothing system-relevant has to be done, the costs of running
anything non-relevant is the full amount of battery-life that could
be saved by suspending + (some minor) running costs. 

Also if there is much work to do (many tasks) its more likely that it's
good to do the work.

something along the lines :

(amount of energy saved by being in suspend) / (number of tasks we
would run if we werent suspended) *
some_parameter_for_this_tasks_importance (which falls clearly into
scheduler-territory)

And if this goes above some threshold we run it.

But this isn't easily done in a robust way.
Also it complicates things. 

Cheers,
Flo
--

From: Rafael J. Wysocki
Date: Friday, May 28, 2010 - 2:44 pm

I think this is a matter of what is regarded as a "runnable task".  Some
tasks may not even be regarded as runnable in specific power conditions,
although otherwise they would be.

Consider updatedb or another file indexing ... thing on a laptop.  I certainly
don't want anything like this to run and drain my battery, even if it has
already been started when the machine was on AC power.  Now, of course,
I can kill it, but for that I need to notice that it's running and it presumably
might have done some job already and it would be wasteful to lose it.
It would be quite nice if that app was not regarded as runnable when the
system was on battery power.

In my view that's quite analogous to the Android situation, when they simply
don't want some tasks to be regarded as runnable in specific situations.

Rafael
--

From: Peter Zijlstra
Date: Saturday, May 29, 2010 - 12:53 am

How will a ionice on steriods that will defer servicing IO when the IO
system QoS limit doesn't meet the updatedb process's level is too low,
not solve this?

In that case the updatedb process will simply block on IO, will hence
not be runnable and thus not drain your battery.
--

From: Rafael J. Wysocki
Date: Saturday, May 29, 2010 - 1:12 pm

It will only work for apps that use I/O, but there may be purely CPU-bound
ones that need that kind of approach too.
--

From: Matthew Garrett
Date: Friday, May 28, 2010 - 5:31 am

<- wakeup event that should be delivered to untrusted app arrives here

At this point you may mark the downtrodden group as ignored between the 
untrusted app receiving the event and the untrusted app marking itself 
as important. To avoid this you need the UI manager to receive every 

(The cgroup has to have some awareness of suspend/resume so that it can 

The timeout-based nature means that if the application doesn't get 
scheduled for some reason (say there's heavy swap pressure - not likely 
in the embedded world, but an issue on laptop-type devices) the event 
may not be handled before you get back to sleep. I accept that this 
isn't likely to be a problem in the real world, but it does make this 

Not just the button driver. Every driver that generates wakeupa. This 
gets difficult when it comes to the network layer, for instance, when 
the network driver has very little idea how the packet it just received 

The problem is that you still have a race, and fixing that race requires 
every event that could generate a wakeup to be proxied out to the policy 
manager as well. That's a moderate additional overhead.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Alan Cox
Date: Friday, May 28, 2010 - 6:54 am

The event wakes the device, the event itself means the kernel is doing
bits so the kernel is active and we are not idled so we have a time
before we will consider re-suspending.

[If you accept that untrusted apps must be constrained then you can't
 allow one to mark itself important - or at least you can't listen to it

I don't think so. The apps will get scheduled anyway when not suspended.
The only reason they are not being scheduled is that the device is

No. Every driver which generates wakeups which should wake an untrusted
application. If network packets to untrusted applications should wake the
box up then a simple background ping process left running is going to
drain your battery and bugger your containment of the mess completely as
you've just accepted an infinite supply of untrusted timed wakeup events

I am not convinced at this point. If the app gets put into the important
group by the driver then you don't need to poke a policy manager.

This again moves us beyond containment because we just allowed an
'untrusted' app a way to be trusted - just as it might abuse a suspend
blocker.

If you accept untrusted apps can't be fixed (for example they could
simply lose the event internally due to app code bugs) then the static
case all looks pretty trivial.

With a Meego hat on you'd dump all the stuff you didn't trust into a
scheduler group and tell the suspend aspect of the idle choice to ignore
it when the screen blanks. While you are it you also get a free ticket to
putting trusted rt apps into the 'and don't even C6' group.

Alan
--

From: Matthew Garrett
Date: Friday, May 28, 2010 - 7:02 am

Ok, I think I've misunderstood you. You're actually saying that only 
applications that are trusted to behave well are allowed to receive 
wakeup events? Yes, that makes implementation significantly easier. If 
that maps reasonably well onto the existing Android application space, 
it may even be an acceptable compromise.

-- 
Matthew Garrett | mjg59@srcf.ucam.org
--

From: Alan Cox
Date: Friday, May 28, 2010 - 8:24 am

To receive them in a manner that they are permitted to defer a suspend.
There is non reason why bouncing cows shouldn't get to see an event, but
there is always the miniscule possibility that we choose to suspend as it
gets the event.

That to me seems fine. Our starting basis was

- Bouncing cows is not trusted

Android's reaction was

- We reserve the right to suspend bouncing cows where it likes it or not

The caveat becomes

- Bouncing cows may get suspended then get an event when the phone wakes
  back up. So I might press "Moo" just before a suspend and get the noise
  when it resumes.

Given the untrusted cows could respond to the event otherwise by blocking
the suspend for as long as permitted with a suspend blocker or similar
that seems no worse. In this case probably better [oof zap! as opposed to
60 seconds of 'event, no sorry got a cow to draw at 100% CPU')

As the app is untrusted we can't assume they would get suspend blockers
right even if they had any.

You can still be nice to the cows app and when the phone is put down send
it a 10 second warning via dbus or Android equivalents.

Your trusted call handling app can still request (by QoS or big hammers)
that the phone does not suspend even if the app goes idle (because you
have a wakeup latency QoS)

A naïve trusted app will behave according to power management idling to
suspend and get stopped

A naïve untrusted app that is doing sane things will spend most of its
life asleep and behave.

--

From: Vitaly Wool
Date: Saturday, May 29, 2010 - 1:39 am

May we somehow live without acquire(timeout)? This is the feature that
can screw up a lot of things with very complicated debugging options.

~Vitaly
--

From: Zygo Blaxell
Date: Sunday, May 30, 2010 - 6:57 pm

I'm not sure "other people are shipping without them" is such a good
metric, especially for scheduler features.  For some reason (I have some
ideas what it might be, but I won't speculate here) people don't like
messing with the scheduler in mainline, even though there's a lot of
special cases where a bit of messing with the scheduler (or replacing
it outright) goes a long way toward qualitatively improving performance
on some workloads.

I'd love to have several more ways to have large classes of processes stop
executing, and stay stopped, even though traditional Unix and mainline
Linux would try to run them.  I don't want to put knowledge of this into
every application I run since there are literally thousands of them,
and IMNSHO it's not even an application's responsibility to know this
kind of thing.  The "sort" program can't know what QoS to ask for in any
sane system design.  The best it can do is try to execute as hard as it
can whenever the kernel lets it, and have some other application advise
the kernel about how much or how little service (including cases like
"no service at all") the sort program should get from the system.

To choose a random example, I'd like a "duty cycle" constraint on
process execution (i.e. a runnable task must execute between L and M ns
per N ns interval--stealing slices from lower priority processes if it
doesn't get enough and isn't blocked on I/O, and leaving the CPU idle even
though the process is runnable if it gets too much).  I usually want to
apply this kind of limit to programs like Firefox, because Firefox is a)
big enough that controlling it actually matters for power consumption,
b) sensitive enough to user interaction latency that I want it to have
fairly high CPU priority when it has something to do, and c) big and
complex enough that I wouldn't want to try to adjust its behavior by
modifying its source.  Also, Firefox's behavior tends to be driven by
the data it pulls from random web sites, over which I have no ...
From: Ingo Molnar
Date: Friday, May 28, 2010 - 2:21 am

(If there's a sane framework then we'll fix x86 to fit into it and will deal 

I really like the level of detail and care that went into suspend-blockers, 
and i think the Android solution is very mature in terms of functionality 
offered to users.

In terms of bringing this depth of functionality and control to the upstream 
kernel, what do you think about Alan's QoS scheme, described in:

   <20100528001514.28e593ef@lxorguk.ukuu.org.uk>

?

It's in essence suspend-blockers on steroids. It consists of two main 
components:

 - Unify the 'suspended' state into the regular chain of idle states, and
   create a single, coherent and transparent way we handle system idleness.

 - Give apps a QoS attribute that allows them to express how long they can
   afford to wait for a wakeup. (A downloading app would set it to say 50msecs,
   and thus the kernel would know it automatically which method of idleness is 
   still achievable. If all currently running apps have a max(QoS) attribute 
   of infinite, then the kernel can suspend for an unlimited amount of time.)

AFAICS, and i have read through your suspend-blocker usecases, this should 
handle all the usecases you listed - and some more. (please yell if that's not 
so)

Suspend-blockers are equivalent to: 'app sets idle QoS latency to 0 msecs'.

(And on x86, for BIOS/CPU combos that allow it we can implement this scheme 
too.)

Thoughts?

	Ingo
--

From: Arve Hjønnevåg
Date: Friday, May 28, 2010 - 2:59 am

Tying the QoS attribute to apps does not work (all proposals I have
seen have race conditions), but replacing every suspend blocker with
unique QoS object will work, since is the same thing as what suspend
blockers provide. I think replacing suspend blockers with artificial
latency requirements is a bad idea though, since we use them to ensure
a specific level of functionality (tasks, timers and interrupts
operate normally). If we get a more generic constraint framework,
suspend blockers may possibly be absorbed by this, but I think the
current implementation is useful as is (it could even be useful to
someone working on a generic constraints framework).

-- 
Arve Hjønnevåg
--

Previous thread: