So after all the heat that was generated in the various Android discussions, I took a look a look at the android git tree, and while there are a fair number of large and controversial infrastructure changes, there are also a number of small fixes that apply easily against Linus' git tree. So after cherry picking these 50-some small patches out of the android tree, I organized them into topic branches, and over the next few weeks, I hope to send them out to lkml and topic maintainers for comments. Now, I'm not proposing that these changes be merged as-is. It may very well be that, unknown to me, android developers have already tried to submit these patches and they have been rejected for good reason. Or some patches may very well be necessary hacks to get thing shipping while deeper fixes are being worked on. If that is the case, let me know and forgive me for the noise. But as, it seemed many of these small changes have been obscured by the debate over the larger infrastructure changes, I wanted to bring them forward so that possibly good fixes were not missed in the controversy. Maintainers: If you do find any of these patches distasteful, that's fine, I'll be happy to drop them from my tree for now. I really don't want to stir up another huge mail thread over these small patches, but I'd appreciate if you'd consider them as a bug report illustrating an issue or a desired feature, and suggest what you see as a reasonable way to accomplish the desired functionality presented in the patch. The following patches are just the scheduler related trivial patches from the Android tree. You can find this as well as my other trivial Android topic branches here: http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=summary thanks -john Cc: Arve Hj
From: Erik Gilling <konkers@android.com> CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <peterz@infradead.org> Change-Id: I1a5c9676baa06c9f9b4424bbcab01b9b2fbfcd99 Signed-off-by: Erik Gilling <konkers@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> --- kernel/sched.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 0b58415..c99bbb2 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5390,7 +5390,7 @@ void sched_show_task(struct task_struct *p) unsigned state; state = p->state ? __ffs(p->state) + 1 : 0; - printk(KERN_INFO "%-13.13s %c", p->comm, + printk(KERN_INFO "%-15.15s %c", p->comm, state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?'); #if BITS_PER_LONG == 32 if (state == TASK_RUNNING) -- 1.7.3.2.146.gca209 --
Commit-ID: 28d0686cf7b14e30243096bd874d3f80591ed392 Gitweb: http://git.kernel.org/tip/28d0686cf7b14e30243096bd874d3f80591ed392 Author: Erik Gilling <konkers@android.com> AuthorDate: Fri, 19 Nov 2010 18:08:51 -0800 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Tue, 23 Nov 2010 10:29:07 +0100 sched: Make task dump print all 15 chars of proc comm Signed-off-by: Erik Gilling <konkers@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 550cf3a..324afce 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -5249,7 +5249,7 @@ void sched_show_task(struct task_struct *p) unsigned state; state = p->state ? __ffs(p->state) + 1 : 0; - printk(KERN_INFO "%-13.13s %c", p->comm, + printk(KERN_INFO "%-15.15s %c", p->comm, state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?'); #if BITS_PER_LONG == 32 if (state == TASK_RUNNING) --
From: Mike Chan <mike@android.com> Introduce new platform callback hooks for cpuacct for tracking CPU frequencies Not all platforms / architectures have a set CPU_FREQ_TABLE defined for CPU transition speeds. In order to track time spent in at various CPU frequencies, we enable platform callbacks from cpuacct for this accounting. Architectures that support overclock boosting, or don't have pre-defined frequency tables can implement their own bucketing system that makes sense given their cpufreq scaling abilities. New file: cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU frequency. CC: Ingo Molnar <mingo@elte.hu> CC: Peter Zijlstra <peterz@infradead.org> Change-Id: I10a80b3162e6fff3a8a2f74dd6bb37e88b12ba96 Signed-off-by: Mike Chan <mike@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> --- Documentation/cgroups/cpuacct.txt | 4 +++ include/linux/cpuacct.h | 41 +++++++++++++++++++++++++++++++ kernel/sched.c | 49 +++++++++++++++++++++++++++++++++++++ 3 files changed, 94 insertions(+), 0 deletions(-) create mode 100644 include/linux/cpuacct.h diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt index 8b93094..600d2d0 100644 --- a/Documentation/cgroups/cpuacct.txt +++ b/Documentation/cgroups/cpuacct.txt @@ -40,6 +40,10 @@ system: Time spent by tasks of the cgroup in kernel mode. user and system are in USER_HZ unit. +cpuacct.cpufreq file gives CPU time (in nanoseconds) spent at each CPU +frequency. Platform hooks must be implemented inorder to properly track +time at each CPU frequency. + cpuacct controller uses percpu_counter interface to collect user and system times. This has two side effects: diff --git a/include/linux/cpuacct.h b/include/linux/cpuacct.h new file mode 100644 index 0000000..560df02 --- /dev/null +++ b/include/linux/cpuacct.h @@ -0,0 +1,41 @@ +/* include/linux/cpuacct.h + * + * Copyright (C) 2010 Google, Inc. + * + * This ...
I utterly detest all such accounting crap.. it adds ABI constraints it add runtime overhead. etc.. Can't you get the same information by using the various perf bits? If you trace the cpufreq changes you can compute the time spend in each power state, if you additionally trace the sched_switch you can compute it for each task. --
On Sat, 20 Nov 2010 11:48:24 +0100 This is probably used for "on-site" debugging of production systems. I.e. when someone sends them a problem report using an bugreport-tool, they gather all useful information they can get on the system because they only have one-way communication with their bug reporters. Do the perf bits work for such a usecase? If I guess correctly, the perf bits need a userspace part that computes what would be in the cpuacct.cpufreq file? Regards, Flo --
Dude, its from the _android_ tree... its cpufreq crud.. it must be some crack induced power management scheme. --
On Mon, 22 Nov 2010 11:43:59 +0100 what I wanted to get at, was that they probably need these stats aggregated somewhere neat and tidy and can not compute them on the fly recording massive amounts of data... I wonder why they didn't put this in the idle-driver. I don't know. Regards, Flo --
This is useful for tracking cpu power per c-group. We split each android application into its own c-group and track what cpu speeds and how long the cpu spent for each one. Peter we've actually discussed this before: http://lkml.org/lkml/2010/5/6/301 These patches were discussed with Paul Menage and Balbir Singh back in April, as well as on lmkl and the cpufreq mailing lists. These may or may not be useful for mainline, I assume anyone wanting to track power specific for c-groups would be interested. I'm open for different implementations that can help achieve cpu power tracking per-cgroup if this particular implementation is controversial, or if you just want to help make Android's kernel better. --
Right, so Stephane is working on perf-cgroup bits (I saw he recently posted another version, which I guess I ought to look at soonish). With that it would be rather simple to use perf to track per-cgroup power state. --
From: Dima Zavin <dima@android.com>
After pulling the thread off the run-queue during a cgroup change,
the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
then gets normalized to this new value. This can then lead to the thread
getting an unfair boost in the new group if the vruntime of the next
task in the old run-queue was way further ahead.
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Cc: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
kernel/sched_fair.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index f4f6a83..72f19ad 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -802,6 +802,8 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
static void
dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
+ u64 min_vruntime;
+
/*
* Update run-time statistics of the 'current'.
*/
@@ -826,6 +828,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (se != cfs_rq->curr)
__dequeue_entity(cfs_rq, se);
account_entity_dequeue(cfs_rq, se);
+
+ min_vruntime = cfs_rq->min_vruntime;
update_min_vruntime(cfs_rq);
/*
@@ -834,7 +838,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
* movement in our normalized position.
*/
if (!(flags & DEQUEUE_SLEEP))
- se->vruntime -= cfs_rq->min_vruntime;
+ se->vruntime -= min_vruntime;
}
/*
--
1.7.3.2.146.gca209
--
Right, so assuming the reasoning is right (my brain still needs to wake up) the patch is weird, by not simply move the code bock up and avoid the whole extra variable like so? --- kernel/sched_fair.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c index d35f464..dfa28ef 100644 --- a/kernel/sched_fair.c +++ b/kernel/sched_fair.c @@ -1003,8 +1003,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) se->on_rq = 0; update_cfs_load(cfs_rq, 0); account_entity_dequeue(cfs_rq, se); - update_min_vruntime(cfs_rq); - update_cfs_shares(cfs_rq, 0); /* * Normalize the entity after updating the min_vruntime because the @@ -1013,6 +1011,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) */ if (!(flags & DEQUEUE_SLEEP)) se->vruntime -= cfs_rq->min_vruntime; + + update_min_vruntime(cfs_rq); + update_cfs_shares(cfs_rq, 0); } /* --
Also, clearly that comments needs addressing.. --
