Re: [PATCH 3/5] scheduler: cpuacct: Enable platform hooks to track cpuusage for CPU frequencies

Previous thread: [PATCH 4/5] scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking by John Stultz on Friday, November 19, 2010 - 7:08 pm. (1 message)

Next thread: [PATCH] ARM: Fix spinlock bad magic on disabling nonboot cpu by Colin Cross on Friday, November 19, 2010 - 7:16 pm. (3 messages)
From: John Stultz
Date: Friday, November 19, 2010 - 7:08 pm

So after all the heat that was generated in the various Android
discussions, I took a look a look at the android git tree, and 
while there are a fair number of large and controversial 
infrastructure changes, there are also a number of small fixes 
that apply easily against Linus' git tree.

So after cherry picking these 50-some small patches out of the
android tree, I organized them into topic branches, and over
the next few weeks, I hope to send them out to lkml and topic
maintainers for comments.

Now, I'm not proposing that these changes be merged as-is. It
may very well be that, unknown to me, android developers have
already tried to submit these patches and they have been rejected
for good reason. Or some patches may very well be necessary hacks
to get thing shipping while deeper fixes are being worked on. If
that is the case, let me know and forgive me for the noise.

But as, it seemed many of these small changes have been obscured
by the debate over the larger infrastructure changes, I wanted 
to bring them forward so that possibly good fixes were not missed
in the controversy.

Maintainers: If you do find any of these patches distasteful,
that's fine, I'll be happy to drop them from my tree for now.
I really don't want to stir up another huge mail thread over these 
small patches, but I'd appreciate if you'd consider them as a
bug report illustrating an issue or a desired feature, and suggest 
what you see as a reasonable way to accomplish the desired
functionality presented in the patch.

The following patches are just the scheduler related trivial patches
from the Android tree. You can find this as well as my other trivial
Android topic branches here:
http://git.linaro.org/gitweb?p=people/jstultz/linux.git;a=summary

thanks
-john

Cc: Arve Hj
From: John Stultz
Date: Friday, November 19, 2010 - 7:08 pm

From: Erik Gilling <konkers@android.com>

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Change-Id: I1a5c9676baa06c9f9b4424bbcab01b9b2fbfcd99
Signed-off-by: Erik Gilling <konkers@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0b58415..c99bbb2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5390,7 +5390,7 @@ void sched_show_task(struct task_struct *p)
 	unsigned state;
 
 	state = p->state ? __ffs(p->state) + 1 : 0;
-	printk(KERN_INFO "%-13.13s %c", p->comm,
+	printk(KERN_INFO "%-15.15s %c", p->comm,
 		state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');
 #if BITS_PER_LONG == 32
 	if (state == TASK_RUNNING)
-- 
1.7.3.2.146.gca209

--

From: tip-bot for Erik Gilling
Date: Tuesday, November 23, 2010 - 3:21 am

Commit-ID:  28d0686cf7b14e30243096bd874d3f80591ed392
Gitweb:     http://git.kernel.org/tip/28d0686cf7b14e30243096bd874d3f80591ed392
Author:     Erik Gilling <konkers@android.com>
AuthorDate: Fri, 19 Nov 2010 18:08:51 -0800
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 23 Nov 2010 10:29:07 +0100

sched: Make task dump print all 15 chars of proc comm

Signed-off-by: Erik Gilling <konkers@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 550cf3a..324afce 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5249,7 +5249,7 @@ void sched_show_task(struct task_struct *p)
 	unsigned state;
 
 	state = p->state ? __ffs(p->state) + 1 : 0;
-	printk(KERN_INFO "%-13.13s %c", p->comm,
+	printk(KERN_INFO "%-15.15s %c", p->comm,
 		state < sizeof(stat_nam) - 1 ? stat_nam[state] : '?');
 #if BITS_PER_LONG == 32
 	if (state == TASK_RUNNING)
--

From: John Stultz
Date: Friday, November 19, 2010 - 7:08 pm

From: Mike Chan <mike@android.com>

Introduce new platform callback hooks for cpuacct for tracking CPU frequencies

Not all platforms / architectures have a set CPU_FREQ_TABLE defined
for CPU transition speeds. In order to track time spent in at various
CPU frequencies, we enable platform callbacks from cpuacct for this accounting.

Architectures that support overclock boosting, or don't have pre-defined
frequency tables can implement their own bucketing system that makes sense
given their cpufreq scaling abilities.

New file:
cpuacct.cpufreq reports the CPU time (in nanoseconds) spent at each CPU
frequency.

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Change-Id: I10a80b3162e6fff3a8a2f74dd6bb37e88b12ba96
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 Documentation/cgroups/cpuacct.txt |    4 +++
 include/linux/cpuacct.h           |   41 +++++++++++++++++++++++++++++++
 kernel/sched.c                    |   49 +++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/cpuacct.h

diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt
index 8b93094..600d2d0 100644
--- a/Documentation/cgroups/cpuacct.txt
+++ b/Documentation/cgroups/cpuacct.txt
@@ -40,6 +40,10 @@ system: Time spent by tasks of the cgroup in kernel mode.
 
 user and system are in USER_HZ unit.
 
+cpuacct.cpufreq file gives CPU time (in nanoseconds) spent at each CPU
+frequency. Platform hooks must be implemented inorder to properly track
+time at each CPU frequency.
+
 cpuacct controller uses percpu_counter interface to collect user and
 system times. This has two side effects:
 
diff --git a/include/linux/cpuacct.h b/include/linux/cpuacct.h
new file mode 100644
index 0000000..560df02
--- /dev/null
+++ b/include/linux/cpuacct.h
@@ -0,0 +1,41 @@
+/* include/linux/cpuacct.h
+ *
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This ...
From: Peter Zijlstra
Date: Saturday, November 20, 2010 - 3:48 am

I utterly detest all such accounting crap.. it adds ABI constraints it
add runtime overhead. etc.. 

Can't you get the same information by using the various perf bits? If
you trace the cpufreq changes you can compute the time spend in each
power state, if you additionally trace the sched_switch you can compute
it for each task.


--

From: Florian Mickler
Date: Sunday, November 21, 2010 - 10:51 pm

On Sat, 20 Nov 2010 11:48:24 +0100
This is probably used for "on-site" debugging of production systems.

I.e. when someone sends them a problem report using an
bugreport-tool, they gather all useful information they can get on the
system because they only have one-way communication with their bug
reporters.

Do the perf bits work for such a usecase? If I guess correctly, the
perf bits need a userspace part that computes what would be in the
cpuacct.cpufreq file? 

Regards,
Flo
--

From: Peter Zijlstra
Date: Monday, November 22, 2010 - 3:43 am

Dude, its from the _android_ tree... its cpufreq crud.. it must be some
crack induced power management scheme.


--

From: Florian Mickler
Date: Monday, November 22, 2010 - 5:23 am

On Mon, 22 Nov 2010 11:43:59 +0100

what I wanted to get at, was that they probably need these stats
aggregated somewhere neat and tidy and can not compute them on the fly
recording massive amounts of data...

I wonder why they didn't put this in the
idle-driver.  I don't know. 

Regards,
Flo

--

From: Mike Chan
Date: Monday, November 22, 2010 - 7:05 pm

This is useful for tracking cpu power per c-group. We split each
android application into its own c-group and track what cpu speeds and
how long the cpu spent for each one. Peter we've actually discussed
this before:
http://lkml.org/lkml/2010/5/6/301

These patches were discussed with Paul Menage and Balbir Singh back in
April, as well as on lmkl and the cpufreq mailing lists. These may or
may not be useful for mainline, I assume anyone wanting to track power
specific for c-groups would be interested. I'm open for different
implementations that can help achieve cpu power tracking per-cgroup if
this particular implementation is controversial, or if you just want
to help make Android's kernel better.

--

From: Peter Zijlstra
Date: Tuesday, November 23, 2010 - 4:35 am

Right, so Stephane is working on perf-cgroup bits (I saw he recently
posted another version, which I guess I ought to look at soonish).

With that it would be rather simple to use perf to track per-cgroup
power state.
--

From: John Stultz
Date: Friday, November 19, 2010 - 7:08 pm

From: Dima Zavin <dima@android.com>

After pulling the thread off the run-queue during a cgroup change,
the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
then gets normalized to this new value. This can then lead to the thread
getting an unfair boost in the new group if the vruntime of the next
task in the old run-queue was way further ahead.

CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
Cc: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/sched_fair.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index f4f6a83..72f19ad 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -802,6 +802,8 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
 static void
 dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 {
+	u64 min_vruntime;
+
 	/*
 	 * Update run-time statistics of the 'current'.
 	 */
@@ -826,6 +828,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	if (se != cfs_rq->curr)
 		__dequeue_entity(cfs_rq, se);
 	account_entity_dequeue(cfs_rq, se);
+
+	min_vruntime = cfs_rq->min_vruntime;
 	update_min_vruntime(cfs_rq);
 
 	/*
@@ -834,7 +838,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 * movement in our normalized position.
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
-		se->vruntime -= cfs_rq->min_vruntime;
+		se->vruntime -= min_vruntime;
 }
 
 /*
-- 
1.7.3.2.146.gca209

--

From: Peter Zijlstra
Date: Saturday, November 20, 2010 - 3:55 am

Right, so assuming the reasoning is right (my brain still needs to wake
up) the patch is weird, by not simply move the code bock up and avoid
the whole extra variable like so?

---
 kernel/sched_fair.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index d35f464..dfa28ef 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1003,8 +1003,6 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	se->on_rq = 0;
 	update_cfs_load(cfs_rq, 0);
 	account_entity_dequeue(cfs_rq, se);
-	update_min_vruntime(cfs_rq);
-	update_cfs_shares(cfs_rq, 0);
 
 	/*
 	 * Normalize the entity after updating the min_vruntime because the
@@ -1013,6 +1011,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 	 */
 	if (!(flags & DEQUEUE_SLEEP))
 		se->vruntime -= cfs_rq->min_vruntime;
+
+	update_min_vruntime(cfs_rq);
+	update_cfs_shares(cfs_rq, 0);
 }
 
 /*

--

From: Peter Zijlstra
Date: Saturday, November 20, 2010 - 5:33 am

Also, clearly that comments needs addressing..
--

Previous thread: [PATCH 4/5] scheduler: cpuacct: Enable platform callbacks for cpuacct power tracking by John Stultz on Friday, November 19, 2010 - 7:08 pm. (1 message)

Next thread: [PATCH] ARM: Fix spinlock bad magic on disabling nonboot cpu by Colin Cross on Friday, November 19, 2010 - 7:16 pm. (3 messages)