Re: [RFC][PATCH 0/3] Refactoring sched_entity and sched_rt_entity.

Previous thread: [PATCH] fix miscompiling with GCC 4.5 -finline-functions by Dzianis Kahanovich on Tuesday, January 4, 2011 - 8:17 am. (1 message)

Next thread: RE:FW by jyothi v on Tuesday, January 4, 2011 - 9:22 am. (1 message)
From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 8:55 am

Hi everybody,

After having heard Peter talking about it many times, I finally decided
to give this thing a spin. It is basically the idea of putting the
current sched_entity and sched_rt_entity together in an union, contained
in a more general structure which hosts the common fields of the twos.
For example, it came out here: http://lkml.org/lkml/2009/7/9/153

Therefore, this patchset _DOES_NOT_ do that! :-O

In fact, this is some preliminary refactoring which I think it's needed
before union-ify the two data structure, and on which I would like to
get some preliminary comments, before going ahead with the wrong
approach.

Basically, I renamed struct sched_entity to struct sched_cfs_entity and
put it and sched_rt_entity inside a more general struct sched_entity...
Is that clear? :-)
The patches just cope with that. Yes, there's some work done by a patch
and undone by the next one. I now it's annoying, and I'm sorry, but I
did such for the sake of reviewability and for making each patch compile
cleanly.
As for the name of CFS's scheduling entities, sched_fair_entity would
probably have been better, but it's longer, and "_cfs" is already
present in many places, so I went for it... No big deal in changing
that, apart from stay-within-80-characters-per-line issues! :-P

They're not inside an union yet, because I'm not sure on how to treat
the task group case. In fact, tasks can only have _just_one_ scheduling
policy at a time, and thus, for example, they need the run_list _or_ the
rb_node for queueing (if the task is RT or fair, respectively), which is
perfect with respect to using an union.
OTOH, groups are always considered both fair _and_ RT entities, for
example they're always queued in _both_ an RT run_list and a fair
rb-tree. So I can't put them in an union, because I need both at the
same time!
Suggestions on how to deal with that will be appreciated.

I compiled and tested this on x86_64, and for [!]SMP, [!]CGROUP_SCHED,
[!]FAIR_GROUP_SCHED, [!]RT_GROUP_SCHED and ...
From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 9:00 am

Using the name `struct sched_entity' for the scheduling entity
of the fair scheduling class is not fair to the other scheduling
classes, which have to use things like `struct sched_rt_entity'.

Moreover, changing that would make it possible to put all the
`struct sched_*_entity' of all the scheduling classes in a more
general structure (which will also contains all the common fields).
This would make the code easier to understand and, if we manage in
using an union for the class-specific fields, it would also save
some memory.

Therefore, this commit turns `sched_entity' into `sched_cfs_entity',
wherever it is being used.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
---
 fs/proc/base.c            |    2 +-
 include/linux/init_task.h |    4 +-
 include/linux/sched.h     |    6 +-
 kernel/delayacct.c        |    2 +-
 kernel/exit.c             |    2 +-
 kernel/posix-cpu-timers.c |   10 +-
 kernel/sched.c            |  152 ++++++------
 kernel/sched_debug.c      |  117 +++++-----
 kernel/sched_fair.c       |  611 +++++++++++++++++++++++----------------------
 kernel/sched_rt.c         |   26 +-
 kernel/sched_stoptask.c   |    2 +-
 11 files changed, 479 insertions(+), 455 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 08cba2c..9591d6e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -356,7 +356,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 static int proc_pid_schedstat(struct task_struct *task, char *buffer)
 {
 	return sprintf(buffer, "%llu %llu %lu\n",
-			(unsigned long long)task->se.sum_exec_runtime,
+			(unsigned long long)task->cfs.sum_exec_runtime,
 			(unsigned long long)task->sched_info.run_delay,
 			task->sched_info.pcount);
 }
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index caa151f..8baee0b 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -142,8 +142,8 @@ extern struct cred init_cred;
 	.cpus_allowed	= CPU_MASK_ALL,					\
 	.mm		= NULL,						\
 ...
From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 9:01 am

Now that the fair scheduling entity structure is called
sched_cfs_entity we can use the name sched_entity for a more
general data structure, which accommodates both sched_cfs_entity
and sched_rt_entity, along with all the fields that are common
and useful for both the classes.

In fact, it sounds a bit awkward for fields like on_rq or
exec_start, that are meaningful and actually used from within
sched_rt.c (for RT tasks and groups scheduling), to be placed
in some other scheduling class' scheduling entity.

This commit addresses that. Notice that it places
sched_cfs_entity and sched_rt_entity inside sched_entity as full
structure, it doesn't use an union for them yet.

No fields are added, and the memory occupation of this solution
should be pretty much the same of the previous one, apart from
the case where just one of CONFIG_FAIR_GROUP_SCHED and
CONFIG_RT_GROUP_SCHED is defined, which is a bit more expensive
than before.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
---
 fs/proc/base.c            |    2 +-
 include/linux/init_task.h |    8 +-
 include/linux/sched.h     |   33 ++++--
 kernel/delayacct.c        |    2 +-
 kernel/exit.c             |    2 +-
 kernel/kthread.c          |    2 +-
 kernel/posix-cpu-timers.c |   14 ++--
 kernel/sched.c            |  262 ++++++++++++++++++++++++++-------------------
 kernel/sched_debug.c      |  109 ++++++++++---------
 kernel/sched_fair.c       |  184 +++++++++++++++++--------------
 kernel/sched_rt.c         |   80 +++++++-------
 kernel/sched_stoptask.c   |    2 +-
 12 files changed, 385 insertions(+), 315 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9591d6e..08cba2c 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -356,7 +356,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 static int proc_pid_schedstat(struct task_struct *task, char *buffer)
 {
 	return sprintf(buffer, "%llu %llu %lu\n",
-			(unsigned long long)task->cfs.sum_exec_runtime,
+			(unsigned long ...
From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 9:02 am

Fields like on_rq or statistics are accessed quite frequently
from CFS within implementation, although they're no longer part of
sched_cfs_entity (they're in the general sched_entity).

This commit provides some macros that makes the code which
deals with them a bit less cumbersome.

Signed-off-by: Dario Faggioli <raistlin@linux.it>
---
 kernel/sched_fair.c |  111 ++++++++++++++++++++++++++-------------------------
 1 files changed, 56 insertions(+), 55 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index e9b8260..0c3006e 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -107,6 +107,11 @@ static inline struct sched_entity *se_of_cfs_se(struct sched_cfs_entity *cfs_se)
 	return container_of(cfs_se, struct sched_entity, cfs);
 }
 
+static inline int cfs_se_on_rq(struct sched_cfs_entity *cfs_se)
+{
+	return se_of_cfs_se(cfs_se)->on_rq;
+}
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 
 /* cpu runqueue to which this cfs_rq is attached */
@@ -455,6 +460,8 @@ static struct sched_cfs_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
  * Scheduling class statistics methods:
  */
 
+#define cfs_se_statistics(cfs_se)	((se_of_cfs_se(cfs_se))->statistics)
+
 #ifdef CONFIG_SCHED_DEBUG
 int sched_proc_update_handler(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp,
@@ -521,8 +528,7 @@ static u64 __sched_period(unsigned long nr_running)
  */
 static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_cfs_entity *cfs_se)
 {
-	struct sched_entity *se = se_of_cfs_se(cfs_se);
-	u64 slice = __sched_period(cfs_rq->nr_running + !se->on_rq);
+	u64 slice = __sched_period(cfs_rq->nr_running + !cfs_se_on_rq(cfs_se));
 
 	for_each_sched_cfs_entity(cfs_se) {
 		struct load_weight *load;
@@ -531,7 +537,7 @@ static u64 sched_slice(struct cfs_rq *cfs_rq, struct sched_cfs_entity *cfs_se)
 		cfs_rq = cfs_rq_of(cfs_se);
 		load = &cfs_rq->load;
 
-		if (unlikely(!se_of_cfs_se(cfs_se)->on_rq)) {
+		if (unlikely(!cfs_se_on_rq(cfs_se))) {
 			lw = ...
From: Lucas De Marchi
Date: Tuesday, January 4, 2011 - 9:18 am

Hi Dario


In fact I created the patchset but I didn't have time to finish,
polish, test and send. There are several corner cases that must be
treated that made me think if it was really worth the work.

In this repo you can find a greatly outdated branch that treats some
of the troubles you'll need to think about:

git://git.profusion.mobi/users/lucas/linux-2.6.git schedentity

And this was a rebase I did, but irc it was not working well:

git://git.profusion.mobi/users/lucas/linux-2.6.git schedentity-rebased




Lucas De Marchi
--

From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 11:31 am

Ok, again, while git-clone and git-push do their jobs, feel free to
share some of these concerns of yours with me, because I'm not sure I'm
Ok, this is mine (it'll be ready in a while, I'm pushing it up):

  git://gitorious.org/sched_deadline/linux-26.git schedentity


Right now I'm pushing my schedentity branch and cloning yours! :-D
Well, I'll be very busy in the next days, but I plan to keep this
updated enough, so feel free to comment and/or use it as a basis for
further patching if you think it could help.

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

http://retis.sssup.it/people/faggioli -- dario.faggioli@jabber.org
From: Peter Zijlstra
Date: Tuesday, January 4, 2011 - 9:19 am

Just like its now, keep a sched_entity per class.

struct task_group {

#ifdef CONFIG_FAIR_GROUP_SCHED
	struct sched_entity **cfs_se;
	...
#endif

#ifdef CONFIG_RT_GROUP_SCHED
	struct sched_entity **rt_se;
	...
#endif


Yay!

I see once clash with my current ttwu patch set though, see:

  http://lkml.org/lkml/2011/1/4/228

But that should be easy to resolve.
--

From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 11:11 am

Well, sure this can be done. But what about the common fields? I guess
you're suggesting to use something like `struct sched_entity_common' and
putting them there, aren't you?

If yes, I'm fine with that, although it'll add one more level of
indirection for those fields (e.g., p->se.comm.on_rq). Are we cool with
It's based on current tip, and yes, I plan to keep it updated and solve
the clashes as soon as I run into them. :-)

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

http://retis.sssup.it/people/faggioli -- dario.faggioli@jabber.org
From: Peter Zijlstra
Date: Tuesday, January 4, 2011 - 12:26 pm

I'm pretty sure we're mis-understanding each other here, I meant to keep
two complete sched_entity structures (both containing the common/fair/rt
parts). That way, one can use the fair and the other the rt part.
--

From: Lucas De Marchi
Date: Tuesday, January 4, 2011 - 9:37 am

Don't forget the PI case too. You will need to change
rt_mutex_setprio() to keep a copy of sched_cfs_entity in struct
rt_mutex_waiter.

Peter, isn't sched_fair_entity a better name?



Lucas De Marchi
--

From: Peter Zijlstra
Date: Tuesday, January 4, 2011 - 9:41 am

Yeah, it probably is..
--

From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 11:08 am

As you wish. I know it's just one letter longer, but be prepared for it
to break a lot of lines, since you may find four of it in a row! :-P

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

http://retis.sssup.it/people/faggioli -- dario.faggioli@jabber.org
From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 10:58 am

Mmm... do I? While I'm cloning your git, could you elaborate a bit on
why, because I don't seem to see that... :-P

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

http://retis.sssup.it/people/faggioli -- dario.faggioli@jabber.org
From: Lucas De Marchi
Date: Tuesday, January 4, 2011 - 11:29 am

Suppose a RT task blocks on a PI-mutex, the lock owner will be boosted
to RT and go through a class change in rt_mutex_setprio().
Since now a class change reinitializes the class-specific, if fair and
rt fields are on the same memory space, we need to save the
sched_fair_entity before changing the class to RT and put it again
when going back to the fair class.

Quoting Peter about this:

 [ Initially I was thinking not, because the task slept we'll have to
reinsert it in the rb-tree anyway, but upon further consideration
that'll loose the old vruntime setting, which can lead to an unseemly
gain of time in place_entity()'s never backward check failing.

So yes, we'd have to place a copy of the old sched_entity in struct
rt_mutex_waiter, not very hard to do. ]


As a side-note: irc this is one thing i didn't do when i prepared that patches.



Lucas De Marchi
--

From: Dario Faggioli
Date: Tuesday, January 4, 2011 - 11:57 am

Well, I know, but you're deactivating+dequeueing and then
activating+enqueueing it back and forth within the proper scheduling
class, so that shouldn't be a big deal...

Actually, the point might be that forgetting something like, e.g.,
vruntime would then lead to unexpected behaviour when the task is back
to fair scheduling, but do we need to cache the whole sched_[cfs|
Ok, exactly, I now see that, and it's probably not hard to do... But I'm
now thinking of how many fields must be saved to avoid this. Surely, not
these ones:
... 
	struct rb_node          run_node;
	struct list_head        group_node;

#ifdef CONFIG_FAIR_GROUP_SCHED
	struct sched_cfs_entity *parent;
	/* rq on which this entity is (to be) queued: */
	struct cfs_rq           *cfs_rq;
	/* rq "owned" by this entity/group: */
	struct cfs_rq           *my_q;
#endif
...

I'll double check, but if it's just a matter of vruntime and a couple of
other u64, isn't it worth to just save them?

Thanks and Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
----------------------------------------------------------------------
Dario Faggioli, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)

http://retis.sssup.it/people/faggioli -- dario.faggioli@jabber.org