Ingo Molnar announced that version 24 of his Completely Fair Scheduler patch is now available backported to the 2.6.24-rc3, 2.6.23.8, 2.6.22.13, and 2.6.21.7 kernels. He noted that there have been significant changes since the last backport, "36 files changed, 2359 insertions(+), 1082 deletions(-). That's 187 individual commits from 32 authors." Ingo noted, "99% of these changes are already upstream in Linus's git tree and they will be released as part of v2.6.24. (there are 4 pending commits that are in the small 2.6.24-rc3-v24 patch.)" He also highlighted some of the more significant improvements:
"Improved interactivity via Peter Ziljstra's 'virtual slices' feature. As load increases, the scheduler shortens the virtual timeslices that tasks get, so that applications observe the same constant latency for getting on the CPU. (This goes on until the slices reach a minimum granularity value).
"CONFIG_FAIR_USER_SCHED is now available across all backported kernels and the per user weights are configurable via /sys/kernel/uids/. Group scheduling got refined all around."
From: Ingo Molnar Subject: [patch/backport] CFS scheduler, -v24, for v2.6.24-rc3, v2.6.23.8, v2.6.22.13, v2.6.21.7 Date: Nov 19, 8:17 am 2007 By popular demand, here is release -v24 of the CFS scheduler patch. It is a full backport of the latest & greatest scheduler code to v2.6.24-rc3, v2.6.23.8, v2.6.22.13, v2.6.21.7. The patches can be downloaded from the usual place: http://people.redhat.com/mingo/cfs-scheduler/ There's tons of changes since v22 was released: 36 files changed, 2359 insertions(+), 1082 deletions(-) that's 187 individual commits from 32 authors. So even if CFS v22 worked well for you, please try this release too and report regressions (if any). There are countless improvements in -v24 (see the shortlog further below for details), but here are a few highlights: - improved interactivity via Peter Ziljstra's "virtual slices" feature. As load increases, the scheduler shortens the virtual timeslices that tasks get, so that applications observe the same constant latency for getting on the CPU. (This goes on until the slices reach a minimum granularity value) - CONFIG_FAIR_USER_SCHED is now available across all backported kernels and the per user weights are configurable via /sys/kernel/uids/. Group scheduling got refined all around. - performance improvements - bugfixes 99% of these changes are already upstream in Linus's git tree and they will be released as part of v2.6.24. (there are 4 pending commits that are in the small 2.6.24-rc3-v24 patch.) As usual, any sort of feedback, bugreport, fix and suggestion is more than welcome! Ingo ------------------> Adrian Bunk (3): sched: make kernel/sched.c:account_guest_time() static sched: proper prototype for kernel/sched.c:migration_init() sched: make sched_nr_latency static Alexey Dobriyan (1): sched: uninline scheduler Andi Kleen (5): sched: cleanup: remove unnecessary gotos sched: cleanup: refactor common code of sleep_on / wait_for_completion sched: cleanup: refactor normalize_rt_tasks sched: remove stale comment from sched_group_set_shares() sched: fix return value of wait_for_completion_interruptible() Arjan van de Ven (1): Make scheduler debug file operations const Balbir Singh (1): sched: fix delay accounting regression Christian Borntraeger (1): sched: fix accounting of interrupts during guest execution on s390 Cliff Wickman (1): hotplug cpu: migrate a task within its cpuset Dhaval Giani (1): sched: group scheduling, sysfs tunables Dmitry Adamushko (16): sched: clean up struct load_stat sched: clean up schedstat block in dequeue_entity() sched: sched_setscheduler() fix sched: add set_curr_task() calls sched: do not keep current in the tree and get rid of sched_entity::fair_key sched: optimize task_new_fair() sched: simplify sched_class::yield_task() sched: rework enqueue/dequeue_entity() to get rid of set_curr_task() sched: yield fix sched: fix __pick_next_entity() sched: tidy up SCHED_RR sched: cleanup, remove calc_weighted() sched: cleanup, make dequeue_entity() and update_stats_wait_end() similar sched: fix group scheduling for SCHED_BATCH sched: fix __set_task_cpu() SMP race sched: remove activate_idle_task() Eric Dumazet (1): sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC Eugene Teo (1): Fix tsk->exit_state usage Gautham R Shenoy (1): sched: fix rt ptracer monopolizing CPU Hiroshi Shimamoto (1): sched: clean up sched_fork() Ingo Molnar (80): sched: fix sysctl_sched_child_runs_first flag sched: resched task in task_new_fair() sched: small sched_debug cleanup sched: debug: track maximum 'slice' sched: uniform tunings sched: use constants if !CONFIG_SCHED_DEBUG sched: remove stat_gran sched: remove precise CPU load sched: remove precise CPU load calculations #2 sched: track cfs_rq->curr on !group-scheduling too sched: cleanup: simplify cfs_rq_curr() methods sched: uninline __enqueue_entity()/__dequeue_entity() sched: speed up update_load_add/_sub() sched: clean up calc_weighted() sched: introduce se->vruntime sched: move sched_feat() definitions sched: optimize vruntime based scheduling sched: simplify check_preempt() methods sched: wakeup granularity increase sched: add se->vruntime debugging sched: remove SCHED_FEAT_SKIP_INITIAL sched: add more vruntime statistics sched: debug: update exec_clock only when SCHED_DEBUG sched: remove wait_runtime limit sched: remove wait_runtime fields and features sched: fix delay accounting performance regression sched: prettify /proc/sched_debug output sched: enhance debug output sched: kernel/sched_fair.c whitespace cleanups sched debug: BKL usage statistics sched: remove unneeded tunables sched debug: print settings sched debug: more width for parameter printouts sched: entity_key() fix sched: remove condition from set_task_cpu() sched: remove last_min_vruntime effect sched: undo some of the recent changes sched: fix sign check error in place_entity() sched: fix sched_fork() sched: remove set_leftmost() sched: clean up schedstats, cnt -> count sched: cleanup, remove stale comment sched: mark scheduling classes as const sched: whitespace cleanups sched: vslice fixups for non-0 nice levels sched: optimize schedule() a bit on SMP sched: tweak wakeup granularity sched: run sched_domain_debug() if CONFIG_SCHED_DEBUG=y sched: break out if printing a warning in sched_domain_debug() sched: style cleanup sched: kfree(NULL) is valid sched: cleanup: rename SCHED_FEAT_USE_TREE_AVG to SCHED_FEAT_TREE_AVG sched: cleanup: rename task_grp to task_group sched: cleanup: function prototype cleanups sched: fix: move the CPU check into ->task_new_fair() sched: update comment sched: clean up is_migration_thread() sched: do not normalize kernel threads via SysRq-N sched: do not wakeup-preempt with SCHED_BATCH tasks sched: speed up context-switches a bit sched: reintroduce cache-hot affinity sched: debug: increase width of debug line sched: debug, improve migration statistics sched: allow the immediate migration of cache-cold tasks sched: affine sync wakeups sched: sync wakeups preempt too sched: cleanup, fix spacing sched: cleanup, make struct rq comments more consistent sched: add KERN_CONT annotation sched: fix fastcall mismatch in completion APIs sched: clean up sched_domain_debug() sched: fix style of swap() macro in kernel/sched_fair.c sched: fix style in kernel/sched.c sched: reintroduce SMP tunings again sched: turn off PREEMPT_RESTRICT sched: remove PREEMPT_RESTRICT sched: wakeup preemption fix sched: clean up the wakeup preempt check sched: clean up the wakeup preempt check, #2 sched: reorder SCHED_FEAT_ bits James Bottomley (1): sched: fix incorrect assumption that cpu 0 exists Ken Chen (2): sched: fix improper load balance across sched domain sched: reduce schedstat variable overhead a bit Laurent Vivier (2): sched: guest CPU accounting: maintain stats in account_system_time() sched: don't clear PF_VCPU in scheduler Matthias Kaehlcke (1): sched: use list_for_each_entry_safe() in __wake_up_common() Michael Neuling (2): Add scaled time to taskstats based process accounting kernel/sched.c: remove bogus comment from account_user_time Mike Galbraith (3): sched: fix SMP migration latencies sched: fix formatting of /proc/sched_debug sched: prevent wakeup over-scheduling Milton Miller (7): sched: domain sysctl fixes: use kcalloc() sched: domain sysctl fixes: use for_each_online_cpu() sched: domain sysctl fixes: unregister the sysctl table before domains sched: domain sysctl fixes: do not crash on allocation failure sched: domain sysctl fixes: add terminator comment sched: more robust sd-sysctl entry freeing sched: fix sched_domain sysctl registration again Oleg Nesterov (3): do CPU_DEAD migrating under read_lock(tasklist) instead of write_lock_irq(tasklist) migration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock() sched: fix SCHED_FIFO tasks & FAIR_GROUP_SCHED Paul E. McKenney (1): sched: export cpu_clock() Paul Jackson (2): cpuset: remove sched domain hooks from cpusets cpuset sched_load_balance flag Paul Menage (4): Task Control Groups: example CPU accounting subsystem Fix cpusets update_cpumask sched: clean up some control group code sched: report CPU usage in CFS cgroup directories Pavel Emelyanov (3): pid namespaces: changes to show virtual ids to user Uninline find_task_by_xxx set of functions Use helpers to obtain task pid in printks Peter Williams (2): sched: reduce balance-tasks overhead sched: isolate SMP balancing code a bit more Peter Zijlstra (21): sched: simplify SCHED_FEAT_* code sched: new task placement for vruntime sched: simplify adaptive latency sched: clean up new task placement sched: add tree based averages sched: handle vruntime 64-bit overflow sched: better min_vruntime tracking sched: add vslice sched debug: check spread sched: max_vruntime() simplification sched: clean up min_vruntime use sched: speed up and simplify vslice calculations sched: another wakeup_granularity fix sched: disable sleeper_fairness on SCHED_BATCH sched: disable forced preemption by default sched: activate task_hot() only on fair-scheduled tasks sched: fix unconditional irq lock sched: fix vslice sched: documentation: place_entity() comments sched: reintroduce the sched_min_granularity tunable sched: avoid large irq-latencies in smp-balancing S.Caglar Onur (1): sched debug: BKL usage statistics, fix Satyam Sharma (1): sched: use show_regs() to improve __schedule_bug() output Srivatsa Vaddagiri (16): sched: group-scheduler core sched: revert recent removal of set_curr_task() sched: fix minor bug in yield sched: print nr_running and load in /proc/sched_debug sched: print &rq->cfs stats sched: clean up code under CONFIG_FAIR_GROUP_SCHED sched: add fair-user scheduler sched: group scheduler wakeup latency fix sched: group scheduler SMP migration fix sched: group scheduler, fix coding style issues sched: group scheduler, fix bloat sched: group scheduler, fix latency sched: fix new task startup crash Hook up group scheduler with control groups sched: move rcu_head to task_group struct sched: fix copy_namespace() <-> sched_fork() dependency in do_fork Zou Nan hai (1): sched: some proc entries are missed in sched_domain sys_ctl debug code -


Thanks!!
That is what precisely I was waiting for to roll out the updates on the kernel I use (2.6.22 serie). Many thanks to all kernel developers for the hard work.
smp-only
Make sure you compile with CONFIG_SMP=y even if you have only one core. The -v24 backport patch (at least the 2.6.23.8 variant) doesn't work for uniprocessor kernels.
If it doesn't work then it's
If it doesn't work then it's a bug and should be reported to Ingo.
Works fine here on UP.
Works fine here on UP.
Sound familiar
Just like Roman's scheduler did months ago. Imagine that.
And your point is?
And your point is?
I really don't see a point in your statement, but you seem to be implying that Roman's scheduler was better just because it had one additional feature that is useful? Have you forgotten that, at that point, CFS had already had several more features compared to Roman's, such as group scheduling and instrumentation?
My point
My point is that the scheduler mafia routinely receives valuable contributions and ignores them. Then they deviously reimplement the ideas without giving proper attribution. For in outside contributer this is the worst place in the kernel to work. And, not surprisingly, this is technically the worst part of the kernel.
My POV
I'm seeing a completely inverse picture here.
In order to take advantage of Roman's contributions, the kernel team would have had to replace the whole CFS. That wouldn't have made much sense as I explained in my previous post.
Ironically, CFS was already fully functional when Roman ignored it and started writing his own reimplementation of CFS. Roman decided not to cooperate with other developers and add to CFS.
I personally found his exchanges with Ingo evasive, as if he didn't even want other developers to understand his scheduler. For instance, he was unwilling to break his work into a set of smaller patches, and this is absolutely essential to getting your code reviewed and accepted in the first place (even if it had made sense to throw out CFS completely at that point). Ingo even offered to do this work for him in order to learn from his scheduler, with the "RSDL".
Obviously nobody could force Roman to port his improvements over to CFS, so there was no other choice than to wait for someone else to do it, such as Peter Ziljstra.
I can't agree with this. Quoting Ingo's announcement: "That's 187 individual commits from 32 authors.". Only 80 of these commits came from Ingo. None of these contributions were "ignored by the scheduler mafia".
Just because some people fail to get along with kernel developers and make a huge fuss about it, doesn't mean that this is the case in general.
My POV
Nice POV ,, but ,, why bother? Everyone should know all these by now. Yet, some guys keep telling the same old story over and over again. It's something like football to them, they don't really care about arguments, facts and reality.
pfff... :)
Pot calling the kettle black
You mean like way Con's SD scheduler was already fully functional when Molnar wrote is own reimplementation CFS? The patches thing was a ruse. Molnar is known to stonewall contributers in this way, never honestly intending to merge their code. The question came down to was Molnar able or willing to understand Roman's work? The answer is a definitive no. A lot of us think that today's CFS scheduler is joe code.
Ahhh... When the first
Ahhh... When the first attack is refuted, try another one. Then another. Then another.
Oh, and only answer the paragraph where you think you have an edge.
You'd do fine in politics.
*Re*implementation
CFS was not a "reimplementation" of the SD, because the design of the two schedulers is nothing alike.
Roman's scheduler pretty much re-used the same approach as CFS, with various tweaks (many of which had already been implemented into CFS by Peter Ziljstra, by the time Roman posted his scheduler).
Wrong
Just like Roman's scheduler did months ago.
No, the RFS patch did not do that at all.
Take a look at the check_preempt_curr_fair() function in kernel/sched_norm.c that Roman wrote, it's using the same static timeslices that CFS is using: "gran_norm" is not load-dependent at all, it's static. (it's a modified version of the original CFS code and it did not change CFS's time-slicing logic.)
So your argument does not even pass the sniff test.
Just like Roman's scheduler
Much like something I did as an exercise, seven years ago or so--I don't think it's a revolutionary idea, but it's nice to see it in the kernel.
How to tune the scheduling on 2.6
Hi,
I have been using kernel 2.4 for a long time and I installed 2.6.22.12 and
2.6.23.8 last week. I find that when the CPU usage is 100%, kernel 2.6
becomes non responsive (sluggish). Currently, I am running kernel 2.4.35,
the CPU usage is 100% and I don't even notice.
I pointed my browser on kerneltrap and the first thing I see is Ingo's
message.
Is there a simple explanation as to why scheduling on kernel 2.6 is not
as good as on kernel 2.4.
Or are there parameters that I can set to improve interactivity under high load.
Thanks
Richard
Just to point out here, CFS
Just to point out here, CFS was merged into mainline for the 2.6.23 release, so you might want to check that kernel out. Other than that, there are quite a few reasons you could be having perceived sluggishness. One that seems common to me is not having proper DMA support in your kernel (IO slowness seems to make everything sluggish).
Lack of responsiveness at 100% CPU on 2.6 kernel
Thanks for the info. I am trying to get up to date.
What I am referring to by 'non responsiveness' is the lag
between the cursor movement on the screen and the mouse movement,
the time between typing a letter and seeing the character
on the screen, and general window operations such as getting
the focus on a window. All at 100% CPU.
Anyway I need to get used to 2.6. I was just surprised by
the difference in behavior between 2.4 and 2.6 on an otherwise
identical system and with mostly the same kernel parameters
(kernel 2.6 inherited most of the parameters from kernel 2.4
in my installation).
I noted already that DMA activation works differently on 2.6 than on 2.4.
I believe that DMA is activated by I still have to make sure.
Many of the options' names
Many of the options' names have changed between 2.4 and 2.6, it is probably just as easy to start from scratch when configuring a 2.6 kernel if coming from 2.4
The responsiveness of my pc is back to normal
Thanks for the nice comments. I replaced the hard disk IDE cable
with a 40 wire cable so my computer can now use dma5.
At last I think I have got the setup of kernel 2.6 right. My CPU is
currently running at 100% use (preparing a live dvd) and the response
is very good.
So I apologize for raising this issue.
But now, when I switch to a virtual console the screen becomes
dim. I tried 'setterm -half-bright off' with no effect.
After I boot, the brightness is normal, but after I switch
to another virtual console the text becomes almost unreadable.
I have searched the net and the kernel documentation without luck, yet.
Otherwise I feel comfortable running 2.6.