divide by zero bug in find_busiest_group

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Chetan Ahuja
Date: Wednesday, August 25, 2010 - 6:17 pm

This has been filed as a bug in the kernel bugzilla
(https://bugzilla.kernel.org/show_bug.cgi?id=16991)
but the visibility on bugzilla seems low ( and the bugizlla server
seems to get overly "stressed" during
certain parts of the day)  so here's my "summary" of the discussion so
far. If for nothing else, so it gets
indexed by search engines etc.

We've seen a divide-by-zero crash  in the function  update_sg_lb_stats
(inlined into find_busiest_group) at the following location :

 /usr/src/linux/kernel/sched.c:3769
*balance = 0;
return;
}

/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) /
group->cpu_power;
aff5: 48 c1 e0 0a shl $0xa,%rax
aff9: 48 f7 f6 div %rsi

Apparently group->cpu_power can be zero under some conditions.

    I noticed  (what I thought was) a race condition between cpu_power
being initted to zero
(in build_xxx_groups functions in sched.c) and their use as
denominator in find_busiest/idlest_group
functions. PeterZ replied that there's a safe codepath from
build_*_groups functions to the crash
location  which guaranteed  a non-zero value.  I did express concern
that in absence of explicit
synchronization/mem-barriers we're at the mercy of compiler and
hardware doing us favors (by
 not re-ordering  instructions in an adverse way) for that guarantee.
But I don't think we got hit
 by the initial zeroes because all the crashes I saw happened after
many months of uptime.

  There's also another place group->cpu_power values gets updated
without any synchronization, in
the update_cpu_power function. Though the only way  this could result
in a bad value for cpu_power
is by  core A reading an in-transit value for a non-atomically-updated
64 bit value from core B :-). Unlikely ?
Very !!. Should we make that update explicity atomic ?  Would be prudent.

We do need more ideas on how the zero could have gotten there. The two
paths I mentioned above don't
provide that warm, fuzzy feeling yet.

Thanks
Chetan

P.S.

a)   kernel version (2.6.32 release from kernel.org. Though a similar
divide-by-zero has been
   reported as recently as 2.6.35 in a Ubuntu distribution kernel
here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/615135
b) Hardware :  8 core nehalem (Intel E5520).. /proc/cpuinfo shows 16
"hyperthreaded" cores.

some relevant CONFIG settings:
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
# CONFIG_NUMA_EMU is not set
CONFIG_ACPI_NUMA=y
.
.
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y

CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
CONFIG_SCHED_HRTICK=y
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_SCHED_TRACER is not set
--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
divide by zero bug in find_busiest_group, Chetan Ahuja, (Wed Aug 25, 6:17 pm)
Re: divide by zero bug in find_busiest_group, Venkatesh Pallipadi, (Thu Aug 26, 12:19 pm)
Re: divide by zero bug in find_busiest_group, Chetan Ahuja, (Thu Aug 26, 4:52 pm)
Re: divide by zero bug in find_busiest_group, Peter Zijlstra, (Fri Aug 27, 12:51 am)
Re: divide by zero bug in find_busiest_group, Peter Zijlstra, (Fri Aug 27, 1:08 am)
Re: divide by zero bug in find_busiest_group, Peter Zijlstra, (Fri Aug 27, 1:13 am)
Re: divide by zero bug in find_busiest_group, Venkatesh Pallipadi, (Fri Aug 27, 10:39 am)