[BUGFIX] memcg: avoid deadlock between move charge and try_charge()

Previous thread: [PATCH] Make swap accounting default behavior configurable by Michal Hocko on Tuesday, November 16, 2010 - 3:17 am. (15 messages)

Next thread: [PATCH] kernel: make /proc/kallsyms mode 400 to reduce ease of attacking by Marcus Meissner on Tuesday, November 16, 2010 - 3:46 am. (30 messages)
From: Daisuke Nishimura
Date: Tuesday, November 16, 2010 - 3:17 am

From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

__mem_cgroup_try_charge() can be called under down_write(&mmap_sem)(e.g.
mlock does it). This means it can cause deadlock if it races with move charge:

Ex.1)
                move charge             |        try charge
  --------------------------------------+------------------------------
    mem_cgroup_can_attach()             |  down_write(&mmap_sem)
      mc.moving_task = current          |    ..
      mem_cgroup_precharge_mc()         |  __mem_cgroup_try_charge()
        mem_cgroup_count_precharge()    |    prepare_to_wait()
          down_read(&mmap_sem)          |    if (mc.moving_task)
          -> cannot aquire the lock     |    -> true
                                        |      schedule()

Ex.2)
                move charge             |        try charge
  --------------------------------------+------------------------------
    mem_cgroup_can_attach()             |
      mc.moving_task = current          |
      mem_cgroup_precharge_mc()         |
        mem_cgroup_count_precharge()    |
          down_read(&mmap_sem)          |
          ..                            |
          up_read(&mmap_sem)            |
                                        |  down_write(&mmap_sem)
    mem_cgroup_move_task()              |    ..
      mem_cgroup_move_charge()          |  __mem_cgroup_try_charge()
        down_read(&mmap_sem)            |    prepare_to_wait()
        -> cannot aquire the lock       |    if (mc.moving_task)
                                        |    -> true
                                        |      schedule()

To avoid this deadlock, we do all the move charge works (both can_attach() and
attach()) under one mmap_sem section.
And after this patch, we set/clear mc.moving_task outside mc.lock, because we
use the lock only to check mc.from/to.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: <stable@kernel.org>
---
 mm/memcontrol.c |   43 ...
From: Andrew Morton
Date: Tuesday, November 16, 2010 - 1:41 pm

On Tue, 16 Nov 2010 19:17:48 +0900


The patch doesn't apply well to 2.6.36 so if we do want it backported
then please prepare a tested backport for the -stable guys?

Thanks.

--

From: Daisuke Nishimura
Date: Tuesday, November 16, 2010 - 5:24 pm

On Tue, 16 Nov 2010 12:41:17 -0800
O.K.
I'll test a backported patch for 2.6.36.y and send it after this is merged into mainline.

Thanks,
Daisuke Nishimura.
--

From: Daisuke Nishimura
Date: Friday, November 26, 2010 - 12:48 am

Done.

I've tested this backported patch on 2.6.36 and it works properly.
There is no change in mm/memcontrol.c from v2.6.36 to v2.6.36.1, so
this can be applied to 2.6.36.1 too.

===
From: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>

commit b1dd693e5b9348bd68a80e679e03cf9c0973b01b upstream.

__mem_cgroup_try_charge() can be called under down_write(&mmap_sem)(e.g.
mlock does it). This means it can cause deadlock if it races with move charge:

Ex.1)
                move charge             |        try charge
  --------------------------------------+------------------------------
    mem_cgroup_can_attach()             |  down_write(&mmap_sem)
      mc.moving_task = current          |    ..
      mem_cgroup_precharge_mc()         |  __mem_cgroup_try_charge()
        mem_cgroup_count_precharge()    |    prepare_to_wait()
          down_read(&mmap_sem)          |    if (mc.moving_task)
          -> cannot aquire the lock     |    -> true
                                        |      schedule()

Ex.2)
                move charge             |        try charge
  --------------------------------------+------------------------------
    mem_cgroup_can_attach()             |
      mc.moving_task = current          |
      mem_cgroup_precharge_mc()         |
        mem_cgroup_count_precharge()    |
          down_read(&mmap_sem)          |
          ..                            |
          up_read(&mmap_sem)            |
                                        |  down_write(&mmap_sem)
    mem_cgroup_move_task()              |    ..
      mem_cgroup_move_charge()          |  __mem_cgroup_try_charge()
        down_read(&mmap_sem)            |    prepare_to_wait()
        -> cannot aquire the lock       |    if (mc.moving_task)
                                        |    -> true
                                        |      schedule()

To avoid this deadlock, we do all the move charge works (both can_attach() and
attach()) under one mmap_sem section.
And after this ...
From: Greg KH
Date: Monday, December 6, 2010 - 3:46 pm

Thanks for the patch, I've now queued it up for .36-stable.

greg k-h
--

From: KAMEZAWA Hiroyuki
Date: Tuesday, November 16, 2010 - 5:11 pm

On Tue, 16 Nov 2010 19:17:48 +0900

Thanks,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--

Previous thread: [PATCH] Make swap accounting default behavior configurable by Michal Hocko on Tuesday, November 16, 2010 - 3:17 am. (15 messages)

Next thread: [PATCH] kernel: make /proc/kallsyms mode 400 to reduce ease of attacking by Marcus Meissner on Tuesday, November 16, 2010 - 3:46 am. (30 messages)