Re: [PATCH 1/2] Don't merge different partition's IOs

Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]
From: Satoru Takeuchi
Date: Tuesday, December 7, 2010 - 12:18 am

Hi Linus, Yasuaki,  and Jens

(2010/12/07 1:08), Linus Torvalds wrote:

The problem can occur even if your patches are applied. Think about a case
like the following.

  1) There are 2 partition, sda1 and sda2, on sda.
  2) Open sda and issue an IO to sda2's first sector. Then sda2's in_flight
     is incremented though you open not sda2 but sda. It is because of
     partition lookup method. It is based on which partition rq->__sector
     sector belongs to.
  3) Issue an IO to sda1's last sector and it merged to the IO issued in
     step (2) because their part are both sda. In addition, rq->__sector
     is modified to the sda1's region.
  4) After completing the IO, sda1's in_flight is decremented and diskstat
     is corrupted here.

I think fixing this case is difficult and would cause more complexity.

I hit on another approach. Although it doesn'tprevent any merge as Linus
preferred, it can fix the problem anyway. In this idea, in_flight is
incremented and decremented for the partition which the request belonged
to in its creation. It has the following merits.

  - It can fix the problem which Yasuaki reported, including the cases which
    I mentioned above.
  - It only append one extra field to request.

Although it would causes a bit gap, it doesn't have most influences because
merging requests beyond partitions is the rare case.

I confirmed the attached patch can be applied to 2.6.37-rc4 and succeeded
to compile. If you can accept this idea, I'll test it soon.

---
  block/blk-core.c       |   12 +++++++-----
  block/blk-merge.c      |    2 +-
  include/linux/blkdev.h |    6 ++++++
  3 files changed, 14 insertions(+), 6 deletions(-)

Index: linux-2.6.37-rc4/block/blk-core.c
===================================================================
--- linux-2.6.37-rc4.orig/block/blk-core.c	2010-11-30 13:42:04.000000000 +0900
+++ linux-2.6.37-rc4/block/blk-core.c	2010-12-07 14:31:55.000000000 +0900
@@ -64,11 +64,13 @@ static void drive_stat_acct(struct reque
  		return;
  
  	cpu = part_stat_lock();
-	part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
  
-	if (!new_io)
+	if (!new_io) {
+		part = disk_map_sector_rcu(rq->rq_disk, blk_rq_init_pos(rq));
  		part_stat_inc(cpu, part, merges[rw]);
-	else {
+	} else {
+		rq->__initial_sector = rq->__sector;
+		part = disk_map_sector_rcu(rq->rq_disk, blk_rq_init_pos(rq));
  		part_round_stats(cpu, part);
  		part_inc_in_flight(part, rw);
  	}
@@ -1776,7 +1778,7 @@ static void blk_account_io_completion(st
  		int cpu;
  
  		cpu = part_stat_lock();
-		part = disk_map_sector_rcu(req->rq_disk, blk_rq_pos(req));
+		part = disk_map_sector_rcu(req->rq_disk, blk_rq_init_pos(req));
  		part_stat_add(cpu, part, sectors[rw], bytes >> 9);
  		part_stat_unlock();
  	}
@@ -1796,7 +1798,7 @@ static void blk_account_io_done(struct r
  		int cpu;
  
  		cpu = part_stat_lock();
-		part = disk_map_sector_rcu(req->rq_disk, blk_rq_pos(req));
+		part = disk_map_sector_rcu(req->rq_disk, blk_rq_init_pos(req));
  
  		part_stat_inc(cpu, part, ios[rw]);
  		part_stat_add(cpu, part, ticks[rw], duration);
Index: linux-2.6.37-rc4/block/blk-merge.c
===================================================================
--- linux-2.6.37-rc4.orig/block/blk-merge.c	2010-11-30 13:42:04.000000000 +0900
+++ linux-2.6.37-rc4/block/blk-merge.c	2010-12-07 14:14:55.000000000 +0900
@@ -351,7 +351,7 @@ static void blk_account_io_merge(struct
  		int cpu;
  
  		cpu = part_stat_lock();
-		part = disk_map_sector_rcu(req->rq_disk, blk_rq_pos(req));
+		part = disk_map_sector_rcu(req->rq_disk, blk_rq_init_pos(req));
  
  		part_round_stats(cpu, part);
  		part_dec_in_flight(part, rq_data_dir(req));
Index: linux-2.6.37-rc4/include/linux/blkdev.h
===================================================================
--- linux-2.6.37-rc4.orig/include/linux/blkdev.h	2010-11-30 13:42:04.000000000 +0900
+++ linux-2.6.37-rc4/include/linux/blkdev.h	2010-12-07 14:13:11.000000000 +0900
@@ -91,6 +91,7 @@ struct request {
  	/* the following two fields are internal, NEVER access directly */
  	unsigned int __data_len;	/* total data len */
  	sector_t __sector;		/* sector cursor */
+	sector_t __initial_sector;
  
  	struct bio *bio;
  	struct bio *biotail;
@@ -730,6 +731,11 @@ static inline sector_t blk_rq_pos(const
  	return rq->__sector;
  }
  
+static inline sector_t blk_rq_init_pos(const struct request *rq)
+{
+	return rq->__initial_sector;
+}
+
  static inline unsigned int blk_rq_bytes(const struct request *rq)
  {
  	return rq->__data_len;

--
Previous message: [thread] [date] [author]
Next message: [thread] [date] [author]

Messages in current thread:
[PATCH 1/2] Don't merge different partition's IOs, Yasuaki Ishimatsu, (Mon Dec 6, 2:44 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Linus Torvalds, (Mon Dec 6, 9:08 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Satoru Takeuchi, (Tue Dec 7, 12:18 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Vivek Goyal, (Tue Dec 7, 11:39 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Satoru Takeuchi, (Wed Dec 8, 12:59 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Satoru Takeuchi, (Wed Dec 8, 1:11 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Jerome Marchand, (Fri Dec 10, 4:22 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Jerome Marchand, (Fri Dec 10, 9:12 am)
Re: [PATCH 1/2] Don't merge different partition's IOs, Vivek Goyal, (Fri Dec 10, 9:55 am)
[PATCH] block: fix accounting bug on cross partition merges, Jerome Marchand, (Fri Dec 17, 6:42 am)
[PATCH 1/2] kref: add kref_test_and_get, Jerome Marchand, (Tue Jan 4, 8:52 am)
Re: [PATCH 1/2] kref: add kref_test_and_get, Eric Dumazet, (Tue Jan 4, 9:05 am)
Re: [PATCH 1/2] kref: add kref_test_and_get, Greg KH, (Tue Jan 4, 1:57 pm)