Hello, An external HDD, usb-encased, works fine under 2.6.26.5, but under 2.6.27.2 I get hundreds of errors per second, of 'No Sense [current]'. On connection, under 2.6.26.5, I get the following: [ 84.503949] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8 [ 84.696861] usb 1-2.1.1: configuration #1 chosen from 1 choice [ 84.740856] usbcore: registered new interface driver libusual [ 84.817625] Initializing USB Mass Storage driver... [ 84.817625] scsi2 : SCSI emulation for USB Mass Storage devices [ 84.818820] usbcore: registered new interface driver usb-storage [ 84.818823] USB Mass Storage support registered. [ 89.819567] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0 [ 89.857627] Driver 'sd' needs updating - please use bus_type methods [ 89.857837] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB) [ 89.858754] sd 2:0:0:0: [sda] Write Protect is off [ 89.858754] sd 2:0:0:0: [sda] Assuming drive cache: write through [ 89.858869] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB) [ 89.858869] sd 2:0:0:0: [sda] Write Protect is off [ 89.858869] sd 2:0:0:0: [sda] Assuming drive cache: write through [ 89.858869] sda: sda1 [ 89.879851] sd 2:0:0:0: [sda] Attached SCSI disk [ 89.901630] sd 2:0:0:0: Attached scsi generic sg0 type 0 [ 89.963632] sd 2:0:0:0: [sda] Sense Key : No Sense [current] [ 89.963632] sd 2:0:0:0: [sda] Add. Sense: No additional sense information [ 106.913629] kjournald starting. Commit interval 5 seconds [ 106.913629] EXT3 FS on dm-5, internal journal On 2.6.27.2, I get: [ 551.484341] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8 [ 551.677467] usb 1-2.1.1: configuration #1 chosen from 1 choice [ 551.718201] usbcore: registered new interface driver libusual [ 551.799287] Initializing USB Mass Storage driver... [ 551.800330] scsi2 : SCSI emulation for USB Mass Storage devices [ 551.800694] usbcore: ...
[Adding CCs] --
You can use usbmon to capture the details of what happens when you plug in the drive. Instructions are in the kernel source file Documentation/usb/usbmon.txt. Alan Stern --
Try reversing this patch (apply with patch -p1 -R)
James
---
commit 10dab22664914505dcb804d9ad09cad6bc94d349
Author: Jamie Wellnitz <Jamie.Wellnitz@emulex.com>
Date: Thu Sep 11 21:39:36 2008 -0400
[SCSI] sd: Fix handling of NO_SENSE check condition
The current handling of NO_SENSE check condition is the same as
RECOVERED_ERROR, and assumes that in both cases, the I/O was fully
transferred.
We have seen cases of arrays returning with NO_SENSE (no error), but
the I/O was not completely transferred, thus residual set. Thus,
rather than return good_bytes as the entire transfer, set good_bytes
to 0, so that the midlayer then applies the residual in calculating
the transfer, and for sd, will fail the I/O and fall into a retry
path.
Signed-off-by: Jamie Wellnitz <Jamie.Wellnitz@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7c4d2e6..55e6ed4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1054,7 +1054,6 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = sd_completed_bytes(SCpnt);
break;
case RECOVERED_ERROR:
- case NO_SENSE:
/* Inform the user, but make sure that it's not treated
* as a hard error.
*/
@@ -1063,6 +1062,15 @@ static int sd_done(struct scsi_cmnd *SCpnt)
memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
good_bytes = scsi_bufflen(SCpnt);
break;
+ case NO_SENSE:
+ /* This indicates a false check condition, so ignore it. An
+ * unknown amount of data was transferred so treat it as an
+ * error.
+ */
+ scsi_print_sense("sd", SCpnt);
+ SCpnt->result = 0;
+ memset(SCpnt->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
+ break;
case ABORTED_COMMAND:
if (sshdr.asc == 0x10) { /* DIF: Disk detected corruption */
scsi_print_result(SCpnt);
--
I'll try it tomorrow. Thanks, Luciano Rocha --=20 Luciano Rocha <luciano@eurotux.com> Eurotux Inform=E1tica, S.A. <http://www.eurotux.com/>
<snip> 2.6.27.4 comes with it reverted. But I still get the same error. Reapplied it as one more test, with the same error. Regards, Luciano Rocha --=20 Luciano Rocha <luciano@eurotux.com> Eurotux Inform=E1tica, S.A. <http://www.eurotux.com/>
Now in 2.6.27.4, same problem. The usb traffic is: f7161ec0 3564884672 C Ii:1:005:1 0:2048 1 = 02 f7161ec0 3564884690 S Ii:1:005:1 -115:2048 1 < f20138c0 3564884706 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3564885023 C Ci:1:005:0 0 4 = 01010100 f20138c0 3564885039 S Co:1:005:0 s 23 01 0010 0001 0000 0 f20138c0 3564885273 C Co:1:005:0 0 0 f20138c0 3564885291 S Co:1:005:0 s 23 03 0016 0001 0000 0 f20138c0 3564885522 C Co:1:005:0 0 0 f20138c0 3564885534 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3564885773 C Ci:1:005:0 0 4 = 01010000 f20138c0 3564911773 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3564912022 C Ci:1:005:0 0 4 = 01010000 f20138c0 3564937773 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3564938022 C Ci:1:005:0 0 4 = 01010000 f20138c0 3564963772 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3564964020 C Ci:1:005:0 0 4 = 01010000 f20138c0 3564989773 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3564990020 C Ci:1:005:0 0 4 = 01010000 f20138c0 3564990043 S Co:1:005:0 s 23 03 0004 0001 0000 0 f20138c0 3564990270 C Co:1:005:0 0 0 f20138c0 3565000771 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3565001020 C Ci:1:005:0 0 4 = 11050000 f20138c0 3565011770 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3565012019 C Ci:1:005:0 0 4 = 03051000 f20138c0 3565062770 S Co:1:005:0 s 23 01 0014 0001 0000 0 f20138c0 3565063018 C Co:1:005:0 0 0 f20138c0 3565063102 S Ci:1:000:0 s 80 06 0100 0000 0040 64 < f20138c0 3565072896 C Ci:1:000:0 0 18 = 12010002 00000040 40088200 01000102 0301 f20138c0 3565072920 S Co:1:005:0 s 23 03 0004 0001 0000 0 f20138c0 3565073268 C Co:1:005:0 0 0 f20138c0 3565083782 S Ci:1:005:0 s a3 00 0000 0001 0004 4 < f20138c0 3565084018 C Ci:1:005:0 0 4 = 03051000 f20138c0 3565134773 S Co:1:005:0 s 23 01 0014 0001 0000 0 f20138c0 3565135017 C Co:1:005:0 0 0 f20138c0 3565135045 S Co:1:000:0 s 00 05 0008 0000 0000 0 f20138c0 3565135266 C Co:1:000:0 0 0 f20138c0 3565146771 S Ci:1:008:0 s 80 06 0100 0000 0012 18 < f20138c0 ...
This looks exactly like the "infinite retry" problem I warned about earlier. Here are the important parts of the log. For people who don't know how to interpret these messages, the CDB starts in the 16th byte of the 31-byte messages. For example, the first command here The response is 0x2e9390b0. In typical broken fashion, that is undoubtedly the total number of sectors rather than the highest sector number. Later on the system tries to read the contents of what it thinks is the There's no data in the response, and the 01 on the line above The automatically-generated REQUEST SENSE gets the 18-byte response you see above. It is entirely empty (No Sense). The remainder of the trace shows the same command being repeated over etc... There's a patch which might help resolve this problem: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8bfa24... It is already present in 2.6.28-rc1, so it's worth a try. If it does fix things, let me know so I can submit it for a future 2.6.27.stable release. Alan Stern --
Since the READ CAPACITY "off by one" error is so common, perhaps drivers such as usb-storage could have a hook to do a pseudo READ CAPACITY. Then if the capacity value looked odd (in both senses) the driver could do an IO to the suspect block and if that failed decrement the capacity value passed back to the mid level. Put another way, why don't these defective devices trip up another OS? BTW a single disk in RAID 0 (seen on a HP E200 controller) has a shortened capacity value seen in the midlevel on the corresponding logical drive. That missing chunk is probably where the RAID controller puts its control information. Anyway, playing with the capacity value returned by READ I know that happens but it seems strange that upper levels are reading a block that has never been written to. Read ahead? --
Window$ never reads the last sector unless it is actually written to. I had such a device it got stuck when I wrote to the That would be udev or hald, I can't remember which. It is a special Linux Boaz --
It's vol_id from udev. To auto-detect raid setups, LVM, MD metadata, ... Such devices carry their magic sequence at the "end" of the device, usually the last sector or a few sectors before the last sector. The "last" sector is obviously determined by the capacity the device reports. There is no way around looking at all block devices' last sector on a Linux box, as long as people expect auto-setup, auto-mounting, anything that is not put in /etc/fstab to work, as long as any "meta device" format puts a magic number at the end of the device. Kay --
We thought of that years ago. Unfortunately there is no reliable way of telling when a capacity value is wrong. There definitely do exist disks with an odd number of sectors. Furthermore, doing I/O to a suspect block is not a good idea. There are plenty of devices which simply crash when you try to access a I imagine they do. However Linux has partition code that stores information in the last sector of a partition (EFI GUID and md, for example). Other OS's apparently do not try to access the medium's last usb-storage isn't in the business of altering the data it gets from a device. It's just a transport. That's why the sdev->fix_capacity flag exists; we tell the upper layer that the data it gets is going to be No, partition scanning. Also maybe /lib/udev/vol_id, which seems to read an inordinate number of irrelevant sectors. Alan Stern --
And you do this by setting US_FL_FIX_CAPACITY in unusual_devs.h James --
Hm, where in the code can I change it to return less that what it thinks is the last sector? No luck. I'm going to try 2.6.28-rc2 now. Regards, Luciano Rocha -- Luciano Rocha <luciano@eurotux.com> Eurotux Informática, S.A. <http://www.eurotux.com/> --
I can write a patch to do that for you. However I'm more interested in solving the "infinite-retry" problem first. I did write some patches which might help. They weren't meant as bug fixes exactly, more as infrastructure cleanup. But you ought to try them out, because they do affect the logic in this area. The patches are here: http://marc.info/?l=linux-scsi&m=122443015406309&w=2 They are based on 2.6.27, not 2.6.28-rc. Alan Stern --
I've tested them (on 2.6.27.4), but I still get the bug. That email mentions: "Neither patch addresses the infinite-retry problem; I wanted to keep the issues separate." So am I missing some other patch? 2.6.28-rc2 is also buggy. Thanks, Luciano Rocha -- Luciano Rocha <luciano@eurotux.com> Eurotux Informática, S.A. <http://www.eurotux.com/> --
Okay, I tried the same sort of experiment in 2.6.27.4 and got the same
result as you did. I was able to fix it by applying the 8bfa24727
commit mentioned earlier together with the patch below.
Now, I'm pretty sure this is not the ideal solution. Fixing the block
core would be better. Still, it does indicate for sure that there's a
real problem.
It's worth noting that the test added to scsi_requeue_command()
succeeds every time, that is, req->retries is always 0. Clearly more
work is needed.
Alan Stern
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries <= 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
--
I tried it, but I get some I/O errors with it. With 8bfa24727 and this patch, I get: [ 94.289819] usb 1-2.1.1: new high speed USB device using ehci_hcd and address 8 [ 94.493891] usb 1-2.1.1: configuration #1 chosen from 1 choice [ 94.537884] usbcore: registered new interface driver libusual [ 94.611888] Initializing USB Mass Storage driver... [ 94.612704] scsi2 : SCSI emulation for USB Mass Storage devices [ 94.619835] usbcore: registered new interface driver usb-storage [ 94.619891] USB Mass Storage support registered. [ 94.622190] usb-storage: device found at 8 [ 94.622232] usb-storage: waiting for device to settle before scanning [ 99.622933] scsi 2:0:0:0: Direct-Access SAMSUNG HD400LD WQ10 PQ: 0 ANSI: 0 [ 99.627569] usb-storage: device scan complete [ 99.657746] Driver 'sd' needs updating - please use bus_type methods [ 99.661428] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB) [ 99.662551] sd 2:0:0:0: [sda] Write Protect is off [ 99.662662] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 99.662702] sd 2:0:0:0: [sda] Assuming drive cache: write through [ 99.666582] sd 2:0:0:0: [sda] 781422769 512-byte hardware sectors (400088 MB) [ 99.672877] sd 2:0:0:0: [sda] Write Protect is off [ 99.672884] sd 2:0:0:0: [sda] Mode Sense: 03 00 00 00 [ 99.672886] sd 2:0:0:0: [sda] Assuming drive cache: write through [ 99.672890] sda: sda1 [ 99.683592] sd 2:0:0:0: [sda] Attached SCSI disk [ 99.704160] sd 2:0:0:0: Attached scsi generic sg0 type 0 [ 99.808029] sd 2:0:0:0: [sda] Sense Key : No Sense [current] [ 99.808040] sd 2:0:0:0: [sda] Add. Sense: No additional sense information [ 99.808051] end_request: I/O error, dev sda, sector 781422768 [ 99.808056] Buffer I/O error on device sda, logical block 781422768 [ 99.837771] sd 2:0:0:0: [sda] Sense Key : No Sense [current] [ 99.837780] sd 2:0:0:0: [sda] Add. Sense: No additional sense information [ 99.837790] end_request: I/O error, dev sda, sector ...
That's to be expected. These patches aren't supposed to prevent I/O errors, they are merely supposed to prevent the errors from continuing Now that is truly bizarre. It means that the drive responded differently even though it received the same sequence of commands as Yes. Please post a usbmon trace for each of the last two tests. That is, with the unusual_devs update only, and with the unusual_devs update plus the invoke_transport update and the scsi_requeue_command update. Alan Stern --
I've updated the bugzilla entry with traces for some combinations of the discussed patches: http://bugzilla.kernel.org/show_bug.cgi?id=11843 -- Luciano Rocha <luciano@eurotux.com> Eurotux Informática, S.A. <http://www.eurotux.com/> --
Okay, thanks. I see the problem. This whole area is a mess, and it's
going to be different in 2.6.28. But for now...
Please rerun the A+B and A+B+C tests, but this time using the B' patch
below instead of B.
Alan Stern
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries <= 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
* leftovers in the front of the
* queue, and goose the queue again.
*/
+ if (bytes > 0) /* Made progress */
+ ++req->retries;
scsi_requeue_command(q, cmd);
cmd = NULL;
}
--
New test data in the bugzilla. Regards, Luciano Rocha -- Luciano Rocha <luciano@eurotux.com> Eurotux Informática, S.A. <http://www.eurotux.com/> --
Sorry, dumb mistake on my part. B'' is below. It is the same as B'
except for one character.
Alan Stern
Index: 2.6.27.4/drivers/scsi/scsi_lib.c
===================================================================
--- 2.6.27.4.orig/drivers/scsi/scsi_lib.c
+++ 2.6.27.4/drivers/scsi/scsi_lib.c
@@ -611,6 +611,11 @@ static void scsi_requeue_command(struct
struct request *req = cmd->request;
unsigned long flags;
+ if (--req->retries < 0) {
+ blk_end_request(req, -EIO, blk_rq_bytes(req));
+ scsi_next_command(cmd);
+ return;
+ }
scsi_unprep_request(req);
spin_lock_irqsave(q->queue_lock, flags);
blk_requeue_request(q, req);
@@ -690,6 +695,8 @@ static struct scsi_cmnd *scsi_end_reques
* leftovers in the front of the
* queue, and goose the queue again.
*/
+ if (bytes > 0) /* Made progress */
+ ++req->retries;
scsi_requeue_command(q, cmd);
cmd = NULL;
}
--
A+B''+C seems to have done it. The bugzilla entry was updated with the kernel messages and usb traffic. http://bugzilla.kernel.org/show_bug.cgi?id=3D11843 Regards, Luciano Rocha --=20 Luciano Rocha <luciano@eurotux.com> Eurotux Inform=E1tica, S.A. <http://www.eurotux.com/>
Here's the promised patch. Alan Stern Index: usb-2.6/drivers/usb/storage/unusual_devs.h =================================================================== --- usb-2.6.orig/drivers/usb/storage/unusual_devs.h +++ usb-2.6/drivers/usb/storage/unusual_devs.h @@ -1251,6 +1251,13 @@ UNUSUAL_DEV( 0x0839, 0x000a, 0x0001, 0x0 US_SC_DEVICE, US_PR_DEVICE, NULL, US_FL_FIX_INQUIRY), +/* Reported by Luciano Rocha <luciano@eurotux.com> */ +UNUSUAL_DEV( 0x0840, 0x0082, 0x0001, 0x0001, + "Argosy", + "Storage", + US_SC_DEVICE, US_PR_DEVICE, NULL, + US_FL_FIX_CAPACITY), + /* Entry and supporting patch by Theodore Kilgore <kilgota@auburn.edu>. * Flag will support Bulk devices which use a standards-violating 32-byte * Command Block Wrapper. Here, the "DC2MEGA" cameras (several brands) with --
