Read-only filesystem on 16.04 (linux 4.15)
Some of our ubuntu machines (16.04 on 4.15 kernel) are suddenly turning into a read-only filesystem after approx. 5 minutes operation:
The error is the following:
{{{
Jan 7 13:26:12 lj000601 kernel [ 311.818652] ata1.00: READ LOG DMA EXT failed, trying PIO
Jan 7 13:26:12 lj000601 kernel [ 311.823232] ata1.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
Jan 7 13:26:12 lj000601 kernel [ 311.823237] ata1.00: irq_stat 0x40000008
Jan 7 13:26:12 lj000601 kernel [ 311.823242] ata1.00: failed command: READ FPDMA QUEUED
Jan 7 13:26:12 lj000601 kernel [ 311.823250] ata1.00: cmd 60/08:80:
Jan 7 13:26:12 lj000601 kernel [ 311.823250] res 41/40:00:
Jan 7 13:26:12 lj000601 kernel [ 311.823254] ata1.00: status: { DRDY ERR }
Jan 7 13:26:12 lj000601 kernel [ 311.823257] ata1.00: error: { UNC }
Jan 7 13:26:12 lj000601 kernel [ 311.828470] ata1.00: configured for UDMA/133
Jan 7 13:26:12 lj000601 kernel [ 311.829567] sd 0:0:0:0: [sda] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=
Jan 7 13:26:12 lj000601 kernel [ 311.829571] sd 0:0:0:0: [sda] tag#16 Sense Key : Medium Error [current]
Jan 7 13:26:12 lj000601 kernel [ 311.829575] sd 0:0:0:0: [sda] tag#16 Add. Sense: Unrecovered read error - auto reallocate failed
Jan 7 13:26:12 lj000601 kernel [ 311.829579] sd 0:0:0:0: [sda] tag#16 CDB: Read(10) 28 00 02 c1 1b 38 00 00 08 00
Jan 7 13:26:12 lj000601 kernel [ 311.829582] print_req_error: I/O error, dev sda, sector 46209848
Jan 7 13:26:12 lj000601 kernel [ 311.829615] EXT4-fs error (device sda1): ext4_find_
Jan 7 13:26:12 lj000601 kernel [ 311.829617] ata1: EH complete
Jan 7 13:26:12 lj000601 kernel [ 311.830654] Aborting journal on device sda1-8.
Jan 7 13:26:12 lj000601 kernel [ 311.831394] EXT4-fs (sda1): Remounting filesystem read-only
Jan 7 13:26:12 lj000601 kernel [ 311.831407] EXT4-fs error (device sda1): ext4_journal_
}}}
PS: see further details in kernel.log
The machines have moderated disk access rates, they are retail point of sale (graphical interface, internal web server, local postgres and several USB devices), nothing terribly complex.
The recovery process is laborious, requiring local intervention to run fsck on the faulty block. Then it comes back as if nothing happened, for a while though, because we are starting seeing the issue resurfacing.
The easy conclusion is hardware defect, but the problem happen in a wide range to SSDs manufacturers and level of usage, as seen in the smartctl.txt attached.
Looking forward to any hints on debugging this problem further.
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Ubuntu linux Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
This question was originally filed as bug #1858784.
Can you help with this problem?
Provide an answer of your own, or ask Celso Providelo for more information if necessary.