Skip to content

san_database_messages_corruptions

PhilippeLeroux edited this page Oct 26, 2017 · 1 revision

Table of Contents


Suite à la mise à jour d'aout 2017, les datafiles des bases de données sont très rapidement corrompus, c'est visible en lançant une sauvegarde.

Erreur dans l'alert.log

2017-09-06T15:55:16.051719+02:00
Hex dump of (file 1, block 67616) in trace file /u01/app/oracle/diag/rdbms/again/AGAIN/trace/AGAIN_ora_11641_12285.trc

Corrupt block relative dba: 0x00410820 (file 1, block 67616)
Bad check value found during backing up datafile
Data in bad block:
 type: 6 format: 2 rdba: 0x00410820
 last change scn: 0x0000.0000.000b5e22 seq: 0x1 flg: 0x06
 spare3: 0x0
 consistency value in tail: 0x5e220601
 check value in block header: 0x8ba5
 computed block checksum: 0xb81d

Reread of blocknum=67616, file=/u02/database/AGAIN/datafile/o1_mf_system_dv0dxc93_.dbf. found valid data
2017-09-06T15:55:34.392480+02:00
Hex dump of (file 3, block 34608) in trace file /u01/app/oracle/diag/rdbms/again/AGAIN/trace/AGAIN_ora_11641_12285.trc

Corrupt block relative dba: 0x00c08730 (file 3, block 34608)
Fractured block found during backing up datafile
Data in bad block:
 type: 6 format: 2 rdba: 0x00c08730
 last change scn: 0x0000.0000.001587a3 seq: 0x1 flg: 0x04
 spare3: 0x0
 consistency value in tail: 0x30003300
 check value in block header: 0xe6e4
 computed block checksum: 0x7685

Reread of blocknum=34608, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0dygp4_.dbf. found same corrupt data
Reread of blocknum=34608, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0dygp4_.dbf. found same corrupt data
Reread of blocknum=34608, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0dygp4_.dbf. found same corrupt data
Reread of blocknum=34608, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0dygp4_.dbf. found same corrupt data
Reread of blocknum=34608, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0dygp4_.dbf. found same corrupt data
2017-09-06T15:55:45.399543+02:00
Deleted Oracle managed file /u03/recovery/AGAIN/datafile/o1_mf_sysaux_dv0fngyn_.dbf
Checker run found 1 new persistent data failures
2017-09-06T15:55:52.186407+02:00
PDB$SEED(2):Hex dump of (file 6, block 2136) in trace file /u01/app/oracle/diag/rdbms/again/AGAIN/tce/AGAIN_ora_11641_12285.trc
PDB$SEED(2):
PDB$SEED(2):Corrupt block relative dba: 0x01000858 (file 6, block 2136)
PDB$SEED(2):Fractured block found during backing up datafile
PDB$SEED(2):Data in bad block:
PDB$SEED(2): type: 6 format: 2 rdba: 0x01000858
PDB$SEED(2): last change scn: 0x0000.0000.000a1b27 seq: 0x1 flg: 0x04
PDB$SEED(2): spare3: 0x0
PDB$SEED(2): consistency value in tail: 0xe12446d6
PDB$SEED(2): check value in block header: 0xbd9e
PDB$SEED(2): computed block checksum: 0xec59
PDB$SEED(2):
PDB$SEED(2):Reread of blocknum=2136, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0f153x_.dbf. und same corrupt data
PDB$SEED(2):Reread of blocknum=2136, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0f153x_.dbf. und same corrupt data
PDB$SEED(2):Reread of blocknum=2136, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0f153x_.dbf. und same corrupt data
PDB$SEED(2):Reread of blocknum=2136, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0f153x_.dbf. und same corrupt data
PDB$SEED(2):Reread of blocknum=2136, file=/u02/database/AGAIN/datafile/o1_mf_sysaux_dv0f153x_.dbf. und same corrupt data
2017-09-06T15:55:57.377766+02:00
PDB$SEED(2):Deleted Oracle managed file /u03/recovery/AGAIN/5889E9722C52233EE05366F0FAC07BA8/datafi/o1_mf_sysaux_dv0fo7xm_.dbf

Sur le serveur K2 avec journalctl -f

sept. 06 15:46:51 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:46:51 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:46:53 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:47:26 K2.orcl chronyd[739]: Selected source 95.81.173.74
sept. 06 15:49:12 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:49:13 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:50:01 K2.orcl systemd[1]: Started Session 16 of user root.
sept. 06 15:50:01 K2.orcl systemd[1]: Starting Session 16 of user root.
sept. 06 15:50:01 K2.orcl CROND[3135]: (root) CMD (/usr/lib64/sa/sa1 1 1)
sept. 06 15:51:24 K2.orcl sshd[3141]: Accepted publickey for root from 192.250.240.1 port 33160 ssh RSA SHA256:1h0yG9mU+UeetNA7KvpUUdi3e6XLJli97UpM3gue1gI
sept. 06 15:51:24 K2.orcl systemd[1]: Started Session 17 of user root.
sept. 06 15:51:24 K2.orcl systemd-logind[706]: New session 17 of user root.
sept. 06 15:51:24 K2.orcl systemd[1]: Starting Session 17 of user root.
sept. 06 15:51:24 K2.orcl sshd[3141]: pam_unix(sshd:session): session opened for user root by (uid=
sept. 06 15:51:24 K2.orcl dbus[708]: [system] Activating service name='org.freedesktop.problems' (ung servicehelper)
sept. 06 15:51:24 K2.orcl dbus-daemon[708]: dbus[708]: [system] Activating service name='org.freedetop.problems' (using servicehelper)
sept. 06 15:51:24 K2.orcl dbus[708]: [system] Successfully activated service 'org.freedesktop.probls'
sept. 06 15:51:24 K2.orcl dbus-daemon[708]: dbus[708]: [system] Successfully activated service 'orgreedesktop.problems'
sept. 06 15:53:15 K2.orcl sshd[3141]: Received disconnect from 192.250.240.1 port 33160:11: disconnted by user
sept. 06 15:53:15 K2.orcl sshd[3141]: Disconnected from 192.250.240.1 port 33160
sept. 06 15:53:15 K2.orcl sshd[3141]: pam_unix(sshd:session): session closed for user root
sept. 06 15:53:15 K2.orcl systemd-logind[706]: Removed session 17.
sept. 06 15:55:03 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:04 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:09 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:09 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:09 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:11 K2.orcl kernel: eth1: bad gso type 162.
sept. 06 15:55:12 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:12 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:13 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:14 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:16 K2.orcl kernel: eth1: bad gso type 162.
sept. 06 15:55:18 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:20 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:20 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:27 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:29 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:29 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:55:32 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:56:14 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:56:15 K2.orcl kernel: net eth1: Unexpected TXQ (0) queue failure: -28
sept. 06 15:56:24 K2.orcl kernel: eth1: zero gso size.

Résolution

Le problème est résolu en passant le serveur d'infra sur le noyau redhat.

Ajouter l'option nobarrier pour les FS était pas bien malin.

Clone this wiki locally