人妻少妇av无码一区二区-欧美黑人又粗又大又爽免费-国产真实伦在线观看-欲求不满的寂寞人妻中文字幕-越猛烈欧美xx00动态图

English

鼎甲公司使用kfed修復(fù)ASM磁盤組故障案例

前言

3月30日廣州某單位Oracle RAC的數(shù)據(jù)庫(kù)在添加新磁盤后出現(xiàn)故障,導(dǎo)致磁盤組無(wú)法掛載,該單位尋求Oracle公司解決,但費(fèi)用非常昂貴,他們抱著僥幸的心理找到了鼎甲科技。廣州鼎甲計(jì)算機(jī)科技有限公司作為國(guó)內(nèi)頂尖的數(shù)據(jù)容災(zāi)備份廠商,鼎甲科技憑借其雄厚的技術(shù)實(shí)力以及完善的服務(wù)體系,與各種行業(yè)用戶有著緊密的合作關(guān)系。鼎甲科技的技術(shù)人員在最短時(shí)間內(nèi),以零成本的服務(wù),成功解決了此故障,讓該單位的業(yè)務(wù)一切恢復(fù)正常,該單位對(duì)鼎甲科技的專業(yè)技術(shù)實(shí)力及服務(wù)質(zhì)量給予高度贊揚(yáng)及認(rèn)可。下面給大家分享本案例使用kfed修復(fù)ASM磁盤組故障過(guò)程。

一、故障現(xiàn)象

鼎甲公司了解情況后立刻派工程師前往探究,通過(guò)對(duì)ASM的v$asm_disk視圖的檢查,發(fā)現(xiàn)磁盤狀態(tài)全部顯示正常;執(zhí)行“alterdiskgroup dgdata mount;”顯示成功,但接著查詢v$asm_diskgroup,該磁盤組仍處于dismounted的狀態(tài)。檢查alert日志,看到磁盤組報(bào)告錯(cuò)誤,然后在mounted之后隨即被dismount了。alert日志如下:

Sat Mar 30 10:51:592013
NOTE: erasingincomplete header on grp 1 disk VOL19
NOTE: cache openingdisk 0 of grp 1: VOL10 label:VOL10
NOTE: F1X0 found ondisk 0 fcn 0.4276074
NOTE: cache openingdisk 1 of grp 1: VOL11 label:VOL11
NOTE: cache openingdisk 2 of grp 1: VOL12 label:VOL12
NOTE: cache openingdisk 3 of grp 1: VOL13 label:VOL13
NOTE: cache openingdisk 4 of grp 1: VOL14 label:VOL14
NOTE: cache openingdisk 5 of grp 1: VOL3 label:VOL3
NOTE: cache openingdisk 6 of grp 1: VOL4 label:VOL4
NOTE: cache openingdisk 7 of grp 1: VOL5 label:VOL5
NOTE: cache openingdisk 8 of grp 1: VOL6 label:VOL6
NOTE: cache openingdisk 9 of grp 1: VOL7 label:VOL7
NOTE: cache openingdisk 10 of grp 1: VOL8 label:VOL8
NOTE: cache openingdisk 11 of grp 1: VOL9 label:VOL9
NOTE: cache openingdisk 12 of grp 1: VOL1 label:VOL1
NOTE: cache openingdisk 13 of grp 1: VOL2 label:VOL2
NOTE: cache openingdisk 14 of grp 1: VOL15 label:VOL15
NOTE: cache openingdisk 15 of grp 1: VOL16 label:VOL16
NOTE: cache openingdisk 16 of grp 1: VOL17 label:VOL17
NOTE: cache openingdisk 17 of grp 1: VOL18 label:VOL18
NOTE: cachemounting (first) group 1/0x36E8615F (DGDATA)
* allocate domain1, invalid = TRUE
kjbdomatt send tonode 1
Sat Mar 30 10:51:592013
NOTE: attached torecovery domain 1
Sat Mar 30 10:51:592013
NOTE: startingrecovery of thread=1 ckpt=75.5792 group=1
NOTE: advancingckpt for thread=1 ckpt=75.5792
NOTE: cacherecovered group 1 to fcn 0.5174872
Sat Mar 30 10:51:592013
NOTE: opening chunk1 at fcn 0.5174872 ABA
NOTE: seq=76blk=5793
Sat Mar 30 10:51:592013
NOTE: cachemounting group 1/0x36E8615F (DGDATA) succeeded
WARNING: offliningdisk 16.3915944441 (VOL17) with mask 0x3
NOTE: PST update:grp = 1, dsk = 16, mode = 0x6
Sat Mar 30 10:51:592013
ERROR: too manyoffline disks in PST (grp 1)
NOTE: cache closingdisk 16 of grp 1: VOL17 label:VOL17
NOTE: cache closingdisk 16 of grp 1: VOL17 label:VOL17
Sat Mar 30 10:51:592013
SUCCESS: diskgroupDGDATA was mounted
Sat Mar 30 10:51:592013
ERROR:PST-initiated MANDATORY DISMOUNT of group DGDATA
NOTE: cachedismounting group 1/0x36E8615F (DGDATA)
Sat Mar 30 10:51:592013
NOTE: halting all I/Osto diskgroup DGDATA
Sat Mar 30 10:51:592013
kjbdomdet send tonode 1
detach from dom 1,sending detach message to node 1
Sat Mar 30 10:51:592013
Dirty detachreconfiguration started (old inc 2, new inc 2)
List of nodes:
 0 1
 Global Resource Directory partially frozen fordirty detach
* dirty detach -domain 1 invalid = TRUE
 10 GCS resources traversed, 0 cancelled
 4014 GCS resources on freelist, 6138 on array,6138 allocated
Dirty DetachReconfiguration complete
Sat Mar 30 10:51:592013
freeing rdom 1
Sat Mar 30 10:51:592013
WARNING: dirtydetached from domain 1
Sat Mar 30 10:51:592013
SUCCESS: diskgroupDGDATA was dismounted
 Received detach msg from node 1 for dom 2

憑著ASM知識(shí)的了解和經(jīng)驗(yàn),能大概知道是某個(gè)盤存在故障而被離線,進(jìn)而導(dǎo)致磁盤組由于缺少磁盤而被卸載。

目前最大的問(wèn)題就是,磁盤組無(wú)法掛載,導(dǎo)致無(wú)法對(duì)磁盤組進(jìn)行任何操作,即使想刪除可能存在故障的磁盤都沒(méi)有辦法。通過(guò)對(duì)某單位技術(shù)員溝通,了解到導(dǎo)致故障的操作:首先在磁盤組中添加3個(gè)新磁盤,報(bào)錯(cuò),隨后再嘗試將新磁盤單獨(dú)加入,仍報(bào)錯(cuò),此后發(fā)現(xiàn)磁盤組被卸載。

通過(guò)查閱、分析和對(duì)比相關(guān)的信息資料,在metalink上看到一些類似情況的解決辦法,使用dd清空故障磁盤頭部,或者強(qiáng)制將故障磁盤加入新的磁盤組,使原有磁盤組無(wú)法識(shí)別原有故障盤,之后便可以成功加載。為了避免造成進(jìn)一步損壞,我與對(duì)方單位已經(jīng)達(dá)成共識(shí),在有確定可行的方案之前,不能作任何修改操作。

二、故障分析

開(kāi)始檢查日志,查找最開(kāi)始導(dǎo)致問(wèn)題的操作和相關(guān)日志信息。

節(jié)點(diǎn)1,第一次同時(shí)添加VOL17、VOL18、VOL19時(shí)沒(méi)有明顯錯(cuò)誤,但有一個(gè)警告“WARNING: offlining disk 18.3915945713 (VOL19) withmask 0x3”,判斷可能VOL19添加時(shí)出現(xiàn)問(wèn)題。日志如下:

Fri Mar 29 18:31:372013
SQL> alterdiskgroup DGDATA add disk 'ORCL:VOL17','ORCL:VOL18','ORCL:VOL19'
Fri Mar 29 18:31:372013
NOTE:reconfiguration of group 1/0x44e8663d (DGDATA), full=1
Fri Mar 29 18:31:382013
NOTE: initializingheader on grp 1 disk VOL17
NOTE: initializingheader on grp 1 disk VOL18
NOTE: initializingheader on grp 1 disk VOL19
NOTE: cache openingdisk 16 of grp 1: VOL17 label:VOL17
NOTE: cache openingdisk 17 of grp 1: VOL18 label:VOL18
NOTE: cache openingdisk 18 of grp 1: VOL19 label:VOL19
NOTE: PST update:grp = 1
NOTE: requestingall-instance disk validation for group=1
Fri Mar 29 18:31:382013
NOTE: diskvalidation pending for group 1/0x44e8663d (DGDATA)
SUCCESS: validateddisks for 1/0x44e8663d (DGDATA)
Fri Mar 29 18:31:402013
NOTE: requestingall-instance membership refresh for group=1
Fri Mar 29 18:31:402013
NOTE: membershiprefresh pending for group 1/0x44e8663d (DGDATA)
SUCCESS: refreshedmembership for 1/0x44e8663d (DGDATA)
Fri Mar 29 18:31:432013
WARNING: offliningdisk 18.3915945713 (VOL19) with mask 0x3
NOTE: PST update:grp = 1, dsk = 18, mode = 0x6
NOTE: PST update:grp = 1, dsk = 18, mode = 0x4
NOTE: cache closingdisk 18 of grp 1: VOL19
NOTE: PST update:grp = 1
NOTE: requestingall-instance membership refresh for group=1
Fri Mar 29 18:31:492013
NOTE: membershiprefresh pending for group 1/0x44e8663d (DGDATA)
NOTE: cache closingdisk 18 of grp 1: VOL19
SUCCESS: refreshedmembership for 1/0x44e8663d (DGDATA)
 Received dirty detach msg from node 1 for dom1
Fri Mar 29 18:31:512013
Dirty detachreconfiguration started (old inc 4, new inc 4)
List of nodes:
 0 1
 Global Resource Directory partially frozen fordirty detach
* dirty detach -domain 1 invalid = TRUE
 2817 GCS resources traversed, 0 cancelled
 1981 GCS resources on freelist, 7162 on array,6138 allocated
 1719 GCS shadows traversed, 0 replayed
Dirty DetachReconfiguration complete
Fri Mar 29 18:31:512013
NOTE: PST enablingheartbeating (grp 1)
Fri Mar 29 18:31:512013
NOTE: SMON startinginstance recovery for group 1 (mounted)
NOTE: F1X0 found ondisk 0 fcn 0.4276074
NOTE: startingrecovery of thread=1 ckpt=39.5722 group=1
NOTE: advancingckpt for thread=1 ckpt=39.5722
NOTE: smon didinstance recovery for domain 1
Fri Mar 29 18:31:532013
NOTE: recoveringCOD for group 1/0x44e8663d (DGDATA)
SUCCESS: completedCOD recovery for group 1/0x44e8663d (DGDATA)
Fri Mar 29 18:32:182013

同一時(shí)間可看到節(jié)點(diǎn)2有報(bào)錯(cuò)“ERROR:group 1/0x44e86390 (DGDATA): could not validate disk 18”,隨后VOL19(即disk18)被離線并導(dǎo)致磁盤組被卸載,部分錯(cuò)誤信息與后來(lái)磁盤組無(wú)法加載的日志吻合。日志如下:

Fri Mar 29 18:31:372013
NOTE:reconfiguration of group 1/0x44e86390 (DGDATA), full=1
NOTE: diskvalidation pending for group 1/0x44e86390 (DGDATA)
ERROR: group1/0x44e86390 (DGDATA): could not validate disk 18
SUCCESS: validateddisks for 1/0x44e86390 (DGDATA)
NOTE: membershiprefresh pending for group 1/0x44e86390 (DGDATA)
NOTE: PST update:grp = 1, dsk = 18, mode = 0x4
Fri Mar 29 18:31:432013
ERROR: too manyoffline disks in PST (grp 1)
Fri Mar 29 18:31:432013
SUCCESS: refreshedmembership for 1/0x44e86390 (DGDATA)
ERROR: ORA-15040thrown in RBAL for group number 1
Fri Mar 29 18:31:432013
Errors in file/opt/app/oracle/admin/+ASM/bdump/+asm2_rbal_14019.trc:
ORA-15040:diskgroup is incomplete
ORA-15066: offliningdisk "" may result in a data loss
ORA-15042: ASM disk"18" is missing
NOTE: cache closingdisk 18 of grp 1:
NOTE: membershiprefresh pending for group 1/0x44e86390 (DGDATA)
NOTE: cache closingdisk 18 of grp 1:
NOTE: cache openingdisk 16 of grp 1: VOL17 label:VOL17
NOTE: cache openingdisk 17 of grp 1: VOL18 label:VOL18
SUCCESS: refreshedmembership for 1/0x44e86390 (DGDATA)
Fri Mar 29 18:31:502013
ERROR:PST-initiated MANDATORY DISMOUNT of group DGDATA
NOTE: cachedismounting group 1/0x44E86390 (DGDATA)
Fri Mar 29 18:31:512013
NOTE: halting allI/Os to diskgroup DGDATA
Fri Mar 29 18:31:512013
kjbdomdet send tonode 0
detach from dom 1,sending detach message to node 0
Fri Mar 29 18:31:512013
Dirty detachreconfiguration started (old inc 4, new inc 4)
List of nodes:
 0 1
 Global Resource Directory partially frozen fordirty detach
* dirty detach -domain 1 invalid = TRUE
 2214 GCS resources traversed, 0 cancelled
 5528 GCS resources on freelist, 7162 on array,6138 allocated
Dirty DetachReconfiguration complete
Fri Mar 29 18:31:512013
WARNING: dirtydetached from domain 1
Fri Mar 29 18:31:512013
SUCCESS: diskgroupDGDATA was dismounted

由此判斷,很可能是添加磁盤時(shí)VOL19在節(jié)點(diǎn)2上存在權(quán)限問(wèn)題:通常情況下是Oracle用戶沒(méi)有相關(guān)設(shè)備的訪問(wèn)權(quán)限。根據(jù)此判斷,我在自己的虛擬機(jī)上運(yùn)行RAC,并模擬這一錯(cuò)誤:在節(jié)點(diǎn)1上設(shè)置好Oracle用戶對(duì)新增磁盤的訪問(wèn)權(quán)限,在節(jié)點(diǎn)2上不作設(shè)置,然后添加新增磁盤。操作后果然出現(xiàn)幾乎相同的日志,但有一處差別:在我的模擬環(huán)境中日志有報(bào)告“ORA-15075:disk(s) are not visible cluster-wide”,而單位提供的日志沒(méi)有這一錯(cuò)誤,因此仍無(wú)法斷定是同一問(wèn)題。

后來(lái),發(fā)現(xiàn)這單位的操作記錄下確實(shí)有出現(xiàn)ORA-15075的錯(cuò)誤,證實(shí)了第一次添加磁盤失敗是由于權(quán)限問(wèn)題造成的。圍繞這一個(gè)誤操作進(jìn)行反復(fù)多次測(cè)試,發(fā)現(xiàn)在模擬環(huán)境中,即使出現(xiàn)該誤操作也不會(huì)導(dǎo)致磁盤組無(wú)法掛載。只要哪個(gè)節(jié)點(diǎn)設(shè)置好Oracle用戶對(duì)磁盤的訪問(wèn)權(quán)限,該節(jié)點(diǎn)就可以成功掛載磁盤組。

隨后繼續(xù)模擬實(shí)際操作,失敗后再繼續(xù)輸入添加磁盤的命令,也不會(huì)出現(xiàn)任何進(jìn)一步的故障,Oracle都會(huì)正確地報(bào)告“ORA-15029: disk '…' is already mounted by thisinstance”。這單位提供的操作記錄顯示,在第二次嘗試添加VOL17及VOL18時(shí),Oracle正確報(bào)告ORA-15029,說(shuō)明VOL17及VOL18已成功加入磁盤組。

但操作記錄顯示隨后的一次操作卻出現(xiàn)了異常,此時(shí)再次嘗試添加VOL17卻出現(xiàn)“ORA-15033: disk 'ORCL:VOL17' belongs todiskgroup "DGDATA"”的錯(cuò)誤。這是一個(gè)異常的錯(cuò)誤,根據(jù)前面多次測(cè)試得到的經(jīng)驗(yàn),該錯(cuò)誤表示的意思是“VOL17是屬于另一個(gè)磁盤組的,不能添加到指定的磁盤組,除非加上FORCE選項(xiàng)強(qiáng)制加入”。也就是說(shuō),第二次嘗試添加磁盤時(shí)VOL17還能被識(shí)別出是DGDATA磁盤組的,但第三次嘗試添加磁盤時(shí)卻沒(méi)被識(shí)別出來(lái)。此時(shí)日志也出現(xiàn)了異常情況:

Fri Mar 29 18:35:412013
SQL> alter diskgroupDGDATA add disk 'ORCL:VOL17'
Fri Mar 29 18:35:412013
NOTE:reconfiguration of group 1/0x44e8663d (DGDATA), full=1
Fri Mar 29 18:35:412013
WARNING: ignoringdisk ORCL:VOL18 in deep discovery
WARNING: ignoringdisk ORCL:VOL19 in deep discovery
NOTE: requestingall-instance membership refresh for group=1
Fri Mar 29 18:35:412013
NOTE: membershiprefresh pending for group 1/0x44e8663d (DGDATA)
SUCCESS: validateddisks for 1/0x44e8663d (DGDATA)
NOTE: PST update:grp = 1, dsk = 16, mode = 0x4
Fri Mar 29 18:35:452013
ERROR: too manyoffline disks in PST (grp 1)
Fri Mar 29 18:35:452013
SUCCESS: refreshedmembership for 1/0x44e8663d (DGDATA)
ERROR: ORA-15040thrown in RBAL for group number 1
Fri Mar 29 18:35:452013
Errors in file/opt/app/oracle/admin/+ASM/bdump/+asm1_rbal_13974.trc:
ORA-15040:diskgroup is incomplete
ORA-15066:offlining disk "" may result in a data loss
ORA-15042: ASM disk"16" is missing
Fri Mar 29 18:35:452013
ERROR:PST-initiated MANDATORY DISMOUNT of group DGDATA
NOTE: cache dismountinggroup 1/0x44E8663D (DGDATA)
Fri Mar 29 18:35:452013
NOTE: halting allI/Os to diskgroup DGDATA
Fri Mar 29 18:35:452013
kjbdomdet send tonode 1
detach from dom 1,sending detach message to node 1
Fri Mar 29 18:35:452013
Dirty detachreconfiguration started (old inc 4, new inc 4)
List of nodes:
 0 1
 Global Resource Directory partially frozen fordirty detach
* dirty detach -domain 1 invalid = TRUE
 1291 GCS resources traversed, 0 cancelled
 2347 GCS resources on freelist, 7162 on array,6138 allocated
Dirty DetachReconfiguration complete
Fri Mar 29 18:35:452013
freeing rdom 1
Fri Mar 29 18:35:452013
WARNING: dirtydetached from domain 1
Fri Mar 29 18:35:462013
SUCCESS: diskgroupDGDATA was dismounted

此時(shí)磁盤節(jié)點(diǎn)1的磁盤組也被卸載,可以判斷正是此時(shí)的異常導(dǎo)致了后來(lái)出現(xiàn)的故障。

由于在模擬環(huán)境上反復(fù)進(jìn)行添加磁盤的操作并未重現(xiàn)出故障,此時(shí)只能判斷該故障很可能是Oracle的BUG,可能正好該添加磁盤的操作影響了Oracle對(duì)新磁盤的rebalance操作,隨后Oracle將該磁盤標(biāo)記為離線,并導(dǎo)致磁盤組被卸載。與這單位技術(shù)員交流了測(cè)試結(jié)果,得知在單位的環(huán)境中節(jié)點(diǎn)2后來(lái)已經(jīng)設(shè)置好Oracle用戶對(duì)磁盤的訪問(wèn)權(quán)限,但故障依舊。此后我繼續(xù)做dd及強(qiáng)制把故障磁盤加入新磁盤組的測(cè)試。

隨后進(jìn)行了一系列測(cè)試。由于測(cè)試環(huán)境下磁盤并不會(huì)出現(xiàn)故障,因此只能手動(dòng)把磁盤組離線,然后進(jìn)行“修復(fù)”后嘗試掛載磁盤組。嘗試了使用dd覆蓋“故障磁盤”的頭部,及把“故障磁盤”加入新磁盤組后刪除,都無(wú)法再掛載原加入的磁盤組。但在測(cè)試環(huán)境下,磁盤組無(wú)法掛載都會(huì)報(bào)告“ORA-15042:ASM disk "…" is missing”,而不像實(shí)際環(huán)境中報(bào)告掛載成功。對(duì)比了網(wǎng)上其他人使用dd及強(qiáng)制加入新磁盤組的文章,發(fā)現(xiàn)有一個(gè)很大差異:網(wǎng)上修復(fù)的案例都是使用“normalredundancy”方式的磁盤組,這種情況下磁盤組中存在冗余數(shù)據(jù),所以一個(gè)磁盤出現(xiàn)故障并不會(huì)使磁盤組被卸載,在這個(gè)前提下許多操作都有可能進(jìn)行。而單位的故障系統(tǒng)是使用了“externalredundancy”,數(shù)據(jù)在Oracle看來(lái)是沒(méi)有冗余的,這也是磁盤組目前無(wú)法掛載的一個(gè)原因。

基于上述情況,想到了2個(gè)解決方案。一個(gè)是查看Oracle有沒(méi)有強(qiáng)制掛載磁盤組的命令,也許會(huì)有這種命令提供給用戶進(jìn)行故障修復(fù)。另一個(gè)是想到使用kfed可以修改磁盤頭信息,那么我找一個(gè)正常的磁盤修改下磁盤頭信息后恢復(fù)到故障盤,是否就能使故障盤被正確識(shí)別?隨后第一個(gè)辦法被否定了,查閱了資料發(fā)現(xiàn)只有11g有強(qiáng)制掛載磁盤組的選項(xiàng),關(guān)鍵是只是“normal redundancy”的磁盤組才能使用。第二個(gè)辦法在昨天被破壞的模擬環(huán)境上進(jìn)行測(cè)試,居然可以成功!將這個(gè)方法的操作過(guò)程發(fā)給這單位的技術(shù)員,讓他在自己的測(cè)試環(huán)境上進(jìn)行驗(yàn)證。

這個(gè)kfed修復(fù)磁盤頭的方法如下:找一個(gè)正常的磁盤,用kfed導(dǎo)出其磁盤頭信息,對(duì)比故障盤導(dǎo)出的磁盤頭信息,合并出一個(gè)修復(fù)后的故障盤磁盤頭信息,導(dǎo)入故障盤。例如正常的磁盤是/dev/rdsk/c1t0d0s3,故障盤是/dev/rdsk/c1t1d0s1,使用以下操作:

kfed read/dev/rdsk/c1t0d0s3 text=header0
kfed read/dev/rdsk/c1t1d0s1 text=header1
vimdiff header0header1
(...修改出一個(gè)“正確”的故障盤磁盤頭,另存為header1fix...)
kfed merge/dev/rdsk/c1t1d0s1 text=header1fix

如果故障盤的磁盤頭沒(méi)有可用信息,需要把它加入新磁盤組后刪除,這樣其磁盤頭中就有新磁盤組的信息。

其中關(guān)鍵的需要修復(fù)的信息有:

kfdhdb.dsknum:磁盤在磁盤組中的序號(hào),從0開(kāi)始,如Oracle日志中的disk 18應(yīng)該對(duì)應(yīng)的數(shù)字為17

kfdhdb.grpname:磁盤組的名稱,如果是從新磁盤組中刪除,需要改為原磁盤組的名稱

kfdhdb.grpstmp.hi:磁盤組的時(shí)間截,需要從正常磁盤頭中復(fù)制

kfdhdb.grpstmp.lo:同上

不過(guò)后來(lái)收到單位技術(shù)員的反饋,故障系統(tǒng)上的VOL17、VOL18、VOL19磁盤頭都是正確的,說(shuō)明這種方法不會(huì)起作用。

克隆故障環(huán)境

后來(lái),提出了可以使用dd把故障系統(tǒng)的磁盤都拷貝出來(lái),然后在此基礎(chǔ)上搭建測(cè)試環(huán)境,可以在克隆出的故障系統(tǒng)上進(jìn)行研究。周三拿到了拷好的數(shù)據(jù),使用iscsi加載到測(cè)試環(huán)境,運(yùn)行oracleasm scandisks,開(kāi)始在模擬環(huán)境上測(cè)試。

三、解決問(wèn)題

使用kfed檢查了VOL17、VOL18、VOL19的磁盤頭,確實(shí)全部正常。把之前嘗試過(guò)的方法在該模擬環(huán)境上重新嘗試一遍,確實(shí)也都不奏效。需要想想其它辦法。

參考了文章:http://blog.csdn.net/tianlesoftware/article/details/6740716 ,先在磁盤組中找到KFBTYP_LISTHEAD,然后再找到KFBTYP_DISKDIR,可看到DISKDIR塊中包含有各磁盤的信息,其中VOL17的狀態(tài)與其它盤都不同:

kfddde[0].entry.incarn:               4 ; 0x024: A=0 NUMM=0x1

其它盤(包括VOL18、VOL19)都是:

kfddde[0].entry.incarn:               1 ; 0x024: A=1 NUMM=0x0

當(dāng)時(shí)分析后認(rèn)為應(yīng)該可以通過(guò)修改VOL17的狀態(tài),讓VOL17變回正常。 不過(guò)當(dāng)時(shí)并沒(méi)有馬上嘗試,而是根據(jù)這個(gè)思路去找到PST表。與其修改VOL17的狀態(tài),不如找到PST表把VOL17刪除掉。

PST表的解釋:Partner StatusTable. Maintains info ondisk-to-diskgroup membership.

根據(jù)http://blog.csdn.net/tianlesoftware/article/details/6743677 這個(gè)鏈接的內(nèi)容,PST表應(yīng)該存在于某個(gè)磁盤的AU=1位置。檢查了磁盤組中的所有磁盤,只有VOL10包含了PST表,但AU=1處并不包含任何有用的內(nèi)容,它的類型是KFBTYP_PST_META。根據(jù)前面查找DISKDIR的經(jīng)驗(yàn),繼續(xù)檢查AU=1,BLK=1處的數(shù)據(jù),仍然是KFBTYP_PST_META,再繼續(xù)檢查AU=1,BLK=2,發(fā)現(xiàn)了KFBTYP_PST_DTA。繼續(xù)檢查其內(nèi)容,很有規(guī)律:

kfdpDtaE[0].status:           117440512 ; 0x000: V=1 R=1 W=1
kfdpDtaE[0].index:                    0 ; 0x004: CURR=0x0CURR=0x0 FORM=0x0 FORM=0x0
kfdpDtaE[0].partner[0]:               0 ; 0x008: 0x0000
kfdpDtaE[0].partner[1]:               0 ; 0x00a: 0x0000
kfdpDtaE[0].partner[2]:               0 ; 0x00c: 0x0000
......
kfdpDtaE[0].partner[19]:              0 ; 0x02e: 0x0000
kfdpDtaE[1].status:           117440512 ; 0x030: V=1 R=1 W=1
kfdpDtaE[1].index:                    0 ; 0x034: CURR=0x0CURR=0x0 FORM=0x0 FORM=0x0
kfdpDtaE[1].partner[0]:               0 ; 0x038: 0x0000
kfdpDtaE[1].partner[1]:               0 ; 0x03a: 0x0000
kfdpDtaE[1].partner[2]:               0 ; 0x03c: 0x0000
......
kfdpDtaE[1].partner[19]:              0 ; 0x05e: 0x0000
kfdpDtaE[2].status:            83886080 ; 0x060: V=1 R=1 W=1
......

直到檢查到kfdpDtaE [18].status開(kāi)始變?yōu)?。與磁盤一一進(jìn)行對(duì)應(yīng),0~15對(duì)應(yīng)原有的16塊磁盤,16、17對(duì)應(yīng)新增的VOL17、VOL18,而VOL19則由于權(quán)限問(wèn)題沒(méi)有出現(xiàn)在表中。決定嘗試修改該表,將VOL17、VOL18從磁盤組中刪除:

ddif=/dev/oracleasm/disks/VOL10 of=vol10.save bs=1048576 count=10
kfed read/dev/oracleasm/disks/VOL10 aun=1 blkn=2 text=pst.data
vi pst.data
(...修改kfdpDtaE[16].status及kfdpDtaE[17].status為0,另存為pst.update...)
kfed merge/dev/oracleasm/disks/VOL10 aun=1 blkn=2 text=pst.update

嘗試掛載磁盤組,如原來(lái)一樣報(bào)告成功,還得看日志:

Thu Apr  4 14:15:08 2013
SQL> alterdiskgroup dgdata mount
Thu Apr  4 14:15:08 2013
NOTE: cacheregistered group DGDATA number=2 incarn=0x0c76f699
Thu Apr  4 14:15:08 2013
NOTE: Hbeat:instance first (grp 2)
Thu Apr  4 14:15:13 2013
NOTE: startheartbeating (grp 2)
Thu Apr  4 14:15:13 2013
NOTE: erasingincomplete header on grp 2 disk VOL17
NOTE: erasingincomplete header on grp 2 disk VOL18
NOTE: erasingincomplete header on grp 2 disk VOL19
NOTE: cache openingdisk 0 of grp 2: VOL10 label:VOL10
NOTE: F1X0 found ondisk 0 fcn 0.4276074
NOTE: cache openingdisk 1 of grp 2: VOL11 label:VOL11
NOTE: cache openingdisk 2 of grp 2: VOL12 label:VOL12
NOTE: cache openingdisk 3 of grp 2: VOL13 label:VOL13
NOTE: cache openingdisk 4 of grp 2: VOL14 label:VOL14
NOTE: cache openingdisk 5 of grp 2: VOL3 label:VOL3
NOTE: cache openingdisk 6 of grp 2: VOL4 label:VOL4
NOTE: cache openingdisk 7 of grp 2: VOL5 label:VOL5
NOTE: cache openingdisk 8 of grp 2: VOL6 label:VOL6
NOTE: cache openingdisk 9 of grp 2: VOL7 label:VOL7
NOTE: cache openingdisk 10 of grp 2: VOL8 label:VOL8
NOTE: cache openingdisk 11 of grp 2: VOL9 label:VOL9
NOTE: cache openingdisk 12 of grp 2: VOL1 label:VOL1
NOTE: cache openingdisk 13 of grp 2: VOL2 label:VOL2
NOTE: cache openingdisk 14 of grp 2: VOL15 label:VOL15
NOTE: cache openingdisk 15 of grp 2: VOL16 label:VOL16
NOTE: cachemounting (first) group 2/0x0C76F699 (DGDATA)
NOTE: startingrecovery of thread=1 ckpt=94.5829 group=2
NOTE: advancing ckptfor thread=1 ckpt=94.5830
NOTE: cacherecovered group 2 to fcn 0.5174912
Thu Apr  4 14:15:13 2013
NOTE: opening chunk1 at fcn 0.5174912 ABA
NOTE: seq=95blk=5831
Thu Apr  4 14:15:13 2013
NOTE: cachemounting group 2/0x0C76F699 (DGDATA) succeeded
SUCCESS: diskgroupDGDATA was mounted
Thu Apr  4 14:15:13 2013
NOTE: recoveringCOD for group 2/0xc76f699 (DGDATA)
SUCCESS: completedCOD recovery for group 2/0xc76f699 (DGDATA)

有變化!VOL17及VOL18也跟VOL19一樣被清理了頭部,然后磁盤組不再報(bào)告VOL17需要離線,不再被卸載。此后檢查磁盤組狀態(tài)、磁盤狀態(tài),一切正常。修改了pfile,啟動(dòng)數(shù)據(jù)庫(kù),成功打開(kāi)。重新強(qiáng)制添加3個(gè)新磁盤,成功,一切穩(wěn)定運(yùn)行。

修復(fù)故障

在模擬環(huán)境中打開(kāi)數(shù)據(jù)庫(kù)后,開(kāi)始使用data pump導(dǎo)出部分業(yè)務(wù)數(shù)據(jù)。第二天安排部分應(yīng)用開(kāi)發(fā)人員上門檢查數(shù)據(jù)一致性。最后在生產(chǎn)系統(tǒng)上按模擬環(huán)境的方法進(jìn)行修復(fù),生產(chǎn)數(shù)據(jù)庫(kù)可正常打開(kāi),業(yè)務(wù)正常運(yùn)行。

總結(jié)

本次ASM磁盤組故障問(wèn)題反映了數(shù)據(jù)容災(zāi)備份的主要性,要防止系統(tǒng)出現(xiàn)操作失誤或系統(tǒng)故障導(dǎo)致數(shù)據(jù)丟失,提前做好備份工作。

聯(lián)系我們

主站蜘蛛池模板: 欧美亚洲国产精品久久高清| 精品熟女少妇av免费久久| 乱人伦中文无码视频| 亚洲精品中文字幕无码蜜桃| www国产亚洲精品久久久| 肥臀熟女一区二区三区| 久久久久香蕉国产线看观看伊| 奶头好大揉着好爽视频| 思思99re6国产在线播放| 国产午夜无码片在线观看| 亚洲中文字幕av每天更新| 久久aⅴ人妻少妇嫩草影院| 日本japanese丰满多毛| 人妻少妇精品专区性色av| 国产乱子伦精品视频| jzzijzzij在线观看亚洲熟妇| 国产精品成人无码免费| 亚洲第一成人网站| 亚洲欧洲∨国产一区二区三区| 国产成人乱色伦区| 亚洲av永久无码精品天堂动漫| 久久综合精品国产二区无码| 国产激情无码视频在线播放性色 | 男人和女人做爽爽视频| 99国内精品久久久久久久| 久久er热在这里只有精品66| 国产香蕉97碰碰久久人人| 免费人成视频x8x8入口app| 亚洲av无码之国产精品网址蜜芽| 成人性生交片无码免费看| 18禁黄无遮挡网站免费| 我的公把我弄高潮了视频| 国产suv精品一区二区33| 少妇又紧又深又湿又爽视频| 内射中出日韩无国产剧情| 久久久久亚洲精品无码网址色欲| 亚洲va中文字幕无码毛片| 亚洲欧美综合区自拍另类| 国内精品久久久久久tv| 男ji大巴进入女人的视频| 羞羞午夜福利免费视频|