Fast-Start Failover (FSFO) este funcționalitatea prin care Oracle Data Guard Broker va face automat failover de la o baza de date primară nefuncțională către o baza standby configurată în prealabil. Prin aceasta caracteristică se crește disponibilitatea bazei de date prin eliminarea necesității de implicare a unui DBA.
O data activată această funcționalitate se va porni un nou proces numit OBSERVATOR, parte a componentei Data Guard Manager, care va monitoriza disponibilitatea bazei primare. Procesul observator se va declanșa la unul din următoarele evenimente (configurabile):
- datafile offline din cauza unei erori de I/O
- dicționarul bazei de date este corupt
- controlfile corupt
- logfile inaccesibil
- archiver-ul este blocat
- cerere explicita prin apelarea funcției dbms_dg.initiate_fs_failover
In rândurile de mai jos voi detalia cum se activează aceasta funcționalitate iar la sfârșit voi face un test. Configurația de la care am pornit este identica cu cea descrisa în articolul precedent despre Data Guard.
DGMGRL> show configuration Configuration - dgmcfg Protection Mode: MaxPerformance Members: orcl - Primary database orcl_dg - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS
Înainte de a începe trebuie activata opțiunea de flashback pe ambele baze de date,
SQL> archive log list Database log mode Archive Mode Automatic archival Enabled Archive destination USE_DB_RECOVERY_FILE_DEST Oldest online log sequence 67 Next log sequence to archive 69 Current log sequence 69 SQL> select log_mode,flashback_on from v$database; LOG_MODE FLASHBACK_ON ------------ ------------------ ARCHIVELOG NO SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount ORACLE instance started. Total System Global Area 1090519040 bytes Fixed Size 2923440 bytes Variable Size 654312528 bytes Database Buffers 419430400 bytes Redo Buffers 13852672 bytes Database mounted. SQL> alter database flashback on; Database altered. SQL> alter database open; Database altered. SQL> select log_mode,flashback_on from v$database; LOG_MODE FLASHBACK_ON ------------ ------------------ ARCHIVELOG YES
Asemănător si pe standby,
SQL> select log_mode,flashback_on from v$database; LOG_MODE FLASHBACK_ON ------------ ------------------ ARCHIVELOG NO SQL> select open_mode from v$database; OPEN_MODE -------------------- MOUNTED SQL> alter database recover managed standby database cancel; Database altered. SQL> alter database flashback on; Database altered. SQL> select log_mode,flashback_on from v$database; LOG_MODE FLASHBACK_ON ------------ ------------------ ARCHIVELOG YES SQL> alter database recover managed standby database using current logfile disconnect; Database altered.
Următorul pas este configurarea protection mode din MAXIMUM PERFORMANCE în MAXIMUM AVAILABILITY
DGMGRL> show configuration; Configuration - dgmcfg Protection Mode: MaxPerformance Members: orcl - Primary database orcl_dg - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 40 seconds ago) DGMGRL> edit database 'orcl' > set property LogXptMode='SYNC'; Property "logxptmode" updated DGMGRL> edit database 'orcl_dg' > set property LogXptMode='SYNC'; Property "logxptmode" updated DGMGRL> edit configuration set protection mode as MaxAvailability; Succeeded. DGMGRL> show configuration; Configuration - dgmcfg Protection Mode: MaxAvailability Members: orcl - Primary database orcl_dg - Physical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS (status updated 56 seconds ago)
Acum nu a mai rămas decât sa pornim procesul observator și sa activam opțiunea FSFO
DGMGRL> edit database 'orcl' set property faststartfailovertarget='orcl_dg'; Property "faststartfailovertarget" updated DGMGRL> edit database 'orcl_dg' set property faststartfailovertarget='orcl'; Property "faststartfailovertarget" updated DGMGRL> ENABLE FAST_START FAILOVER; Enabled.
Este de preferat ca acest proces să fie rulat pe o mașină separată unde sunt instalate binarele RDBMS ori clientul Oracle. In cazul în care procesul pornește pe mașina cu baza de date primara iar mașina nu va mai fi disponibila atunci FSFO nu se va declanșa.
$nohup dgmgrl sys/oracle@orcl "start observer" -logfile $HOME/observer.log & DGMGRL> show configuration verbose; Configuration - dgmcfg Protection Mode: MaxAvailability Members: orcl - Primary database orcl_dg - (*) Physical standby database (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '30' OperationTimeout = '30' TraceLevel = 'USER' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '0' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'FALSE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' Fast-Start Failover: ENABLED Threshold: 30 seconds Target: orcl_dg Observer: scott Lag Limit: 30 seconds (not in use) Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: (none) Observer Override: FALSE Configuration Status: SUCCESS DGMGRL> show fast_start failover Fast-Start Failover: ENABLED Threshold: 30 seconds Target: orcl_dg Observer: scott Lag Limit: 30 seconds (not in use) Shutdown Primary: TRUE Auto-reinstate: TRUE <---mai tarziu ne vom folosi de aceasta optiune Observer Reconnect: (none) Observer Override: FALSE Configurable Failover Conditions Health Conditions: Corrupted Controlfile YES Corrupted Dictionary YES Inaccessible Logfile NO Stuck Archiver NO Datafile Offline YES Oracle Error Conditions: (none)
Într-o alta sesiune deschidem logul cu tail pentru a verifica mesajele din acesta
$tail -f observer.log Observer started [W000 10/04 20:21:58.01] Observer started.
Simulam un dezastru prin terminarea forțată a procesului pmon:
$ ps -ef | grep pmon oracle 7925 1 0 19:52 ? 00:00:00 ora_pmon_orcl $ kill -9 7925
In logul de mai sus putem observa următorul mesaj:
Initiating Fast-Start Failover to database "orcl_dg"... Performing failover NOW, please wait... Failover succeeded, new primary is "orcl_dg"
Verificam ce raportează data guard manager-ul:
$ dgmgrl sys/oracle@orcl_dg DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production Copyright (c) 2000, 2013, Oracle. All rights reserved. Welcome to DGMGRL, type "help" for information. Connected as SYSDBA. DGMGRL> show configuration verbose Configuration - dgmcfg Protection Mode: MaxAvailability Members: orcl_dg - Primary database Warning: ORA-16817: unsynchronized fast-start failover configuration orcl - (*) Physical standby database (disabled) ORA-16661: the standby database needs to be reinstated (*) Fast-Start Failover target Properties: FastStartFailoverThreshold = '30' OperationTimeout = '30' TraceLevel = 'USER' FastStartFailoverLagLimit = '30' CommunicationTimeout = '180' ObserverReconnect = '0' FastStartFailoverAutoReinstate = 'TRUE' FastStartFailoverPmyShutdown = 'TRUE' BystandersFollowRoleChange = 'ALL' ObserverOverride = 'FALSE' ExternalDestination1 = '' ExternalDestination2 = '' PrimaryLostWriteAction = 'CONTINUE' Fast-Start Failover: ENABLED Threshold: 30 seconds Target: orcl Observer: scott Lag Limit: 30 seconds (not in use) Shutdown Primary: TRUE Auto-reinstate: TRUE Observer Reconnect: (none) Observer Override: FALSE Configuration Status: WARNING
Consider ca testul a fost finalizat cu succes iar acum vreau sa revin la configurația de dinainte de test.
Pornesc baza standby (fosta primara) in mount:
SQL> startup mount
Iar pentru ca Auto-reinstate este TRUE voi vedea în logul observator următorul mesaj:
Initiating reinstatement for database "orcl"... Reinstating database "orcl", please wait... Reinstatement of database "orcl" succeeded
Acum orcl_dg este baza primara, iar orcl baza de standby, pentru a schimba rolurile va fi nevoie de swichover:
DGMGRL> show configuration; Configuration - dgmcfg Protection Mode: MaxAvailability Members: orcl_dg - Primary database orcl - (*) Physical standby database Fast-Start Failover: ENABLED Configuration Status: SUCCESS (status updated 51 seconds ago) DGMGRL> switchover to orcl Performing switchover NOW, please wait... New primary database "orcl" is opening... Operation requires start up of instance "orcl_dg" on database "orcl_dg" Starting instance "orcl_dg"... ORACLE instance started. Database mounted. Switchover succeeded, new primary is "orcl" DGMGRL> show configuration; Configuration - dgmcfg Protection Mode: MaxAvailability Members: orcl - Primary database orcl_dg - (*) Physical standby database Fast-Start Failover: ENABLED Configuration Status: SUCCESS (status updated 35 seconds ago)
In acest exemplu am văzut ca Fast-Start Failover este o funcționalitate puternica care simplifica taskurile unui DBA iar un test de Disaster Recovery se va transforma in vizionarea unui simplu log.