Fast-Start Failover (FSFO) este funcționalitatea prin care Oracle Data Guard Broker va face automat failover de la o baza de date primară nefuncțională către o baza standby configurată în prealabil. Prin aceasta caracteristică se crește disponibilitatea bazei de date prin eliminarea necesității de implicare a unui DBA.

O data activată această funcționalitate se va porni un nou proces numit OBSERVATOR, parte a componentei Data Guard Manager,  care va monitoriza disponibilitatea bazei primare. Procesul observator se va declanșa la unul din următoarele evenimente (configurabile):

  • datafile offline din cauza unei erori de I/O
  • dicționarul bazei de date este corupt
  • controlfile corupt
  • logfile inaccesibil
  • archiver-ul este blocat
  • cerere explicita prin apelarea funcției dbms_dg.initiate_fs_failover

In rândurile de mai jos voi detalia cum se activează aceasta funcționalitate iar la sfârșit voi face un test. Configurația de la care am pornit este identica cu cea descrisa în articolul precedent despre Data Guard.

DGMGRL> show configuration

Configuration - dgmcfg

  Protection Mode: MaxPerformance
  Members:
  orcl    - Primary database
  orcl_dg - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Înainte de a începe trebuie activata opțiunea de flashback pe ambele baze de date,

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     67
Next log sequence to archive   69
Current log sequence           69
SQL> select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   NO

SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1090519040 bytes
Fixed Size                  2923440 bytes
Variable Size             654312528 bytes
Database Buffers          419430400 bytes
Redo Buffers               13852672 bytes
Database mounted.
SQL> alter database flashback on;

Database altered.

SQL> alter database open;

Database altered.

SQL>  select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   YES

Asemănător si pe standby,

SQL> select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   NO

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL> alter database recover managed standby database cancel;
Database altered.
SQL> alter database flashback on;

Database altered.

SQL> select log_mode,flashback_on from v$database;

LOG_MODE     FLASHBACK_ON
------------ ------------------
ARCHIVELOG   YES

SQL> alter database recover managed standby database using current logfile disconnect;

Database altered.

Următorul pas este configurarea protection mode din MAXIMUM  PERFORMANCE în MAXIMUM AVAILABILITY

DGMGRL>  show configuration;

Configuration - dgmcfg

  Protection Mode: MaxPerformance
  Members:
  orcl    - Primary database
    orcl_dg - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 40 seconds ago)

DGMGRL> edit database 'orcl'
> set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit database 'orcl_dg'
> set property LogXptMode='SYNC';
Property "logxptmode" updated
DGMGRL> edit configuration set protection mode as MaxAvailability;
Succeeded.
DGMGRL> show configuration;

Configuration - dgmcfg

  Protection Mode: MaxAvailability
  Members:
  orcl    - Primary database
    orcl_dg - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS   (status updated 56 seconds ago)

Acum nu a mai rămas decât sa pornim procesul observator și sa activam opțiunea  FSFO

DGMGRL> edit database 'orcl' set property faststartfailovertarget='orcl_dg';
Property "faststartfailovertarget" updated
DGMGRL> edit database 'orcl_dg' set property faststartfailovertarget='orcl';
Property "faststartfailovertarget" updated
DGMGRL> ENABLE FAST_START FAILOVER;
Enabled.

Este de preferat ca acest proces să fie rulat pe o mașină separată unde sunt instalate binarele RDBMS ori clientul Oracle. In cazul în care procesul pornește pe mașina cu baza de date primara iar mașina nu va mai fi disponibila atunci FSFO nu se va declanșa.

$nohup dgmgrl sys/oracle@orcl "start observer" -logfile $HOME/observer.log &

DGMGRL> show configuration verbose;

Configuration - dgmcfg

  Protection Mode: MaxAvailability
  Members:
  orcl    - Primary database
    orcl_dg - (*) Physical standby database

  (*) Fast-Start Failover target

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: ENABLED

  Threshold:          30 seconds
  Target:             orcl_dg
  Observer:           scott
  Lag Limit:          30 seconds (not in use)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configuration Status:
SUCCESS

DGMGRL> show fast_start failover

Fast-Start Failover: ENABLED

  Threshold:          30 seconds
  Target:             orcl_dg
  Observer:           scott
  Lag Limit:          30 seconds (not in use)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE     <---mai tarziu ne vom folosi de aceasta optiune
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configurable Failover Conditions
  Health Conditions:
    Corrupted Controlfile          YES
    Corrupted Dictionary           YES
    Inaccessible Logfile            NO
    Stuck Archiver                  NO
    Datafile Offline               YES

  Oracle Error Conditions:
    (none)

Într-o alta sesiune deschidem logul cu tail pentru a verifica mesajele din acesta

$tail -f observer.log
Observer started
[W000 10/04 20:21:58.01] Observer started.

Simulam un dezastru prin terminarea forțată a procesului pmon:

$ ps -ef | grep pmon
oracle     7925      1  0 19:52 ?        00:00:00 ora_pmon_orcl
$ kill -9 7925

In logul de mai sus putem observa următorul mesaj:

Initiating Fast-Start Failover to database "orcl_dg"...
Performing failover NOW, please wait...
Failover succeeded, new primary is "orcl_dg"

Verificam ce raportează data guard manager-ul:

$ dgmgrl sys/oracle@orcl_dg
DGMGRL for Linux: Version 12.1.0.2.0 - 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type "help" for information.
Connected as SYSDBA.
DGMGRL> show configuration verbose

Configuration - dgmcfg

  Protection Mode: MaxAvailability
  Members:
  orcl_dg - Primary database
    Warning: ORA-16817: unsynchronized fast-start failover configuration

    orcl    - (*) Physical standby database (disabled)
      ORA-16661: the standby database needs to be reinstated

  (*) Fast-Start Failover target

  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    TraceLevel                      = 'USER'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    ObserverReconnect               = '0'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
    ObserverOverride                = 'FALSE'
    ExternalDestination1            = ''
    ExternalDestination2            = ''
    PrimaryLostWriteAction          = 'CONTINUE'

Fast-Start Failover: ENABLED

  Threshold:          30 seconds
  Target:             orcl
  Observer:           scott
  Lag Limit:          30 seconds (not in use)
  Shutdown Primary:   TRUE
  Auto-reinstate:     TRUE    
  Observer Reconnect: (none)
  Observer Override:  FALSE

Configuration Status:
WARNING

Consider ca testul a fost finalizat cu succes iar acum vreau sa revin la configurația de dinainte de test.

Pornesc baza standby (fosta primara) in mount:

SQL> startup mount

Iar pentru ca Auto-reinstate este TRUE voi vedea în logul observator următorul mesaj:

Initiating reinstatement for database "orcl"...
Reinstating database "orcl", please wait...
Reinstatement of database "orcl" succeeded

Acum orcl_dg este baza primara, iar orcl baza de standby, pentru a schimba rolurile va fi nevoie de swichover:

DGMGRL> show configuration;

Configuration - dgmcfg

  Protection Mode: MaxAvailability
  Members:
  orcl_dg - Primary database
    orcl    - (*) Physical standby database

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 51 seconds ago)

DGMGRL> switchover to orcl
Performing switchover NOW, please wait...
New primary database "orcl" is opening...
Operation requires start up of instance "orcl_dg" on database "orcl_dg"
Starting instance "orcl_dg"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "orcl"
DGMGRL> show configuration;

Configuration - dgmcfg

  Protection Mode: MaxAvailability
  Members:
  orcl    - Primary database
    orcl_dg - (*) Physical standby database

Fast-Start Failover: ENABLED

Configuration Status:
SUCCESS   (status updated 35 seconds ago)

In acest exemplu am văzut ca Fast-Start Failover este o funcționalitate puternica care simplifica taskurile unui DBA iar un test de Disaster Recovery se va transforma in vizionarea unui simplu log.

Lasă un comentariu

Mathijs Bruggink

Tips Tricks and Blogs on Oracle

Oracle Romania

Simplify IT

Programat in Romania

Blogul industriei de software din Romania. Comunitate dezvoltatori

Dan Bârsan

The miracle is not that we do this work, but that we are happy to do it. I'm writing in Romanian and English

Romanian Oracle User Group

Focusing On Oracle Database Administration

Big Lazy SysAdmin

Adapt. Enjoy. Survive.

Talip Hakan Ozturk's ORACLE BLOG

The secret of success is at your fingertips!...

Pickleball spielen

002 - License to dink

Mathijs Bruggink

Tips Tricks and Blogs on Oracle

Oracle Romania

Simplify IT

Programat in Romania

Blogul industriei de software din Romania. Comunitate dezvoltatori

Dan Bârsan

The miracle is not that we do this work, but that we are happy to do it. I'm writing in Romanian and English

Romanian Oracle User Group

Focusing On Oracle Database Administration

Big Lazy SysAdmin

Adapt. Enjoy. Survive.

Talip Hakan Ozturk's ORACLE BLOG

The secret of success is at your fingertips!...

Pickleball spielen

002 - License to dink