How SCAN works?
- PMON process of each instance registers the database services with the default listener on the local node and with each SCAN listener, which is specified by the
REMOTE_LISTENER
database parameter. - Oracle client connects using SCAN name: myscan:1521/sales.example.com
- Client queries DNS to resolve
scan_name
. - SCAN listener selects least loaded node (node in this example)
- The client connects to the local listener on
node2
. The local listener starts a dedicated server process for the connection to the database. - The client connects directly to the dedicated server process on node2 and accesses the
sales2
database instance. Ref: http://www.freeoraclehelp.com/2011/12/scan-setup-for-oracle-11g-release211gr2.html
RAC Questions
What command would you use to check the
availability of the RAC system?
crs_stat -t -v (-t -v are optional)
How do we verify that RAC instances are running?
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column,host_:instancename
under INST_NAME column.
What are Oracle Clusterware Components ?
Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by
way of a health check and arbitrates cluster ownership among the instances in
case of network failures. The voting disk must reside on shared disk.
Oracle Cluster Registry (OCR) — Maintains cluster configuration information as
well as configuration information about any cluster database within the
cluster. The OCR must reside on shared disk that is accessible by all of the
nodes in your cluster
How do you backup the OCR ?
There is an automatic backup mechanism for OCR. The default location is :
$ORA_CRS_HOME\cdata\"clustername"\
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents
back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the
command:
# ocrconfig -manualbackup
How do you backup voting disk ?
#dd if=voting_disk_name of=backup_file_name
How do I identify the voting disk location ?
#crsctl query css votedisk
How do I identify the OCR file location ?
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
Is ssh required for normal Oracle RAC operation ?
"ssh" are not required for normal Oracle RAC operation. However
"ssh" should be enabled for Oracle RAC and patchset installation.
What is SCAN?
Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters
(RAC) 11g Release 2 feature that provides a single name for clients to access
an Oracle Database running in a cluster. The benefit is clients using SCAN do
not need to change if you add or remove nodes in the cluster. Click here for more details from Oracle
What is the purpose of Private Interconnect ?
Clusterware uses the private interconnect for cluster synchronization (network
heartbeat) and daemon communication between the the clustered nodes. This
communication is based on the TCP protocol.RAC uses the interconnect for cache
fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote
memory mapping of Oracle buffers, shared between the caches of participating
nodes in the cluster.
Why do we have a Virtual
IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often
wait for a TCP timeout period (which can be up to 10 min) before getting an
error. As a result, you don't really have a good HA solution without using
VIPs. When a node fails, the VIP associated with it is automatically
failed over to some other node and new node re-arps the world indicating a new
MAC address for the IP. Subsequent packets sent to the VIP go to the new node,
which will send error RST packets back to the clients. This results in the
clients getting errors immediately.
How many nodes are supported in a RAC
Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100
instances in a RAC database.
What is voting disk?
Voting Disk is a file that sits in the
shared storage area and must be accessible by all nodes in the cluster. All
nodes in the cluster registers their heart-beat information in the voting disk,
so as to confirm that they are all operational. If heart-beat information of
any node in the voting disk is not available that node will be evicted from the
cluster. The CSS (Cluster Synchronization Service) daemon in the clusterware
maintains the heart beat of all nodes to the voting disk. When any node is not
able to send heartbeat to voting disk, then it will reboot itself, thus help
avoiding the split-brain syndrome.
For high availability, Oracle recommends
that you have a minimum of three or odd number (3 or greater) of votingdisks.
Voting Disk – is file that resides on
shared storage and Manages cluster members. Voting disk reassigns cluster
ownership between the nodes in case of failure.
The Voting Disk Files are used by Oracle
Clusterware to determine which nodes are currently members of the cluster. The
voting disk files are also used in concert with other Cluster components such
as CRS to maintain the clusters integrity.
Oracle Database 11g Release 2 provides
the ability to store the voting disks in ASM along with the OCR. Oracle
Clusterware can access the OCR and the voting disks present in ASM even if the
ASM instance is down. As a result CSS can continue to maintain the Oracle cluster
even if the ASM instance has failed.
How many voting disks are you
maintaining ?
By default Oracle will create 3 voting
disk files in ASM.
Oracle expects that you will configure
at least 3 voting disks for redundancy purposes. You should always configure an
odd number of voting disks >= 3. This is because loss of more than half your
voting disks will cause the entire cluster to fail.
You should plan on allocating 280MB for
each voting disk file. For example, if you are using ASM and external
redundancy then you will need to allocate 280MB of disk for the voting disk. If
you are using ASM and normal redundancy you will need 560MB.
Why we need to keep odd number of voting
disks ?
Oracle expects that you will configure
at least 3 voting disks for redundancy purposes. You should always configure an
odd number of voting disks >= 3. This is because loss of more than half your
voting disks will cause the entire cluster to fail.
What is SCAN? (11gR2 feature)
Single Client Access Name (SCAN)
eliminates the need to change TNSNAMES entry when nodes are added to or removed
from the Cluster. RAC instances register to SCAN listeners as remote listeners.
Oracle recommends assigning 3 addresses to SCAN, which will create 3 SCAN
listeners, though the cluster has got dozens of nodes.. SCAN is a domain name
registered to at least one and up to three IP addresses, either in DNS (Domain
Name Service) or GNS (Grid Naming Service). The SCAN must resolve to at least
one address on the public network. For high availability and scalability,
Oracle recommends configuring the SCAN to resolve to three addresses.
What are SCAN components in a cluster?
1.SCAN Name
2.SCAN IPs (3)
3.SCAN Listeners (3)
What are the requirements for Oracle
Clusterware?
1. External Shared Disk to store Oracle
Cluster ware file (Voting Disk and Oracle Cluster Registry – OCR)
2. Two netwrok cards on each cluster
ware node (and three set of IP address) -
Network Card 1 (with IP address set 1)
for public network
Network Card 2 (with IP address set 2)
for private network (for inter node communication between rac nodes used by
clusterware and rac database)
IP address set 3 for Virtual IP (VIP)
(used as Virtual IP address for client connection and for connection failover)
3. Storage Option for OCR and Voting
Disk – RAW, OCFS2 (Oracle Cluster File System), NFS, …..
How to find location of OCR file when
CRS is down?
If you need to find the location of OCR
(Oracle Cluster Registry) but your CRS is down.
When the CRS is down:
Look into “ocr.loc” file, location of
this file changes depending on the OS:
On Linux: /etc/oracle/ocr.loc
On Solaris: /var/opt/oracle/ocr.loc
When CRS is UP:
Set ASM environment or CRS environment
then run the below command:
ocrcheck
In 2 node RAC, how many NIC’s are r
using ?
2 network cards on each clusterware node
Network Card 1 (with IP address set 1)
for public network
Network Card 2 (with IP address set 2)
for private network (for inter node communication between rac nodes used by
clusterware and rac database)
In 2 node RAC, how many IP’s are r using
?
6 – 3 set of IP address
## eth1-Public: 2
## eth0-Private: 2
## VIP: 2
How to find IP’s information in RAC ?
Edit the /etc/hosts file as shown below:
# Do not remove the following line, or
various programs
# that requires network functionality
will fail.
127.0.0.1 localhost.localdomain
localhost
## Public Node names
192.168.10.11 node1-pub.hingu.net
node1-pub
192.168.10.22 node2-pub.hingu.net
node2-pub
## Private Network (Interconnect)
192.168.0.11 node1-prv node1-prv
192.168.0.22 node2-prv node2-prv
## Private Network (Network Area
storage)
192.168.1.11 node1-nas node1-nas
192.168.1.22 node2-nas node2-nas
192.168.1.33 nas-server nas-server
## Virtual IPs
192.168.10.111 node1-vip.hingu.net
node1-vip
192.168.10.222 node2-vip.hingu.net
node2-vip
What is difference between RAC ip
addresses ?
Public IP adress is the normal IP
address typically used by DBA and SA to manage storage, system and database.
Public IP addresses are reserved for the Internet.
Private IP address is used only for
internal clustering processing (Cache Fusion) (aka as interconnect). Private IP
addresses are reserved for private networks.
VIP is used by database applications to
enable fail over when one cluster node fails. The purpose for having VIP is so
client connection can be failover to surviving nodes in case there is failure
Can application developer access the
private ip ?
No. private IP address is used only for
internal clustering processing (Cache Fusion) (aka as interconnect)
How many OCR and voting disks should one
have?
For redundancy, one should have at least
two OCR disks and three voting disks (raw disk partitions).
These disk partitions should be spread
across different physical disks.
Why we required to maintain odd number
of voting disks?
Odd number of disk are to avoid split
brain,When Nodes in cluster can't talk to each other they run to lock the
Voting disk and whoever lock the more disk will survive,if disk number are even
there are chances that node might lock 50% of disk (2 out of 4) then how to
decide which node to evict.whereas when number is odd, one will be higher than other
and each for cluster to evict the node with less number.
How you check the health of Your RAC
Database?
'crsctl' command from root or
oracle user can be used to check the clusterware health But for starting or
stopping we have to use root user or any privilege user.
[oracle@TEST_NODE1 ~]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
How you will backup your RAC Database?
Backup strategy of RAC Database:An RAC
Database consists of
1)OCR
2)Voting disk
3)Database
files, controlfiles, redolog files & Archive log files
Give
the usage of srvctl ?
srvctl start instance -d db_name -i
"inst_name_list" [-o start_options]
srvctl stop instance -d name -i
"inst_name_list" [-o stop_options]
srvctl stop instance -d orcl -i
"orcl3,orcl4" -o immediate
srvctl start database -d name [-o
start_options]
srvctl stop database -d name [-o
stop_options]
srvctl start database -d orcl -o mount
What are the major RAC wait events?
In a RAC environment the buffer cache is
global across all instances in the cluster and hence the processing differs.
The most common wait events related to
this are gc cr request
and gc
buffer busy
GC CR request :the time it takes to
retrieve the data from the remote cache
Reason: RAC Traffic Using Slow
Connection or Inefficient queries (poorly tuned queries will increase the
amount of data blocks requested by an Oracle session.
The more blocks requested typically
means the more often a block will need to be read from a remote instance via
the interconnect.)
GC BUFFER
BUSY: It is the time the remote instance locally spends accessing the requested
data block.
How do we verify an existing current
backup of OCR?
We can
verify the current backup of OCR using the following command : ocrconfig
-showbackup
How to move OCR and Voting disk to new
storage device? (
verify in non-prod first)
Moving OCR
==========
You must be logged in as the root user,
because root owns the OCR files.
Also an ocrmirror must be in place
before trying to replace the OCR device.
Make sure there is a recent backup of
the OCR file before making any changes:
ocrconfig –showbackup
If there is not a recent backup copy of
the OCR file, an export can be taken for the current OCR file. Use the
following command to generate an export of the online OCR file:
In 10.2
# ocrconfig –export -s online
In 11g
# ocrconfig -manualbackup
The new OCR disk must be owned by root,
must be in the oinstall group, and must have permissions set to 640. Provide at
least 100 MB disk space for the OCR.
On one node as root run:
# ocrconfig -replace ocr
# ocrconfig -replace ocrmirror
Now run ocrcheck to verify if the OCR is
pointing to the new file
Moving Voting Disk
==================
Note: crsctl votedisk commands must be
run as root
Shutdown the Oracle Clusterware (crsctl
stop crs as root) on all nodes before making any modification to the voting
disk. Determine the current voting disk location using:
crsctl query css votedisk
Take a backup of all voting disk:
dd if=voting_disk_name
of=backup_file_name
To move a Voting Disk, provide the full
path including file name:
crsctl delete css votedisk –force
crsctl add css votedisk –force
After modifying the voting disk, start
the Oracle Clusterware stack on all nodes
# crsctl start crs
Verify the voting disk location using
crsctl query
css votedisk
What is split brain ?
In RAC environment, server nodes
communicate with each other using High speed private interconnects network. A
split brain situation happens when all the links of the private interconnect
fail to respond to each other but instances are still up and running. So each
instance thinks that the other nodes/instances are dead and that it should take
over the ownership.
In split brain situation, instances
independtly access the data and modify the same blocks and the database will
end up with changed database overwritten which could lead to data corruption.
To avoid this, various algorithm are implemented to handle split brain
scenario.
In RAC, the IMR (Instance Membership
Recovery) service is one of the one of the efficient algorithm used to detect
& resolve the split-brain syndrome. When one instance fails to communicate
with other instances or when one instance becomes inactive due to any reason
and is unable to issue the control file heartbeat, the split brain is detected
and the detecting instance will evict the failed instance from the
database.This process is called node eviction.
What is FAN?
Applications can use Fast Application
Notification (FAN) to enable rapid failure detection, balancing of connection
pools after failures, and re-balancing of connection pools when failed
components are repaired.The FAN process uses system events that Oracle
publishes when cluster servers become unreachable or if network interfaces
fail.
What is FCF?
Fast Connection Failover provides high
availability to FAN integrated clients, such as clients that use JDBC, OCI, or
ODP.NET. If you configure the client to use fast connection failover, then the
client automatically subscribes to FAN events and can react to database UP and
DOWN events.
In response, Oracle gives the client a
connection to an active instance that provides the requested database
service.
What is the difference between CRSCTL and SRVCTL?
crsctl manages clusterware-related operations:
· Starting
and stopping Oracle Clusterware
· Enabling
and disabling Oracle Clusterware daemons
· Registering
cluster resources
srvctl manages Oracle resource–related operations:
· Starting
and stopping database instances and services
· Also
from 11gR2 manages the cluster resources like network,vip,disks etc
How to control Oracle Clusterware?
To start or stop Oracle Clusterware on a specific node:
# crsctl stop crs
# crsctl start crs
To enable or disable Oracle Clusterware on a specific node:
# crsctl enable crs
# crsctl disable crs
How to check the cluster (all nodes) status?
To check the viability of Cluster Synchronization Services (CSS)
across nodes:
$ crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
How to check the cluster (one node) status?
$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533:
Event Manager is online
How do you troubleshoot node reboot ?
Please check metalink ...
Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for
diagnosing Oracle Clusterware Node evictions.
====================================================================================
What is split brain?
When database nodes in a cluster are unable to communicate with each other, they may continue to process and modify the data blocks independently. If the
same block is modified by more than one instance, synchronization/locking of the data blocks does not take place and blocks may be overwritten by others in the cluster. This state is called split brain.
What is cache fusion?
In a RAC environment, it is the combining of data blocks, which are shipped across the interconnect from remote database caches (SGA) to the local node, in order to fulfill the requirements for a transaction (DML, Query of Data Dictionary).
What is the difference between Crash recovery and Instance recovery?
When an instance crashes in a single node database on startup a crash recovery takes place. In a RAC enviornment the same recovery for an instance is performed by the surviving nodes called Instance recovery.
What is the interconnect used for?
It is a private network which is used to ship data blocks from one instance to another for cache fusion. The physical data blocks as well as data dictionary blocks are shared across this interconnect.
How do you determine what protocol is being used for Interconnect traffic?
One of the ways is to look at the database alert log for the time period when the database was started up.
What methods are available to keep the time synchronized on all nodes in the cluster?
Either the Network Time Protocol(NTP) can be configured or in 11gr2, Cluster Time Synchronization Service (CTSS) can be used.
What files components in RAC must reside on shared storage?
Spfiles, ControlFiles, Datafiles and Redolog files should be created on shared storage.
Where does the Clusterware write when there is a network or Storage missed heartbeat?
The network ping failure is written in $CRS_HOME/log
How do you find out what OCR backups are available?
The ocrconfig -showbackup can be run to find out the automatic and manually run backups.
If your OCR is corrupted what options do have to resolve this?
You can use either the logical or the physical OCR backup copy to restore the Repository.
How do you find out what object has its blocks being shipped across the instance the most?
You can use the dba_hist_seg_stats.
What is a VIP in RAC use for?
The VIP is an alternate Virtual IP address assigned to each node in a cluster. During a node failure the VIP of the failed node moves to the surviving node and relays to the application that the node has gone down. Without VIP, the application will wait for TCP timeout and then find out that the session is no longer live due to the failure.
How do we know which database instances are part of a RAC cluster?
You can query the V$ACTIVE_INSTANCES view to determine the member instances of the RAC cluster.
What is OCLUMON used for in a cluster environment?
The Cluster Health Monitor (CHM) stores operating system metrics in the CHM repository for all nodes in a RAC cluster. It stores information on CPU, memory, process, network and other OS data, This information can later be retrieved and used to troubleshoot and identify any cluster related issues. It is a default component of the 11gr2 grid install. The data is stored in the master repository and replicated to a standby repository on a different node.
What would be the possible performance impact in a cluster if a less powerful node (e.g. slower CPU’s) is added to the cluster?
All processing will show down to the CPU speed of the slowest server.
What is the purpose of OLR?
Oracle Local repository contains information that allows the cluster processes to be started up with the OCR being in the ASM storage ssytem. Since the ASM file system is unavailable until the Grid processes are started up a local copy of the contents of the OCR is required which is stored in the OLR.
What is the default memory allocation for ASM?
In 10g the default SGA size is 1G in 11g it is set to 256M and in 12c ASM it is set back to 1G.
How do you backup ASM Metadata?
You can use md_backup to restore the ASM diskgroup configuration in-case of ASM diskgroup storage loss.
What files can be stored in the ASM diskgroup?
In 11g the following files can be stored in ASM diskgroups.
- Datafiles
- Redo logfiles
- Spfiles
In 12c the files below can also new be stored in the ASM Diskgroup
- Password file
What it the ASM POWER_LIMIT?
This is the parameter which controls the number of Allocation units the ASM instance will try to rebalance at any given time. In ASM versions less than 11.2.0.3 the default value is 11 however it has been changed to unlimited in later versions.
What is a rolling upgrade?
A patch is considered a rolling if it is can be applied to the cluster binaries without having to shutting down the database in a RAC environment. All nodes in the cluster are patched in a rolling manner, one by one, with only the node which is being patched unavailable while all other instance open.
What are some of the RAC specific parameters?
Some of the RAC parameters are:
- CLUSTER_DATABASE
- CLUSTER_DATABASE_INSTANCE
- INSTANCE_TYPE (RDBMS or ASM)
- ACTIVE_INSTANCE_COUNT
- UNDO_MANAGEMENT
What is the future of the Oracle Grid?
The Grid software is becoming more and more capable of not just supporting HA for Oracle Databases but also other applications including Oracle’s applications. With 12c there are more features and functionality built-in and it is easier to deploy these pre-built solutions, available for common Oracle applications.
What components of the Grid should I back up?
Is there an easy way to verify the inventory for all remote nodes
You can run the opatch lsinventory -all_nodes command from a single node to look at the inventory details for all nodes in the cluster.
No comments:
Post a Comment