Backing up and Recovering a Data Warehouse

This section discusses some considerations for data warehouse backup and recovery, and includes the following topics:

Strategies and Best Practices for Backup and Recovery

Devising a backup and recovery strategy can be a daunting task. And when you have hundreds of gigabytes of data that must be protected and recovered in the case of a failure, the strategy can be very complex.

The following best practices can help you implement your warehouse's backup and recovery strategy:

Best Practice A: Use ARCHIVELOG Mode
Best Practice B: Use RMAN
Best Practice C: Use Read-Only Tablespaces
Best Practice D: Plan for NOLOGGING Operations
Best Practice E: Not All Tablespaces are Equally Important

Best Practice A: Use ARCHIVELOG Mode

Archived redo logs are crucial for recovery when no data can be lost, because they constitute a record of changes to the database. Oracle can be run in either of two modes:

ARCHIVELOG -- Oracle archives the filled online redo log files before reusing them in the cycle.
NOARCHIVELOG -- Oracle does not archive the filled online redo log files before reusing them in the cycle.

Running the database in ARCHIVELOG mode has the following benefits:

The database can be completely recovered from both instance and media failure.
The user can perform backups while the database is open and available for use.
Oracle supports multiplexed archive logs to avoid any possible single point of failure on the archive logs
The user has more recovery options, such as the ability to perform tablespace-point-in-time recovery (TSPITR).
Archived redo logs can be transmitted and applied to the physical standby database, which is an exact replica of the primary database.
The database can be completely recovered from both instance and media failure.

Running the database in NOARCHIVELOG mode has the following consequences:

The user can only back up the database while it is completely closed after a clean shutdown.
Typically, the only media recovery option is to restore the whole database, which causes the loss of all transactions since the last backup.

Is Downtime Acceptable?

Oracle database backups can be made while the database is open or closed. Planned downtime of the database can be disruptive to operations, especially in global enterprises that support users in multiple time zones, up to 24-hours per day. In these cases it is important to design a backup plan to minimize database interruptions.

Depending on your business, some enterprises can afford downtime. If your overall business strategy requires little or no downtime, then your backup strategy should implement an online backup. The database needs never to be taken down for a backup. An online backup requires the database to be in ARCHIVELOG mode.

There is essentially no reason not to use ARCHIVELOG mode. All data warehouses (and for that matter, all mission-critical databases) should use ARCHIVELOG mode. Specifically, given the size of a data warehouse (and consequently the amount of time to back up a data warehouse), it is generally not viable to make an offline backup of a data warehouse, which would be necessitated if one were using NOARCHIVELOG mode.

Of course, large-scale data warehouses may undergo large amounts of data-modification, which in turn will generate large volumes of log files. To accommodate the management of large volumes of archived log files, RMAN provides the option to compress log files as they are archived. This will enable you to keep more archive logs on disk for faster accessibility for recovery.

In summary, a best practice is to put the database in archive log mode to provide online backups and point-in-time recovery options.

Best Practice B: Use RMAN

There are many reasons to adopt RMAN. Some of the reasons to integrate RMAN into your backup and recovery strategy are that it offers:

extensive reporting
incremental backups
downtime free backups
backup and restore validation
backup and restore optimization
easily integrates with media managers
block media recovery
archive log validation and management
corrupt block detection

Best Practice C: Use Read-Only Tablespaces

One of the biggest issues facing a data warehouse is sheer size of a typical data warehouse. Even with powerful backup hardware, backups may still take several hours. Thus, one important consideration in improving backup performance is minimizing the amount of data to be backed up. Read-only tablespaces are the simplest mechanism to reduce the amount of data to be backed up in a data warehouse.

The advantage of a read-only tablespace is that the data only need to be backed up once. So, if a data warehouse contains five years of historical data, and the first four years of data can be made read-only. Theoretically, the regular backup of the database would only back up 20% of the data. This can dramatically reduce the amount of time required to back up the data warehouse.

Most data warehouses store their data in tables that have been range-partitioned by time. In a typical data warehouse, data is generally 'active' for a period ranging anywhere from 30 days to one year. During this period, the historical data can still be updated and changed (for example, a retailer may accept returns up to 30 days beyond the date of purchase, so that sales data records could change during this period). However, once a data has reached a certain date, it is often known to be static.

By taking advantage of partitioning, users can make the static portions of their data read-only. RMAN supports read-only tablespaces rather than read-only partitions or tables. To take advantage of the read-only tablespaces and reduce the backup window, a strategy of storing constant data partitions in a read-only tablespace should be devised. Two strategies for implementing a rolling window are as follows:

Implement a regularly scheduled process to move partitions from a read/write tablespace to a read-only tablespace when the data matures to the point where it is entirely static.

The best practice in this case is to put the database in ARCHIVELOG mode to provide online backups and point-in-time recovery options.
Create a series of tablespaces, each containing a small number of partitions and regularly modify one tablespace from read/write to read-only as the data in that tablespaces ages.

One consideration is that backing up data is only half of the recovery process. If you configure a tape system so that it can backup the read/write portions of a data warehouse in 4 hours, the corollary is that a tape system might take 20 hours to recover the database if a complete recovery is necessary when 80% of the database is read-only.

In summary, a best practice is to place static tables and partitions into read-only tablespaces. A read-only tablespace must be backed up only once.

Best Practice D: Plan for NOLOGGING Operations

In gener al, one of the highest priorities for a data warehouse is performance. Not only must the data warehouse provide good query performance for online users, but the data warehouse must also be efficient during the ETL process so that large amount of data can be loaded in the shortest amount of time.

One common optimization leveraged by data warehouses is to execute bulk-data operations using the NOLOGGING mode. The database operations which support NOLOGGING modes are direct-path loads and inserts, index creation, and table creation. When an operation runs in NOLOGGING mode, data is not written to the redo log (or more precisely, only a small set of metadata is written to the redo log). This mode is widely used within data warehouses and can improve the performance of bulk data operations by up to 50%.

However, the trade-off is that a NOLOGGING operation cannot be recovered using conventional recovery mechanisms, since the necessary data to support the recovery was never written to the log file. Moreover, subsequent operations to the data upon which a nologging operation has occurred also cannot be recovered even if those operations were not using nologging mode. Because of the performance gains provided by NOLOGGING operations, it is generally recommended that data warehouses utilize nologging mode in their ETL process.

The presence of NOLOGGING operations must be taken into account when devising the backup and recovery strategy. When a database relies on NOLOGGING operations, the conventional recovery strategy (of recovering from the latest tape backup and applying the archived logfiles) is no longer applicable because the log files cannot recover the NOLOGGING operation.

The first principle to remember is, do not make a backup when a NOLOGGING operation is occurring. Oracle does not currently enforce this rule, so the DBA must schedule the backup jobs and the ETL jobs such that the nologging operations do not overlap with backup operations.

There are two approaches to backup and recovery in the presence of NOLOGGING operations: ETL or incremental backups. If you are not using NOLOGGING operations in your data warehouse, then you do not have to choose either of the following options: you can recover your data warehouse using archived logs. However, the following options may offer some performance benefits over an archive-log-based approach in the event of recovery.

Extraction, Transformation, and Loading
Incremental Backup

Extraction, Transformation, and Loading

The ETL process uses several Oracle features or tools and a combination of methods to load (re-load) data into a data. These features or tools may consist of:

Transportable Tablespaces. The Oracle Transportable Tablespace feature enables users to quickly move a tablespace across Oracle databases. It is the most efficient way to move bulk data between databases. Oracle Database provides the ability to transport tablespaces across platforms. If the source platform and the target platform are of different endianness, then RMAN will convert the tablespace being transported to the target format.
SQL*Loader. SQL*Loader loads data from external flat files into tables of an Oracle database. It has a powerful data-parsing engine that puts little limitation on the format of the data in the datafile.
Data Pump (export/import). Oracle Database offers the Oracle Data Pump technology, which enables very high-speed movement of data and metadata from one database to another. This technology is the basis for Oracle's data movement utilities, Data Pump Export and Data Pump Import.
External Tables. The external tables feature is a complement to existing SQL*Loader functionality. It enables you to access data in external sources as if it were in a table in the database.

The ETL Strategy and NOLOGGING Operations

One approach is to take regular database backups and store the necessary data files to re-create the ETL process for that entire week. If a recovery is necessary, the data warehouse could be recovered from the most recent backup. Then, instead of rolling forward by applying the archived redo logs (as would be done in a conventional recovery scenario), the data warehouse could be rolled forward by re-running the ETL processes. This paradigm assumes that the ETL processes can be easily replayed, which would typically involve storing a set of extract files for each ETL process (many data warehouses do this already as a best practice, in order to be able to identify repair a bad data feed for example).

A sample implementation of this approach is make a backup of the data warehouse every weekend, and then store the necessary files to support the ETL process for each night. Thus, at most, seven days of ETL processing would need to be re-applied in order to recover a database. The data warehouse administrator can easily project the length of time to recover the data warehouse, based upon the recovery speeds from tape and performance data from previous ETL runs.

Essentially, the data warehouse administrator is gaining better performance in the ETL process through nologging operations, at a price of slight more complex and less-automated recovery process. Many data warehouse administrators have found that this is a desirable trade-off.

One downside to this approach is that the burden is upon the data warehouse administrator to track all of the relevant changes that have occurred in the data warehouse. This approach will not capture changes that fall outside of the ETL process. For example, in some data warehouses, end-users may create their own tables and data structures. Those changes will be lost in the event of a recovery. This restriction must be conveyed to the end-users. Alternatively, one could also mandate that end-users create all of private database objects in a separate tablespace, and during recovery, the DBA could recover this tablespace using conventional recovery while recovering the rest of the database using the approach of replaying the ETL process.

In summary, a best practice is to restore a backup that does not contain non-recoverable (nologging) transactions. Then replay the ETL process to reload the data.

Sizing the Block Change Tracking File

The size of the block change tracking file is proportional to:

Database size in bytes. The block change tracking file contains data representing data file blocks in the database. The data is approximately 1/250000 of the total size of the database.
The number of enabled threads. All Real Application Cluster (RAC) instances have access to the same block change tracking file, however, the instances update different areas of the tracking file without any locking or inter-node block swapping. You enable block change tracking for the entire database and not for individual instances.
Changed Block Metadata. The block change tracking file keeps a record of all changes between previous backups, in addition to the modifications since the last backup. The tracking file retains the change history for a maximum of eight backups. If the tracking file contains the change history for eight backups then the Oracle database overwrites the oldest change history information.

Let us take an example of a 500 GB database, with only one thread, and having eight backups kept in the RMAN repository will require a block change tracking file of 20 MB.

((Threads * 2) + number of old backups) * (database size in bytes)
------------------------------------------------------------------ = 20MB
 250000

Incremental Backup

A more automated backup and recovery strategy in the presence of nologging operations leverages RMAN's incremental backup capability Incremental backups have been part of RMAN since it was first released. Incremental backups provide the capability to backup only the changed blocks since the previous backup. Incremental backups of datafiles capture data changes on a block-by-block basis, rather than requiring the backup of all used blocks in a datafile. The resulting backups set are generally smaller and more efficient than full datafile backups, unless every block in the datafile is change.

Oracle Database delivers the ability for faster incrementals with the implementation of the change tracking file feature. When you enable block change tracking, Oracle tracks the physical location of all database changes. RMAN automatically uses the change tracking file to determine which blocks must be read during an incremental backup and directly accesses that block to back it up.

The Incremental Approach

A typical backup and recovery strategy using this approach is to backup the data warehouse every weekend, and then take incremental backups of the data warehouse every night following the completion of the ETL process. Note that incremental backups, like conventional backups, must not be run concurrently with nologging operations. In order to recover the data warehouse, the database backup would be restored, and then each night's incremental backups would be re-applied. Although the nologging operations were not captured in the archive logs, the data from the nologging operations is present in the incremental backups. Moreover, unlike the previous approach, this backup and recovery strategy can be completely managed using RMAN.

The replay ETL approach and the incremental backup approach are both recommended solutions to efficiently and safely backing up and recovering a database which is a workload consisting of many NOLOGGING operations. The most important consideration is that your backup and recovery strategy must take these NOLOGGING operations into account.

In summary, a best practice is to implement Block Change Tracking functionality and make an incremental backup after a direct load that leaves objects unrecoverable due to NOLOGGING operations.

Best Practice E: Not All Tablespaces are Equally Important

While the simplest backup and recovery scenario is to treat every tablespace in the database the same, Oracle provides the flexibility for a DBA to devise a backup and recovery scenario for each tablespace as needed.

Not all of the tablespaces in a data warehouse are equally significant from a backup and recovery perspective. DBAs can leverage this information to devise more efficient backup and recovery strategies when necessary. The basic granularity of backup and recovery is a tablespace, so different tablespaces can potentially have different backup and recovery strategies. On the most basic level, temporary tablespaces never need to be backed up (a rule that RMAN enforces).

Moreover, in some data warehouses, there may be tablespaces that are not explicit temporary tablespaces but are essentially functioning as temporary tablespaces as they are dedicated to scratch space for end-users to store temporary tables and incremental results. Depending upon the business requirements, these tablespaces may not need to backed up and restored. Instead, in the case of a loss of these tablespaces, the end users would re-create their own data objects.

In many data warehouses, some data is more important than other data. For example, the sales data in a data warehouse may be crucial and, in a recovery situation, this data must be online as soon as possible. But, in the same data warehouse, a table storing clickstream data from the corporate Web site may be much less critical. The business may tolerate this data being offline for a few days or may even be able to accommodate the loss of several days of clickstream data in the event of a loss of database files. In this scenario, the tablespaces containing sales data must be backed up often, while the tablespaces containing clickstream data need only to be backed up once every week or two.