Oracle Parallel ProcessingBy Tushar Mahapatra & Sanjay Mishra
1st Edition August 2000
1-56592-701-X, Order Number: 701X
400 pages, $32.95
Chapter 1In this chapter:
About Parallel Processing
Parallel Processing for Databases
Parallel Processing in Oracle
Overhead for Oracle Parallel Processing
Requirements for Oracle's Parallel Features
Parallel processing is becoming increasingly important in the world of database computing. These days, databases often grow to enormous sizes and are accessed by larger and larger numbers of users. This growth strains the ability of single-processor and single-computer systems to handle the load. More and more organizations are turning to parallel processing technologies to give them the performance, scalability, and reliability they need. Oracle Corporation is a leader in providing parallel processing technologies in a wide range of products. This chapter provides an overview of parallel processing in general and also describes how parallel processing features are implemented in an Oracle environment.
About Parallel Processing
Parallel processing involves taking a large task, dividing it into several smaller tasks, and then working on each of those smaller tasks simultaneously. The goal of this divide-and-conquer approach is to complete the larger task in less time than it would have taken to do it in one large chunk.
Your local grocery store provides a good, real-life analogy to parallel processing. Your grocer must collect money from customers for the groceries they purchase. He could install just one checkout stand, with one cash register, and force everyone to go through the same line. However, the line would move slowly, people would get fidgety, and some would go elsewhere to shop. To speed up the process, your grocer doubtless uses several checkout stands, each with a cash register of its own. This is parallel processing at work. Instead of checking out one customer at a time, your grocer can now handle several at a time.
In our grocery store analogy, parallel processing required several checkout stands, each with its own cash register. Without trying to push the analogy too far, think of each checkout stand as a computer and each cash register as a processor. In a computing environment, the multiple processors in a parallel processing system may all reside on the same computer, or they may be spread across separate computers. When they are spread across separate computers, each computer is referred to as a node.
There are a few basic requirements of parallel computing:
- Computer hardware that is designed to work with multiple processors and that provides a means of communication between those processors
- An operating system that is capable of managing multiple processors
- Application software that is capable of breaking large tasks into multiple smaller tasks that can be performed in parallel
Weather forecasting provides another real-life example of parallel processing at work. Satellites used for weather forecasting collect millions of bytes of data per second on the condition of earth's atmosphere, formation of clouds, wind intensity and direction, temperature, and so on. This huge amount of data has to be processed by complex algorithms to arrive at a proper forecast. Thousands of iterations of computation may be needed to interpret this environmental data. Parallel computers are used to perform these computations in a timely manner so a weather forecast can be generated early enough for it to be useful.
Why Parallel Processing?
Why do you need parallel processing? Why not just buy a faster computer? The answers to these questions lie largely in the laws of physics.
Computers were invented to solve problems faster than a human being could. Since day one, people have wanted computers to do more and to do it faster. Vendors responded with improved circuitry design for the processor, improved instruction sets, and improved algorithms to meet the demand for faster response time. Advances in engineering made it possible to add more logic circuits to processors. Processor circuit designs developed from small-scale to medium-scale integration, and then to large-scale and very large-scale integration. Some of today's processors have billions of transistors in them. The clock cycle of processors has also been reduced over the years. Some of today's processors have a clock cycle on the order of nanoseconds, and CPU frequencies have crossed the one-gigahertz barrier. All of these advances have led to processors that can do more work faster than ever before.
However, there are physical limitations on this trend of constant improvement. The processing speed of processors depends on the transmission speed of information between the electronic components within the processor. This speed, in turn, is limited by the speed of light, which is 300 mm per nanosecond. But to achieve the speed of light, optical communication methods would have to be used within processors. Therefore, the speed of processors cannot be increased beyond a certain point. Another limiting factor is that the density of the transistors within a processor can be pushed only to a certain limit. Beyond that limit, the transistors create electromagnetic interference for one another.
As improvements in clock cycle and circuitry design reached an optimum level, hardware designers looked for other alternatives to increase performance. Parallelism is the result of those efforts. Parallelism enables multiple processors to work simultaneously on several parts of a task in order to complete it faster than could be done otherwise.
Do You Need Parallel Processing?
Parallel processing not only increases processing power, it also offers several other advantages when it's implemented properly. These advantages are:
- Higher throughput
- More fault tolerance
- Better price/performance
There are hundreds of applications today that benefit from these advantages.
But parallelism is not the answer for everything. There are some added costs associated with parallelism. Synchronization between parts of a program executed by different processors is an overhead of parallelism that needs to be managed and kept at a minimum. Also, administering a parallel computing environment is more complicated than administering a serial environment.
Applications that already run satisfactorily in a serial environment may not benefit from a switch to a parallel processing environment. In addition, not all problems are amenable to parallel solutions. Unless your application is capable of decomposing large tasks into multiple smaller, parallelizable tasks, parallel processing will be of no benefit.
Parallel processing is useful for only those applications that can break larger tasks into smaller parallel tasks and that can manage the synchronization between those tasks. In addition, there must be a performance gain large enough to justify the overhead of parallelism.
Parallel Hardware Architectures
The subject of parallel computing has attracted attention from scientists and engineers, as well as from commercial vendors. Over the years, several commercially successful parallel hardware platforms have been developed. The most common of these are listed here, and are described in greater detail in Chapter 2, Architectures for Parallel Processing.
- Symmetric Multiprocessing systems
- Symmetric Multiprocessing (SMP) systems have multiple CPUs. The number usually varies from 2 to 64. All of the CPUs in an SMP machine share the same memory, the system bus, and the I/O system. A single copy of the operating system controls all of the CPUs.
- Massively Parallel Processing systems
- Massively Parallel Processing (MPP) systems consist of several nodes connected together. Each node has its own CPU, memory, bus, disks, and I/O system. Each node runs its own copy of the operating system. The number of nodes in an MPP system can vary from two all the way to several thousand.
- Clustered systems
- A clustered system consists of several nodes loosely coupled using local area network (LAN) interconnection technology. Each of these nodes can be a single-processor machine or SMP machine. In a cluster, system software balances the workload among the nodes and provides for high availability.
- Non Uniform Memory Access systems
- Non Uniform Memory Access (NUMA) systems consist of several SMP systems that are interconnected in order to form a larger system. All of the memory in all of the SMP systems are connected together to form a single large memory space. NUMA systems run one copy of the operating system across all nodes.
Parallel Processing for Databases
Three issues are driving the increasing use of parallel processing in database environments:
- The need for increased speed or performance
- Database sizes are increasing, queries are becoming more complex--especially in data warehouse systems--and the database software must somehow cope with the increasing demands that result from this complexity.
- The need for scalability
- This requirement goes hand-in-hand with performance. Databases often grow rapidly, and companies need a way to easily and cost-effectively scale their systems to match that growth.
- The need for high availability
- High availability refers to the need to keep a database up and running with minimal or no downtime. With the increasing use of the Internet, companies need to accommodate users at all hours of the day and night.
Database sizes have been increasing steadily, and it's now quite common to find data warehouses holding several hundred gigabytes of data. Some databases, referred to as Very Large Databases (VLDBs), even hold several terabytes of data. Complex queries are run on these data warehouses to gather business intelligence and to aid in decision making. Such queries require a lot of processing time to execute. By executing these queries in parallel, you can reduce the elapsed time while still providing the required processor time.
Speedup is defined as the ratio between the runtime with one processor and the runtime using multiple processors. It measures the performance improvement gained using multiple processors instead of a single processor and is calculated using the following formula:
Speedup = Time1 / Timem
Time1 is the time it takes to execute a task using only one processor, while Timem is the time it takes to execute that same task using m processors.
Figure 1-1 shows a query that takes four minutes to complete using one processor, but that takes only one minute to complete using four processors.
Figure 1-1. Parallel processing speedup
Plugging the values from Figure 1-1 into the speedup formula yields the following results:
Speedup = 4 / 1
Speedup = 4.0
In this case, the speedup is 4. Multiplying the number of processors by 4 caused the query to finish in one-fourth the time. Unfortunately, such an ideal result is seldom achieved in real life.
In an ideal world, the parallel processing speedup would track with the number of processors used for any given task. In other words, the ideal speedup curve is a 45-degree line like the one you see in Figure 1-2.
Figure 1-2. Ideal speedup curve
The ideal speedup curve is rarely reached because parallelism entails a certain amount of overhead. The inherent parallelism of the application also plays an important role in the amount of speedup you can achieve. Some tasks are easily divided into parts that can be processed in parallel. The join of two large tables, for example, can be done in parallel. Other tasks, however, cannot be divided. A nonpartitioned index scan is one such example. If an application has little or no inherent parallelism, then little or no speedup will be achieved.
Efficiency is the speedup divided by the number of processors used. In our example, the number of processors is 4, and the speedup achieved is also 4. The efficiency then is 100%, which represents an ideal case.
Scalability is the ability to maintain performance levels as the workload increases by incrementally adding more system capacity (adding more processors and disks). On a single-processor system, it is very difficult to achieve scalability beyond a certain point. Parallel systems provide better scalability.
Parallel systems improve scalability
In many applications, the number of database users and the transaction volume are both likely to increase over time. The demand for added processing power to handle the increased load, without the loss of response time, can be met by using parallel systems. For example, to handle a higher load, an SMP system with four processors can be augmented to eight processors with additional memory and disk capacity.
In situations in which several thousand users are expected to use a database, the processing and memory requirements may be beyond the capacity of a single SMP system. In such a situation, you'll want to consider using parallel database systems. These allow you to have several nodes, each with its own copy of the database server software and memory structures, working together on a single, shared database. Such parallel systems can be clusters or MPP systems. As the user population grows, the number of nodes in the cluster or MPP system may be increased as needed to handle the additional load.
Scaleup is the ability of an application to retain response time as the job size or the transaction volume increases by adding additional processors and disks. The term scalability often is used in reference to scaleup.
In database applications, scaleup can be either batch or transactional. With batch scaleup, larger batch jobs can be supported without a loss of response time. With transaction scaleup, larger numbers of transactions can be supported without loss of response time. In both cases, response time is maintained by the addition of more processors. For example, a 4-processor system can provide the same response time with a workload of 400 transactions per minute as the response time of a single-processor system that supports a workload of 100 transactions per minute.
There is a key difference between scaleup and speedup: when calculating speedup, the problem size is kept fixed, whereas scaleup is calculated by increasing the problem size or transaction volume. Scaleup is measured in terms of how much the transaction volume can be increased by adding more processors while still maintaining a constant response time. Scaleup is calculated using the following formula:
Scaleup = Volumem / Volume1
Volumem is the transaction volume executed in a given amount of time using m processors, while Volume1 is the transaction volume executed in the same time using one processor. For our previous example:
Scaleup = 400 / 100
Scaleup = 4
this scaleup of 4 is achieved with 4 processors. This is an example of ideal (linear) scaleup. Figure 1-3 shows a graph of an ideal transaction scaleup. Notice that the response time remains constant by increasing the number of processors as the transaction volume is increased.
The curve, or flat line, really, in Figure 1-3 represents an ideal. In reality, after a certain point, the response time increases for higher transaction volumes even if additional processors are added.
Figure 1-3. Ideal scaleup curve
Databases are used in the mission-critical applications in organizations such as stock exchanges, banks, and airlines. Many database applications are expected to be available 24 hours a day, 7 days a week, all year round. The availability of such databases (along with other system components) is crucial to the success of these organizations. With an e-commerce application, for example, customers may log in any time of the day or night to request products and services. Database downtime causes loss of revenue and customer dissatisfaction. As the negative impact of database downtime has increased for many applications, high availability requirements have become an important factor in the design of database software.
Running parallel databases on a multinode parallel system is one way to provide high availability. Other high availability options include maintaining standby databases and replicated databases. With a parallel database, when one node goes down, it affects only the subset of users connected to the failed node; moreover, users of the failed node still can access the database after switching to one of the surviving nodes.
Economics is another driver toward parallel computing. It costs money to make processors faster. After a certain limit, increasing the processing power on a single CPU system becomes technically very difficult. Once that limit has been reached, SMP systems often provide better performance for the price. Likewise, when the scalability limit of SMP systems is reached, clusters or MPP systems may provide better price/performance ratios.
Types of Parallelism in Databases
Database applications can exploit two types of parallelism in a parallel computing environment: inter-query parallelism and intra-query parallelism. While inter-query parallelism has been around for many years, database vendors recently have started to implement intra-query parallelism as well.
Inter-query parallelism is the ability to use multiple processors to execute several independent queries simultaneously. Figure 1-4 illustrates inter-query parallelism, showing how three independent queries can be performed simultaneously by three processors. Inter-query parallelism does not provide speedup, because each query is still executed by only one processor.
In online transaction processing (OLTP) applications, each query is independent and takes a relatively short time to execute. As the number of OLTP users increases, more queries are generated. Without inter-query parallelism, all queries will be performed by a single processor in a time-shared manner. This slows down response time. With inter-query parallelism, queries generated by OLTP users can be distributed over multiple processors. Since the queries are performed simultaneously by multiple processors, response time remains satisfactory.
Figure 1-4. Inter-query parallelism
Intra-query parallelism is the ability to break a single query into subtasks and to execute those subtasks in parallel using a different processor for each. The result is a decrease in the overall elapsed time needed to execute a single query. Intra-query parallelism is very beneficial in decision support system (DSS) applications, which often have complex, long-running queries. As DSS systems have become more widely used, database vendors have been increasing their support for intra-query parallelism.
Figure 1-5 shows how one large query may be decomposed into two subtasks, which then are executed simultaneously using two processors. The results of the subtasks then are merged to generate a result for the original query. Intra-query parallelism is useful not only with queries, but also with other tasks such as data loading, index creation, and so on. Chapter 3, Parallel Execution Concepts, and Chapter 4, Using Parallel Execution, provide detailed information on Oracle's support of intra-query parallelism.
Figure 1-5. Intra-query parallelism
Parallel Processing in Oracle
Most modern commercial DBMS products have implemented parallel features, and Oracle is no exception. With every release, Oracle has consistently improved its support for parallel processing. Oracle's support for parallel processing can be divided into the following two specific feature sets:
- Parallel execution
- Refers to intra-query parallelism
- Parallel server
- Refers to the use of multiple instances to open a single, shared database
As you delve into Oracle's parallel feature set, you'll encounter several very similar terms that all begin with the word "parallel." Read through the following definitions; they will help you understand these terms before you read further:
- Oracle Parallel Server
- Refers to the Oracle option allowing you to share one database among multiple instances. Typically, these instances will be running on separate nodes of a clustered system.
- Parallel Server
- The same as Oracle Parallel Server.
- Parallel execution
- Refers to Oracle's ability to apply multiple CPUs to the task of executing a single SQL statement in order to complete execution faster than would be possible using only a single CPU.
- Parallel SQL
- Means the same thing as parallel execution.
- Parallel Query
- Refers to Oracle's ability to execute SELECT statements in parallel, using multiple CPUs. When parallel features first were introduced into Oracle years ago, the only support was for parallel SELECT statements, and at that point the feature was known as Parallel Query and was available through the Parallel Query Option (PQO). Now, Parallel Query is only a subset of Oracle's parallel execution features.
- Parallel DML
- Refers to Oracle's ability to execute Data Manipulation Language (DML) statements in parallel. Parallel DML is a subset of Oracle's parallel execution feature set.
- Parallel DDL
- Refers to Oracle's ability to execute some Data Definition Language (DDL) statements in parallel. Parallel DDL is a subset of Oracle's parallel execution feature set.
Oracle's parallel execution features enable Oracle to divide a task among multiple processes in order to complete the task faster. This allows Oracle to take advantage of multiple CPUs on a machine. The parallel processes acting on behalf of a single task are called parallel slave processes. Parallel execution features first were introduced in Oracle Version 7.1 in the form of the Parallel Query Option, which supported only parallel SELECT statements. Since then many new functions have been added. In Oracle7, support for parallel execution was a separately installed option. However, in Oracle8 and Oracle8i, it is embedded into the Oracle RDBMS product.
Let's look at a simple example that illustrates how parallel execution works in Oracle. The following SQL statement counts the number of orders in the orders table:
SQL> SELECT COUNT(*) FROM orders;
When you execute this statement serially--that is, without using any parallel execution features--a single process scans the orders table and counts the number of rows. However, if you had a four-processor machine and used Oracle's parallel execution features, the orders table would be split into four parts. A process would be started on each CPU, and the four parts of the table would be scanned simultaneously. The results of each of the four processes then would be merged to arrive at the total count. Figure 1-6 illustrates this situation.
Figure 1-6. Serial and parallel scans of a table
Oracle's parallel execution support extends far beyond simply executing SELECT statements in parallel. The full range of features includes all of the following:
- Parallel Query
- Large queries (SELECT statements) can be split into smaller tasks and executed in parallel by multiple slave processes in order to reduce the overall elapsed time. The task of scanning a large table, for example, can be performed in parallel by multiple slave processes. Each process scans a part of the table, and the results are merged together at the end. Oracle's parallel query feature can significantly improve the performance of large queries and is very useful in decision support applications, as well as in other environments with large reporting requirements.
- Parallel DML
- In addition to SELECT statements, Oracle can execute DML operations such as INSERT, UPDATE, and DELETE in parallel. There are some restrictions on this capability, however. UPDATE and DELETE operations can be parallelized only on partitioned tables. INSERT INTO . . . SELECT . . . FROM statements can be parallelized for nonpartitioned as well as partitioned tables. Parallel DML is particularly advantageous in data warehouse environments that maintain summary and historical tables. The time needed to rebuild or otherwise maintain these tables is reduced because the work can be done in parallel. Parallel DML also is useful in OLTP systems to improve the performance of long-running batch jobs.
- Parallel DDL
- Oracle now has the ability to parallelize table and index creation. Statements such as the following can be parallelized:
CREATE TABLE...AS SELECT...FROM
ALTER INDEX REBUILD
- Data warehouse applications frequently require summary and temporary tables to be built, and the parallel object creation feature can be very useful in performing such tasks. OLTP applications can use this feature to rebuild indexes at regular intervals in order to keep those indexes efficient.
- Parallel data loading
- Bulk data loading can be parallelized by splitting the input data into multiple files and running multiple SQL*Loader sessions simultaneously to load data into a table. Loading large amounts of data in bulk is a necessary requirement of all data warehouse applications, and Oracle's parallel loading feature can greatly reduce the time needed to load that data.
- Parallel recovery
- Oracle's parallel recovery feature can reduce the time needed for instance and media recovery. With parallel recovery, multiple parallel slave processes will be used to perform recovery operations. The system monitor (SMON) background process reads the redo log files, and the parallel slave processes apply the changes to the datafiles. Recovery of a large database takes a significant amount of time, and parallel recovery can be used to reduce that time.
- Parallel replication propagation
- If you are using replication to maintain copies of database objects in multiple databases, you can use parallel propagation to update those copies efficiently. Changes made in one database can be propagated to another database using multiple slave processes to speed up the propagation.
Oracle Parallel Server
Oracle Parallel Server (OPS) enables one database to be mounted and opened concurrently by multiple instances. Each OPS instance is like any standalone Oracle instance and runs on a separate node having its own CPU and memory. The database resides on a disk subsystem shared by all nodes. OPS takes parallelism to a higher plane by allowing you to spread work not only over multiple CPUs, but also over multiple nodes. OPS offers many more advantages that we'll explain later in this section.
Figure 1-7 illustrates an OPS database comprised of two nodes. Each node runs one Oracle instance. Each instance has its own set of background processes and its own System Global Area (SGA). Both of the instances mount and open a database residing on a shared disk subsystem.
Figure 1-7. Oracle Parallel Server allows many instances to operate on a single database
Oracle Parallel Server is a separately installable option available with the Oracle RDBMS software. If you are planning to run a parallel server database, you need to install this option, along with the Oracle RDBMS software, on a system with a shared disk architecture. In addition, even after installation, you need to enable the parallel server option. Chapter 6, Oracle Parallel Server Architecture, and Chapter 7, Administering an OPS Database, cover this subject in detail.
Because a parallel server system is a multi-instance configuration, such a system provides some distinct advantages over a single-instance (often referred to as a standalone-instance) system. These advantages include:
- High availability
- Better scalability
- Load balancing
When one instance goes down, other instances continue functioning. This increases availability, because the failure of one instance does not result in the database's becoming unavailable to users. Users connected to other instances continue working without disruption. Users connected to the instance that failed can reconnect to any of the surviving instances.
If you get more users on an OPS system than can be handled by the existing nodes, you can add another node easily and start an additional instance on that node. Doing so results in better scalability than you typically would get with a single-instance system. Oracle also has features that allow you to balance the workload among the instances of an OPS database, enabling you to optimize the load on each node.
Managing an OPS database is a much more complex task than managing a stand-alone, single-instance database. Not only do you have the complexity of dealing with multiple instances, you also have several performance issues that come about as a result. , discusses the unique aspects of managing Oracle Parallel Server.
Parallel execution and the Oracle Parallel Server Option are two separate Oracle features that can work independently as well as together. Parallel execution features can be utilized on databases with or without the Oracle Parallel Server Option. Without OPS, the parallel slave processes on behalf of a task run on only one instance, whereas with OPS the parallel slave processes on behalf of a task run on one or more instances.
Overhead for Oracle Parallel Processing
As you might imagine, the benefits of parallel execution and Oracle Parallel Server do not come without a price. The next two sections discuss the various overhead issues that apply to parallel execution and Oracle Parallel Server.
Parallel Execution Overhead
Parallel execution entails a cost in terms of the processing overhead necessary to break up a task into pieces, manage the execution of each of those pieces, and combine the results when the execution is complete. Figure 1-8 illustrates some of the steps involved in parallel execution.
Figure 1-8. Steps in parallel execution
Parallel execution overhead can be divided into three areas: startup cost, interference, and skew.
Startup cost refers to the time it takes to start parallel execution of a query or a DML statement. It takes time and resources to divide one large task into smaller subtasks that can be run in parallel. Time also is required to create the processes needed to execute those subtasks and to assign each subtask to one of these processes. For a large query, this startup time may not be significant in terms of the overall time required to execute the query. For a small query, however, the startup time may end up being a significant portion of the total time.
Interference refers to the slowdown that one process imposes on other processes when accessing shared resources. While the slowdown resulting from one process is small, with a large number of processors the impact can be substantial.
Skew refers to the variance in execution time of parallel subtasks. As the number of parallel subtasks increases (perhaps as a result of using more processors), the amount of work performed by each subtask decreases. The result is a reduction in processing time required for each of those subtasks and also for the overall task. There are always variations, however, in the size of these subtasks. In some situations these variations may lead to large differences in execution time between the various subtasks. The net effect, when this happens, is that the processing time of the overall task becomes equivalent to that of the longest subtask.
Let's suppose you have a query that takes ten minutes to execute without parallel processing. Let's further suppose when parallel processing is used that the query is broken down into ten subtasks and that the average processing time of each subtask is one minute. In this perfect situation, the overall query is also completed in one minute, resulting in a speedup of ten. However, if one subtask takes two minutes, the response of the overall query becomes two minutes. The speedup achieved is then only five. The skew, or the variation in execution time between subtasks, has reduced the efficiency of this particular parallel operation.
Oracle Parallel Server Overhead
When Oracle Parallel Server is used, multiple instances need to synchronize access to the objects in the shared database. Synchronization is achieved by passing messages back and forth between the OPS nodes. A Lock Manager called Integrated Distributed Lock Manager (IDLM) facilitates this synchronization. The amount of synchronization required depends on the data access requirements of the particular applications running on each of the OPS nodes. If multiple OPS nodes are trying to access the same object, then there will be a lot of conflict. This increases the need for synchronization, resulting in a high amount of overhead. An OPS database needs to be designed and tuned to minimize this overhead. Otherwise, the required synchronization overhead may reduce, or even negate, the benefits of using OPS in the first place.
Oracle's parallel execution features also may be used, and often are used, in an OPS environment. If you do use Oracle's parallel execution features in an OPS environment, then all the issues discussed in the previous section on parallel execution overhead also apply.
Requirements for Oracle's Parallel Features
In order to use Oracle's parallel processing features for your database applications, you must:
- Have both suitable hardware and operating system software that supports parallel processing
- Configure Oracle Server appropriately
- Modify your database applications to get the benefit of parallel processing
Table 1-1 briefly summarizes the requirements for using Oracle's parallel execution features and for using Oracle Parallel Server.
Table 1-1: Requirements for Using Oracle's Parallel Features
Type of Requirement
Oracle Parallel Server
Multiple processors (SMP, MPP, or NUMA).
Multiple nodes (cluster, MPP, or NUMA cluster) shared disk architecture.
Operating system software
Operating system must support multiple
Operating system must support resource management and communication across nodes. The operating system also must support a disk-sharing configuration such as the use of raw devices on Unix and NT platforms.
Configure initialization parameters. If using Oracle7, install the Oracle Parallel Query Option.
Purchase and install the Oracle Parallel Server Option. Configure initialization parameters.
Minor changes to SQL statements.
The following sections discuss some of the Table 1-1 requirements in greater detail.
Parallel Execution Requirements
Requirements for implementing parallel execution in Oracle are relatively straightforward. You need a computer with multiple processors, and you need to make relatively minor changes to Oracle initialization parameters and to your database application for parallel execution.
Parallel execution is not beneficial with a single-processor machine. In order to take advantage of Oracle's parallel execution features, you need to be running in an environment that supports multiple CPUs. This could be an SMP system, an MPP system, or a clustered system. In MPP systems and clusters, parallel tasks are run on different nodes. Sometimes, each node of the clusters or MPP systems may be another SMP system.
Not only does your hardware need to support multiple CPUs, but your operating system also needs to be capable of managing multiple processors and be able to share common resources (memory, I/O, and system bus) among multiple processors. For example, several operating systems such as various flavors of Unix (AIX, Sun Solaris, HPUX, etc.), Windows NT, and VAX/VMS support multiprocessing.
Oracle DBMS requirements
In Oracle7, the parallel execution features were a separately installed option known as the Oracle Parallel Query Option. In Oracle8, these features are now part of the core database product. To enable them, all you need to do is to properly set some initialization parameters.
Setting up parallel execution in database applications is simple compared to scientific and other applications. In scientific applications, special algorithms are required to take advantage of parallel processing. Parallel compiling also is needed to compile those algorithms.
In contrast, once you've enabled the parallel execution features in Oracle, only minimal application modifications are necessary to take advantage of them. The number of parallel processes used to execute a query is referred to as the degree of parallelism. Minor changes may be required to the SQL statements in your application in order to specify the degree of parallelism to be used for each. The rest of the work required for parallel execution is handled transparently by the database server software. Oracle Server automatically takes care of dividing the task into smaller units, assigning work to multiple slave processes, and combining the results of individual slave processes to make the final result.
Oracle Parallel Server Requirements
Setting up Oracle Parallel Server is somewhat more complex than setting up parallel execution in an SMP environment. The installation process for Oracle Parallel Server is platform-specific, so be sure to check your platform-specific documentation before attempting an OPS install. Once you have the software installed, read Chapter 6 for a detailed explanation of the OPS architecture, and read Chapter 7 for detailed information on the steps involved in creating and managing an OPS database.
An Oracle Parallel Server runs on shared disk architectures where multiple nodes share common databases that are accessible from all nodes. Oracle Parallel Server can run on clusters and MPP systems but cannot be used on an SMP system, because SMP systems do not have more than one node.
System software requirements
To meet the system software requirements for parallel server execution, you must have a copy of the operating system on each node, and you must have system software to manage resources and communication among nodes. This management component often is referred to as a cluster manager. The system software also should be capable of making disks sharable among all the nodes. For example, on IBM RISC/6000 clusters, this component is referred to as HACMP (High Availability Clusters Multiprocessing).
Oracle DBMS requirements
Oracle Parallel Server is a separate option that has to be purchased and installed on all the nodes. Setting up and configuring the OPS option is more involved than setting up and configuring a standalone Oracle instance. Several additional initialization parameters have to be configured for OPS. Also, in order to synchronize database activity across multiple database instances, you will need to configure a new kind of database lock, referred to as an instances lock. Installation and configuration of OPS is quite involved, and you will find that it requires additional planning and careful analysis.
You can use the following query to check to see if the Oracle Parallel Server Option has been installed in your database. If the OPS option is installed, the VALUE column will return TRUES; otherwise, the VALUE column will return FALSE:
SQL> SELECT * FROM V$OPTION WHERE PARAMETER = 'Parallel Server';
Parallel Server TRUE
In Oracle Parallel Server, database applications run on more than one instance but access the same physical database. As we have mentioned before, OPS overhead increases if database applications running on multiple instances access the same set of database tables. Thus, in designing a database application for OPS, you need to give additional consideration to the application design to ensure that applications running on multiple database instances access different set of database tables. This process, referred to as application partitioning, has a huge impact on the performance of an OPS database.
Back to: Oracle Parallel Processing
© 2001, O'Reilly & Associates, Inc.