This post is based on ‘Performance Tuning in Federated systems‘ presentation in DB2’s Got Talent 2014 at db2nightshow. This was my first presentation on federated systems. I worked closely on the federated systems so I wanted to share my experience with all the db2 community. Since there were too many things to tell I ran very fast while talking but overall I took it very good at the end. Here is an overview of Performance Tuning in Federated Systems.
What is Federation ?
1. Federation allows the users and applications to access the data from more than one RDBMS in the single requests.
2. Its a special type of distributed database management system that allows the distributed request across the multiple databases at the same time.
3. Federation consists of a DB2 instance, a database or federated database and one or more remote databases of same or different type. I work on the federated system with ORACLE as a remote data source in the federation.
How Federation Works:
Application can communicate with Federated server with any supported interface like JDBC or ODBC etc.
The federated server communicates with the data sources by means of software modules called Wrappers. Once the wrapper is installed federated system need to told where the wrapper is located this is done by using the create wrapper statement.
db2 “create wrapper net8”
For the multiple sources of same type only single wrapper is needed. For example multiple oracle data source can use the single wrapper for the communication.
Every remote data source is identified by the federated system by using the create server statement. For multiple oracle data sources we will need multiple create server statement.
db2 “create server ora_server type oracle version 11.1.0 wrapper net8 options (node ‘ORCL’)”
Once the wrapper and server is created the data at the remote data source can be accessed by creating the nicknames for the equivalent table in the remote data source including the remote data source is RDBMS.
Understanding the Federated Query Execution Steps:
This slide I skipped on time because of the time constraints at the show but it is important to understand how exactly the federated query works.
Before understanding the query performance we need to look at the compilation steps included in the Query compilation. The input query is parsed and semantics are checked for the query. Then based on the remote data source information available in the wrappers and using the push down analysis, optimizer re-writes the query and then chose the plan with lowest cost. Then remote SQL generation for the data source happens this is important because filtering or grouping of the data at the remote data source is always critical for the performance rather than to bring whole data to the local server and then performing the filtering operation which causes bottlenecks due to the network constraints.
Once the remote SQL is generated, local data is gathered using the local SQL generation.
Bottlenecks in Federated Query Performance:
Now after understanding the execution steps of the query, it will be easy to pin point the bottlenecks that affect the performance of any federated query. Here are the few commonly observed issues:
This is one of the biggest bottleneck in the performance of a federated query since the data is fetched from the remote data source and is brought to the local federated system.
We can check the round trip time of the database by using the ping command from the federated database.
Federated Server Performance:
Performance of the federated server in terms of the CPU utilization, I/O Utilization, paging space utilizations needs to be checked if it is high can hamper the performance.
Size of the data in remote data source:
Size of the data in the remote data source if it is huge then the time required for the filtering, grouping and bringing it on local will be expensive.
Operators that are not pushed down can also significantly affect query performance:
If operations such as GROUP BY/ ORDER BY can not be performed on the remote data source by any case then we will need to bring whole data on local server and then perform those operation which will be very expensive. This might occur due to the limitations like some functionality is not provided by the remote data source in that case you need to bring whole data set to the federated system over the network.
How to Improve the Performance:
1. Use of MQTs:
Using MQTs for caching the data locally will help to improve the performance of the query to a great extent. It will also ensure the data availability in case if the remote server is down for some reason.
2. Filtering and grouping results at the data source:
As mentioned earlier, Filtering and grouping the data at the remote data source and bringing the required data instead of full data will save the network overhead and will help to improve the performance of the query.
3. Index Information/Statistics:
Updated statistics of the remote objects as well as the local will help the optimizer to choose most effective plan for the execution.
4. Tuning of the sort heap and the bufferpools associated with the temporary tablespace:
Higher the value more is the need to investigate more into the Sorting related setting the federated database. When sort overflows occur, data is written to the temporary tablespaces that are associated with a buffer pool. When the buffer pool fills, data is overflowed to disk. This can result in poor query performance. For this We need to Tune the SORTHEAP and/or SHEAPTHRES configuration parameters to minimize the percent of sort overflows.
Benefits of Using Materialized Query Tables:
A materialized query table (MQT) is a table that materializes the pre-computed result of a query involving one or more tables or nicknames.
Caching data using MQTs and allowing the optimizer to make the cost based decision to use the MQT can be useful to improve the performance of the Query in the federated system. Creating the MQT which involves nicknames makes the remote data locally available which consequently reduces the network cost of the bringing the data from the remote host over the network. If the remote data source from where the data is needed is unavailable in that case also we can execute the query since the data is locally populated using the MQT. Using MQT also reduces the consumption of the CPU and also avoids the overloading of the data on the local federated server.
Tips for Using MQTs:
1. Use of the design advisor (db2advis) to create the MQT that will help the query workload to improve the performance.
db2advis –d <DBNAME> -i <InputFile> –m M –o <OutputFile>
2. When we create MQT, it goes in the check-pending state. Use set integrity statement to bring the MQT out of check-pending state.
set integrity for <MQTNAME> all immediate unchecked;
3. REFRESH IMMEDIATE option will not be available for the MQTs which are referring to the nicknames.
4. Do not create too many MQTs. because it will take more time for the optimizer to choose the MQT from many and will increase the execution time. Sometimes optimizer may not choose the best MQT for the execution.
5. If the MQT is created at the lower isolation level than that of the query’s isolation level, then the isolation level at which MQT is created will be ignored.
Enable MQT to be considered for the optimization:
1. Use CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION Special Register
2. DFT_MTTB_TYPES database configuration parameter.
3. Set the REFRESH AGE special register or setting the DFT_REFRESH_AGE database configuration parameter
There are few more posts on Federation performance that I will post in coming time. I am currently working closely on a project which includes the Federation setup and there are very few documentation I observed on internet. That’s why I feel of publishing as many post as I can to share the knowledge.
Check out the presentation at the following link:
I will recommend that please watch full episode since there are other very good presentations given my other DB2 friends as well. Check out the full episode at the following link of DB2NightShow: