This post is reference to my presentation in the Grand Finale of DB2’s Got Talent 2014 at DB2Night Show season. This was my last presentation in this season and the show theme was DB2 Success Story. I have selected one of my favourite topic i.e. Federated Database Performance. I have been working on the Federated setup since last 1.5 years. I faced many performance issues on production for which resolutions were difficult that is the reason I spent most of my time in searching and learning different ways for performance improvement in such distributed systems. I had given the presentation on Federated Query performance before on The DB2Night Show #128. This one is the extension of my previous presentation. You can check out the Federation basics and recap of the Episode #128 presentation at this link.
I discussed how distributed data across multiple databases can be accessed using the Federated database systems. Also specified the components of the federation like wrappers, servers, nicknames etc. Here are the few steps to setup the federated database.
CREATE WRAPPER NET8
LIBRARY ‘libdb2net8.a’ ;
CREATE SERVER ora_server TYPE oracle
VERSION 11.2.0 WRAPPER net8
OPTIONS (NODE ‘ORCL’);
CREATE USER MAPPING FOR userName
OPTIONS (REMOTE_AUTHID ‘username’, REMOTE_PASSWORD ‘pwd) ;
CREATE NICKNAME ora_emp FOR ora_server.scott.emp;
I also mentioned in my previous post that using MQT will cache all the data locally and will save the network cost as well as the round trip time for the query. But the MQTs are useful only for those data objects which are mostly static. If I have a customer’s personal information that is not changed frequently, I can create a MQT on the DB2 side and access the data quickly. But I am having a balance view which is getting updated every minute or transaction table in such cases DB2 need to handle the additional burden of MQT refresh. And moreover in case of MQTs which are referring to the federated objects, MQT refresh operation needs to be performed explicitly because in that case AUTOMATIC option is disabled.
So in such cases remote call to the federated objects becomes mandatory.
The performance of the query which includes a data access to the remote databases is majorly dependent on one most vital mechanism that is PUSHDOWN mechanism. Almost all the parameters that IBM provides for improving the performance of the SHIP operation is dependent on this mechanism.
What is PUSHDOWN mechanism?
Using pushdown mechanism, DB2 optimizer pushed as many operations as possible to remote database server so that filtering of the data can be done at the remote server and only required data will get transferred to the federated server. With pushdown, one can filter the data remotely instead of bringing all the data to the local DB2 server and perform filtering. This will improve the performance by reducing the network overload and data transfer cost and time.
I came across SYSCAT.SERVEROPTIONS view and found some interesting parameters which proved to be helpful for our application. SYSCAT.SERVEROPTIONS view catalogs the federated server settings for each remote data source. Each row specifies the server-specific options and corresponding values. There are few very important parameters that I studied and mentioned during this presentation.
Setting this parameter to ‘Y’ will enable the filtering of the data at the remote data source. It will push all the filtering and sorting operations to remote data source thereby reducing the network load. If this parameter is set to ‘N’, then DB2 optimizer will select all the data from remote data objects and predicates will be applied locally which will degrade the performace of the query.
Here is the sample example that I tried in my local setup.
Q1: SELECT * FROM ORA_EMPLOYEE
WHERE CUSTOMERID = ‘053141245124’;
Cardinality of ORA_EMPLOYEE table: 42 Millions
Original Execution Time: 29 secs
Then I set the the PUSHDOWN parameter to ‘Y’ using alter statement.
ALTER SERVER ora_server
OPTIONS (ADD PUSHDOWN ‘Y’);
Final Execution Time: < 1 sec
This simple example proves the importance of this parameter in the federated environment.
This parameter determines whether to use the execution plan based on cost , or to favour pushing down the maximum number of operations. Setting this parameter to ‘Y’ will choose the execution plan with the lowest number of SHIP operators.
Decision will not consider the statistical information of nicknames, nor consider the CPU_RATIO and IO_RATIO settings.
That means this parameter will force optimizer to use the execution plan that contains the least number of SHIP operators. Most of the times these plans are the optimal execution plans but there can be exceptions in rare cases.
SET SERVER OPTION DB2_MAXIMAL_PUSHDOWN
TO ‘Y’ FOR SERVER ora_server;
Following 2 parameters are not documented by the IBM since they can be problematic in some cases. But it our applications these parameters are proved to be very helpful and successful. These parameters are actually the key to success of our federated system. So I did not miss to mention these parameters in my presentation.
During the initial stage of the project we were having problems with the specific queries which are having sub-queries in their SELECT list. Consider the following sample queries.
SELECT emp_name, (SELECT job_desc from job
where emp_no =123) from ora_emp;
In such cases if there is a remote object referred in the sub-queries, pushdown was avoided by the optimizer and the performance of the queries was lowered. We are having many such queries in our Actuate report scenarios where we were having the serious performance bottleneck. When we contacted IBM Labs they suggested this parameter needs to be set to ‘Y’ which will allow the pushdown of such queries.
SET SERVER OPTION db2_select_scalar_SQ TO ‘Y’
for server ora_server;
NOTE: One needs to be careful while using this parameter since this parameter can be proved dangerous and might send the unsupported SQL for the push-down causing FAILURE.
When we need to use a scalar function that is created in the remote data source then we cna not use it directly.
We need to create the function template in the local database and then map this template with the definition of the function in the remote data source. Here are the commands for creating the function template and mapping the user with the remote function.
Creating the template for remote functions with no compatible counterpart
CREATE FUNCTION localFunc_tmpl(INTEGER)
RETURNS INTEGER AS TEMPLATE
NO EXTERNAL ACTION;
Mapping the template with remote function
CREATE FUNCTION MAPPING func_mapping
FOR localFunc_tmpl SERVER ORACLE_SERVER OPTIONS(REMOTE_NAME ‘RemoteFuncName’);
This parameter will be used to tell optimizer that sub-query is allowed as a parameter to the function. After creating the function template and mapping it with the remote function we can use that function in our queries. But in any case if we need to pass the parameter in the form of sub-query then we need to set this parameter to ‘Y’ so that operation will not cause any FAILURE. Consider the following sample query.
SELECT localFunc_tmpl(select empno from ora_employee WHERE employee_name=‘PRASAD’);
To make such queries workable, we need to set DB2_SCALARFUNC_SCALAR_SQ to ‘Y’.
ALTER SERVER ORACLE_SERVER OPTIONS(ADD DB2_SCALARFUNC_SCALAR_SQ ‘Y’);
These 4 important parameters I presented in short span of 5 minutes. This was my last presentation at DB2’s Got Talent this season. I commence my amazing and memorable journey at DB2NightShow with one of my favourite topic. I think my last presentation was one of the best of all other presentations that I had given throughout this season because I enjoyed studying these parameters trying out those on my local setup and presenting them in my style 😉
I was please with these comments that Ember had given on her blog db2commerce.com about my presentation. After that I was very sure it was very good presentation.
I share my presentation on the slide share. Go and check out my Success Story.
Watch out some of the best DB2 Success Stories of my friends Saurabh, Ken and Michael at this link