SQL Query Optimization and Performance Tuning
We are writing lots of different SQL statement to retrieve the same result from the database. But among the selection or writing the best or optimize SQL to retrieve the data to increase the performance. For that we need to tune or optimized the query to increase the performance while fetching the result from database.
When optimizing your database server, you need to tune the performance of individual queries. This is as important as – perhaps even more important than – tuning other aspects of your server installation that affect performance, such as hardware and software configurations.
Even if your database server runs on the most powerful hardware available, its performance can be negatively affected by a handful of misbehaving queries. In fact, even one bad query will cause serious performance issues for your database.
Overall performance of your database can be greatly improved by proper designing of table tuning and a set of most expensive or most often executed queries.In this blog, we will look at some of the techniques which we can implement to identify and tune the most expensive and worst performing queries.
- Designing of table
- Optimizing queries
- Analysing execution plans
Designing of tables
- Each table must have a primary key, which helps retrieving, updating and deleting the rows.
- The primary column must be a first column of the table.
- Normalize your tables to the third normal form.
A table is in third normal form (3NF) if it is in second normal form (2NF) and if it does not contain transitive dependencies. In that, you need normalize your tables to the third normal form. The normalization is a process to reduce the redundant data from database. For less data less work for SQL Server to perform, speeding its performance.
- Consider the de-normalization of your tables from the fourth or fifth normal forms to the third normal form.
Increasing normalization to the fourth and fifth normal forms can result in some degradation in performance, means when you required to perform more joins against several tables. It may be necessary to de-normalize your tables to prevent performance degradation.
- Consider horizontal partitioning of the very large tables into the current and the archives versions.
For smaller data table it used less space, for which SQL Server has less work to do to perform or evaluate your queries.
- Try to reduce the number of columns in a table.
The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table’s data.
- Try to use constraints instead of triggers, rules, and defaults whenever possible.
Constraints are much more efficient than triggers and which increase the performance. Compare to triggers, constraints are more consistent and reliable, because you can make errors when you write your own code to perform the same actions as the constraints.
- If you have to store large binary objects then use file system instead of blob data type.
If you are storing large binary objects in to the table, server has to take lots of resource while reading the rows for scanning the data. It is better to store in file system. In case it is needed to store within database, the use an additional table for it, on which no scanning of data would be performed and it doesn’t impact on server preformation.
- In SQL server there are various data type which allow us to store the various types of data. Using proper data type as per value would really increase the performance of the database server.
For integer data from 0 through 255, use tinyint data type, take one byte.
For integer data from -32,768 through 32,767, use smallint data type, take two bytes.
For integer data from -2,147,483,648 through 2,147,483,647, useint data type, take four bytes.
Use smallmoney data type instead of money data type, if you need to store monetary data values from 214,748.3648 through 214,748.3647, take four bytes.
Use smalldatetime data type instead of datetime data type, if you need to store the date and time data from January 1, 1900 through June 6, 2079, with accuracy to the minute, take four bytes.
Use varchar/nvarchar columns instead of text or ntext columns whenever possible.
As SQL Server stores text or ntext columns on the Text or Image pages separately from the other data, stored on the Data pages, it can take more time to get the text or ntext values.
Use char/varchar columns instead of nchar/nvarchar if you do not need to store unicode data.
- Always use WHEREclause in SELECTqueries, when we don’t need all rows to be returned. This will help to narrow the return rows else it will perform a whole table scan and waste the Sql server resources with increasing the network bandwidth. Also on scanning the whole table, it will lock thetable which may not allow other users to access the table.
- It is seen many times developers use codes like
SELECT * FROM OfferTable WHERE LOWER(UserName)='jshah' Instead of writing it like the below SELECT * FROM OfferTable WHERE UserName='jshah'
- While running a query, the operators used with the WHERE clause directly affect the performance. The operators shown below are in their decreasing order of their performance.
= >,>=,<, <= LIKE <>
- When we are writing queries using NOT IN, which result poor performance as the optimizer need to use nested table scan to perform activity. Which can be replaced by using NOT EXISTS.
If there is a choice to use IN or EXIST, we can proceed with EXIST clause for better performance.
- Best practice to use the Index seek as columns covered by an index, which force the Query Optimizer to use the index while using IN or OR clauses as a part of our WHERE clause.
SELECT * FROM OfferTable WHERE Status = 1 AND OrderID IN (2,20,50) Takes more time than SELECT * FROM OfferTable (INDEX=IX_OrderID) WHERE Status = 1 AND OfferID IN (2,20,50)
- While we use IN, in the sql query it better to use one or more leading characters in the clause instead of using the wildcard character at the starting.
SELECT * FROM SupplierTable WHERE SupplierName LIKE 'm%' SELECT * FROM SupplierTable WHERE SupplierName LIKE '%m'
In the first query the Query optimizer is having the ability to use an index to perform the query and thereby reducing the load on sql server. But in the second query, no suitable index can be created while running the query.
- While there is case to use IN or BETWEEN clause in the query, it is always advisable to use BETWEEN for better result.
SELECT * FROM SupplierTable WHERE SupplierID BETWEEN (300 AND 303) Performs better than SELECT * FROM SupplierTable WHERE SupplierID IN (300,301,302,303)
- Avoid using SUBSTRING function in query.
SELECT * FROM SupplierTable WHERE SupplierName LIKE 'n%' Is much better than writing SELECT * FROM SupplierTable WHERE SUBSTRING(SupplierName, 1, 1)='n'.
- The queries having WHERE clause connected by AND operators are evaluated from left to right in the order they are written. So certain things should be taken care of eg: Using AND start with false or least likely true expressions. So if the AND expression is false at the beginning stage the statement/clause will immediately ended, which result in saving execution time.
- Its sometimes better to combine queries using UNION ALL instead of using many OR clauses.
SELECT SupplierID, FirstName, LastName FROM SupplierTable WHERE City = 'Vadodara' or ZIP = '380010 or State= 'Gujarat'
The above query to use and index, it is required to have indexes on all the 3 columns.
The same query can be written as
SELECT SupplierID, FirstName, LastName FROM SupplierTable WHERE City = 'Vadodara' UNION ALL SELECT SupplierID, FirstName, LastName FROM SupplierTable WHERE ZIP = 380010' UNION ALL SELECT SupplierID, FirstName, LastName FROM SupplierTable WHERE State= 'Gujarat'
Both the queries will provide same results but if there is only an index on City and no indexes on the zip or state, then the first query will not use the index and a table scan is performed. But the 2nd one will use the index as the part of the query.
- Do not use the COUNT() aggregate in a subquery to do an existence check
Do not use:
SELECTcolumn_listFROMtableWHERE0< (SELECTcount(*) FROM table2 WHERE ..)
SELECTcolumn_listFROMtableWHERE EXISTS (SELECT * FROM table2 WHERE ...)
When you use COUNT(), SQL Server does not know that you are doing an existence check. SQL Server will counts all matching values, instead of doing a table scan or by scanning the smallest non-clustered index.
When using EXISTS, SQL Server knows you are doing an existence check. On finding the first matching value, it returns TRUE and stops further looking. The same thing also applies to using COUNT() instead of IN or ANY.
- Implement the following good practices in User Defined Functions
Do not call functions repeatedly within your Stored Procedures, triggers, functions, and batches. For example, you might need the length of a string variable in many places of your procedure, but don’t call the LEN function whenever it’s needed; instead, call the LEN function once, and store the result in a variable for later use
- Try to avoid dynamic SQL
Unless really required, try to avoid the use of dynamic SQL because:
It is hard to debugging and finding the solutions or troubleshoots.
When user inserts the input to the dynamic SQL, there is possibility of SQL injection attacks.
- Minimized the use of temporary tables
Unless if required, try to minimized the use of temporary tables. Instead try to use table variables. In most cases, table variables reside inside the memory, which is a lot faster.
- Instead of LIKE search, use full text search for searching textual data
Full text searches always outperform LIKE searches. It will enable you to implement complex search criteria that can’t be implemented using a LIKE search, such as searching on a single word or phrase or may be searching on a word or phrase close or nearest to another word or phrase, or may be searching on synonymous forms of a specific word.
- While the select statement contains a HAVING clause, its better to make the WHERE clause to do most of the works (removing the undesired rows) for the Query instead of letting the HAVING clause to do the works.
- We can use a hint like
SELECT * FROM SupplierTable WHERE City = Vadodara' OPTION(FAST n)
Where n = number of rows that we want to display as fast as possible. This hint helps to return the specified number of rows as fast as possible without bothering about the time taken by the overall query.
Analysing execution plans
To Tuning of query, we usually start looking or analysing the execution plan of that query. The execution plan shows the sequences of operations performed, physical and logical the SQL server will perform to fulfil the query and produce the desired result set.The execution plan is produced by a database engine component called Query Optimizer during the optimization phase of processing query, this takes into account many factors, like the search predicates to used in the query, the number of tables which involves and join conditions between the tables, the list of columns returned and the indexes used in that.
As with most performance optimization techniques, there are tradeoffs. For example, with more indexes, SELECT queries will potentially run faster. However, DML (INSERT, UPDATE, and DELETE) operations will slow down significantly because more indexes must be maintained with each executed operations. So if queries are mostly SELECT statements, more indexes can be helpful. If your application performs many DML operations, you should be conservative with the number of indexes you create.
Use proper data type as per possible value to be store.
Infact both the queries does the same work but the 2nd one is better and retrieves rows more speedily than the first query. Because Sql Server Database collation is default case insensitive. But if the Database collation is set to case sensitive, then you need to use the first one.
Information do we get by viewing the execution plans
Whenever any of your query performs slowly, you can view the estimated (and, actual if required) execution plan and can identify the item that is taking the most amount of time (in terms of percentage) in the query. When you start reviewing or thinking about any TSQL for optimization, the very first thing we would like to do is view the execution plan. You will most likely quickly identify the area in the SQL that is creating the bottleneck in the overall SQL.
Keep watching for the following costly operators in the execution plan of your query. If you find one of these, you are likely to have problems in your TSQL and you need to re-factor the TSQL to improve performance.
- Table Scan : Occurs when the corresponding table does not have a clustered index. Most likely,
creating a clustered index or defragmenting index will enable you to get rid of it.
- Clustered Index Scan: Sometimes considered equivalent to Table Scan. It also happened when a non-clustered index on an eligible column is not indexing. Many times, creating/indexing a non-clustered index will enable you to get rid of it.
- Hash Join: This is the most expensive joining methodology, which takes place when the joining columns between two tables are not indexed. So creating/indexing indexes on the columns will enable you to get rid of it.
- Nested Loops: Most cases, this happens when a non-clustered index does not include (Cover) a column that is used in the SELECT column list. In this case, for each member in the non-clustered index column, the database server has to seek into the clustered index to retrieve the other column value which specified in SELECT list. On creating a covered index which will enable to get rid of it.
- RID Lookup: Takes place when you have a non-clustered index but the same table does not have any clustered index. In this case, the database engine has to look up the actual row which is using the row ID and is more expensive operation. On creating a clustered index on the corresponding table would enable you to get rid of it.