The last week I posted a blog about auto update stats and this is a…
This is a sequal to previous blog posts about auto update stats in SQL Server, but the focus of this post is how to handle when you know for sure that the statistics will be out of date when you depend on it. As I mentioned in previous blog post, you could use the OPTIMIZE FOR UNKNOWN, but there is some other issues with this query hint. The hint is applied on all tables in your select, even if there are only one table with bad statistics. That means that you might get a good plan, but probably not the most optimal plan. If you know for sure that you always will get more records from one of the tables than what the optimizer will estimate depending on the statistics, you can try to hint the optimizer to use a hash join instead of a nested loop. Well, is hash join faster than nested loop? It depends, nested loop is much faster than hash match when handling few records, and vice versa. I am not a fan of hints in SQL server, but sometimes a hint can perform miracles (at least for a while).
You can try the impact of having missguiding statistics by using the following code:
SET NOCOUNT ON; SELECT * FROM Production.Product P INNER JOIN Production.TransactionHistoryArchive tha ON P.ProductID=tha.ProductID WHERE P.DaysToManufacture>3 UPDATE Production.Product SET DaysToManufacture=10 WHERE ProductID=875 SELECT * FROM Production.Product P INNER JOIN Production.TransactionHistoryArchive tha ON P.ProductID=tha.ProductID WHERE P.DaysToManufacture>=10 ALTER DATABASE AdventureWorks2008R2 SET AUTO_UPDATE_STATISTICS OFF; BEGIN TRAN UPDATE Production.Product SET DaysToManufacture=10 SET STATISTICS IO ON; SET STATISTICS TIME ON; SELECT * FROM Production.Product P INNER JOIN Production.TransactionHistoryArchive tha ON P.ProductID=tha.ProductID WHERE P.DaysToManufacture>=10 SELECT * FROM Production.Product P INNER HASH JOIN Production.TransactionHistoryArchive tha ON P.ProductID=tha.ProductID WHERE P.DaysToManufacture>=10 UPDATE STATISTICS Production.Product; SELECT * FROM Production.Product P INNER JOIN Production.TransactionHistoryArchive tha ON P.ProductID=tha.ProductID WHERE P.DaysToManufacture>=10 SET STATISTICS IO OFF; SET STATISTICS TIME OFF; ROLLBACK TRAN ALTER DATABASE AdventureWorks2008R2 SET AUTO_UPDATE_STATISTICS ON;
As you can see, when the optimizer thinks that there are few records in the product table that matches your condition, it creates a plan with nested loop and you should take a look at the number of logical reads in that case. When you force the HASH join the number of logical reads decreases and when you update the statistics the optimizer choose the hash join by itself.
I am not saying that you should turn of auto update statistics, I am saying that you can get performance issues in some cases if you rely on the automatic updates.