By looking at Extended Events i found some interesting Events in a category called Query…
Joins, CAST and UDFs are a bad mix
Transactional data is usually stored with some sort of time stamp. This time stamp is more often than not of type DATETIME, the data will look like this: ‘2016-02-05 15:22:15.167’
When you want to sum or group this data per day you run into problem because ‘2016-02-05 15:22:15.167’ and ‘2016-02-05 15:23:44.543’ is the same day but it’s not the same time.
A quick fix is to CAST the data to DATE, CAST(@MyTimeStamp as DATE) or you could define a function that does it. The problem with this approach is that the query optimizer can not use the indexes but has to resort to a table scan which is VERY slow.
Solution
One solution is to define another column of type DATE where the date is stored without the time part. In the example in the code below the same query time goes from 2200 ms to 33 ms on my machine. That is 67 times faster.
A solution with a cache
If the data arrives in batches, usually early in the morning, and the same query is run over and over during the day in various reports, the result of the query can be cached.
In the example below the query time goes from 33ms to 0 ms. Yep, ZERO ms.
USE test; GO SET STATISTICS TIME ON; --SET STATISTICS TIME OFF ----------------------------------- --Create the test tables IF OBJECT_ID('dbo.Dates1','U') IS NOT NULL BEGIN DROP TABLE dbo.Dates1; END; GO IF OBJECT_ID('dbo.Dates2','U') IS NOT NULL BEGIN DROP TABLE dbo.Dates2; END; GO CREATE TABLE Dates1(CalendarDateTime DATETIME,CalendarDate DATE); GO CREATE TABLE Dates2(CalendarDateTime DATETIME,CalendarDate DATE); GO ----------------------------------- --Insert random data --Dates going back ~10 years DECLARE @i INT = 0; WHILE @i < 50000 BEGIN --Dates going 10 years back INSERT INTO Dates1(CalendarDateTime) SELECT DATEADD(SECOND,-10 * 365 * 24 * 60 * 60 * RAND(),GETDATE()); INSERT INTO Dates2(CalendarDateTime) SELECT DATEADD(SECOND,-10 * 365 * 24 * 60 * 60 * RAND(),GETDATE()); SET @i = @i + 1; END; GO --Update the column of type DATE UPDATE Dates1 SET CalendarDate = CalendarDateTime; GO UPDATE Dates2 SET CalendarDate = CalendarDateTime; GO --Create indexes CREATE NONCLUSTERED INDEX IX_Caldt ON dbo.Dates1(CalendarDateTime); GO CREATE NONCLUSTERED INDEX IX_Caldt ON dbo.Dates2(CalendarDateTime); GO CREATE NONCLUSTERED INDEX IX_Cald ON dbo.Dates1(CalendarDate); GO CREATE NONCLUSTERED INDEX IX_Cald ON dbo.Dates2(CalendarDate); GO ----------------------------------- -- SQL Server 2016 only --DROP FUNCTION IF EXISTS dbo.DateOnlySlow --go --Create a very slow User Defined Function, UDF --You should NOT try this at home 🙂 IF OBJECT_ID('dbo.DateOnlySlow','FN') IS NOT NULL BEGIN DROP FUNCTION dbo.DateOnlySlow; END; GO CREATE FUNCTION dbo.DateOnlySlow(@CalendarDateTime DATETIME) RETURNS DATE AS BEGIN RETURN @CalendarDateTime; END; GO ----------------------------------- --Slowest: Using User defined function, DateOnlySlow SELECT TOP 10 dbo.DateOnlySlow(s1.CalendarDateTime),COUNT(*) FROM Dates1 AS s1 JOIN Dates2 AS s2 ON dbo.DateOnlySlow (s2.CalendarDateTime) = dbo.DateOnlySlow(s1.CalendarDateTime) GROUP BY dbo.DateOnlySlow(s1.CalendarDateTime) ORDER BY COUNT(*) DESC,dbo.DateOnlySlow(s1.CalendarDateTime); GO --Faster: Casting in the query SELECT TOP 10 CAST(s1.CalendarDateTime AS DATE),COUNT(*) FROM Dates1 AS s1 JOIN Dates2 AS s2 ON CAST(s2.CalendarDateTime AS DATE) = CAST(s1.CalendarDateTime AS DATE) GROUP BY CAST(s1.CalendarDateTime AS DATE) ORDER BY COUNT(*) DESC,CAST(s1.CalendarDateTime AS DATE); GO --Even faster: Casting one side in the join and using the column 'CalendarDate' on the other SELECT TOP 10 CAST(s1.CalendarDateTime AS DATE),COUNT(*) FROM Dates1 AS s1 JOIN Dates2 AS s2 ON s2.CalendarDate = CAST(s1.CalendarDateTime AS DATE) GROUP BY CAST(s1.CalendarDateTime AS DATE) ORDER BY COUNT(*) DESC,CAST(s1.CalendarDateTime AS DATE); GO --Fastest: Joining only on 'CalendarDate SELECT TOP 10 s1.CalendarDate,COUNT(*) FROM Dates1 AS s1 JOIN Dates2 AS s2 ON s2.CalendarDate = s1.CalendarDate GROUP BY s1.CalendarDate ORDER BY COUNT(*) DESC,s1.CalendarDate; GO ----------------------------------- --Awesome speed: Pre-processing the query and cache the result in ##IncludedSalesDates -- SQL Server 2016 only --DROP TABLE IF EXISTS ##IncludedSalesDates IF OBJECT_ID('tempdb..##IncludedSalesDates','U') IS NOT NULL BEGIN DROP TABLE ##IncludedSalesDates; END; SELECT s1.CalendarDate,COUNT(*) AS num INTO ##IncludedSalesDates FROM Dates1 AS s1 JOIN Dates2 AS s2 ON s2.CalendarDate = s1.CalendarDate GROUP BY s1.CalendarDate; CREATE CLUSTERED INDEX IX_Cald ON dbo.##IncludedSalesDates(num DESC,CalendarDate); GO SELECT TOP 10 * FROM ##IncludedSalesDates ORDER BY num DESC,CalendarDate; GO