Monday, January 9, 2012

Selecting Random Date Between a Date Range with tSQL

I had a very interesting experience recently trying to figure out how to select a random date between a date range using tSQL, it was a lot of trial and error, but ultimately this is what I came up with. It is only as random as the RAND() function will allow it to be unfortunately.

Read up on the function here: MSDN tSQL RAND()

WARNING When using the RAND() function with a specified seed for multiple results/rows in a select statement, unfortunately the RAND() function with a specified seed will yield the same result each time. To get a different result for each row you must unfortunately use a loop (while, cursor, etc...). I think this is highly inconvenient, but it is the truth. Examples: SELECT (RAND(100)*100 - RAND(100)*100) will yield zero, where as SELECT (RAND() - RAND()) will yield a random number.

Here are some of the basics first about how to work with this stuff:
/* Some of the basics first */
-- Get the int value for today's date
Select cast(getdate() as int) -- 40913
Select cast(cast('1 Jan 2010' as datetime) as int) -- 40177

Next this is a worked out version of selecting random dates based on a range.
/* Selecting a random date based on a range */

-- Get the High date int value
Select cast(cast('1/6/2012' as datetime) as int) -- 40912

-- Get the Low date int value
Select cast(cast('1/6/2010' as datetime) as int) -- 40182

-- Compute the Difference
Select 40912 - 40182 -- 730

-- Get a number that is not higher than the higher limit and not lower than the lower limit
-- High - Random * Difference
SELECT (40912 - RAND()*730) -- Random limited multiplier
SELECT cast(cast((40912 - RAND()*730) as int) as datetime) -- Selecting random date based on limited multiplier

Here is a full script to make this work:
declare @intDateDiff int,  -- The numerical difference between dates
        @intDtmHigh int,   -- The numerical representation of the High Date
        @dtmLow datetime,  -- The Low Date
        @dtmHigh datetime; -- The High Date

-- I always initialize my variables, I have run into avoidable hard to trace problems when I haven't in the past
set @intDtmHigh = 0;
set @intDateDiff = 0;
set @dtmHigh = GETDATE(); -- I just happened to use GETDATE(), you can use whatever you want
set @dtmLow = DATEADD(YEAR, -2, GETDATE()); -- I put a negative 2 to indicate a two year spread, change to whatever you want

-- Get the High date int value
Select @intDtmHigh = cast(@dtmHigh as int);

-- Subtract the Low date int value from the High Date int value
Select @intDateDiff = (@intDtmHigh - cast(@dtmLow as int));

-- Select the random date
Select cast(cast((@intDtmHigh - RAND()*@intDateDiff) as int) as datetime);

As usual, writing something semi-complicated in tSQL is a giant pain in the ass, but expected...

No comments:

Post a Comment