Search

Thursday, February 24, 2011

Union

The union statement probably isn't the most used statement in TSQL, but it is useful. What's more interesting is that it's often used in a way that might not return the results you would expect. If you're a TSQL master none of this will be new to you, but for the rest of us - it's worth a few minutes review!

We'll start with a somewhat contrived example, let's say we want to invite 10 employees with the lastname of Adams and 10 customers with the last name of Smith to our SomethingOrOther.com launch.All things being equal, most people expect this statement to return 20 rows if there are 10 rows that match each separate query - would you agree?

select top 10 firstname, lastname from person.contact where lastname='adams' union select top 10 firstname, lastname from dbo.customers where lastname='smith' 

When I run each statement separately I get 10 rows each, as follows:

If I run the entire query, I only get back 18 rows:

Ref:http://www.sqlservercentral.com

If you look carefully you'll see that our first query for last name 'Smith' return a duplicate row for Samantha and another for Denise. Union by design removes duplicates from the final result set, even if the duplicates were within a single statement as in our example. It can be very useful behavior, or problematic if you're not expecting it! In this case are the two rows we eliminated truly duplicate people, or more likely they have different ID's and addresses, meaning we managed to exclude two people by accident.

If we change to use UNION ALL we get the expected behavior of returning 20 rows, duplicates and all.

Depending on our needs either could be correct. But it's so important and so often misunderstood that I make it a standard practice to follow up any time I see UNION to see if they know the difference. If they indeed meant UNION I add a comment to make life easier for the next DBA, like thisL

select top 10 firstname, lastname from person.contact where lastname='adams' union --8/10/08 verified UNION is correct select top 10 firstname, lastname from dbo.customers where lastname='smith' 

There's also a difference from a performance perspective. UNION ALL requires little additional work besides running the combined queries, but just UNION requires an additional step to remove the duplicates - and that could be expensive, it depends on how many rows you're checking. The top query plan is UNION, the bottom is UNION ALL. We can see that the UNION example includes a SORT operator that changes the plan, but always check Profiler to see the actual difference in cost.

So, there's a little trivia you can test your developers with, and it might save you from an embarrassing mistake someday too! If you're new to UNION you might want to take a look at both EXCEPT and INTERSECT, both were added to SQL 2005.

No comments:

Post a Comment