SQL SET Operators: A Comprehensive Guide
Day 51 Of 60 : Intoduction to SET Operators in SQL
SET Operators in SQL
SQL set operators are used to combine the results obtained from two or more queries into a single result.
The queries which contain two or more subqueries are known as compounded queries.
Set operations in SQL are used to combine or manipulate the result sets of multiple SELECT queries.
They allow you to perform operations similar to those in set theory, such as union, intersection, and difference, on the data retrieved from different tables or queries.
Set operations provide powerful tools for managing and manipulating data, enabling you to analyse and combine information in various ways.
There are four major types of SQL operators, namely:
Union
Union all
Intersect
Minus
Here is an abstract table for whatever we will be learning in this article.
SQL Set Operator | Function |
Union | Combines distinct results of two or more SELECT statements. |
Union All | Combines all results of two or more SELECT statements, including duplicates. |
Intersect | Returns only the common records obtained from two or more SELECT statements. |
Minus | Returns only those records which are exclusive to the first table. |
Let us look into each set operator in more detail with examples. This article uses the following two tables to execute the queries. All the operations are performed on the comp1_employees and comp2_employees table that is given below.
comp1_employees:
employee_id | employee_name | employee_city |
1 | Bhim Shekh | Surat |
2 | Mehul Mohan | Goa |
3 | Palash Yadav | Ahmedabad |
4 | Ela Shikha | Delhi |
5 | Mrinal Thakur | Bangalore |
6 | Sitara Vani | Bangalore |
7 | Mohak Jain | Gurugram |
8 | Adesh Patel | Jaipur |
9 | Kunal Tandon | Delhi |
10 | Romit Soni | Mumbai |
comp2_employees:
employee_id | employee_name | employee_city |
1 | Sahdev Ramiah | Raipur |
2 | Mehul Mohan | Goa |
3 | Mohak Jain | Gurugram |
4 | Pragun Sarika | Chennai |
5 | Pooja Srivastava | Bangalore |
6 | Vani Shekhawat | Delhi |
7 | Mohak Jain | Ahmedabad |
8 | Neera Shah | Ahmedabad |
9 | Poonam Oberoi | Delhi |
10 | Abhishek Saini | Raipur |
1. UNION:
The UNION operator combines the result sets of two or more SELECT queries into a single result set.
It removes duplicates by default, meaning that if there are identical rows in the result sets, only one instance of each row will appear in the final result.
UNION will be used to combine the result of two select statements.
Duplicate rows will be eliminated from the results obtained after performing the UNION operation.
Rules for SQL UNION
Each table used within UNION must have the same number of columns.
The columns must have the same data types.
The columns in each table must be in the same order.
Syntax
The Syntax of the SQL UNION operator is:
SELECT columnnames FROM table1
UNION
SELECT columnnames FROM table2;
Now, let's take an example to clearly understand how the UNION operator works.
SELECT * FROM comp1_employees
UNION
SELECT * FROM comp2_employees;
The result of the above query is shown below:
employee_id | employee_name | employee_city |
1 | Bhim Shekh | Surat |
1 | Sahdev Ramiah | Raipur |
2 | Mehul Mohan | Goa |
3 | Mohak Jain | Gurugram |
3 | Palash Yadav | Ahmedabad |
4 | Ela Shikha | Delhi |
4 | Pragun Sarika | Chennai |
5 | Mrinal Thakur | Bangalore |
5 | Pooja Srivastava | Bangalore |
6 | Sitara Vani | Bangalore |
6 | Vani Shekhawat | Delhi |
7 | Mohak Jain | Gurugram |
8 | Adesh Patel | Jaipur |
8 | Neera Shah | Ahmedabad |
9 | Kunal Tandon | Delhi |
9 | Poonam Oberoi | Delhi |
10 | Abhishek Saini | Raipur |
10 | Romit Soni | Mumbai |
As shown in the above code snippet, there are two SELECT queries, and a UNION operator is used.
The first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the records from comp2_employees, and the UNION operation is performed with the results of both the query.
After performing the UNION operation with both the tables, all the records from the comp1_employees table and comp2_employees table are displayed except for the duplicate data, i.e., employee_id -- 2 and 77 are duplicates. Hence, they are displayed only one time.
UNION ALL :
This operator combines all the records from both the queries.
Duplicate rows will be not be eliminated from the results obtained after performing the UNION ALL operation.
To successfully execute the operation of Union All, the number of columns and the data type must be the same in both tables.
The syntax of the UNION ALL operator is shown below:
SELECT expression_1, expression_2, ... , expression_n
FROM table_1
UNION ALL
SELECT expression_1, expression_2, ... , expression_n
FROM table_2
Now, let's take an example to clearly understand how the UNION ALL operator works.
SELECT * FROM comp1_employees
UNION ALL
SELECT * FROM comp2_employees;
The result of the above query is shown below:
employee_id | employee_name | employee_city |
1 | Bhim Shekh | Surat |
2 | Mehul Mohan | Goa |
3 | Palash Yadav | Ahmedabad |
4 | Ela Shikha | Delhi |
5 | Mrinal Thakur | Bangalore |
6 | Sitara Vani | Bangalore |
7 | Mohak Jain | Gurugram |
8 | Adesh Patel | Jaipur |
9 | Kunal Tandon | Delhi |
10 | Romit Soni | Mumbai |
1 | Sahdev Ramiah | Raipur |
2 | Mehul Mohan | Goa |
3 | Mohak Jain | Gurugram |
4 | Pragun Sarika | Chennai |
5 | Pooja Srivastava | Bangalore |
6 | Vani Shekhawat | Delhi |
7 | Mohak Jain | Gurugram |
8 | Neera Shah | Ahmedabad |
9 | Poonam Oberoi | Delhi |
10 | Abhishek Saini | Raipur |
As shown in the above code snippet, there are two SELECT queries, and a UNION All operator is used. The first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the records from comp2_employees, and the UNION ALL operation is performed with the results of both the query.
After performing the UNION ALL operation with both tables, all the records from the comp1_employees table and comp2_employees table are displayed. Since it's a UNION ALL operation, all the records are displayed, including the duplicate words, which is not the case in the UNION operation.
3. INTERSECT:
It is used to combine two SELECT statements, but it only returns the records which are common from both SELECT statements.
To successfully execute the operation of INTERSECT, the number of columns and the data type must be the same in both tables.
After performing the INTERSECT operation, the data/records which are common in both the SELECT statements are returned.
The syntax of the INTERSECT operator is shown below:
SELECT expression_1, expression_2, ... , expression_n
FROM table_1
INTERSECT
SELECT expression_1, expression_2, ... , expression_n
FROM table_2
Now, let's take an example to clearly understand how the INTERSECT operator works.
SELECT * FROM comp1_employees
INTERSECT
SELECT * FROM comp2_employees;
The result of the above query is shown below:
employee_id | employee_name | employee_city |
2 | Mehul Mohan | Goa |
7 | Mohak Jain | Gurugram |
As shown in the above code snippet, there are two SELECT queries, and an INTERSECT operator is used. The first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the records from comp2_employees, and the INTERSECT operation is performed with the results of both the query.
After performing the INTERSECT operation with both tables, all the data/records that are common from the comp1_employees table and comp2_employees table are displayed. As you can see in the result above, the INTERSECT gets the results that exist in both queries. And the employee_id -- 2 and 77 exist in both tables; hence they are displayed in the final result.
4.MINUS/EXCEPT
It displays the rows which are present in the first query but absent in the second query with no duplicates.
The MINUS operator allows you to filter out the results which are present in the first query but absent in the second query.
To successfully execute the MINUS operation, the number of columns and the data type must be the same in both tables.
After performing the MINUS operation, the data/records which are not present in the second SELECT statement or query are displayed.
Note: The MINUS operator is supported only in Oracle databases. For other databases like SQLite, PostgreSQL, SQL server, you can use EXCEPT operator to perform similar operations.
The syntax of the EXCEPT operator is shown below:
SELECT expression_1, expression_2, ... , expression_n
FROM table_1
EXCEPT
SELECT expression_1, expression_2, ... , expression_n
FROM table_2
Let's take an example to clearly understand how the EXCEPT operator works in MySQL, which is similar to the MINUS operator.
SELECT * FROM comp1_employees
EXCEPT
SELECT * FROM comp2_employees;
The result of the above query is shown below:
employee_id | employee_name | employee_city |
1 | Bhim Shekh | Surat |
3 | Palash Yadav | Ahmedabad |
4 | Ela Shikha | Delhi |
5 | Mrinal Thakur | Bangalore |
6 | Sitara Vani | Bangalore |
8 | Adesh Patel | Jaipur |
9 | Kunal Tandon | Delhi |
10 | Romit Soni | Mumbai |
As shown in the above code snippet, there are two SELECT queries, and an EXCEPT operator is used. The first SELECT query will fetch the records from comp1_employees, and the second SELECT query will fetch the records from comp2_employees, and the EXCEPT operation is performed with the results of both the query.
After performing the EXCEPT operation with both tables, all the data/records that are present in the comp1_employees table but not in the comp2_employees table are displayed. As you can see in the result above, the EXCEPT gets the results that are present only in the comp1_employees without any duplicates.
Difference Between UNION and UNION ALL
UNION and UNION ALL, both commands are used to combine the result of two or more SELECT Statements, but there are many difference in UNION and UNION ALL.
UNION vs UNION ALL | ||
Description | UNION | UNION ALL |
Duplication Handling | The UNION removes duplicate rows from the result set, presenting only unique records. | The UNION ALL remains all rows, including duplicates, without any elimination. |
Performance Impact | The UNION involves the additional step of identifying and eliminating duplicates, potentially impacting performance. | The UNION ALL generally performs faster, as it does not have the overhead of duplicate removal. |
Result Set Structure | The UNION creates a distinct result set with unique records. | The UNION ALL produces a result set that includes all rows from the combined queries. |
Syntax | The syntax for both UNION and UNION ALL is similar, with the key difference lying in the choice of the operator. | The syntax for both UNION and UNION ALL is similar, with the key difference lying in the choice of the operator. |
Use Case | The UNION is suitable when the emphasis is on unique records and the removal of duplicates is essential. | The UNION ALL is preferred when retaining all rows, including duplicates, is acceptable or even desired. |
Data Volume | For large datasets, UNION may not be efficient due to its complex processing with duplicate checks. | For large datasets, UNION ALL may be more efficient due to its simplified processing without duplicate checks. |
Query Optimization | The UNION may involve additional processing to identify and eliminate duplicates, potentially leading to a longer execution time. | The UNION ALL, being less restrictive, may result in quicker query execution. |
Resource Utilization | The UNION consumes additional resources to identify and remove duplicate rows, impacting memory usage. | The UNION ALL is generally more resource-efficient as it bypasses the duplicate elimination step. |
Compatibility | Both UNION and UNION ALL are widely supported across various relational database management systems (RDBMS). | Both UNION and UNION ALL are widely supported across various relational database management systems (RDBMS). |
Consideration for Result Accuracy | If eliminating duplicates is crucial for the accuracy of the result set, then UNION is the appropriate choice. | When duplicate records are acceptable or necessary, and performance is a priority, UNION ALL is the preferred option. |
Difference between JOIN and UNION in SQL
JOIN | UNION |
JOIN combines data from many tables based on a matched condition between them | SQL combines the result set of two or more SELECT statements. |
It combines data into new columns. | It combines data into new rows |
The number of columns selected from each table may not be the same. | The number of columns selected from each table should be the same. |
Datatypes of corresponding columns selected from each table can be different. | The data types of corresponding columns selected from each table should be the same. |
It may not return distinct columns. | It returns distinct rows. |
Conclusion :
SQL supports various set operators, which you can use to filter out the desired data from the SELECT queries/tables.
The number of columns in the SELECT statement on which you want to apply the SQL set operators and the data type must be the same.
The UNION operator combines the result of two or more SELECT statements and eliminates the duplicate rows from the result.
UNION ALL is similar to UNION, but a difference is that UNION ALL does not eliminate the duplicates, and it displays all the data/records.
INTERSECT operator only returns the record, which is common from both the SELECT statements.
The MINUS operator combines the result of two statements and returns only those that are present in the first SELECT query statement.