ltcsql0
Shortest Distance in a Plane
Problem:
In a plane, you are given the coordinates of two points (x1, y1)
and (x2, y2)
. Find the shortest distance between these two points.
SQL Solution:
SELECT SQRT(POW((x2 - x1), 2) + POW((y2 - y1), 2)) AS distance
FROM table_name
WHERE ...;
Breakdown:
The
POW()
function calculates the square of a number.The
SQRT()
function calculates the square root of a number.The
+
operator adds two numbers.
The formula for calculating the distance between two points is:
distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)
The SQL query uses this formula to calculate the distance between the two points.
Real-World Application:
This problem can be applied in various real-world scenarios, such as:
Finding the shortest driving distance between two cities.
Calculating the distance between two points on a map.
Determining the minimum distance required to reach a destination.
Find Candidates for Data Scientist Position
Problem Statement:
You have a table called candidates
with the following columns:
candidate_id
- Unique identifier for each candidatename
- Name of the candidateemail
- Email address of the candidateskills
- A comma-separated list of skills possessed by the candidate
You want to find all candidates who have at least one of the skills: "Python", "SQL", and "Machine Learning".
Best & Performant SQL Solution:
SELECT
*
FROM candidates
WHERE
skills LIKE '%Python%'
OR skills LIKE '%SQL%'
OR skills LIKE '%Machine Learning%';
Breakdown and Explanation:
LIKE Operator:
The
LIKE
operator is used to perform pattern matching in SQL. It checks if a string matches a specified pattern. The%
wildcard character matches any number of characters, including no characters.Using LIKE with Skills:
In the WHERE clause, we use the
LIKE
operator with each skill we want to match. For example,skills LIKE '%Python%'
will match any candidate whoseskills
column contains the string "Python".OR Operator:
The
OR
operator is used to combine multiple conditions. In this case, we are combining the three conditions usingOR
. This means that a candidate will be selected if any of the three conditions are met.
Real-World Applications:
This query can be used in a variety of real-world applications, such as:
Candidate Screening: HR professionals can use this query to identify candidates who possess specific skills required for a job.
Skill Analysis: Managers can use this query to analyze the skills of their employees and identify areas where training or development is needed.
Market Research: Companies can use this query to research the availability of candidates with certain skills in a particular market.
Example:
Consider the following candidates
table:
1
John Doe
john.doe@example.com
Python, SQL, Machine Learning
2
Jane Smith
jane.smith@example.com
SQL, Machine Learning
3
Mark Jones
mark.jones@example.com
Java, C++, JavaScript
Running the above query on this table will return the following result:
1
John Doe
john.doe@example.com
Python, SQL, Machine Learning
2
Jane Smith
jane.smith@example.com
SQL, Machine Learning
Note: For better performance, you can create an index on the skills
column for faster lookups.
The Number of Rich Customers
Problem Statement:
Given a table called Customers
, which contains the following columns:
id
- Integer: Unique identifier for each customername
- String: Customer's nameamount
- Integer: Transaction amount for each customer
Find the number of distinct customers who have made transactions of more than a certain amount X
.
SQL Solution:
SELECT COUNT(DISTINCT id)
FROM Customers
WHERE amount > X;
Explanation:
SELECT COUNT(DISTINCT id)
: This expression counts the number of distinctid
values, which corresponds to the number of distinct customers. TheDISTINCT
keyword ensures that each unique customer is counted only once.FROM Customers
: The query retrieves data from theCustomers
table.WHERE amount > X
: This condition filters the rows where theamount
column is greater than the specified valueX
. Only customers who have made transactions of more thanX
are included in the count.
Example:
Consider the following Customers
table:
1
Alice
100
2
Bob
50
3
Cindy
150
4
David
75
5
Emily
200
If we want to find the number of customers who have made transactions of more than 125, we would run the following query:
SELECT COUNT(DISTINCT id)
FROM Customers
WHERE amount > 125;
This query would return the result:
2
Explanation: There are two customers, Alice and Emily, who have made transactions of more than 125.
Real-World Application:
This query can be used to analyze customer spending patterns and identify high-value customers. For example, a retail company could use this query to identify customers who have spent a certain amount in the past month and offer them exclusive discounts or rewards.
Biggest Window Between Visits
Problem Statement
Given a table of patient_visits
that logs when patients visit a clinic, find the largest time gap between any two consecutive visits for each patient.
SQL Query:
WITH PatientTimeGaps AS (
SELECT patient_id,
visit_date,
DATEDIFF(visit_date, LAG(visit_date) OVER (PARTITION BY patient_id ORDER BY visit_date)) AS time_gap
FROM patient_visits
)
SELECT patient_id, MAX(time_gap) AS largest_time_gap
FROM PatientTimeGaps
GROUP BY patient_id;
Explanation:
This query uses a common table expression (CTE) named PatientTimeGaps
to calculate the time difference between each patient's consecutive visits.
Breakdown:
Window Function (LAG): The
LAG
function shifts thevisit_date
values back by one row, allowing us to calculate the time difference between current and previous visits.Partitioning: The
PARTITION BY patient_id
clause ensures that theLAG
function calculates time gaps for each patient separately.Subquery (CTE): The CTE
PatientTimeGaps
stores the time gaps for each patient.Main Query (Outer Query): The outer query then finds the maximum time gap for each patient using
GROUP BY
andMAX
.
Real-World Applications:
Patient Monitoring: Tracking the time between visits can help healthcare providers monitor patients' health and adherence to treatment plans.
Predictive Analytics: By analyzing the distribution of time gaps, healthcare systems can predict future visit patterns and adjust staffing or resources accordingly.
Patient Engagement: Identifying long time gaps between visits can prompt outreach efforts to re-engage patients with their care.
Number of Accounts That Did Not Stream
Problem Statement:
Given a database of user accounts and their streaming activity, find the number of accounts that have not streamed anything.
Database Schema:
CREATE TABLE accounts (
account_id INT PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE streams (
stream_id INT PRIMARY KEY,
account_id INT,
start_time TIMESTAMP,
end_time TIMESTAMP,
FOREIGN KEY (account_id) REFERENCES accounts (account_id)
);
SQL Query:
SELECT COUNT(*) AS num_inactive_accounts
FROM accounts
EXCEPT
SELECT COUNT(DISTINCT account_id) AS num_active_accounts
FROM streams;
Explanation:
The
SELECT COUNT(*)
statement counts the number of rows in theaccounts
table, which represents the total number of accounts.The
EXCEPT
operator subtracts thenum_active_accounts
count from the total number of accounts.The
DISTINCT
keyword in thestreams
query ensures that only uniqueaccount_id
values are counted, indicating active accounts.By subtracting the number of active accounts from the total number of accounts, we get the number of inactive accounts (accounts with no streaming activity).
Real-World Applications:
Analytics dashboards: To track the percentage of inactive accounts and understand user engagement.
Targeted marketing campaigns: To identify users who may benefit from introductory streaming offers.
Product development: To gather insights into user behavior and improve streaming recommendations.
Group Employees of the Same Salary
Problem Statement
Given a table employees
with columns id
, name
, and salary
, group employees with the same salary together.
Table: employees
1
John Doe
10000
2
Jane Doe
10000
3
David Doe
15000
4
Mary Doe
15000
5
Tom Doe
20000
Result Table:
10000
John Doe, Jane Doe
15000
David Doe, Mary Doe
20000
Tom Doe
Solution
Use the GROUP_CONCAT()
function to concatenate the names of employees with the same salary.
SELECT
salary,
GROUP_CONCAT(name) AS names
FROM
employees
GROUP BY
salary;
Breakdown
SELECT salary, GROUP_CONCAT(name) AS names
: Selects the salary and concatenated names for each group.FROM employees
: Specifies the input table.GROUP BY salary
: Groups the rows by salary.GROUP_CONCAT(name)
: Concatenates the names of employees within each salary group using a comma separator.
Real-World Applications
This query can be useful in various real-world scenarios, such as:
Salary Analysis: Analyzing the distribution of salaries within an organization.
Payroll Management: Identifying employees with the same salary for payroll processing.
Employee Compensation: Grouping employees by salary for performance reviews and compensation decisions.
HR Reporting: Generating reports on the number of employees at different salary levels.
Loan Types
Problem Statement
Write a SQL query to find all loan types and the number of loans for each type.
Table Schema
CREATE TABLE loans (
id INT PRIMARY KEY,
type VARCHAR(255) NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
term INT NOT NULL
);
Example Data
INSERT INTO loans (id, type, amount, term) VALUES
(1, 'Personal', 10000.00, 12),
(2, 'Business', 20000.00, 24),
(3, 'Mortgage', 30000.00, 36),
(4, 'Personal', 15000.00, 18),
(5, 'Business', 25000.00, 30);
Query
SELECT type, COUNT(*) AS num_loans
FROM loans
GROUP BY type;
Results
| type | num_loans |
|---|---|
| Business | 2 |
| Mortgage | 1 |
| Personal | 2 |
Explanation
The query uses the GROUP BY
clause to group the loan types together. The COUNT(*)
function is used to count the number of loans for each type. The results are sorted by the type
column.
Applications
This query can be used to analyze the distribution of loan types in a database. This information can be used to make decisions about which types of loans to offer, and to set interest rates and other terms for each type of loan.
Market Analysis I
Problem Statement:
You are given a table called Orders
that contains the following columns:
order_id
: The unique ID of an order.product_id
: The ID of the product purchased.quantity
: The number of units of the product purchased.price
: The price of each unit of the product purchased.order_date
: The date the order was placed.
You need to write a SQL query to calculate the total revenue generated by each product in the given date range. The date range is specified by the start_date
and end_date
parameters.
Best & Performant Solution:
SELECT
product_id,
SUM(quantity * price) AS total_revenue
FROM Orders
WHERE
order_date BETWEEN start_date AND end_date
GROUP BY
product_id;
Breakdown and Explanation:
1. Select Columns:
SELECT
product_id,
SUM(quantity * price) AS total_revenue
We select the product_id
column to identify each product and the total revenue generated by each product calculated as SUM(quantity * price)
.
2. Filter Orders:
WHERE
order_date BETWEEN start_date AND end_date
We filter the orders to include only those placed within the specified date range. This ensures that we only calculate revenue for orders within the given period.
3. Group By Products:
GROUP BY
product_id
We group the results by product_id
to aggregate the total revenue for each product.
Performance Considerations:
The query is efficient because it uses an index on the order_date
column. This allows MySQL to quickly retrieve the orders within the specified date range. Additionally, grouping the results by product_id
reduces the number of rows that need to be processed, improving performance.
Real-World Applications:
This query can be used to calculate revenue for various time periods, such as daily, weekly, or monthly, providing valuable insights for businesses to analyze sales trends, identify top-selling products, and make informed decisions.
Longest Winning Streak
Problem Statement: Given a table of games played by players, find the longest winning streak for each player.
Example Table:
CREATE TABLE Games (
player_id INT,
game_date DATE,
won BOOLEAN
);
INSERT INTO Games (player_id, game_date, won) VALUES
(1, '2023-01-01', TRUE),
(1, '2023-01-02', FALSE),
(1, '2023-01-03', TRUE),
(1, '2023-01-04', TRUE),
(1, '2023-01-05', FALSE),
(2, '2023-02-01', TRUE),
(2, '2023-02-02', FALSE),
(2, '2023-02-03', TRUE),
(2, '2023-02-04', TRUE),
(2, '2023-02-05', TRUE);
Solution:
SELECT
g1.player_id,
MAX(g2.game_date - g1.game_date) AS longest_winning_streak
FROM Games AS g1
JOIN Games AS g2
ON g1.player_id = g2.player_id
AND g2.won = TRUE AND g2.game_date > g1.game_date
GROUP BY
g1.player_id;
Breakdown:
Join the Games table with itself:
JOIN Games AS g2 ON g1.player_id = g2.player_id
This creates a self-join that connects each game to all subsequent games played by the same player.
Filter for wins:
AND g2.won = TRUE
This condition ensures that we only consider games where the player won.
Filter for later dates:
AND g2.game_date > g1.game_date
This condition ensures that we only consider games that occurred after the previous game.
Calculate the winning streak:
MAX(g2.game_date - g1.game_date)
This expression calculates the difference between the game dates of the most recent winning game and the first winning game, which gives us the longest winning streak.
Group by player ID:
GROUP BY g1.player_id;
This groups the results by player ID to find the longest winning streak for each player.
Example Output:
+----------+--------------------------+
| player_id | longest_winning_streak |
+----------+--------------------------+
| 1 | 2 |
| 2 | 3 |
+----------+--------------------------+
Real-World Applications:
Tracking performance trends for athletes or teams.
Analyzing sales records to identify the most successful sales periods.
Monitoring customer churn rates to identify risk factors.
Game Play Analysis V
Game Play Analysis V
Problem:
Given a table of game play data, find the average score for each player who has played at least 3 games.
SQL Query:
SELECT player_id, AVG(score) AS average_score
FROM game_plays
GROUP BY player_id
HAVING COUNT(*) >= 3;
Breakdown:
SELECT player_id, AVG(score) AS average_score: Selects the player's ID and the average score for each player.
FROM game_plays: Specifies the table containing the game play data.
GROUP BY player_id: Groups the results by player ID. This allows us to calculate the average score for each player.
HAVING COUNT(*) >= 3: Filters the results to only include players who have played at least 3 games.
Example:
SELECT player_id, AVG(score) AS average_score
FROM game_plays
GROUP BY player_id
HAVING COUNT(*) >= 3;
+----------+-----------------+
| player_id | average_score |
+----------+-----------------+
| 1 | 60.0 |
| 2 | 75.0 |
| 3 | 85.0 |
+----------+-----------------+
This query will return the average score for each player who has played at least 3 games. In this example, player 1 has an average score of 60.0, player 2 has an average score of 75.0, and player 3 has an average score of 85.0.
Real-World Applications:
This query can be used to analyze game play data and identify players who are performing well. It can also be used to compare the performance of different players or teams.
Count Artist Occurrences On Spotify Ranking List
Problem Statement:
Count the number of times each artist appears in a given Spotify ranking list.
SQL Solution:
WITH ArtistOccurrences AS (
SELECT
a.name AS artist_name,
COUNT(*) AS occurrence_count
FROM artists a
JOIN albums b ON a.id = b.artist_id
JOIN songs c ON b.id = c.album_id
JOIN spotify_ranking_list d ON c.id = d.song_id
GROUP BY
artist_name
)
SELECT
artist_name,
occurrence_count
FROM ArtistOccurrences
ORDER BY
occurrence_count DESC;
Explanation:
Create a CTE (Common Table Expression) called ArtistOccurrences:
This CTE calculates the artist occurrences by joining the
artists
,albums
,songs
, andspotify_ranking_list
tables.The
COUNT(*)
function counts the number of songs by each artist that appear in the ranking list.
Select the Artist Name and Occurrence Count:
The main query selects the
artist_name
andoccurrence_count
from theArtistOccurrences
CTE.
Order the Results:
The query orders the results by the
occurrence_count
in descending order, showing the artists with the highest number of occurrences first.
Real-World Application:
This query can be used to identify popular artists on Spotify, track artist trends, and make recommendations based on user preferences. It can also be used to analyze the performance of artists' albums and songs in the ranking list.
Example:
Consider the following data:
Artist A
Album A
Song A
1
Artist A
Album A
Song B
2
Artist B
Album B
Song C
3
Artist C
Album C
Song D
4
The query would produce the following output:
Artist A
2
Artist B
1
Artist C
1
This shows that Artist A has the highest number of occurrences in the Spotify ranking list with 2 songs, followed by Artist B and Artist C with 1 song each.
The Number of Passengers in Each Bus II
LeetCode Problem:
Number of Passengers in Each Bus II
Problem Statement:
Given a table called Trips
that records passenger journeys on buses, write a SQL query to count the total number of passengers on each bus.
Table Schema:
CREATE TABLE Trips (
bus_id INT NOT NULL,
passenger_count INT NOT NULL,
start_time DATETIME NOT NULL,
end_time DATETIME NOT NULL
);
Sample Data:
| bus_id | passenger_count | start_time | end_time |
|--------|-----------------|-------------------|--------------------|
| 1 | 10 | 2023-01-01 10:00 | 2023-01-01 11:00 |
| 2 | 15 | 2023-01-01 11:00 | 2023-01-01 12:00 |
| 1 | 5 | 2023-01-01 12:00 | 2023-01-01 13:00 |
| 2 | 20 | 2023-01-01 13:00 | 2023-01-01 14:00 |
Solution:
SELECT bus_id, SUM(passenger_count) AS total_passengers
FROM Trips
GROUP BY bus_id;
Breakdown:
SELECT bus_id, SUM(passenger_count) AS total_passengers
: Calculates the total number of passengers for each bus. It returns the bus ID and the sum of the passenger count for that bus.FROM Trips
: Specifies theTrips
table as the data source.GROUP BY bus_id
: Groups the results by the bus ID. This ensures that the passenger count is summed up for each unique bus.
Result:
| bus_id | total_passengers |
|--------|-----------------|
| 1 | 15 |
| 2 | 35 |
Example:
Using the sample data, the query will produce the following result:
| bus_id | total_passengers |
|--------|-----------------|
| 1 | 15 |
| 2 | 35 |
Applications:
Passenger Transportation Tracking: Tracking the total number of passengers for each bus can help transportation companies optimize their routes and schedules.
Bus Sales Analysis: Businesses can use this query to determine which buses generate the highest revenue by analyzing the total number of passengers carried.
Customer Insights: By analyzing passenger counts, businesses can gain insights into passenger travel patterns and demographics.
Find Expensive Cities
Problem: Find Expensive Cities
SQL Query:
SELECT City, PriceLevel
FROM Costs
WHERE PriceLevel > (
SELECT AVG(PriceLevel)
FROM Costs
)
ORDER BY PriceLevel DESC;
Breakdown:
The query performs the following steps:
Select the
City
andPriceLevel
columns from theCosts
table: This retrieves the city names and their corresponding price levels from the database.Filter the results where the
PriceLevel
is greater than the average: The query calculates the average price level of all cities using the subquery(SELECT AVG(PriceLevel) FROM Costs)
. It then filters out cities whose price levels exceed this average.Order the results in descending order of
PriceLevel
: This sorts the cities from most expensive to least expensive based on their price levels.
Simplified Explanation:
Imagine a restaurant menu where each dish has a price level. We want to find out which cities have dishes that are more expensive than the average dish price across all cities.
Read the menu (Costs table): We start by reading the menu, which contains the city names and their corresponding dish prices (price levels).
Find the average price (Average subquery): We then calculate the average price of all dishes on the menu. This represents the expected price level.
Identify expensive dishes (Filter condition): We focus on dishes that are more expensive than the average. This filter separates the high-priced dishes from the more moderately priced ones.
Rank the cities (Order by clause): Finally, we arrange the cities in order, starting with the most expensive dishes and ending with the least expensive dishes.
Real-World Application:
This query can be useful for travelers, tourists, or businesses evaluating the cost of living in different cities. It helps identify cities where goods and services tend to be more expensive than average, allowing individuals and organizations to make informed decisions about their budget and expenses.
Running Total for Different Genders
Problem Statement:
You have a table called person
with columns:
id
(primary key)gender
age
name
Calculate the running total for each gender group. The running total is the sum of all ages up to that point.
Example Table:
1
Male
20
John
2
Female
15
Jane
3
Male
25
Peter
4
Female
22
Susan
Expected Output:
Male
45
Female
37
Solution:
Step 1: Create Common Table Expression (CTE)
WITH GenderRunningTotal AS (
SELECT
gender,
SUM(age) OVER (PARTITION BY gender ORDER BY id) AS running_total
FROM
person
)
WITH GenderRunningTotal AS (...)
creates a CTE.SUM(age) OVER (PARTITION BY gender ORDER BY id)
calculates the running total for each gender.
Step 2: Select Gender and Running Total
SELECT DISTINCT
gender,
running_total
FROM
GenderRunningTotal;
SELECT DISTINCT gender, running_total
removes duplicates.FROM GenderRunningTotal
uses the CTE created in Step 1.
Explanation:
The CTE
GenderRunningTotal
calculates the running total for each gender by using theOVER
clause:PARTITION BY gender
groups the data by gender.ORDER BY id
sorts the data by ID within each gender group.SUM(age)
calculates the cumulative sum of ages.
The main query selects distinct values of gender and their corresponding running totals from the CTE.
Real World Applications:
Sales Data: Calculate the running total of sales for different product categories or regions to track sales trends.
Patient Records: Track the cumulative number of patients seen by a doctor over time to monitor their workload.
Inventory Management: Keep track of the running total of inventory items in each warehouse to optimize stock levels.
Consecutive Numbers
Problem:
Given a table Nums
with integer values, find the longest consecutive sequences of consecutive numbers.
Table:
Nums (
num INT
)
Query:
# Write your MySQL query statement below
SELECT
MAX(LENGTH) AS LongestConsecutiveSequence
FROM
(
SELECT
num,
@cur_num := CASE
WHEN @prev_num = num - 1 THEN @cur_num + 1
ELSE 1
END AS LENGTH,
@prev_num := num
FROM
Nums,
(SELECT @prev_num := NULL, @cur_num := 0) AS vars
ORDER BY
num
) AS subquery
GROUP BY
num - LENGTH + 1
Breakdown:
1. Subquery:
SELECT
num,
@cur_num := CASE
WHEN @prev_num = num - 1 THEN @cur_num + 1
ELSE 1
END AS LENGTH,
@prev_num := num
FROM
Nums,
(SELECT @prev_num := NULL, @cur_num := 0) AS vars
ORDER BY
num
This subquery iterates through the
Nums
table and calculates the length of each consecutive sequence.It uses user-defined variables
@prev_num
and@cur_num
to track the previous number and the length of the current consecutive sequence.If the current number is consecutive with the previous number, it increments the length by 1. Otherwise, it sets the length to 1.
It also updates the
@prev_num
variable with the current number for the next iteration.
2. Group By and Max:
GROUP BY
num - LENGTH + 1
SELECT
MAX(LENGTH) AS LongestConsecutiveSequence
The outer query groups the results of the subquery by
num - LENGTH + 1
to get the starting point of each consecutive sequence.It then calculates the maximum length of these sequences and assigns it to the
LongestConsecutiveSequence
column.
Real-World Application:
This query can be used in various real-world applications, such as:
Finding the longest consecutive winning streak in a sports league.
Analyzing inventory data to identify items that are consistently selling together.
Detecting gaps or missing values in a dataset.
First and Last Call On the Same Day
Problem: Find the first and last phone call received by each customer on the same day.
Input:
table1 (calls):
| customer_id | call_date | call_time |
|---|---|---|
| 1 | 2022-08-01 | 09:00:00 |
| 1 | 2022-08-01 | 12:00:00 |
| 2 | 2022-08-02 | 10:00:00 |
| 2 | 2022-08-02 | 15:00:00 |
| 3 | 2022-08-03 | 11:00:00 |
| 3 | 2022-08-04 | 13:00:00 |
Output:
1
09:00:00
12:00:00
2
10:00:00
15:00:00
3
11:00:00
13:00:00
Solution:
WITH RankedCalls AS (
SELECT
customer_id,
call_date,
call_time,
ROW_NUMBER() OVER (PARTITION BY customer_id, call_date ORDER BY call_time) AS row_num
FROM calls
)
SELECT
customer_id,
MIN(CASE WHEN row_num = 1 THEN call_time END) AS first_call,
MAX(CASE WHEN row_num = 1 THEN call_time END) AS last_call
FROM RankedCalls
GROUP BY
customer_id,
call_date
ORDER BY
customer_id;
Breakdown:
RankedCalls Subquery: Creates a ranked table that assigns a row number to each call for each customer by call date.
MIN and MAX Aggregation: Calculates the first and last call times for each customer on each call date using the MIN and MAX functions.
GROUP BY and ORDER BY: Groups the results by customer ID and call date, then sorts them by customer ID for the final output.
Real-World Application:
Tracking customer call history and identifying patterns in call frequency.
Analyzing call center performance by tracking the time range of calls.
Enhancing customer service by providing information about the first and last contact points on a specific day.
Queries Quality and Percentage
Problem Statement
Given a table of queries and their corresponding execution times, find the top k queries that have the highest average execution time and also calculate the percentage of total execution time contributed by these queries.
SQL Query
WITH RankedQueries AS (
SELECT query, AVG(execution_time) AS avg_execution_time,
RANK() OVER (ORDER BY AVG(execution_time) DESC) AS rank
FROM Queries
GROUP BY query
), TopQueries AS (
SELECT query, avg_execution_time
FROM RankedQueries
WHERE rank <= k
)
SELECT tq.query, tq.avg_execution_time,
(SUM(tq.avg_execution_time) / SUM(q.execution_time)) * 100 AS percentage
FROM TopQueries tq
JOIN Queries q ON tq.query = q.query
GROUP BY tq.query, tq.avg_execution_time
ORDER BY tq.avg_execution_time DESC;
Explanation
Step 1: Create a RankedQueries Table
The subquery
RankedQueries
groups the queries by their name and calculates the average execution time for each group.It then ranks the queries in descending order of average execution time.
Step 2: Create a TopQueries Table
The subquery
TopQueries
selects the queries with a rank less than or equal tok
.These queries are the top
k
queries with the highest average execution time.
Step 3: Calculate the Percentage
The main query joins
TopQueries
with the originalQueries
table to get the average execution times for the top queries.It calculates the percentage of total execution time contributed by these top queries by dividing the sum of their average execution times by the total sum of execution times for all queries and multiplying by 100.
Real-World Applications
This query can be useful in optimizing database performance by identifying the queries that are consuming the most resources. By understanding this information, database administrators can take steps to improve the performance of the database, such as optimizing the queries or creating indexes.
Product's Price for Each Store
Problem:
Given two tables:
Products (id, name, price)
Stores (id, name, city)
Find the price of each product in each store.
Solution:
-- Join the Products and Stores tables on the id column.
SELECT
p.name AS product_name,
s.name AS store_name,
p.price AS product_price
FROM
Products AS p
JOIN
Stores AS s
ON
p.id = s.id;
Explanation:
Join the Tables: The
JOIN
clause combines the rows from theProducts
andStores
tables based on the common columnid
. This creates a new table that contains all the columns from both tables.Select the Columns: The
SELECT
clause specifies the columns that you want to retrieve from the joined table. In this case, we want the product name (product_name
), store name (store_name
), and product price (product_price
).Alias the Columns: The
AS
keyword is used to alias the column names. This makes it easier to refer to the columns in the output.
Example:
Consider the following tables:
Products:
+----+--------+-------+
| id | name | price |
+----+--------+-------+
| 1 | Apple | 10 |
| 2 | Banana | 5 |
| 3 | Orange | 7 |
+----+--------+-------+
Stores:
+----+--------+-------+
| id | name | city |
+----+--------+-------+
| 1 | Walmart | Atlanta |
| 2 | Target | Chicago |
| 3 | Kroger | Dallas |
+----+--------+-------+
Running the SQL query:
SELECT
p.name AS product_name,
s.name AS store_name,
p.price AS product_price
FROM
Products AS p
JOIN
Stores AS s
ON
p.id = s.id;
will produce the following output:
+-------------+-----------+--------------+
| product_name | store_name | product_price |
+-------------+-----------+--------------+
| Apple | Walmart | 10 |
| Apple | Target | 10 |
| Apple | Kroger | 10 |
| Banana | Walmart | 5 |
| Banana | Target | 5 |
| Banana | Kroger | 5 |
| Orange | Walmart | 7 |
| Orange | Target | 7 |
| Orange | Kroger | 7 |
+-------------+-----------+--------------+
This output shows the price of each product in each store.
Applications:
This query can be used in various real-world applications, such as:
Displaying product prices on an e-commerce website
Comparing product prices across different stores
Managing inventory and pricing for a retail business
Strong Friendship
Problem Statement
Find all pairs of friends who have at least three mutual friends.
Table Schema
CREATE TABLE Friends (
id1 INT NOT NULL,
id2 INT NOT NULL,
PRIMARY KEY (id1, id2),
FOREIGN KEY (id1) REFERENCES Users (id),
FOREIGN KEY (id2) REFERENCES Users (id)
);
Optimal Solution
SELECT f1.id1, f1.id2
FROM Friends f1
JOIN Friends f2 ON f1.id1 = f2.id2 AND f1.id2 = f2.id1
JOIN Friends f3 ON f1.id1 = f3.id2 AND f2.id2 = f3.id1
WHERE f1.id1 < f1.id2;
Explanation
This query uses self-joins on the Friends
table to find pairs of friends who have at least three mutual friends.
The first join (
f1 JOIN f2
) finds pairs of friends who are friends with each other.The second join (
f2 JOIN f3
) finds pairs of mutual friends who are friends with both of the friends from the first join.The
WHERE
clause filters out any rows where the first friend's ID is greater than or equal to the second friend's ID to ensure that each pair is listed only once.
Example
Consider the following Friends
table:
| id1 | id2 |
|---|---|
| 1 | 2 |
| 1 | 3 |
| 2 | 3 |
| 2 | 4 |
| 3 | 4 |
| 3 | 5 |
| 4 | 5 |
The query will return the following result:
| id1 | id2 |
|---|---|
| 1 | 3 |
| 1 | 4 |
| 2 | 3 |
| 2 | 4 |
Real-World Applications
This query can be used in a variety of real-world applications, such as:
Identifying social groups within a population
Recommending friends to users on social media platforms
Identifying potential collaborators for research projects
List the Products Ordered in a Period
Problem Statement:
Write a SQL query to list all products ordered in a specific period.
Table Schema:
orders (
order_id int,
product_id int,
order_date date
)
Query:
SELECT DISTINCT product_id
FROM orders
WHERE order_date BETWEEN '2022-01-01' AND '2022-12-31';
Breakdown:
The
SELECT DISTINCT
statement retrieves the uniqueproduct_id
values.The
FROM
clause specifies theorders
table as the data source.The
WHERE
clause filters the rows where theorder_date
column is between the specified period,'2022-01-01'
and'2022-12-31'
.
Real-World Application:
This query can be used by businesses to identify which products were ordered the most in a specific time frame. This information can be useful for:
Sales analysis: Understanding which products are popular during different seasons or periods.
Inventory management: Determining which products need to be restocked based on demand.
Marketing campaigns: Identifying products that can be promoted based on their recent sales performance.
The Number of Employees Which Report to Each Employee
Problem Statement
Given a table Employees
with columns id
and managerId
, where managerId
represents the ID of the employee's manager, find the number of employees who report to each employee.
Table Schema
CREATE TABLE Employees (
id INT PRIMARY KEY,
managerId INT REFERENCES Employees(id),
FOREIGN KEY (managerId) REFERENCES Employees(id)
);
Example Data
| Id | ManagerId |
|---|---|
| 1 | NULL |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
Expected Output
| Id | NumEmployees |
|---|---|
| 1 | 2 |
| 2 | 2 |
Solution
1. Recursive Common Table Expression (CTE)
WITH RECURSIVE EmployeeHierarchy AS (
SELECT
e1.id,
e1.managerId,
COUNT(*) OVER (PARTITION BY e1.managerId ORDER BY e1.id) AS NumEmployees
FROM Employees e1
JOIN EmployeeHierarchy e2
ON e1.managerId = e2.id
WHERE
e1.managerId IS NULL
)
SELECT
id,
NumEmployees
FROM EmployeeHierarchy
WHERE
managerId IS NOT NULL;
Explanation
The CTE
EmployeeHierarchy
recursively finds the number of employees under each manager by traversing the employee tree starting from the root node (manager with no manager).The
PARTITION BY
clause ensures that we count employees for each manager separately.The
WHERE
clause in line 13 ensures that we only select employees who have a manager (i.e., not the root node).
2. Join and Aggregation
SELECT
em1.id,
COUNT(DISTINCT em2.id) AS NumEmployees
FROM Employees em1
LEFT JOIN Employees em2
ON em1.id = em2.managerId
GROUP BY
em1.id;
Explanation
This query uses a LEFT JOIN to connect managers (em1) with their employees (em2).
The
COUNT(DISTINCT)
function counts the number of unique employees who report to each manager.The
GROUP BY
clause groups the results by manager ID.
Applications
Organizational Hierarchy: Determine the number of direct reports for each manager in an organizational hierarchy.
Performance Management: Identify managers with the highest and lowest number of reports for performance evaluation purposes.
Resource Allocation: Allocate resources (e.g., training, equipment) based on the number of employees under each manager.
Build the Equation
Problem Statement:
Given a table Sales
with columns product_id
, quantity
, and price
, calculate the total sales amount for each product by multiplying the quantity and price for each row.
SQL Query:
SELECT product_id, SUM(quantity * price) AS total_sales
FROM Sales
GROUP BY product_id;
Breakdown and Explanation:
Selecting Product ID and Total Sales Amount:
SELECT product_id, SUM(quantity * price) AS total_sales
This part of the query selects the
product_id
column and calculates the sum ofquantity * price
for each product, which represents the total sales amount. TheSUM()
function aggregates the product ofquantity
andprice
for each row.Grouping by Product ID:
GROUP BY product_id
This part groups the results by the
product_id
column. It ensures that the total sales amount is calculated for each unique product.
Real-World Application:
This SQL query can be used in various real-world scenarios:
Sales Analysis: Businesses can use this query to analyze the total sales for different products.
Revenue Forecasting: By tracking total sales over time, businesses can forecast future revenue.
Inventory Management: The total sales amount can help determine which products are selling more and need to be restocked.
Customer Segmentation: Businesses can use total sales to segment customers based on their purchase patterns.
Example:
Consider the following Sales
table:
1
10
5
1
20
6
2
15
4
2
25
7
Result:
1
260
2
275
This result shows the total sales amount for each product, which is 260
for product ID 1
and 275
for product ID 2
.
Rising Temperature
Problem Statement: Given a table Temperature
that records the temperature of a region at different time intervals, find the regions with the highest and lowest average temperatures.
Table Schema:
CREATE TABLE Temperature (
region VARCHAR(255) NOT NULL,
time TIMESTAMP NOT NULL,
temperature FLOAT NOT NULL,
PRIMARY KEY (region, time)
);
Solution:
WITH RegionAverageTemperatures AS (
SELECT region, AVG(temperature) AS avg_temperature
FROM Temperature
GROUP BY region
)
SELECT region, avg_temperature
FROM RegionAverageTemperatures
ORDER BY avg_temperature DESC
LIMIT 1;
SELECT region, avg_temperature
FROM RegionAverageTemperatures
ORDER BY avg_temperature ASC
LIMIT 1;
Explanation:
Create a Common Table Expression (CTE) called
RegionAverageTemperatures
:Calculate the average temperature for each region using the
AVG()
function and group the results by region.
Find the Region with the Highest Average Temperature:
Select the region and average temperature from the CTE, order the results in descending order by average temperature, and limit the results to 1 row using
LIMIT 1
.
Find the Region with the Lowest Average Temperature:
Same as above, but order the results in ascending order by average temperature.
Simplified Explanation:
We create a temporary table that calculates the average temperature for each region.
We find the regions with the highest and lowest average temperatures by sorting the average temperatures in descending and ascending order, respectively.
We only display the top 1 result for each.
Real-World Applications:
Climate analysis: Identifying regions with extreme temperature variations.
Weather forecasting: Predicting future temperature trends based on historical data.
Climate change modeling: Studying the impact of rising global temperatures on specific regions.
Top Travellers
Problem:
Find the top X most frequent travelers in a database of travel records.
SQL Query:
SELECT traveler_id, COUNT(*) AS travel_count
FROM TravelRecords
GROUP BY traveler_id
ORDER BY travel_count DESC
LIMIT X;
Breakdown:
SELECT traveler_id, COUNT(*) AS travel_count
:Selects the unique traveler ID and the count of travel records for each traveler.
FROM TravelRecords
:Specifies the table containing the travel records.
GROUP BY traveler_id
:Groups the records by traveler ID, so that each traveler's travel count can be aggregated.
ORDER BY travel_count DESC
:Orders the results in descending order of travel count.
LIMIT X
:Specifies the maximum number of travelers to return (top X).
Real-World Application:
This query can be used by travel companies to identify their most frequent travelers, who can then be targeted with exclusive offers, loyalty programs, or personalized experiences.
Example:
Consider the following TravelRecords
table:
1
2023-01-01
1
2023-01-15
2
2023-02-01
3
2023-03-01
3
2023-03-15
4
2023-04-01
Running the query with X = 2
would return:
1
2
3
2
Employees Whose Manager Left the Company
Problem:
Find the employees whose manager has left the company.
SQL Solution:
SELECT DISTINCT E.employee_id, E.employee_name
FROM Employees E
LEFT JOIN Employees M ON E.manager_id = M.employee_id
WHERE M.employee_id IS NULL;
Explanation:
SELECT DISTINCT E.employee_id, E.employee_name: Selects the distinct employee IDs and names of employees (E).
FROM Employees E: Specifies the
Employees
table as the source table for selecting the employees.LEFT JOIN Employees M ON E.manager_id = M.employee_id: Performs a left join between the
Employees
table (E) and itself (M) on the condition that the employee's manager ID (E.manager_id) matches the manager's employee ID (M.employee_id).WHERE M.employee_id IS NULL: Filters the results to include only those employees whose managers have a NULL employee ID. This indicates that the manager has left the company because employees without a manager will have a NULL value in their
manager_id
field.
Example:
Consider the following Employees
table:
1
John Smith
2
2
Mary Jones
NULL
3
Peter Parker
4
4
Tony Stark
NULL
Using the above SQL query, we get the result:
1
John Smith
3
Peter Parker
Explanation:
John Smith's manager (employee ID 2) is NULL, indicating that his manager has left the company.
Peter Parker's manager (employee ID 4) is also NULL, indicating that his manager has left the company.
Mary Jones' manager is not NULL (employee ID 2), so she is not included in the result.
Real-World Applications:
This query can be useful in various scenarios, such as:
Identifying employees who may need additional support or guidance due to the absence of their managers.
Ensuring that tasks and responsibilities are reassigned or redistributed effectively to maintain operational efficiency.
Tracking changes in the organizational structure and identifying potential management gaps or areas for restructuring.
Employees With Deductions
SELECT E.name, E.salary, SUM(D.amount) AS total_deductions
FROM Employees E
JOIN Deductions D ON E.id = D.employee_id
GROUP BY E.name, E.salary
ORDER BY total_deductions DESC;
This query retrieves the name and salary of employees along with the total amount of deductions applied to their paychecks.
Breakdown:
JOIN Operation: The query uses an INNER JOIN between the
Employees
(E) andDeductions
(D) tables to match employees with their respective deductions. TheE.id
column from theEmployees
table is linked to theD.employee_id
column from theDeductions
table. This ensures that only deductions belonging to employees in theEmployees
table are retrieved.SUM Aggregation: For each employee, the
SUM()
aggregate function is applied to theD.amount
column to calculate the total amount of deductions. This value is aliased astotal_deductions
.GROUP BY Clause: The results are grouped by the
E.name
andE.salary
columns, meaning that all deductions for each employee are aggregated together.ORDER BY Clause: Finally, the results are ordered in descending order based on the
total_deductions
column, displaying employees with the highest total deductions at the top of the list.
Real-World Applications:
This query can be useful in various real-world scenarios:
Payroll Processing: Companies can use this query to determine the total deductions for each employee, which is necessary for calculating net pay.
Budgeting and Financial Planning: Employees can use this query to assess their total deductions and adjust their budgets or financial plans accordingly.
Employee Performance Analysis: HR departments can use this query to identify employees with higher or lower deduction amounts, which may indicate areas for improvement in compensation and benefits packages.
Employees Project Allocation
Problem Statement
You have a table called Employees
with the following columns:
emp_id
(int)name
(string)
And a table called Projects
with the following columns:
proj_id
(int)name
(string)
Each employee can be assigned to multiple projects, and each project can have multiple employees assigned to it. You want to write a query to list the names of employees and the projects they are assigned to.
Solution
The following query will list the names of employees and the projects they are assigned to:
SELECT e.name, p.name
FROM Employees e
JOIN EmployeeProjects ep ON e.emp_id = ep.emp_id
JOIN Projects p ON ep.proj_id = p.proj_id;
Breakdown of the Solution
The query uses a JOIN
operation to combine the Employees
and Projects
tables on the common column emp_id
. This creates a new table that contains all of the rows from both tables that have matching emp_id
values.
The JOIN
operation is followed by a SELECT
statement that selects the name
column from the Employees
table and the name
column from the Projects
table.
Real-World Applications
This query could be used in a variety of real-world applications, such as:
Generating reports on employee productivity
Tracking employee assignments
Managing project staffing
Potential Applications
Here are some potential applications of this solution in the real world:
Human Resources: This query could be used to generate reports on employee productivity. For example, the query could be used to identify employees who are assigned to multiple projects and to track their progress on each project.
Project Management: This query could be used to track employee assignments. For example, the query could be used to identify which employees are assigned to a particular project and to track their progress on the project.
Staffing: This query could be used to manage project staffing. For example, the query could be used to identify which employees are available to be assigned to a new project and to track their availability.
Triangle Judgement
Triangle Problem:
Imagine you have three sticks or line segments. You want to know if they can form a valid triangle.
SQL Solution:
SELECT CASE
WHEN a + b > c AND a + c > b AND b + c > a
THEN 'Valid Triangle'
ELSE 'Invalid Triangle'
END AS Triangle_Validity
FROM Triangle_Info
WHERE a > 0 AND b > 0 AND c > 0;
Explanation:
Table Setup: We assume you have a table named
Triangle_Info
with three columns:a
,b
, andc
, representing the lengths of the three sticks.Query: The query first checks each combination of sticks if their sum is greater than the length of the other two sticks. If all combinations pass this check, then it's a valid triangle. Otherwise, it's invalid.
Output: The query returns the validity of the triangle as 'Valid Triangle' or 'Invalid Triangle'.
Real-World Applications:
Construction: Validating if materials have the right proportions to create a strong frame.
Engineering: Designing structures like bridges or buildings to ensure they can withstand forces.
Furniture Design: Determining if the dimensions of furniture pieces will provide proper support and stability.
Simplified Explanation:
Think of a triangle as a house. The sides (a, b, c) are like the beams supporting the roof. If any side is too short, the roof will collapse. If any side is too long, it will stick out and be unstable. But if all three sides are the right length, the roof will stand strong and you will have a valid triangle!
Accepted Candidates From the Interviews
Problem:
Given the tables candidates
and interviews
, find the names of candidates who passed the interviews.
Tables:
candidates (candidate_id, name)
interviews (candidate_id, interview_date, result)
Solution:
SELECT c.name
FROM candidates c
JOIN interviews i ON c.candidate_id = i.candidate_id
WHERE i.result = 'passed';
Explanation:
Join the
candidates
andinterviews
tables using thecandidate_id
column, which is the common column between the two tables.Filter the joined table to only include candidates who passed the interviews, by checking for the condition
i.result = 'passed'
.
Real-World Application:
This query can be used in a real-world scenario to identify candidates who have passed the interview process for a particular job position. The results can be used to:
Send official job offers to the successful candidates.
Schedule onboarding for the new hires.
Track the progress of the hiring process and identify any potential bottlenecks.
Example:
Consider the following data in the candidates
and interviews
tables:
candidates
+-----------+--------+
| candidate_id | name |
+-----------+--------+
| 1 | John |
| 2 | Mary |
| 3 | Bob |
+-----------+--------+
interviews
+-----------+--------------+--------+
| candidate_id | interview_date | result |
+-----------+--------------+--------+
| 1 | 2023-01-01 | passed |
| 2 | 2023-01-02 | failed |
| 3 | 2023-01-03 | passed |
+-----------+--------------+--------+
Running the SQL query on this data will produce the following result:
+--------+
| name |
+--------+
| John |
| Bob |
+--------+
This result shows that John and Bob passed their interviews and are eligible for job offers.
User Activity for the Past 30 Days I
Problem Statement:
Given a table of user activities, find the total number of activities for each user in the past 30 days.
SQL Query:
SELECT
user_id,
COUNT(*) AS total_activities
FROM user_activities
WHERE
activity_date >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY
user_id;
Breakdown:
SELECT: Selects two columns:
user_id
andtotal_activities
.FROM: Selects from the
user_activities
table.WHERE: Filters the rows to include only those where
activity_date
is greater than or equal to 30 days ago.GROUP BY: Groups the results by
user_id
so that we get a count for each user.
Example:
Consider the following table:
| user_id | activity_date |
|---|---|
| 1 | 2023-03-01 |
| 1 | 2023-03-05 |
| 2 | 2023-03-07 |
| 2 | 2023-03-10 |
The query will return the following result:
| user_id | total_activities |
|---|---|
| 1 | 2 |
| 2 | 2 |
Explanation:
The query first selects all rows from the
user_activities
table where theactivity_date
is within the past 30 days.It then groups the results by
user_id
and counts the number of activities for each user.The result is a table that shows the total number of activities for each user in the past 30 days.
Real-World Applications:
This query can be useful for:
Tracking user engagement: Monitoring the number of activities performed by users can help you understand how engaged they are with your application or website.
Identifying inactive users: You can use this query to identify users who have not been active in the past 30 days and target them with marketing campaigns.
Analyzing user behavior: By comparing the total number of activities for different users, you can identify patterns and trends in user behavior.
The Airport With the Most Traffic
Problem Statement:
Find the airport with the most total takeoffs and landings.
SQL Solution:
-- Find the airport with the most total takeoffs and landings
SELECT airport_id,
airport_name,
SUM(takeoffs + landings) AS total_operations
FROM airport_operations
GROUP BY airport_id, airport_name
ORDER BY total_operations DESC
LIMIT 1;
Explanation:
This query uses a combination of aggregation and grouping to count the total number of takeoffs and landings for each airport.
Aggregation: The
SUM()
function is used to calculate the total number of takeoffs and landings for each airport. The expression(takeoffs + landings)
adds the values of thetakeoffs
andlandings
columns for each row.Grouping: The
GROUP BY
clause groups the results byairport_id
andairport_name
. This means that all rows with the sameairport_id
andairport_name
are grouped together.Ordering: The
ORDER BY
clause orders the results in descending order oftotal_operations
. This means that the airports with the most total operations will be listed first.Limiting: The
LIMIT 1
clause limits the results to the top 1 row. This means that only the airport with the most total operations will be returned.
Real-World Applications:
This query can be used in a variety of real-world applications, including:
Identifying the busiest airports in a region or country
Planning airport expansion projects
Forecasting air traffic demand
Analyzing trends in air travel
Example:
Suppose we have the following table of airport operations data:
1
JFK
1000
1200
2
LAX
1500
1300
3
ORD
1200
1100
Running the above query on this data would return the following result:
2
LAX
2800
This result shows that LAX is the airport with the most total takeoffs and landings (2800).
Restaurant Growth
Problem Statement
Given a table Restaurant
, which contains the following columns:
rid
: Restaurant IDname
: Restaurant namecity
: Restaurant cityyear
: Year the restaurant was established
You are tasked to find the restaurants that have experienced the most growth in terms of the number of cities they operate in over a given period of time.
Solution
Step-by-Step Explanation
1. Find the Number of Cities for Each Restaurant in Each Year
We can achieve this by using a self-join:
SELECT r1.rid, r1.year AS year1, r2.year AS year2, COUNT(DISTINCT r2.city) AS city_count
FROM Restaurant r1
JOIN Restaurant r2 ON r1.rid = r2.rid
WHERE r1.year < r2.year
GROUP BY r1.rid, r1.year, r2.year;
This query counts the number of distinct cities each restaurant operates in for each pair of years (year1, year2)
where year1
is less than year2
.
2. Calculate the Growth for Each Restaurant
We can calculate the growth for each restaurant by finding the difference in the number of cities operated between year2 and year1:
SELECT rid, year1, year2, city_count AS growth
FROM (
SELECT r1.rid, r1.year AS year1, r2.year AS year2, COUNT(DISTINCT r2.city) AS city_count
FROM Restaurant r1
JOIN Restaurant r2 ON r1.rid = r2.rid
WHERE r1.year < r2.year
GROUP BY r1.rid, r1.year, r2.year
) subquery
WHERE year2 - year1 = 1;
We filter the results to only include pairs of years that are one year apart.
3. Find Restaurants with the Most Growth
Finally, we can find the restaurants with the most growth by ranking them based on their growth:
SELECT rid, year1, year2, growth
FROM (
SELECT rid, year1, year2, city_count AS growth
FROM (
SELECT r1.rid, r1.year AS year1, r2.year AS year2, COUNT(DISTINCT r2.city) AS city_count
FROM Restaurant r1
JOIN Restaurant r2 ON r1.rid = r2.rid
WHERE r1.year < r2.year
GROUP BY r1.rid, r1.year, r2.year
) subquery
WHERE year2 - year1 = 1
) subquery
ORDER BY growth DESC;
Real-World Applications
This query can be used to identify restaurants that are rapidly expanding their geographical footprint. This information can be valuable for investors, real estate developers, and city planners who are interested in tracking the growth of the restaurant industry in specific areas.
Complete Code Example
The following is a complete code example in SQL:
WITH RestaurantGrowth AS (
SELECT rid, year1, year2, COUNT(DISTINCT r2.city) AS city_count
FROM Restaurant r1
JOIN Restaurant r2 ON r1.rid = r2.rid
WHERE r1.year < r2.year
GROUP BY r1.rid, r1.year, r2.year
)
SELECT rid, year1, year2, city_count AS growth
FROM RestaurantGrowth
WHERE year2 - year1 = 1
ORDER BY growth DESC;
Convert Date Format
Problem: Convert Date Format
SQL Code:
SELECT DATE_FORMAT(input_date, '%Y-%m-%d') AS formatted_date
FROM table_name;
Breakdown:
DATE_FORMAT() Function: This function converts a date string into a specified format.
%Y-%m-%d: This is the format string that specifies the desired output format. It represents "Year-Month-Day."
Example Input and Output:
| input_date | formatted_date |
|---|---|
| '2023-04-05' | '2023-04-05' |
| '2022-12-25' | '2022-12-25' |
| '2021-07-14' | '2021-07-14' |
Explanation:
The DATE_FORMAT() function takes the input_date as an argument and converts it into the specified format. In this case, the format is "Year-Month-Day." This means that the output date string will be in the format YYYY-MM-DD.
Real-World Applications:
Format dates for display in user interfaces or reports.
Convert dates to a standard format for data exchange or storage.
Perform date-related calculations, such as finding the difference between two dates.
Tasks Count in the Weekend
Problem Statement
Given a table Tasks
containing task information, including the date when each task was created (created_at
) and its status, write a SQL query to count the number of tasks created during the weekend (Saturday and Sunday).
Table Schema
CREATE TABLE Tasks (
id INT PRIMARY KEY,
created_at TIMESTAMP NOT NULL,
status VARCHAR(255) NOT NULL
);
Query
SELECT COUNT(*) AS weekend_task_count
FROM Tasks
WHERE created_at BETWEEN '2022-08-13' AND '2022-08-14';
Explanation
The query uses the BETWEEN
operator to check if the created_at
column falls between two dates, in this case, Saturday, August 13th, 2022 and Sunday, August 14th, 2022. The COUNT(*)
function counts the number of rows that meet this condition, providing the count of tasks created during the weekend.
Real-World Applications
This query can be used in a project management system to track the number of tasks created during weekends, which can provide insights into project progress and potential workload issues. For example, if the number of weekend tasks is consistently high, it may indicate that the project timeline is too ambitious or that the team is understaffed.
Users With Two Purchases Within Seven Days
Problem Statement
Find users who have made two or more purchases within a seven-day period.
SQL Solution
SELECT DISTINCT user_id
FROM purchases
WHERE purchase_date >= DATE('now', '-7 days')
GROUP BY user_id
HAVING COUNT(*) >= 2;
Explanation
The
WHERE
clause filters out purchases made within the last seven days.The
GROUP BY
clause groups the purchases by user ID.The
HAVING
clause checks if each user has made at least two purchases.
Breakdown
DISTINCT
: Ensures that each user is only counted once.purchase_date >= DATE('now', '-7 days')
: Selects purchases made within the last seven days.COUNT(*)
: Counts the number of purchases for each user.>= 2
: Filters out users who have made less than two purchases.
Real-World Applications
Identifying active users for targeted marketing campaigns.
Analyzing customer buying behavior to improve sales strategies.
Detecting fraudulent purchases by identifying users who make multiple purchases in a short period.
Daily Leads and Partners
Problem Statement:
Given a table called leads
that contains information about leads, including their lead_id
and partner_id
. The table also includes a column called daily_leads
, which represents the number of leads generated by each partner on a specific day.
Write a SQL query to find the total number of daily leads generated by all partners on a specific day.
Breakdown:
Leads: A lead is a potential customer who has shown interest in a product or service.
Partner: A partner is a company or individual who collaborates with the business to generate leads.
Daily Leads: The number of leads generated by a partner on a specific day.
SQL Query:
SELECT SUM(daily_leads) AS total_daily_leads
FROM leads
WHERE date = '2023-03-08';
Explanation:
The query first filters the leads
table to only include rows where the date
column matches the specified date, which is '2023-03-08' in this example.
Then, it calculates the sum of the daily_leads
column for all the filtered rows. This gives us the total number of daily leads generated by all partners on the specified date.
Real-World Applications:
This query can be used in a variety of real-world applications, such as:
Tracking Lead Generation Performance: Businesses can use this query to monitor the performance of their different partners in generating leads. By comparing the total daily leads generated by each partner, they can identify which partners are most effective and invest resources accordingly.
Optimizing Marketing Campaigns: Businesses can use this query to identify which days are most effective for lead generation. By analyzing the daily lead counts for different days of the week or month, they can plan their marketing campaigns accordingly.
Forecasting Lead Generation: Businesses can use this query to forecast future lead generation based on historical data. By analyzing the total daily leads generated over time, they can identify trends and predict future lead volume.
Top Percentile Fraud
Problem: Top Percentile Fraud
SQL Query:
WITH RankedTransactions AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_amount DESC) AS transaction_rank
FROM Transactions
),
TopPercentile AS (
SELECT
user_id
FROM RankedTransactions
WHERE
transaction_rank <= CEILING(0.99 * COUNT(user_id) OVER ())
)
SELECT
user_id
FROM TopPercentile;
Explanation:
Create a Ranked Transactions Table:
This subquery assigns a rank to each transaction for each user, in descending order of transaction amounts.
Calculate the 99th Percentile:
This subquery uses the
CEILING
function to determine the 99th percentile rank for each user.
Identify Top Percentile Users:
The main query uses the ranked transactions table and selects users whose transaction rank is within the top 99th percentile.
Steps in Detail:
Window Function (ROW_NUMBER):
The
ROW_NUMBER()
function assigns a unique rank to each row within a specified partition. In this case, the partition is byuser_id
, and the rows are ranked bytransaction_amount
in descending order.
99th Percentile Calculation:
The
CEILING
function rounds up the result of dividing the total number of transactions by 100, multiplied by 99. This calculates the maximum rank that falls within the top 99th percentile.
User Identification:
The main query uses the
TopPercentile
table to identify users whoseuser_id
matches a rank within the top 99th percentile.
Real-World Applications:
Fraud Detection: Identifying users who exhibit unusually high transaction amounts, potentially indicating fraudulent activity.
Customer Segmentation: Classifying customers into different tiers based on their transaction activity, enabling targeted marketing campaigns.
Risk Management: Assessing the risk associated with individual users based on their transaction history.
Page Recommendations II
Problem:
Find all the web pages that are not visited by any users.
SQL Query:
SELECT
p.url
FROM pages AS p
LEFT JOIN visits AS v
ON p.id = v.page_id
WHERE
v.page_id IS NULL;
Breakdown:
SELECT p.url: This line selects the URL of the pages table (
p
).FROM pages AS p: This line specifies that we are selecting from the "pages" table, and we are aliasing it as "p".
LEFT JOIN visits AS v ON p.id = v.page_id: This line performs a LEFT JOIN between the "pages" table and the "visits" table (
v
) based on theid
column of the "pages" table and thepage_id
column of the "visits" table. A LEFT JOIN will return all rows from the left table (in this case, "pages"), even if there are no matching rows in the right table ("visits").WHERE v.page_id IS NULL: This line filters the results to only include rows where the
page_id
column in the "visits" table is NULL. This means that these are pages that have not been visited by any users.
Real World Example:
This query can be used by website administrators to identify web pages that are not getting any traffic. This information can be used to make decisions about which pages to remove or update.
Potential Applications:
Identifying underperforming web pages for SEO optimization.
Removing unused pages to improve website performance.
Finding orphaned pages that can be removed to avoid security vulnerabilities.
Project Employees II
Problem:
You are given a database table Projects
with the following columns:
project_id
(int)project_name
(string)num_employees
(int)
And another table Employees
with the following columns:
employee_id
(int)project_id
(int)employee_name
(string)
You need to write a SQL query to find all projects that have more employees than the average number of employees across all projects.
Solution:
SELECT
Projects.project_id,
Projects.project_name,
Projects.num_employees
FROM Projects
JOIN (
SELECT
AVG(num_employees) AS avg_num_employees
FROM Projects
) AS AverageEmployees
ON Projects.num_employees > AverageEmployees.avg_num_employees;
Breakdown:
Calculate the average number of employees: We calculate the average number of employees across all projects using a subquery:
SELECT AVG(num_employees) AS avg_num_employees FROM Projects
Join the Projects table with the AverageEmployees subquery: We join the
Projects
table with theAverageEmployees
subquery to compare the number of employees in each project with the average number of employees. TheON
clause specifies that we only want to include projects where the number of employees is greater than the average.JOIN ( SELECT AVG(num_employees) AS avg_num_employees FROM Projects ) AS AverageEmployees ON Projects.num_employees > AverageEmployees.avg_num_employees
Select the desired columns: We select the
project_id
,project_name
, andnum_employees
columns from theProjects
table for the resulting rows.SELECT Projects.project_id, Projects.project_name, Projects.num_employees
Real-World Applications:
This query can be used in various real-world scenarios, such as:
Identifying projects that are overstaffed or understaffed.
Planning staffing levels for new or ongoing projects.
Analyzing the efficiency of project teams based on the number of employees assigned.
Sales Person
Problem Statement
Given a table containing sales data, implement a query to find the top salespersons for each month.
SQL Implementation
WITH MonthlySales AS (
SELECT salesperson_id, MONTH(sale_date) AS sale_month, SUM(sales_amount) AS total_sales
FROM sales_data
GROUP BY salesperson_id, sale_month
)
SELECT salesperson_id, sale_month, total_sales
FROM MonthlySales
ORDER BY sale_month, total_sales DESC
LIMIT 10;
Breakdown and Explanation
MonthlySales View:
This view calculates the monthly sales for each salesperson.
MONTH(sale_date)
extracts the month part from thesale_date
column.SUM(sales_amount)
calculates the total sales for each salesperson-month combination.
Main Query:
The main query retrieves the top 10 salespersons for each month.
It orders the results by
sale_month
andtotal_sales
in descending order.LIMIT 10
shows only the top 10 results.
Real-World Applications
Tracking sales performance over time.
Identifying underperforming and overperforming salespersons.
Making informed decisions on sales strategies and incentives.
Example
Consider the following sales data:
1
2023-01-01
100
2
2023-01-05
150
1
2023-02-10
200
3
2023-02-15
250
1
2023-02-20
300
The result of the query would be:
1
01
250
2
01
150
3
02
250
1
02
500
Find the Team Size
Problem: Find the Team Size
SQL Query:
WITH TeamSizes AS (
SELECT team_id, COUNT(*) AS team_size
FROM team_members
GROUP BY team_id
)
SELECT team_id, team_size
FROM TeamSizes
ORDER BY team_size DESC;
Breakdown and Explanation:
Common Table Expression (CTE): TeamSizes
This CTE calculates the team size for each team.
It uses the
COUNT()
function to count the number of rows for eachteam_id
in theteam_members
table.The result is stored in the new
TeamSizes
table, with columnsteam_id
andteam_size
.
Main Query
The main query selects
team_id
andteam_size
from theTeamSizes
CTE.It sorts the results in descending order of
team_size
, displaying the teams with the largest sizes first.
Real-World Example:
This query can be used to find the teams with the largest number of members in a company or organization. It can help with:
Identifying teams that need more resources or support.
Optimizing team structure for efficiency and collaboration.
Tracking team growth over time.
Code Implementation:
-- Create the `team_members` table
CREATE TABLE team_members (
member_id INT NOT NULL,
team_id INT NOT NULL
);
-- Insert data into the `team_members` table
INSERT INTO team_members (member_id, team_id) VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 3),
(7, 4);
-- Execute the query to find the team sizes
SELECT team_id, team_size
FROM TeamSizes
ORDER BY team_size DESC;
Output:
team_id | team_size
------- | --------
2 | 2
3 | 2
1 | 2
4 | 1
Capital Gain/Loss
Problem:
Capital Gain/Loss
Given a table Transactions
with the following columns:
transaction_id
(primary key)stock_symbol
quantity
price_per_share
transaction_date
Calculate the capital gain or loss for each transaction.
Solution:
SELECT
transaction_id,
stock_symbol,
quantity,
price_per_share,
transaction_date,
(
(price_per_share - previous_price_per_share) * quantity
) AS capital_gain_loss
FROM (
SELECT
t.transaction_id,
t.stock_symbol,
t.quantity,
t.price_per_share,
t.transaction_date,
(
SELECT
price_per_share
FROM Transactions
WHERE
stock_symbol = t.stock_symbol AND transaction_date < t.transaction_date
ORDER BY
transaction_date DESC
LIMIT 1
) AS previous_price_per_share
FROM Transactions AS t
) AS derived_table;
Explanation:
The outer query selects all columns from the subquery, including the calculated
capital_gain_loss
column.The subquery calculates the
previous_price_per_share
for each transaction using a correlated subquery. This subquery finds the most recent transaction for the same stock symbol before the current transaction date and returns its price per share.The
capital_gain_loss
column is then calculated by multiplying the quantity by the difference between the current price per share and the previous price per share.
Example:
| transaction_id | stock_symbol | quantity | price_per_share | transaction_date | capital_gain_loss |
|---|---|---|---|---|---|
| 1 | AAPL | 100 | 100.00 | 2023-01-01 | NULL |
| 2 | AAPL | 50 | 120.00 | 2023-01-02 | 1000.00 |
| 3 | MSFT | 200 | 50.00 | 2023-01-03 | NULL |
| 4 | MSFT | 150 | 45.00 | 2023-01-04 | -750.00 |
Explanation:
Transaction 1 does not have a previous transaction, so its capital gain/loss is NULL.
Transaction 2 has a previous transaction (Transaction 1) with a price per share of 100.00. The capital gain/loss is (120.00 - 100.00) * 50 = 1000.00.
Transaction 3 does not have a previous transaction, so its capital gain/loss is NULL.
Transaction 4 has a previous transaction (Transaction 3) with a price per share of 50.00. The capital gain/loss is (45.00 - 50.00) * 150 = -750.00.
Potential Applications:
Calculating capital gains and losses is important for tax reporting purposes. This query can be used to create a report that shows the capital gains and losses for a given period of time. This information can then be used to determine the amount of taxes that need to be paid.
Market Analysis III
LeetCode Problem: Market Analysis III
SQL Query:
SELECT Market, SUM(Revenue) AS TotalRevenue
FROM RevenueTable
WHERE Market IN (
SELECT Market
FROM RevenueTable
GROUP BY Market
HAVING SUM(Revenue) > (
SELECT SUM(Revenue)
FROM RevenueTable
WHERE Market = 'US'
)
)
GROUP BY Market
ORDER BY TotalRevenue DESC;
Explanation:
1. Subquery:
The subquery
(SELECT Market FROM RevenueTable GROUP BY Market HAVING SUM(Revenue) > (SELECT SUM(Revenue) FROM RevenueTable WHERE Market = 'US'))
identifies all markets with total revenue greater than the total revenue in the US market.
2. Main Query:
The main query selects the markets from the subquery and then groups them by market to calculate the total revenue for each market.
The results are sorted in descending order of total revenue.
Example:
Consider the following RevenueTable
:
US
100,000
UK
75,000
Canada
60,000
France
55,000
Germany
70,000
Results:
Market TotalRevenue
UK 75,000
Germany 70,000
Real-World Applications:
Market Analysis: Identifying markets with high revenue potential and comparing them to a benchmark market (e.g., US market).
Sales Optimization: Prioritizing sales efforts on markets with the highest revenue growth.
Competitive Analysis: Monitoring market share and identifying competitive threats from other markets.
Monthly Transactions I
Topic or Step 1: Understanding the Question
Question: Find the total number of transactions for each month.
Breakdown:
Transaction: An activity involving the exchange of money or goods and services for a particular sum of money.
Month: A period of about 30 or 31 days.
To answer the question, you need to count the number of transactions for each month.
Topic or Step 2: SQL Solution
SELECT
strftime('%Y-%m', Date) AS Month, -- Extract the year and month from the Date column
COUNT(*) AS TotalTransactions -- Count the number of rows for each unique Month
FROM
Transactions -- The table containing the transaction records
GROUP BY
Month -- Group the transactions by Month
ORDER BY
Month; -- Order the results by Month
Breakdown:
The
strftime()
function extracts the year and month from theDate
column, formatting it as 'YYYY-MM'.The
COUNT(*)
function counts the number of rows for each uniqueMonth
.The
GROUP BY
clause groups the transactions byMonth
to count the transactions separately for each month.The
ORDER BY
clause orders the results byMonth
in ascending order.
Simplified Explanation:
Imagine a table with a list of transactions, each having a Date column. To get the total transactions for each month, we first extract the year and month from the Date column. Then, we count the number of transactions for each unique year-month combination. Finally, we sort the results by month to make them easy to read.
Topic or Step 3: Real-World Applications
This query can be used in many real-world applications, such as:
Financial analysis: Tracking the number of transactions per month to monitor business trends and identify any seasonal patterns.
Customer relationship management: Identifying months with the highest transaction volumes to target customers with personalized offers.
Fraud detection: Analyzing transaction patterns to detect suspicious activity or identify potential fraud.
Find Cumulative Salary of an Employee
Problem Statement:
Given an employee table with columns (emp_id
, salary
, start_date
, and end_date
), find the cumulative salary of each employee for the specified date range.
Optimal Solution:
SELECT emp_id,
SUM(salary) OVER (PARTITION BY emp_id ORDER BY start_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM employee
WHERE start_date <= @end_date AND end_date >= @start_date;
Explanation:
Window Function: The
OVER
clause creates a window of rows for each employee, starting from the specified start date and going up to the current row.SUM Aggregate Function: Within each window, the
SUM
aggregate function calculates the cumulative salary by summing up the salary values.PARTITION BY Clause: The
PARTITION BY
clause groups the rows by employee ID, ensuring that the cumulative salary is calculated separately for each employee.ORDER BY Clause: The
ORDER BY
clause sorts the rows by start date in ascending order. This order is necessary for the window function to correctly calculate the cumulative salary.ROWS BETWEEN Syntax: The
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
specifies that the window extends infinitely in the past (i.e., from the start of the employee's employment) up to the current row.
Real-World Applications:
This query can be used in HR or payroll systems to generate reports on employee salaries, bonuses, or other compensation-related information. For example, it can provide insights into an employee's salary progression over time or help determine total compensation for a given period.
All the Matches of the League
Problem: Find all the matches of a league.
Solution:
SELECT
*
FROM MATCHES
WHERE
LEAGUE_ID = ?;
Explanation:
The query is straightforward. It selects all the columns from the MATCHES table where the LEAGUE_ID is equal to the provided parameter.
Real World Applications:
This query can be used to find all the matches of a particular league, such as the Premier League or the Bundesliga. It can also be used to find all the matches that have been played in a particular stadium or city.
Example:
The following query finds all the matches that have been played in the Premier League:
SELECT
*
FROM MATCHES
WHERE
LEAGUE_ID = 1;
Sort the Olympic Table
Problem Statement:
You are given an Olympic table with columns:
Country (string)
: Name of the countryGold (integer)
: Number of gold medals wonSilver (integer)
: Number of silver medals wonBronze (integer)
: Number of bronze medals wonTotal (integer)
: Total number of medals won (Gold + Silver + Bronze)
Sort the Olympic table to show countries with the most medals (Total) first. If two or more countries have the same number of medals, sort them by the most gold medals won, then by the most silver medals won, and finally by the most bronze medals won.
Best and Performant SQL Solution:
SELECT Country,
Gold,
Silver,
Bronze,
Total
FROM OlympicTable
ORDER BY Total DESC,
Gold DESC,
Silver DESC,
Bronze DESC;
Breakdown and Explanation:
The query uses the ORDER BY
clause to sort the results by multiple columns:
Total DESC
: Sort the countries in descending order by the total number of medals won.Gold DESC
: If two or more countries have the same total medals, sort them in descending order by the number of gold medals won.Silver DESC
: If two or more countries have the same total gold medals, sort them in descending order by the number of silver medals won.Bronze DESC
: If two or more countries have the same total silver medals, sort them in descending order by the number of bronze medals won.
Real-World Applications:
This query can be used to display the rankings of countries in the Olympics or any other sporting event where medals are awarded. It can also be used for data analysis to identify the countries with the most successful Olympic programs or the most medals won in a particular sport.
All the Pairs With the Maximum Number of Common Followers
Problem Description:
You are given a table followers
that records pairs of accounts that follow each other. You need to find all the pairs of accounts that have the maximum number of common followers.
Example:
Consider the following table:
followers (follower_id, followee_id)
+------------+-------------+
| follower_id | followee_id |
+------------+-------------+
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 3 |
| 2 | 4 |
| 3 | 4 |
+------------+-------------+
In this table, the pair (1, 4)
has the maximum number of common followers (2).
Solution:
The solution to this problem involves the following steps:
Find the number of followers for each account. This can be done using the following query:
SELECT follower_id, followee_id, COUNT(*) AS follower_count FROM followers GROUP BY follower_id, followee_id;
This query returns a table with the following columns:
follower_id
: The ID of the follower.followee_id
: The ID of the followee.follower_count
: The number of followers that the follower has.
Find the maximum number of followers. This can be done using the following query:
SELECT MAX(follower_count) AS max_follower_count FROM followers GROUP BY follower_id, followee_id;
This query returns a table with a single column called
max_follower_count
that contains the maximum number of followers that any follower has.Find all the pairs of accounts that have the maximum number of common followers. This can be done using the following query:
SELECT follower_id, followee_id FROM followers WHERE follower_count = ( SELECT MAX(follower_count) FROM followers GROUP BY follower_id, followee_id );
This query returns a table with the following columns:
follower_id
: The ID of the follower.followee_id
: The ID of the followee.
Real-World Application:
This problem can be used to find the most influential users on a social media platform. By finding the pairs of users that have the maximum number of common followers, we can identify the users who are most likely to reach a large audience. This information can be used to target marketing campaigns and to develop strategies for growing the user base.
Investments in 2016
Problem:
Given a table of investments made in 2016, find the total amount invested in each country.
Table:
CREATE TABLE investments (
id INT AUTO_INCREMENT,
country VARCHAR(255),
amount INT
);
INSERT INTO investments (country, amount) VALUES
('USA', 1000),
('UK', 500),
('France', 700),
('Germany', 300),
('Spain', 200);
Best & Performant Solution:
SELECT country, SUM(amount) AS total_investment
FROM investments
GROUP BY country;
Explanation:
SELECT country, SUM(amount) AS total_investment
: This part of the query selects thecountry
column and calculates the sum of theamount
column for each unique value in thecountry
column. The result is a new column namedtotal_investment
which contains the total investment amount for each country.FROM investments
: This specifies the table from which the data should be retrieved.GROUP BY country
: This part of the query groups the results by thecountry
column. This means that for each unique value in thecountry
column, theSUM(amount)
will be calculated separately.
Real-World Example:
This query can be useful for analyzing investment data and identifying the countries with the highest investment levels. This information can be used to make investment decisions or to understand investment trends.
Potential Applications:
Investment analysis
Financial planning
Economic development
Fix Names in a Table
Problem: There is a table called "Names" with the following schema:
CREATE TABLE Names (
id INT NOT NULL AUTO_INCREMENT,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
PRIMARY KEY (id)
);
Some of the first names and last names in the table are misspelled. Fix these misspellings by updating the table.
Solution:
-- Update first names
UPDATE Names
SET first_name = CASE
WHEN first_name = 'Johen' THEN 'Johan'
WHEN first_name = 'Kris' THEN 'Chris'
WHEN first_name = 'Mihal' THEN 'Michael'
ELSE first_name
END;
-- Update last names
UPDATE Names
SET last_name = CASE
WHEN last_name = 'Smuth' THEN 'Smith'
WHEN last_name = 'Jonhson' THEN 'Johnson'
WHEN last_name = 'Davids' THEN 'Davis'
ELSE last_name
END;
Explanation:
The first UPDATE statement uses a CASE expression to check for misspelled first names. If a first name matches one of the misspellings, it is updated to the correct spelling. Otherwise, the first name is left unchanged.
The second UPDATE statement does the same for last names.
Real World Application:
This technique can be used to fix misspellings in any table, regardless of the column names or data types. It is particularly useful for tables that contain a large number of records and it is not feasible to manually correct the misspellings.
League Statistics
Problem Statement:
Given a table LeagueStatistics
with columns:
rank
: Rank of the team (1 for the top team)team_name
: Unique name of the teamscore
: Score of the team in the season
You need to find the average score of the top 3 teams in the league.
Solution:
-- Find the average score of the top 3 teams in the league
SELECT AVG(score)
FROM LeagueStatistics
WHERE rank <= 3;
Explanation:
We first filter the
LeagueStatistics
table to select only the top 3 teams by using the conditionrank <= 3
.Then, we calculate the average score of the selected teams using the
AVG()
function.
Example:
-- Input
CREATE TABLE LeagueStatistics (
rank INT,
team_name VARCHAR(255),
score INT
);
INSERT INTO LeagueStatistics (rank, team_name, score) VALUES
(1, 'Liverpool', 95),
(2, 'Manchester City', 92),
(3, 'Chelsea', 88),
(4, 'Tottenham', 78),
(5, 'Arsenal', 75);
-- Query
SELECT AVG(score)
FROM LeagueStatistics
WHERE rank <= 3;
-- Output
88.33333333333333
Real-World Applications:
This query can be useful for:
Determining the average performance of the top teams in a sports league
Analyzing team performance and identifying areas for improvement
Comparing the strength of different leagues or divisions
Providing insights for sports analysts and fans
Last Person to Fit in the Bus
Problem Statement:
Given a table of bus stops trips
with columns stop_id
, stop_sequence
, and trip_id
, find the last person who boarded a bus on a particular trip.
Example:
| stop_id | stop_sequence | trip_id |
|---------|---------------|---------|
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 3 | 1 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
| 6 | 2 | 2 |
| 7 | 3 | 2 |
For trip_id 1, the last person boarded at stop_id 4. For trip_id 2, the last person boarded at stop_id 7.
Best & Performant SQL Solution:
WITH LastBoarding AS (
SELECT stop_id, trip_id, MAX(stop_sequence) AS last_sequence
FROM trips
GROUP BY stop_id, trip_id
)
SELECT stop_id
FROM trips
WHERE stop_id IN (SELECT stop_id FROM LastBoarding)
AND stop_sequence = (SELECT last_sequence FROM LastBoarding WHERE stop_id = trips.stop_id AND trip_id = trips.trip_id);
Explanation:
Create a Subquery (LastBoarding): This subquery finds the stop_id and last_sequence for each unique combination of stop_id and trip_id. It groups the rows by stop_id and trip_id, and for each group, it calculates the maximum stop_sequence, representing the last boarding stop.
Filter trips Table: The main query then filters the trips table to only include rows where the stop_id is present in the subquery LastBoarding. This ensures that only the stops where people boarded the bus are considered.
Find Last Boarding Stop: Finally, the main query further filters the trips table to only include rows where the stop_sequence matches the last_sequence for the corresponding stop_id and trip_id pair from the LastBoarding subquery. This identifies the rows representing the last boarding stop for each trip.
Real-World Applications:
This query can be useful in the following scenarios:
Tracking Passenger Flow: Determining the last stop where passengers boarded a bus helps transportation authorities analyze passenger flow patterns and optimize bus routes.
Security Monitoring: Identifying the last boarding stop can assist in security investigations and tracing the movement of individuals.
Passenger Assistance: It can help bus drivers identify the stops where special assistance is needed, such as wheelchair accessibility or language translation.
Number of Comments per Post
Leet Code Problem:
Number of Comments per Post
Problem Statement:
Given a table called Posts
with columns id
, title
, and num_comments
, and a table called Comments
with columns id
, post_id
, and content
, find the number of comments for each post in the Posts
table.
SQL Solution:
SELECT
p.id AS post_id,
COUNT(c.id) AS num_comments
FROM Posts AS p
LEFT JOIN Comments AS c
ON p.id = c.post_id
GROUP BY
p.id;
Breakdown and Explanation:
SELECT Column List: The
SELECT
clause specifies the columns we want to include in the result:p.id AS post_id
: The ID of the post.COUNT(c.id) AS num_comments
: The number of comments for the post.
FROM Clause: The
FROM
clause specifies the table(s) we're querying from:Posts AS p
: ThePosts
table aliased asp
.
JOIN Clause: The
LEFT JOIN
clause joins thePosts
table with theComments
table on thepost_id
column:LEFT JOIN Comments AS c ON p.id = c.post_id
: This ensures that we include all posts, even if they don't have any comments.
GROUP BY Clause: The
GROUP BY
clause groups the results by thepost_id
column:GROUP BY p.id
: This ensures that the number of comments is calculated for each unique post.
Real-World Application:
This query could be used in a social media or blog application to display the number of comments for each post. This information can be useful for users to quickly see how engaged a post is.
Customers With Strictly Increasing Purchases
Problem Statement: Find the customers who have made a strictly increasing sequence of purchases. A strictly increasing sequence means that each purchase amount is strictly greater than the previous purchase amount.
Optimal Solution in SQL:
WITH CustomerPurchases AS (
SELECT
customer_id,
purchase_amount,
RANK() OVER (PARTITION BY customer_id ORDER BY purchase_date) AS purchase_rank
FROM Purchases
), CustomerPurchaseLag AS (
SELECT
customer_id,
purchase_amount,
purchase_rank,
LAG(purchase_amount, 1, NULL) OVER (PARTITION BY customer_id ORDER BY purchase_date) AS previous_purchase_amount
FROM CustomerPurchases
)
SELECT
CustomerPurchaseLag.customer_id
FROM CustomerPurchaseLag
WHERE
CustomerPurchaseLag.purchase_amount > CustomerPurchaseLag.previous_purchase_amount;
Explanation:
CustomerPurchases: This CTE (Common Table Expression) creates a ranking for each purchase made by a customer, based on the purchase date. The
RANK()
function assigns a unique rank to each purchase within each customer's purchase history.CustomerPurchaseLag: This CTE creates another CTE that adds a column called
previous_purchase_amount
. This column stores the purchase amount of the previous purchase made by the customer. It uses theLAG()
function to retrieve the previous purchase amount for each row.Final Query: The main query selects the unique customer IDs from the
CustomerPurchaseLag
CTE where the current purchase amount is greater than the previous purchase amount. This identifies customers who have made a strictly increasing sequence of purchases.
Example:
Given the following Purchases
table:
| customer_id | purchase_amount | purchase_date |
|-------------|-----------------|---------------|
| 1 | 100 | 2023-01-01 |
| 1 | 150 | 2023-01-02 |
| 1 | 180 | 2023-01-03 |
| 1 | 200 | 2023-01-04 |
| 2 | 200 | 2023-02-01 |
| 2 | 150 | 2023-02-02 |
The output of the query would be:
| customer_id |
|-------------|
| 1 |
Customer 1 has made a strictly increasing sequence of purchases, while Customer 2 has not (their February 2nd purchase was lower than their February 1st purchase).
Real-World Applications:
This query can be used in loyalty programs to identify customers who have shown a consistent increase in their spending. By understanding these valuable customers, businesses can tailor their marketing and reward programs to encourage them to make even more purchases.
Product Sales Analysis III
Best & Performant SQL Solution for LeetCode Product Sales Analysis III
Problem Statement:
Given a database table Sales
with columns product_id
, date
, units_sold
, and revenue
, find the top 5 products that generated the highest revenue in a given month.
SQL Solution:
-- Extract the top 5 products with highest revenue in a given month
WITH MonthlyRevenue AS (
SELECT
product_id,
SUM(revenue) AS total_revenue,
EXTRACT(MONTH FROM date) AS sales_month
FROM
Sales
WHERE
EXTRACT(MONTH FROM date) = 'Month of interest'
GROUP BY
product_id
),
RankedProducts AS (
SELECT
product_id,
total_revenue,
RANK() OVER (ORDER BY total_revenue DESC) AS rank
FROM
MonthlyRevenue
)
SELECT
product_id,
total_revenue
FROM
RankedProducts
WHERE
rank <= 5;
Explanation:
Create a common table expression (CTE) called
MonthlyRevenue
:Calculate the total revenue for each product in the given month.
Create another CTE called
RankedProducts
:Rank the products based on their total revenue in descending order.
Select the
product_id
andtotal_revenue
fromRankedProducts
:Filter for products ranked within the top 5.
Example Data and Output:
Sales Table:
1
2023-03-01
10
$100
2
2023-03-05
15
$150
1
2023-03-10
5
$50
3
2023-03-15
20
$200
2
2023-03-20
10
$100
4
2023-03-25
15
$150
3
2023-03-30
10
$100
Result for Month '03':
3
$300
1
$150
2
$250
Potential Applications:
Identifying top-selling products for inventory planning and sales forecasting.
Analyzing product performance and identifying areas for improvement.
Tracking revenue trends and comparing sales performance across different products and time periods.
Ads Performance
LeetCode Problem: Ads Performance
Problem Statement:
Given an AdsPerformance table containing data about ad campaigns, calculate the following metrics for each ad: date, impressions, clicks, revenue, and cost.
CREATE TABLE AdsPerformance (
date DATE NOT NULL,
impressions INT NOT NULL,
clicks INT NOT NULL,
revenue DECIMAL(10, 2) NOT NULL,
cost DECIMAL(10, 2) NOT NULL
);
Best & Performant SQL Solution:
WITH AdMetrics AS (
SELECT
date,
SUM(impressions) AS total_impressions,
SUM(clicks) AS total_clicks,
SUM(revenue) AS total_revenue,
SUM(cost) AS total_cost
FROM AdsPerformance
GROUP BY date
)
SELECT
date,
total_impressions,
total_clicks,
total_revenue,
total_cost
FROM AdMetrics
ORDER BY date;
Implementation and Explanation:
Create a Common Table Expression (CTE) called
AdMetrics
:This CTE calculates the sum of
impressions
,clicks
,revenue
, andcost
for each uniquedate
.The
GROUP BY date
clause ensures that the results are aggregated by date.
Select the columns from the
AdMetrics
CTE:The final query selects the
date
,total_impressions
,total_clicks
,total_revenue
, andtotal_cost
columns from theAdMetrics
CTE.The
ORDER BY date
clause sorts the results by date in ascending order.
Real World Application:
Tracking Ad Campaign Performance:
This query can be used by marketing teams to track the performance of ad campaigns over time. By analyzing the metrics for each ad, such as impressions, clicks, revenue, and cost, they can identify which campaigns are most effective and make data-driven decisions about future ad spending.
Department Top Three Salaries
Problem Statement:
Find the top three salaries for each department in a company.
SQL Query:
SELECT department_id, MAX(salary) AS top_salary
FROM (
SELECT department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM employee
) AS subquery
WHERE row_num <= 3
GROUP BY department_id;
Breakdown and Explanation:
1. Subquery:
The subquery creates a new table that includes the department ID (
department_id
), salary, and theROW_NUMBER()
function applied within each department partition.ROW_NUMBER()
assigns a unique number to each row within each department, starting from 1 and incrementing for each subsequent row.The result of this subquery is as follows:
1
10000
1
1
9000
2
1
8000
3
2
15000
1
2
14000
2
2
13000
3
2. WHERE Clause:
The
WHERE
clause filters the subquery to select only rows whererow_num
is less than or equal to 3. This ensures that only the top three salaries for each department are selected.
3. GROUP BY Clause:
The
GROUP BY
clause groups the results bydepartment_id
to find the maximum (MAX
) salary for each department.
Result:
The final result is a table that contains the department_id
and the top salary (top_salary
) for each department.
Real-World Applications:
Human Resources: Determine the highest-paid employees in each department for performance evaluations or salary negotiations.
Payroll: Calculate bonuses or benefits for employees based on their position within their department.
Management: Identify pay disparities within departments and make adjustments to ensure fairness and equity.
Get Highest Answer Rate Question
Problem Statement
Find the question with the highest answer rate in a database of questions and answers.
SQL Solution
SELECT Question
FROM Questions
WHERE AnswerRate = (SELECT MAX(AnswerRate) FROM Questions);
Breakdown
1. SELECT Question: This selects the Question column from the Questions table.
2. FROM Questions: This specifies the Questions table to select from.
3. WHERE AnswerRate = (SELECT MAX(AnswerRate) FROM Questions):
This subquery finds the maximum AnswerRate in the Questions table.
The outer query then filters the Questions table to find the question with the maximum AnswerRate.
Example
CREATE TABLE Questions (
Question VARCHAR(255) PRIMARY KEY,
AnswerRate FLOAT
);
INSERT INTO Questions (Question, AnswerRate) VALUES
('Question 1', 0.5),
('Question 2', 0.7),
('Question 3', 0.9);
SELECT Question
FROM Questions
WHERE AnswerRate = (SELECT MAX(AnswerRate) FROM Questions);
Output:
Question 3
Real-World Applications
This query can be used to identify the most popular or frequently asked questions in a knowledge base system, online forum, or customer support database. This information can be valuable for:
Improving user experience by prioritizing questions with high answer rates.
Identifying areas where users need more support or assistance.
Targeting marketing campaigns to address specific questions and topics.
Customer Placing the Largest Number of Orders
Customer Placing the Largest Number of Orders
Problem Statement:
Find the customer who has placed the highest number of orders.
SQL Script:
SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC
LIMIT 1;
Explanation:
This SQL script performs the following steps:
SELECT customer_id, COUNT(*) AS order_count: For each unique customer, count the number of orders they have placed. The result is a table with two columns: the
customer_id
and theorder_count
.FROM orders: The data source is the
orders
table, which contains information about all orders.GROUP BY customer_id: Group the rows in the table by the
customer_id
column. This combines all orders for each customer into a single row.ORDER BY order_count DESC: Sort the rows in descending order by the
order_count
. This puts the customers with the highest order counts at the top.LIMIT 1: Limit the result to only the first row. This gives us the customer with the highest order count.
Real-World Application:
This query can be used in various real-world scenarios:
Marketing: Identify the most valuable customers based on their order history.
Customer Service: Prioritize customers with the highest order counts for better support.
Logistics: Plan inventory and shipping based on customer demand.
Fraud Detection: Identify potential fraud by comparing order counts to known customer behavior.
Simplified Example:
Imagine a table of orders:
1
2023-01-01
1
2023-01-05
2
2023-01-03
2
2023-01-10
3
2023-01-07
The SQL script would produce the following result:
1
2
2
2
3
1
Customer 1 and 2 have placed the same number of orders (2). However, since the ORDER BY
clause is in descending order, the result would show Customer 1 as the one with the highest order count.
The Most Recent Orders for Each Product
Problem Statement:
Given a table Orders
with columns order_id
, product_id
, order_date
, find the most recent order for each unique product.
Schema:
CREATE TABLE Orders (
order_id INT NOT NULL,
product_id INT NOT NULL,
order_date DATE NOT NULL,
PRIMARY KEY (order_id)
);
Solution:
SELECT
product_id,
MAX(order_date) AS latest_order_date
FROM
Orders
GROUP BY
product_id;
Breakdown:
SELECT product_id, MAX(order_date) AS latest_order_date: This selects the product ID and the maximum order date for each product.
FROM Orders: This specifies the table to query from.
GROUP BY product_id: This groups the results by product ID, so that we get the maximum order date for each unique product.
Simplified Explanation:
For each unique product, we find the order with the latest order date and show the product ID and the latest order date.
Real World Application:
This query can be used in a variety of real-world scenarios, such as:
Identifying the most recent orders for a given product to track inventory levels.
Finding the latest orders from a particular customer to provide personalized recommendations.
Monitoring order trends and patterns for specific products over time.
Example:
INSERT INTO Orders (order_id, product_id, order_date) VALUES
(1, 1, '2023-01-01'),
(2, 2, '2023-01-02'),
(3, 1, '2023-01-03'),
(4, 2, '2023-01-04'),
(5, 3, '2023-01-05');
SELECT
product_id,
MAX(order_date) AS latest_order_date
FROM
Orders
GROUP BY
product_id;
Output:
product_id latest_order_date
1 2023-01-03
2 2023-01-04
3 2023-01-05
Top Three Wineries
1. Select the Top Three Wineries
Problem Statement:
Given a table of wine reviews, identify the top three wineries with the highest average ratings.
Query:
SELECT winery, AVG(rating) AS avg_rating
FROM reviews
GROUP BY winery
ORDER BY avg_rating DESC
LIMIT 3;
Explanation:
The
SELECT
statement retrieves thewinery
column and calculates the average rating (AVG(rating)
) for each unique winery.The
FROM
clause references thereviews
table, which contains the wine review data.The
GROUP BY
clause groups the data by winery to calculate the average rating for each winery.The
ORDER BY
clause sorts the results by the average rating in descending order.The
LIMIT 3
clause limits the output to the top three wineries with the highest average ratings.
2. Using a Common Table Expression (CTE)
Problem Statement:
Identify the top three wineries with the highest average ratings, but also include the number of reviews for each winery.
Query:
WITH WineryReviews AS (
SELECT winery, AVG(rating) AS avg_rating, COUNT(*) AS review_count
FROM reviews
GROUP BY winery
)
SELECT winery, avg_rating, review_count
FROM WineryReviews
ORDER BY avg_rating DESC
LIMIT 3;
Explanation:
The
WITH
statement creates a Common Table Expression (CTE) namedWineryReviews
, which calculates the average rating and review count for each winery.The
SELECT
statement inside the CTE retrieves thewinery
,AVG(rating)
, andCOUNT(*)
for each unique winery.The
GROUP BY
clause groups the data by winery to calculate these values.The outer
SELECT
statement selects thewinery
,avg_rating
, andreview_count
columns from theWineryReviews
CTE.The
ORDER BY
clause sorts the results by the average rating in descending order.The
LIMIT 3
clause limits the output to the top three wineries with the highest average ratings.
3. Using a Subquery
Problem Statement:
Find the top three wineries with the highest average ratings, but only include wineries that have at least 10 reviews.
Query:
SELECT winery, AVG(rating) AS avg_rating
FROM reviews
WHERE winery IN (
SELECT winery
FROM reviews
GROUP BY winery
HAVING COUNT(*) >= 10
)
GROUP BY winery
ORDER BY avg_rating DESC
LIMIT 3;
Explanation:
The inner subquery calculates the number of reviews for each unique winery and selects wineries that have at least 10 reviews.
The
WHERE
clause in the main query checks if thewinery
is present in the subquery, ensuring that only wineries with at least 10 reviews are included.The
GROUP BY
clause groups the data by winery to calculate the average rating.The
ORDER BY
clause sorts the results by the average rating in descending order.The
LIMIT 3
clause limits the output to the top three wineries with the highest average ratings.
Real-World Applications:
These queries can be used in various real-world scenarios, such as:
Identifying wineries to feature in a restaurant's wine list based on their ratings and popularity.
Marketing campaigns to target wineries with high average ratings and many reviews.
Identifying potential investment opportunities in the wine industry based on winery performance.
Viewers Turned Streamers
LeetCode Problem: Viewers Turned Streamers
SQL Query:
SELECT *
FROM Streamers AS s
WHERE s.streamer_id IN (
SELECT viewer_id
FROM Viewers
WHERE viewer_id NOT IN (
SELECT streamer_id
FROM Streamers
)
);
Breakdown and Explanation:
1. Subquery to Identify Viewers Not Streaming:
SELECT viewer_id
FROM Viewers
WHERE viewer_id NOT IN (
SELECT streamer_id
FROM Streamers
);
This subquery selects the viewer IDs of those who are not currently streaming.
It does this by removing
streamer_id
values from theStreamers
table from the list ofviewer_id
values in theViewers
table.
2. Main Query to Select Viewers Turned Streamers:
SELECT *
FROM Streamers AS s
WHERE s.streamer_id IN (
SELECT viewer_id
FROM Viewers
WHERE viewer_id NOT IN (
SELECT streamer_id
FROM Streamers
)
);
The main query uses the subquery to identify the
viewer_id
values of viewers who are not currently streaming.It then selects all the columns from the
Streamers
table where thestreamer_id
matches one of these viewer IDs.The result is a table of streamers who were once viewers but have now become streamers themselves.
Real-World Applications:
Monitoring viewer growth: By tracking which viewers have become streamers, streaming platforms can monitor their viewer base growth and identify potential new talent.
Content strategy: Understanding which viewers turn into streamers can help platforms refine their content strategy to cater to those who aspire to become creators.
Community building: Identifying viewers who have become streamers can facilitate community building by connecting them with other streamers and viewers.
Grand Slam Titles
Problem: Find all distinct tennis players who have won at least one Grand Slam tournament in both men's and women's categories.
SQL Query:
SELECT DISTINCT Name
FROM Player
WHERE
(Men_GrandSlams > 0 AND Women_GrandSlams > 0)
OR (Men_GrandSlams IS NULL AND Women_GrandSlams > 0)
OR (Men_GrandSlams > 0 AND Women_GrandSlams IS NULL);
Explanation:
Step 1: Identify Players with Grand Slam Wins in Both Categories
Men_GrandSlams > 0 AND Women_GrandSlams > 0
This condition checks for players who have won at least one Grand Slam in both men's and women's categories.
Step 2: Handle Null Values Null values represent missing data. We need to consider cases where a player has won Grand Slams in only one category.
(Men_GrandSlams IS NULL AND Women_GrandSlams > 0)
This condition identifies players who have won at least one Grand Slam in the women's category but may not have won any in the men's category.
(Men_GrandSlams > 0 AND Women_GrandSlams IS NULL)
Similarly, this condition identifies players who have won at least one Grand Slam in the men's category but may not have won any in the women's category.
Step 3: Combine Conditions We use the OR operator to combine the three conditions, ensuring that we capture all distinct players who meet any of these criteria.
Real-World Applications: This query can be used in various applications, such as:
Identifying legendary tennis players who have achieved success in both men's and women's competitions.
Analyzing historical Grand Slam data to identify trends and patterns.
Tracking the accomplishments of players and comparing their performances across categories.
Count Salary Categories
Question: Count Salary Categories
SQL Query:
SELECT
CASE
WHEN salary < 10000 THEN 'Low'
WHEN salary BETWEEN 10000 AND 20000 THEN 'Medium'
WHEN salary > 20000 THEN 'High'
ELSE 'Invalid'
END AS salary_category,
COUNT(*) AS salary_count
FROM employees
GROUP BY salary_category;
Explanation:
1. CASE Statement:
The CASE statement evaluates an expression (salary) and returns a different value depending on the result. In this case, it categorizes salaries into three categories:
'< 10000: Low
'Between 10000 and 20000: Medium
'> 20000: High
'Otherwise: Invalid
2. COUNT(*) Function:
The COUNT(*) function counts all rows in the selected group, in this case, grouped by the salary category.
Breakdown of the Query:
Select the CASE statement's result (salary_category) and the count of salary_category (salary_count).
From the 'employees' table.
Group the results by salary_category.
Real-World Applications:
This query can be used in HR systems to:
Analyze salary distribution across categories.
Identify potential salary disparities.
Make informed decisions about salary adjustments or bonuses.
Friendly Movies Streamed Last Month
Problem:
You are given a table Movie
that contains the following columns:
id
(int): The unique identifier of the movie.title
(varchar): The title of the movie.genre
(varchar): The genre of the movie.
You want to find all the movies that were streamed last month that are friendly.
Solution:
SELECT
*
FROM Movie
WHERE
genre = 'Friendly' AND
streaming_date >= DATE('now', '-1 month');
Breakdown:
The
SELECT
statement selects all the columns from theMovie
table.The
WHERE
clause filters the results to only include movies that meet the following criteria:The
genre
column is equal to 'Friendly'.The
streaming_date
column is greater than or equal to the current date minus one month.
Explanation:
This query uses the DATE()
function to subtract one month from the current date. The resulting date is then used to filter the streaming_date
column to only include movies that were streamed within the last month.
Applications:
This query can be used to find all the friendly movies that were streamed last month on a streaming service. This information can be used to recommend movies to users or to track the popularity of friendly movies.
Product Sales Analysis II
Problem Statement
Given a table ProductSales
containing the following columns:
product_id
(primary key)product_name
sales_date
(date)quantity
(number of units sold)sales_price
(price per unit)
Write a SQL query to analyze product sales data and answer the following questions:
Total sales amount for each product
Average sales amount per month
Total sales amount by product category
SQL Solution
-- Calculate total sales amount for each product
SELECT product_id, product_name, SUM(quantity * sales_price) AS total_sales_amount
FROM ProductSales
GROUP BY product_id, product_name;
-- Calculate average sales amount per month
SELECT
SUBSTR(sales_date, 1, 7) AS sales_month, -- Extract year-month from sales_date
AVG(quantity * sales_price) AS avg_sales_amount_per_month
FROM ProductSales
GROUP BY sales_month;
-- Calculate total sales amount by product category
SELECT product_category, SUM(quantity * sales_price) AS total_sales_amount_by_category
FROM ProductSales
JOIN ProductCategories ON ProductSales.product_id = ProductCategories.product_id
GROUP BY product_category;
Breakdown and Explanation
1. Total Sales Amount for Each Product
GROUP BY product_id, product_name
groups the rows by product ID and product name, creating a separate row for each product.SUM(quantity * sales_price)
calculates the total sales amount for each product by multiplying the quantity sold by the sales price and summing the results.
2. Average Sales Amount per Month
SUBSTR(sales_date, 1, 7)
extracts the year-month from thesales_date
column, grouping the rows by month.AVG(quantity * sales_price)
calculates the average sales amount for each month by averaging the total sales amount across all products sold in that month.
3. Total Sales Amount by Product Category
JOIN ProductSales ON ProductSales.product_id = ProductCategories.product_id
joins theProductSales
table with theProductCategories
table to associate each sale with its corresponding product category.GROUP BY product_category
groups the rows by product category, creating a separate row for each category.SUM(quantity * sales_price)
calculates the total sales amount for each product category by summing the total sales amount across all products in that category.
Real-World Applications
This SQL query can be used in various real-world scenarios:
Inventory Management: To identify which products are selling well and which are not, helping companies optimize their inventory levels.
Marketing and Sales Analysis: To understand the performance of different products and categories over time, and to make informed decisions regarding marketing and sales strategies.
Financial Analysis: To calculate the overall sales revenue and profitability of a business, and to identify trends and patterns in sales data.
Tree Node
Problem Statement: Find the maximum depth of a binary tree from a given table.
Table Schema:
id
int
Unique ID of the node
parent_id
int
ID of the parent node
value
int
Value of the node
Example Table:
1
null
2
2
null
4
3
1
6
4
2
8
SQL Query:
WITH RECURSIVE TreeDepth AS (
SELECT id, parent_id, value, 1 AS depth
FROM Tree
WHERE parent_id IS NULL
UNION ALL
SELECT t.id, t.parent_id, t.value, td.depth + 1
FROM Tree t
JOIN TreeDepth td ON t.parent_id = td.id
)
SELECT MAX(depth) AS max_depth
FROM TreeDepth;
Breakdown:
Create a recursive CTE (Common Table Expression):
We create a CTE called
TreeDepth
that calculates the depth of each node in the tree.The base case is when the parent ID is null (root node). We assign a depth of 1 to these nodes.
The recursive part fetches child nodes of each parent node and increments the depth by 1.
Select the maximum depth:
After calculating the depth of all nodes, we select the maximum value from the
depth
column to get the maximum depth of the tree.
Example Output:
max_depth
3
Real-World Applications:
File Systems: Determining the maximum nesting level of folders in a hierarchical file system.
Organizational Structures: Finding the maximum reporting level in a company's organizational hierarchy.
Data Mining: Identifying patterns and trends in hierarchical data structures.
The Winner University
LeetCode Problem: Winner University
SQL Solution:
WITH UniversityWins AS (
SELECT University, SUM(Score) AS TotalScore
FROM Wins
GROUP BY University
)
SELECT University
FROM UniversityWins
WHERE TotalScore = (
SELECT MAX(TotalScore)
FROM UniversityWins
)
Breakdown:
UniversityWins Subquery:
Calculates the total score for each university by grouping the
Wins
table byUniversity
and summing theScore
column.Result: A table with two columns:
University
andTotalScore
.
Main Query:
Selects the university with the maximum total score from the
UniversityWins
subquery.Result: A table with one column:
University
.
Explanation:
The subquery
UniversityWins
is used to calculate the total score for each university. This is done by grouping the rows in theWins
table byUniversity
and then summing theScore
column.The main query then selects the university with the maximum total score from the
UniversityWins
subquery. This is done using theMAX()
aggregate function to find the maximum value of theTotalScore
column.
Example:
Consider the following Wins
table:
A
10
B
5
C
15
A
20
UniversityWins Subquery:
SELECT University, SUM(Score) AS TotalScore
FROM Wins
GROUP BY University
A
30
B
5
C
15
Main Query:
SELECT University
FROM UniversityWins
WHERE TotalScore = (SELECT MAX(TotalScore) FROM UniversityWins)
Result:
| University | |---|---| | A |
This shows that University A is the winner with a total score of 30.
Potential Applications:
This problem can be used in real-world applications to determine the winner of a competition or tournament. For example, it can be used to find the team with the most points in a sports league or the student with the highest GPA in a university.
Concatenate the Name and the Profession
Problem: Given two tables:
Name
Profession
John
Doctor
Mary
Teacher
Bob
Engineer
Concatenate the Name
and Profession
columns to create a new column FullName
.
Solution:
SELECT Name || ' ' || Profession AS FullName
FROM table_name;
Breakdown:
The
||
operator concatenates two strings.The spaces between the quotes are used to add spaces between the name and profession.
The
AS
keyword is used to alias the new column.
Example:
SELECT Name || ' ' || Profession AS FullName
FROM table_name;
Output:
| FullName | |---|---| | John Doctor | | Mary Teacher | | Bob Engineer |
Real-World Applications:
Displaying full names in a report or dashboard.
Creating a drop-down list of names for a form.
Generating a list of employees with their job titles for printing or emailing.
Bikes Last Time Used
Problem Statement:
Given a table Bikes
with the following columns:
bike_id
(int): Unique identifier for each bikelast_used_date
(date): The date the bike was last used
Write a SQL query to find the most recently used bike and the date it was last used.
Solution:
SELECT bike_id, MAX(last_used_date) AS most_recent_date
FROM Bikes
GROUP BY bike_id
ORDER BY most_recent_date DESC
LIMIT 1;
Breakdown of the Solution:
SELECT bike_id, MAX(last_used_date) AS most_recent_date: This line selects the bike ID and the maximum value of the
last_used_date
column, which represents the most recent date the bike was used. TheMAX
aggregate function is used to find the maximum value in a group of rows.FROM Bikes: This line specifies that the query should be executed on the
Bikes
table.GROUP BY bike_id: This line groups the rows in the table by the
bike_id
column. This allows us to find the most recent date for each bike.ORDER BY most_recent_date DESC: This line sorts the rows in descending order of the
most_recent_date
column. This places the most recently used bike at the top of the result set.LIMIT 1: This line limits the result set to only the first row, which is the most recently used bike.
Real-World Applications:
This query can be used in various real-world applications, such as:
Bike rental companies: To track which bikes are being used most frequently and to ensure that all bikes are being maintained and repaired as needed.
City planners: To analyze bike usage patterns and identify areas where bike infrastructure can be improved.
Insurance companies: To assess the risk of bike theft or damage by determining which bikes are most commonly targeted.
Users That Actively Request Confirmation Messages
Problem:
You are given two tables:
Requests
: Contains a list of requests made by users.Users
: Contains information about users.
The Requests
table has the following columns:
request_id
: The ID of the request.user_id
: The ID of the user who made the request.request_type
: The type of request.
The Users
table has the following columns:
user_id
: The ID of the user.name
: The name of the user.email
: The email address of the user.
You want to find the users who actively request confirmation messages. A user is considered active if they have made at least 3 requests of type "CONFIRMATION"
.
Solution:
SELECT
U.user_id,
U.name,
U.email
FROM Users AS U
JOIN Requests AS R
ON U.user_id = R.user_id
WHERE
R.request_type = "CONFIRMATION"
GROUP BY
U.user_id
HAVING
COUNT(*) >= 3;
Breakdown:
Join the
Users
andRequests
tables: This is done using theJOIN
keyword. TheON
clause specifies that the join should be performed based on theuser_id
column.Filter the
Requests
table: This is done using theWHERE
clause. We only want to include requests of type"CONFIRMATION"
.Group the results by user ID: This is done using the
GROUP BY
clause. This will group the results together based on theuser_id
column.Count the number of requests per user: This is done using the
COUNT(*)
aggregate function.Filter the results to include only users with at least 3 requests: This is done using the
HAVING
clause. We only want to include users who have made at least 3 requests of type"CONFIRMATION"
.
Real World Application:
This query can be used to identify users who actively request confirmation messages. This information can be used to:
Send automated confirmation messages to these users.
Provide customer support to these users.
Target these users with marketing campaigns.
Game Play Analysis III
Problem Statement
Given a table GamePlay
with the following schema:
| id | player_id | points | game_date |
|---|---|---|---|
| 1 | 1 | 100 | 2022-01-01 |
| 2 | 2 | 50 | 2022-01-02 |
| 3 | 1 | 150 | 2022-01-03 |
| 4 | 2 | 75 | 2022-01-04 |
| 5 | 1 | 200 | 2022-01-05 |
Find the players with the highest total points scored in the last 7 days.
Solution
To solve this problem, we can use a combination of window functions and date manipulation functions.
Create a window for the last 7 days:
OVER (PARTITION BY player_id ORDER BY game_date DESC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
This window will create a partition for each player and order the rows by
game_date
in descending order. It will then select only the rows within the last 7 days.Calculate the total points within the window:
SUM(points)
This will calculate the total points scored by each player within the last 7 days.
Rank the players by total points:
RANK() OVER (ORDER BY total_points DESC)
This will rank the players in descending order by their total points within the last 7 days.
Select the players with the highest rank:
WHERE rank = 1
This will select the players with the highest rank, which are the players with the highest total points within the last 7 days.
Complete SQL Statement
SELECT
player_id,
SUM(points) OVER (PARTITION BY player_id ORDER BY game_date DESC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS total_points,
RANK() OVER (ORDER BY total_points DESC) AS rank
FROM GamePlay
WHERE game_date >= DATE('now', '-7 days')
GROUP BY player_id
HAVING rank = 1;
Example
If we execute this query on the sample table, we will get the following results:
| player_id | total_points | rank |
|---|---|---|
| 1 | 450 | 1 |
| 2 | 125 | 2 |
This shows that Player 1 has the highest total points (450) within the last 7 days, followed by Player 2 with 125 points.
Real-World Applications
This query can be used to identify the most active players in a game or the players who have scored the most points in a certain period of time. This information can be used to reward players, create leaderboards, or improve the game experience.
The First Day of the Maximum Recorded Degree in Each City
Problem:
Given a database containing weather records for multiple cities, find the first day when the maximum recorded temperature was achieved in each city.
Database Schema:
CREATE TABLE Weather (
City TEXT,
Date TEXT,
Temperature REAL
);
SQL Query:
SELECT City, MIN(Date) AS FirstMaxTempDate
FROM Weather
WHERE Temperature = (SELECT MAX(Temperature) FROM Weather WHERE City = Weather.City)
GROUP BY City;
Query Breakdown:
Subquery:
(SELECT MAX(Temperature) FROM Weather WHERE City = Weather.City)
This subquery calculates the maximum temperature recorded for each city.
Outer Query:
Weather
refers to the main Weather table.Temperature = (SELECT MAX(Temperature) FROM Weather WHERE City = Weather.City)
: This condition filters the main table to include only rows where the temperature is equal to the maximum temperature for each city.GROUP BY City
: This groups the results by city to get the first date for each city where the maximum temperature was recorded.MIN(Date)
: This calculates the minimum date within each city group, which corresponds to the first day when the maximum temperature was achieved.
Example:
Consider the following table:
London
2020-06-01
20
London
2020-06-02
25
Paris
2020-07-01
15
Paris
2020-07-02
20
Paris
2020-07-03
25
The query would return:
London
2020-06-02
Paris
2020-07-03
This shows that the first day when the maximum temperature was recorded in London was June 2, 2020, and in Paris was July 3, 2020.
Applications:
This query can be useful for weather analysis, such as finding the hottest day or analyzing seasonal patterns. It can also be used for comparative analysis between cities or understanding the impact of climate change over time.
Replace Employee ID With The Unique Identifier
Problem:
You have a table called Employees
with the following columns:
EmployeeID
(unique identifier for each employee)Name
Salary
Department
You want to replace the EmployeeID
column with a unique identifier called UID
.
Solution:
ALTER TABLE Employees
ADD COLUMN UID UNIQUEIDENTIFIER ROWGUIDCOL, -- Add a new column called UID of data type uniqueidentifier and set it as ROWGUIDCOL
DROP COLUMN EmployeeID; -- Drop the EmployeeID column
Explanation:
The UNIQUEIDENTIFIER
data type generates a unique 16-byte value that serves as a unique identifier for each row in the table. The ROWGUIDCOL
property ensures that the UID
column is automatically populated with a unique value when new rows are inserted. By replacing the EmployeeID
column with the UID
column, we create a new unique identifier that is both performant and reliable.
Real-World Applications:
Data Integrity: Ensuring unique identifiers eliminates the possibility of duplicate entries and data corruption.
Efficient Data Access: Unique identifiers allow for faster data retrieval because they can be used to quickly locate specific rows without the need for full table scans.
Replication: Unique identifiers simplify data replication across multiple systems, ensuring that each record is uniquely identified and can be tracked accurately.
Data Security: Unique identifiers can be used to enforce data security by restricting access to specific records based on the identifier.
Evaluate Boolean Expression
Problem Description:
Given a table Employee
with the following columns: id
, name
, and salary
, return the names of employees whose salary is greater than the average salary of all employees.
SQL Query:
SELECT name
FROM Employee
WHERE salary > (
SELECT AVG(salary)
FROM Employee
);
Breakdown:
Subquery: The subquery
(SELECT AVG(salary) FROM Employee)
calculates the average salary of all employees.Comparison: The
WHERE
clause compares the salary of each employee to the average salary.Selection: Employees whose salary is greater than the average salary are selected.
Real-World Application:
This query can be used in various applications, such as:
Identifying employees who should receive bonuses or raises.
Analyzing the salary distribution within an organization.
Identifying potential pay discrepancies.
Simplified Explanation:
Imagine a table of employee salaries. To find employees who earn more than the average, we do the following:
We calculate the average salary using the
AVG()
function.We compare each employee's salary to the average using the
>
operator.We select the names of employees whose salary is greater than the average.
Example:
Consider the following table:
1
John
5000
2
Mary
4000
3
Alice
6000
The average salary is (5000 + 4000 + 6000) / 3 = 5000. The query will return:
name
Alice
as Alice's salary (6000) is greater than the average.
The Change in Global Rankings
Problem:
We have two tables:
Player
table:player_id
(int)player_name
(string)rank
(int)
Score
table:player_id
(int)score
(int)
We need to find the change in rank for each player after updating the player's score.
Explanation:
Create a Common Table Expression (CTE) called
PlayerWithUpdatedScore
to calculate the updated rank for each player.
WITH PlayerWithUpdatedScore AS (
SELECT
p.player_id,
p.player_name,
p.rank AS old_rank,
COALESCE(SUM(s.score), 0) AS updated_score
FROM
Player p
LEFT JOIN
Score s ON p.player_id = s.player_id
GROUP BY
p.player_id, p.player_name, p.rank
)
This CTE calculates the updated score for each player by summing up their scores from the
Score
table. It also includes the player's old rank.
Create another CTE called
PlayerWithRankChange
to calculate the change in rank for each player.
WITH PlayerWithRankChange AS (
SELECT
p.player_id,
p.player_name,
p.old_rank,
p.updated_score,
(SELECT COUNT(*) FROM PlayerWithUpdatedScore WHERE updated_score > p.updated_score) + 1 AS new_rank,
(new_rank - old_rank) AS rank_change
FROM
PlayerWithUpdatedScore p
)
This CTE calculates the new rank for each player by counting the number of players with a higher updated score. It then subtracts the old rank from the new rank to get the change in rank.
Select the results from the
PlayerWithRankChange
CTE.
SELECT
player_id,
player_name,
old_rank,
new_rank,
rank_change
FROM
PlayerWithRankChange
ORDER BY
player_id;
Example:
1
John
5
3
-2
2
Mary
3
2
-1
3
Bob
2
1
-1
4
Alice
1
4
3
Potential Applications:
This problem can be applied in any scenario where you need to track changes in rankings based on updated scores, such as in:
Sports (tracking changes in player or team rankings based on game performance)
Gaming (tracking changes in player rankings based on in-game accomplishments)
Education (tracking changes in student rankings based on test scores)
Invalid Tweets
LeetCode SQL Coding Problem
Invalid Tweets
Problem: Find all the tweets with hashtags that start with the letter 'a' or 'b'.
SQL Query:
SELECT *
FROM Tweets
WHERE hashtag LIKE 'a%' OR hashtag LIKE 'b%';
Breakdown and Explanation:
**
SELECT *
: Selects all columns from theTweets
table.**
FROM Tweets
: Specifies theTweets
table to query from.**
WHERE hashtag LIKE 'a%'
: Filters the tweets to include only those with hashtags that start with the letter 'a'. The wildcard character%
represents any number of characters after 'a'.**
OR hashtag LIKE 'b%'
: Adds a second filter to include tweets with hashtags that start with the letter 'b'.
Real-World Application:
This query can be used by social media companies to analyze tweets for specific topics or trends. For example, a marketing team could use the query to identify popular hashtags related to their products or services.
Code Example:
-- Sample Tweets table
CREATE TABLE Tweets (
id INT PRIMARY KEY,
text VARCHAR(255),
hashtag VARCHAR(255)
);
-- Insert sample data
INSERT INTO Tweets (id, text, hashtag) VALUES
(1, 'This is a tweet with #awesome', 'awesome'),
(2, 'This is a tweet with #bad', 'bad'),
(3, 'This is a tweet with #coffee', 'coffee');
-- Execute the query
SELECT *
FROM Tweets
WHERE hashtag LIKE 'a%' OR hashtag LIKE 'b%';
-- Output
+----+---------------------+---------+
| id | text | hashtag |
+----+---------------------+---------+
| 1 | This is a tweet with #awesome | awesome |
| 2 | This is a tweet with #bad | bad |
+----+---------------------+---------+
Find Peak Calling Hours for Each City
Problem:
Find the peak calling hours for each city from a table of phone call records.
Solution:
Step 1: Group Calls by City and Hour
SELECT city, hour, COUNT(*) AS call_count
FROM phone_calls
GROUP BY city, hour
ORDER BY city, hour;
This query groups the call records by city and hour, and counts the number of calls for each city and hour.
Step 2: Find Max Call Count for Each City
SELECT city, MAX(call_count) AS peak_call_count
FROM phone_calls
GROUP BY city;
This query finds the maximum call count for each city.
Step 3: Join Results to Find Peak Hours
SELECT t1.city, t1.hour, t1.call_count
FROM phone_calls t1
JOIN (
SELECT city, MAX(call_count) AS peak_call_count
FROM phone_calls
GROUP BY city
) t2 ON t1.city = t2.city AND t1.call_count = t2.peak_call_count;
This query joins the two previous queries to find the peak call count for each city and then returns the city, hour, and call count for the peak hours.
Example:
CREATE TABLE phone_calls (
id INTEGER PRIMARY KEY,
city TEXT,
hour INTEGER,
call_count INTEGER
);
INSERT INTO phone_calls (city, hour, call_count) VALUES
('New York', 10, 100),
('New York', 11, 150),
('New York', 12, 120),
('New York', 13, 100),
('London', 8, 50),
('London', 9, 70),
('London', 10, 90),
('London', 11, 100),
('Paris', 6, 20),
('Paris', 7, 40),
('Paris', 8, 60),
('Paris', 9, 80);
SELECT * FROM peak_calling_hours;
Simplified Explanation:
We first group the call records by city and hour to count the number of calls for each combination.
Then, we find the maximum call count for each city, which represents the peak call hour.
Finally, we join these results to get the city, hour, and call count for the peak calling hours.
Real-World Applications:
Call Center Staffing: Determine the optimal staffing levels for call centers based on peak calling hours.
Network Optimization: Identify congested network areas during peak hours to improve call quality.
Marketing Campaigns: Target marketing campaigns to specific cities during their peak calling hours for maximum impact.
Dynamic Pivoting of a Table
Dynamic Pivoting of a Table
Problem:
You have a table with data in a specific format, and you need to transform it into a different format by pivoting the table. The pivot operation rotates the rows and columns of the table to create a new table.
Solution:
To perform dynamic pivoting, you can use a combination of PIVOT
and XML PATH
functions. Here's a step-by-step breakdown:
Step 1: Create a Sample Table
CREATE TABLE Sales (
Product VARCHAR(255),
Region VARCHAR(255),
Sales INT
);
INSERT INTO Sales (Product, Region, Sales) VALUES ('Product A', 'East', 100);
INSERT INTO Sales (Product, Region, Sales) VALUES ('Product B', 'West', 200);
INSERT INTO Sales (Product, Region, Sales) VALUES ('Product C', 'South', 300);
Step 2: Use the PIVOT
Function
The PIVOT
function allows you to pivot a table by specifying the columns to be pivoted and the aggregation function to use. In our case, we want to pivot the table by the Region
column and calculate the sum of Sales
for each region.
SELECT *
FROM Sales
PIVOT (SUM(Sales) FOR Region IN ([East], [West], [South]));
Output:
Product A
100
NULL
NULL
Product B
NULL
200
NULL
Product C
NULL
NULL
300
Step 3: Use the XML PATH
Function
The XML PATH
function can be used to dynamically generate the PIVOT
clause based on the distinct values in the Region
column.
DECLARE @PivotColumns NVARCHAR(MAX);
SET @PivotColumns = (
SELECT ',' + QUOTENAME(Region)
FROM (
SELECT DISTINCT Region
FROM Sales
)
FOR XML PATH('')
);
SELECT *
FROM Sales
PIVOT (SUM(Sales) FOR Region IN (@PivotColumns));
Output:
Product A
100
NULL
NULL
Product B
NULL
200
NULL
Product C
NULL
NULL
300
Real-World Applications:
Dynamic pivoting can be useful in various scenarios, such as:
Converting data from a relational model to a multidimensional model (e.g., for reporting or analysis)
Summarizing data across multiple dimensions
Creating reports with tabular data where the columns vary dynamically based on the data
Calculate Special Bonus
Problem Statement:
Given a table Employees
with columns EmployeeID
, Salary
, and Performance
, calculate the special bonus for each employee based on their performance.
1
1000
Excellent
2
2000
Good
3
3000
Average
Special Bonus Calculation:
Excellent: 10% of salary
Good: 5% of salary
Average: 0% of salary
SQL Query:
SELECT
EmployeeID,
Salary,
Performance,
CASE
WHEN Performance = 'Excellent' THEN Salary * 0.10
WHEN Performance = 'Good' THEN Salary * 0.05
ELSE 0
END AS SpecialBonus
FROM
Employees;
Breakdown:
1. Select Columns:
SELECT
EmployeeID,
Salary,
Performance,
This part selects the necessary columns from the Employees
table.
2. Calculate Special Bonus Using CASE
Statement:
CASE
WHEN Performance = 'Excellent' THEN Salary * 0.10
WHEN Performance = 'Good' THEN Salary * 0.05
ELSE 0
END AS SpecialBonus
The CASE
statement evaluates the Performance
column and calculates the special bonus based on the following conditions:
If
Performance
is 'Excellent', the bonus is 10% of the salary.If
Performance
is 'Good', the bonus is 5% of the salary.Otherwise, the bonus is 0.
3. Alias the Result:
AS SpecialBonus
This aliases the result of the CASE
statement as SpecialBonus
.
4. From Employees Table:
FROM
Employees;
This specifies the source table for the query, which is the Employees
table.
Output:
1
1000
Excellent
100
2
2000
Good
100
3
3000
Average
0
Real-World Application:
This query can be used in an HR system to calculate special bonuses for employees based on their performance. It can also be used to analyze the performance of employees and make decisions regarding promotions or raises.
Reported Posts
Problem:
Reported Posts
Given a table called posts
that contains the following columns:
post_id
: The unique ID of the post.title
: The title of the post.content
: The content of the post.reported
: A flag indicating whether the post has been reported.
You need to write a query to find all the reported posts and the percentage of reported posts out of the total number of posts.
Solution:
SELECT
COUNT(*) AS total_posts,
(
SELECT
COUNT(*)
FROM posts
WHERE
reported = 1
) AS reported_posts,
(
(
SELECT
COUNT(*)
FROM posts
WHERE
reported = 1
) / COUNT(*) * 100
) AS reported_percentage
FROM posts;
Explanation:
The query first calculates the total number of posts in the total_posts
column. Then, it calculates the number of reported posts in the reported_posts
column by using a subquery. Finally, it calculates the percentage of reported posts out of the total number of posts in the reported_percentage
column by dividing the number of reported posts by the total number of posts and multiplying the result by 100.
Real-World Applications:
This query can be used to find the percentage of reported posts on a social media platform or other online forum. This information can be used to identify posts that may contain inappropriate content or violate the platform's terms of service.
Find Customer Referee
Problem Statement:
Given a table of customer records with their referee information, find all customers who have been referred by a customer with a specific ID, as well as the number of customers they have referred.
SQL Query:
WITH Referrals AS (
SELECT
referred_by,
COUNT(*) AS num_referrals
FROM
customers
GROUP BY
referred_by
)
SELECT
c.customer_id,
c.name,
r.num_referrals
FROM
customers c
JOIN
Referrals r ON c.customer_id = r.referred_by
WHERE
c.referred_by = @specific_customer_id;
Explanation:
Create a Common Table Expression (CTE) called
Referrals
: This CTE counts the number of referrals for each customer by grouping thecustomers
table by thereferred_by
column.Join the
customers
table with theReferrals
CTE: On thereferred_by
column, which links customers to their referring customers.Filter the results: Where the
referred_by
column in thecustomers
table matches the specified customer ID.
Simplified Explanation:
Imagine a table with customer information, including who referred them. To find all customers referred by a specific customer, we first count the number of referrals for each customer using a CTE. Then, we join this count information with the customer table, filtering it to show only the customers referred by the specified customer.
Real-World Applications:
Referral programs: Tracking customer referrals to reward the referring customers.
Customer relationship management (CRM): Identifying key influencers or evangelists within a customer base.
Marketing campaigns: Targeting referred customers with tailored promotions to increase conversion rates.
Hopper Company Queries III
Problem Statement:
Find the total number of "Hopper" employees in the "Hopper Company" database.
Solution:
SELECT COUNT(*) AS TotalHopperEmployees
FROM Employees
WHERE Name LIKE '%Hopper%';
Simplified Explanation:
SELECT COUNT(*) AS TotalHopperEmployees: This calculates the total number of rows in the "Employees" table where the "Name" column contains the substring "Hopper".
FROM Employees: Specifies the table to query, which is "Employees".
WHERE Name LIKE '%Hopper%': This is a filter condition that selects rows where the "Name" column contains the substring "Hopper" anywhere in the string. The percent signs (%) act as wildcards, allowing for partial matches.
Real-World Example:
In a human resources system for a large company, this query can be used to quickly count the number of employees with the last name "Hopper". This information could be used for various purposes, such as reporting on employee demographics or identifying potential candidates for promotions.
Potential Applications:
Employee Management: Tracking the number of employees with specific characteristics, such as last name, job title, or department.
Hiring and Recruitment: Identifying candidates with desired qualifications or experience.
Performance Evaluation: Analyzing the distribution of employees by performance ratings or salary range.
Reported Posts II
Problem Statement
Given a table named ReportedPosts
with the following schema:
CREATE TABLE ReportedPosts (
reportId INT NOT NULL,
postId INT NOT NULL,
reporterId INT NOT NULL,
PRIMARY KEY (reportId)
);
You need to write a SQL query to find the top 10 reported posts.
Optimized Solution
SELECT postId, COUNT(DISTINCT reporterId) AS report_count
FROM ReportedPosts
GROUP BY postId
ORDER BY report_count DESC
LIMIT 10;
Explanation
The query starts with a
SELECT
statement that retrieves thepostId
and the count of distinctreporterId
s for each post. TheDISTINCT
keyword is used to ensure that each reporter is counted only once.The
FROM
clause specifies theReportedPosts
table as the data source.The
GROUP BY
clause groups the results bypostId
so that the count of distinctreporterId
s can be computed for each post.The
ORDER BY
clause sorts the results in descending order ofreport_count
.The
LIMIT 10
clause limits the results to the top 10 reported posts.
Performance Analysis
The query uses an index on the postId
column to speed up the retrieval of the reported posts. The GROUP BY
and ORDER BY
operations are also optimized to take advantage of the index.
Real-World Applications
This query can be used to identify the most reported posts on a social media platform or any other platform that allows users to report content. This information can be used to investigate the content and take appropriate action, such as removing the post or banning the user who posted it.
The Number of Seniors and Juniors to Join the Company II
Problem:
Count the number of seniors and juniors in a company with the following employee table:
CREATE TABLE Employee (
id INT PRIMARY KEY,
name VARCHAR(255),
position VARCHAR(255)
);
Solution:
SELECT
COUNT(CASE WHEN position = 'Senior' THEN 1 END) AS num_seniors,
COUNT(CASE WHEN position = 'Junior' THEN 1 END) AS num_juniors
FROM
Employee;
Breakdown:
The
COUNT()
function is used to count the number of rows that meet a certain condition.The
CASE
statement is used to check the value of theposition
column and return a different value for each possible value.The
WHEN
clause is used to specify the condition that must be met for theCASE
statement to return a value.The
THEN
clause specifies the value that theCASE
statement will return if the condition is met.The
END
clause closes theCASE
statement.
Real-World Application:
This query could be used by a company to track the number of senior and junior employees they have. This information could be used to make decisions about hiring, promotion, and training.
Example:
| num_seniors | num_juniors |
|-------------|-------------|
| 5 | 10 |
This example shows that the company has 5 senior employees and 10 junior employees.
Active Businesses
Problem Statement:
Given a table called Businesses
that contains information about businesses, including their business_id
, address
, and active
status. Find all active businesses.
SQL Solution:
SELECT *
FROM Businesses
WHERE active = 1;
Breakdown and Explanation:
*SELECT : This clause selects all columns from the
Businesses
table.FROM Businesses: This clause specifies the table from which the data will be retrieved.
WHERE active = 1: This clause filters the results to only include businesses where the
active
column is set to 1. In this table, a value of 1 indicates that the business is active.
Example:
Consider the following Businesses
table:
1
123 Main Street
1
2
456 Elm Street
0
3
789 Oak Street
1
Running the query on this table will return the following result:
1
123 Main Street
1
3
789 Oak Street
1
These are the two businesses that are active.
Real-World Applications:
This query can be useful in various real-world applications, such as:
Creating a list of active businesses for a directory or website.
Identifying active businesses for marketing campaigns or promotions.
Tracking the status of businesses for regulatory purposes.
Delete Duplicate Emails
Problem: Delete Duplicate Emails
SQL Solution:
DELETE FROM users
WHERE id NOT IN (SELECT MIN(id)
FROM users
GROUP BY email);
Breakdown:
DELETE FROM users: This deletes rows from the "users" table.
WHERE id NOT IN(...): This condition checks for rows where the "id" column is not included in the list of minimum "id"s for each email address.
SELECT MIN(id) FROM users GROUP BY email: This subquery finds the minimum "id" for each unique email address. By comparing "id" not in this list, we can identify duplicate rows to delete.
Simplified Explanation:
Imagine you have a table of users with columns for "id" and "email." Each email address may be associated with multiple rows (e.g., if a user has multiple accounts). We want to keep only the row with the smallest "id" for each email address and delete the duplicates.
The query accomplishes this by first identifying the minimum "id" for each email address. Then, it deletes all rows except those with the matching minimum "id."
Example:
| id | email |
| --- | ------------- |
| 1 | user@example.com |
| 2 | user@example.com |
| 3 | another@example.com |
| 4 | another@example.com |
DELETE FROM users
WHERE id NOT IN (SELECT MIN(id)
FROM users
GROUP BY email);
Result:
| id | email |
| --- | ------------- |
| 1 | user@example.com |
| 3 | another@example.com |
Real-World Applications:
This query can be useful in various scenarios, such as:
Cleaning up user data tables by removing duplicate email addresses for better data integrity.
Enforcing unique email addresses in a registration system to prevent multiple accounts for the same user.
Identifying and removing duplicate emails from a marketing list to improve email deliverability rates.
Sales Analysis III
Problem Statement:
Given a sales database with tables Orders
and Products
, find the total sales for each product group.
Tables:
Orders:
- order_id: INTEGER
- customer_id: INTEGER
- product_id: INTEGER
- quantity: INTEGER
- unit_price: DECIMAL
Products:
- product_id: INTEGER
- product_group: STRING
- product_name: STRING
Solution:
SELECT
p.product_group,
SUM(o.quantity * o.unit_price) AS total_sales
FROM Orders AS o
JOIN Products AS p
ON o.product_id = p.product_id
GROUP BY
p.product_group;
Explanation:
We start by joining the
Orders
andProducts
tables on theproduct_id
column to link orders to products.We then use the
SUM()
function to calculate the total sales for each product group by multiplying the quantity ordered with the unit price and summing up the results.The
GROUP BY
clause groups the results by product group to show the total sales for each group.
Example:
Electronics
10000
Clothing
5000
Furniture
2000
Real-World Applications:
This query can be used for various analytics and reporting purposes, such as:
Identifying the most profitable product groups
Analyzing sales trends and patterns
Making informed decisions about product development and marketing
Number of Trusted Contacts of a Customer
Problem: Given a table of customer information and a table of trusted contacts, determine the number of trusted contacts for each customer.
Table Schema:
customers
: Contains customer data, includingcustomer_id
.trusted_contacts
: Contains trusted contact data, includingcustomer_id
andcontact_id
.
SQL Query:
SELECT c.customer_id, COUNT(DISTINCT tc.contact_id) AS num_trusted_contacts
FROM customers c
LEFT JOIN trusted_contacts tc ON c.customer_id = tc.customer_id
GROUP BY c.customer_id;
Explanation:
The
LEFT JOIN
combines thecustomers
andtrusted_contacts
tables, preserving all customers even if they have no trusted contacts.The
COUNT(DISTINCT tc.contact_id)
function counts the distinct contact IDs for each customer, giving us the total number of trusted contacts.The
GROUP BY c.customer_id
clause groups the results bycustomer_id
, ensuring that each customer has its own row with the corresponding number of trusted contacts.
Example:
customers table:
1
John Doe
2
Jane Smith
trusted_contacts table:
1
101
1
102
2
201
2
202
Result:
1
2
2
2
Real-World Applications:
Customer Relationship Management (CRM) systems: Understand the level of trust between customers and their trusted contacts.
Fraud detection: Identify customers with an unusually large number of trusted contacts, which may indicate suspicious activity.
Social network analysis: Determine the connectivity and influence of customers within a social network.
The Most Recent Three Orders
Problem Statement: Given a table Orders
that contains columns like order_id
, created_at
, total_amount
, find the most recent three orders for each customer.
Solution:
-- Select the order_id, created_at, and total_amount for the most recent three orders for each customer.
SELECT order_id,
created_at,
total_amount
-- From the Orders table.
FROM Orders
-- Use a subquery to find the most recent three orders for each customer.
WHERE order_id IN (
SELECT order_id
FROM (
SELECT order_id,
customer_id,
created_at,
total_amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS row_num
FROM Orders
) AS subquery
WHERE row_num <= 3
);
Explanation:
The main query selects the
order_id
,created_at
, andtotal_amount
for the most recent three orders for each customer.The subquery finds the most recent three orders for each customer using the
ROW_NUMBER()
function.The
ROW_NUMBER()
function assigns a sequential number to each row within a partition.The
PARTITION BY customer_id
clause ensures that the rows are partitioned by customer_id.The
ORDER BY created_at DESC
clause sorts the rows within each partition in descending order ofcreated_at
.The
row_num <= 3
condition selects the rows with the lowest three row numbers, which correspond to the most recent three orders for each customer.The
IN
clause in the main query uses the subquery to select the order_ids of the most recent three orders for each customer.
Real-World Applications:
Identifying the most recent orders for a customer can be useful for various purposes, such as:
Providing personalized recommendations based on recent purchases.
Tracking the status of recent orders and providing updates to customers.
Identifying any issues or delays with recent orders.
Apples & Oranges
Problem Statement
Implement a SQL query to return a list of apples and oranges and their total count. The table is given as:
+-------------+-------------+
| Fruits | Count |
+-------------+-------------+
| Apples | 10 |
| Oranges | 5 |
| Apples | 7 |
| Oranges | 10 |
+-------------+-------------+
Solution
-- Group the fruits by name and sum their counts
SELECT Fruits, SUM(Count) AS TotalCount
FROM Fruits
GROUP BY Fruits
ORDER BY Fruits;
Output
+-------------+-------------+
| Fruits | TotalCount |
+-------------+-------------+
| Apples | 17 |
| Oranges | 15 |
+-------------+-------------+
Explanation
GROUP BY Fruits
: This clause groups the rows in theFruits
table by theFruits
column. This means that all rows with the same fruit name will be grouped together.SUM(Count)
: This clause calculates the sum of theCount
column for each group. This gives us the total count of each fruit.ORDER BY Fruits
: This clause sorts the results by theFruits
column in ascending order.
Real-World Application
This query can be used in a variety of real-world applications, such as:
Inventory management: To track the total count of apples and oranges in a warehouse.
Sales analysis: To determine which fruits are selling the best.
Market research: To gather data on the popularity of different fruits.
Recyclable and Low Fat Products
LeetCode Problem:
Recyclable and Low Fat Products
Problem Statement:
You are given two tables:
products
: Each row represents a product withproduct_id
,product_name
, andproduct_type
.product_tags
: Each row represents a tag associated with a product withproduct_id
andtag_id
.
Find all products that are both recyclable and low fat.
Example:
products:
+--------------+-------------------+--------------------+
| product_id | product_name | product_type |
+--------------+-------------------+--------------------+
| 1 | Apple | Fruits |
| 2 | Banana | Fruits |
| 3 | Milk | Dairy |
| 4 | Yogurt | Dairy |
| 5 | Cheese | Dairy |
product_tags:
+--------------+--------+
| product_id | tag_id |
+--------------+--------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 1 | 6 |
+--------------+--------+
Result:
+--------------+-------------------+
| product_id | product_name |
+--------------+-------------------+
| 4 | Yogurt |
| 5 | Cheese |
Solution:
SELECT p.product_id, p.product_name
FROM products AS p
INNER JOIN product_tags AS pt ON p.product_id = pt.product_id
WHERE pt.tag_id IN (
SELECT tag_id FROM tags WHERE tag = 'recyclable'
)
AND pt.tag_id IN (
SELECT tag_id FROM tags WHERE tag = 'low_fat'
);
Breakdown:
Join Products and Tags: We join the
products
andproduct_tags
tables using theproduct_id
column to associate products with their tags.Filter by Recyclable Tag: We use an
INNER JOIN
to filter theproduct_tags
table and include only rows where thetag_id
matches a tag with the value 'recyclable'.Filter by Low Fat Tag: We use a second
INNER JOIN
to further filter theproduct_tags
table and include only rows where thetag_id
matches a tag with the value 'low_fat'.Final Result: We select the
product_id
andproduct_name
from theproducts
table where the products have both a recyclable tag and a low fat tag.
Real-World Applications:
Consumer Information: Retailers can use this information to help consumers identify products that are both environmentally friendly and healthy.
Product Development: Manufacturers can use this analysis to develop and market products that meet the growing demand for sustainable and low-fat options.
Online Shopping: E-commerce platforms can provide filters and recommendations to guide customers towards recyclable and low fat products.
Leetcodify Friends Recommendations
Problem Statement
You are given a table Friends
with the following columns:
user_id1
user_id2
This table represents friendships between users. Each row indicates that user user_id1
is friends with user user_id2
.
You are also given a table RecommendedFriends
with the following columns:
user_id
recommended_friend_id
This table contains potential friend recommendations for users. Each row indicates that user user_id
may be interested in becoming friends with user recommended_friend_id
.
Your task is to write a query to find a list of recommended friends for each user in the Friends
table. The recommended friends should not already be friends with the user.
SQL Implementation
SELECT
F.user_id1 AS user_id,
RF.recommended_friend_id
FROM
Friends AS F
INNER JOIN
RecommendedFriends AS RF
ON
F.user_id1 = RF.user_id
WHERE
NOT EXISTS (
SELECT
1
FROM
Friends
WHERE
(user_id1 = F.user_id1 AND user_id2 = RF.recommended_friend_id)
OR (user_id2 = F.user_id1 AND user_id1 = RF.recommended_friend_id)
);
Explanation
The query starts by joining the
Friends
table (aliased asF
) with theRecommendedFriends
table (aliased asRF
) on theuser_id1
column. This ensures that we only consider potential friend recommendations for users who are already in theFriends
table.The
WHERE
clause uses a subquery to exclude any recommended friends who are already friends with the user. The subquery checks whether there are any rows in theFriends
table where one of the users is the same as theuser_id1
from the outer query and the other user is the same as therecommended_friend_id
from the outer query.The final result is a table containing a list of recommended friends for each user in the
Friends
table.
Real-World Applications
This query can be used in a variety of real-world applications, including:
Social media: Recommending new friends to users on social media platforms.
E-commerce: Recommending products to users based on their past purchases and browsing history.
Customer relationship management (CRM): Identifying potential customers who may be interested in a particular product or service.
Drop Type 1 Orders for Customers With Type 0 Orders
Problem Statement:
You are given a table Orders
with the following schema:
CREATE TABLE Orders (
customer_id INT PRIMARY KEY,
order_type INT,
order_amount INT
);
You need to delete all orders of type 1 for customers who have at least one order of type 0.
Solution:
DELETE FROM Orders
WHERE order_type = 1
AND customer_id IN (SELECT customer_id FROM Orders WHERE order_type = 0);
Explanation:
Identify Customers with Type 0 Orders: We use a subquery to select the
customer_id
s of all customers who have at least one order of type 0.Delete Type 1 Orders: We then use these
customer_id
s to delete all orders of type 1 for those customers.
Simplified Explanation:
Imagine you have a table of orders, where each row represents an order placed by a customer. Each order has an order type (0 or 1) and an order amount.
We want to delete all orders of type 1 for customers who have also placed at least one order of type 0.
To do this, we first find all the customers who have placed an order of type 0. Then, we use this list to delete all their orders of type 1.
Real-World Applications:
This query can be used in various real-world scenarios, such as:
Customer Segmentation: Identifying customers who have placed specific types of orders can help businesses segment their customers and target marketing campaigns accordingly.
Inventory Management: Removing duplicate or unwanted orders can help optimize inventory levels and prevent overstocking.
Fraud Detection: Identifying customers who place orders with conflicting order types can help detect fraudulent activities.
Rectangles Area
Problem Statement:
Rectangles Area
You are given a table rectangles
that contains the following columns:
id
: Integer, the unique identifier of the rectangle.x1
: Integer, the x-coordinate of the lower left corner.y1
: Integer, the y-coordinate of the lower left corner.x2
: Integer, the x-coordinate of the upper right corner.y2
: Integer, the y-coordinate of the upper right corner.
Find the total area that is covered by all the rectangles.
Example:
Input:
+---+-----+-----+-----+-----+
| id | x1 | y1 | x2 | y2 |
+---+-----+-----+-----+-----+
| 1 | 1 | 1 | 4 | 5 |
| 2 | 3 | 2 | 5 | 7 |
| 3 | 6 | 3 | 8 | 6 |
+---+-----+-----+-----+-----+
Output:
45
Solution:
Calculate the area of each rectangle:
SELECT id, (x2 - x1) * (y2 - y1) AS area
FROM rectangles;
Find the total area:
SELECT SUM(area) AS total_area
FROM (
SELECT id, (x2 - x1) * (y2 - y1) AS area
FROM rectangles
);
Complete Solution:
SELECT SUM(area) AS total_area
FROM (
SELECT (x2 - x1) * (y2 - y1) AS area
FROM rectangles
);
Explanation:
The subquery calculates the area of each rectangle by subtracting the
x1
fromx2
andy1
fromy2
, then multiplying the results.The outer query then sums the area of all the rectangles to find the total area covered.
The subquery is used to prevent duplicate calculations, as the area of a rectangle is the same regardless of which corner is considered the lower left corner.
Real-World Applications:
Calculating the total area covered by objects in a geographic area (e.g., buildings, parks, lakes).
Determining the amount of material needed to cover a surface (e.g., paint, fabric, flooring).
Estimating the amount of space required for a particular purpose (e.g., a warehouse, a parking lot, a garden).
Average Time of Process per Machine
Problem:
Find the average processing time of a process for each machine.
SQL Query:
-- Select the machine name and average processing time
SELECT machine_name,
AVG(processing_time) AS average_processing_time
-- From the 'processes' table
FROM processes
-- Group the results by machine name
GROUP BY machine_name;
Breakdown and Explanation:
SELECT:
machine_name
: Select the name of the machine.AVG(processing_time)
: Calculate the average processing time for each machine.
FROM:
processes
: The table containing the processing time data.
GROUP BY:
machine_name
: Group the results by machine name to calculate the average processing time for each machine.
Real-World Application:
This query can be used in a manufacturing or production environment to:
Identify machines with longer processing times, indicating potential bottlenecks.
Compare the performance of different machines or configurations.
Set performance targets and monitor progress towards improving process efficiency.
Example:
Consider the following table:
1
Machine A
10
2
Machine A
15
3
Machine B
20
4
Machine B
25
The query would produce:
Machine A
12.5
Machine B
22.5
This indicates that Machine A has an average processing time of 12.5 units while Machine B has an average processing time of 22.5 units.
Find Interview Candidates
Problem:
Find candidates for an interview based on their skills and experience.
Table Schema:
candidates (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
skills VARCHAR(255) NOT NULL,
experience VARCHAR(255) NOT NULL
);
interviews (
id INT PRIMARY KEY,
position VARCHAR(255) NOT NULL,
required_skills VARCHAR(255) NOT NULL,
required_experience VARCHAR(255) NOT NULL
);
Solution:
SELECT
c.id,
c.name
FROM Candidates AS c
JOIN Interviews AS i
ON c.skills LIKE '%'||i.required_skills||'%'
AND c.experience LIKE '%'||i.required_experience||'%'
WHERE
i.position = 'Software Engineer';
Explanation:
JOIN
: Combine thecandidates
andinterviews
tables on matching skill and experience requirements.LIKE
: Use theLIKE
operator to match candidates' skills and experience to the required skills and experience for the position.'%'||i.required_skills||'%'
: Surround the required skills with wildcard characters to allow for partial matches.WHERE
: Filter the results to only include candidates applying for the specified position, in this case, "Software Engineer."
Example:
candidates:
+----+------+-------+----------+
| id | name | skills | experience |
+----+------+-------+----------+
| 1 | John | Java | 5 years |
| 2 | Mary | Python | 3 years |
| 3 | Tom | C++ | 2 years |
interviews:
+----+---------+---------------+--------------------+
| id | position | required_skills | required_experience |
+----+---------+---------------+--------------------+
| 1 | Software Engineer | Java, Python | 3 years |
Query:
```sql
SELECT
c.id,
c.name
FROM Candidates AS c
JOIN Interviews AS i
ON c.skills LIKE '%'||i.required_skills||'%'
AND c.experience LIKE '%'||i.required_experience||'%'
WHERE
i.position = 'Software Engineer';
Result:
+----+------+
| id | name |
+----+------+
| 1 | John |
+----+------+
John is the only candidate with the required skills and experience for the Software Engineer position.
Real-World Applications:
Recruiting: Find potential candidates for job openings based on their skills and experience.
Talent Mapping: Identify internal employees with the right skills for promotions or new projects.
Skill Gap Analysis: Determine the skills that are lacking in an organization and develop training programs accordingly.
Find All Unique Email Domains
Problem: Find All Unique Email Domains
Description: Given a table of email addresses, find all unique domains.
SQL Query:
SELECT SUBSTR(email, INSTR(email, '@') + 1) AS domain
FROM email_table
GROUP BY domain;
Breakdown:
Extract the Domain: The
SUBSTR()
function is used to extract the substring of the email address that starts after the '@' symbol. This is where the domain is located.Group by Domain: The results are grouped by the
domain
column using theGROUP BY
clause. This ensures that only unique domains are returned.
Example:
| email |
|---|---|
| john@example.com |
| jane@example.com |
| bob@gmail.com |
| alice@yahoo.com |
Result:
| domain |
|---|---|
| example.com |
| gmail.com |
| yahoo.com |
Real-World Applications:
Email Marketing: Identifying unique email domains can help email marketers target specific audiences.
Spam Detection: Analyzing email domains can help identify potential spam or phishing attempts.
Data Analysis: Understanding the distribution of email domains can provide insights into user demographics and online behavior.
Hopper Company Queries II
Problem:
Hopper Company Queries II
Given a table called Accounts
with columns id
, email
, balance
, and a table called Transactions
with columns id
, from_account
, to_account
, amount
, and timestamp
.
Write a SQL query to find the accounts with the highest balance. If there is more than one account with the highest balance, return all of them.
SQL Query:
SELECT id, email, balance
FROM Accounts
WHERE balance = (SELECT MAX(balance) FROM Accounts);
Explanation:
The subquery
(SELECT MAX(balance) FROM Accounts)
finds the maximum balance in theAccounts
table.The outer query selects all accounts with a balance equal to the maximum balance.
Real-World Example:
A bank wants to know which accounts have the highest balance. This information can be used to target marketing campaigns to these accounts.
Applications in Real World:
Identifying high-value customers for targeted marketing campaigns
Monitoring account balances for potential fraud or overdraft fees
Tracking the financial health of a business or organization
Friday Purchases II
Problem Statement: Find the total amount spent by customers on Fridays from the 'Sales' table.
SQL Query:
SELECT SUM(Amount)
FROM Sales
WHERE DAYNAME(Date) = 'Friday';
Breakdown:
SELECT SUM(Amount): Calculates the total amount spent by summing up the 'Amount' column.
FROM Sales: Specifies the table from which data will be retrieved.
WHERE DAYNAME(Date) = 'Friday': Filters the rows to include only sales made on Friday. The DAYNAME() function returns the name of the day of the week for a given date.
Explanation:
The WHERE clause ensures that only records where the 'Date' column has a day name of 'Friday' are included in the calculation. The SUM() function computes the total amount spent by summing up the 'Amount' column across all eligible rows.
Real-World Application:
This query can be used in various real-world scenarios:
Retail Analytics: To analyze customer behavior and understand their spending patterns on specific days of the week.
Sales Performance Monitoring: To track weekly sales performance and identify trends based on weekdays.
Loyalty Program Management: To award rewards or discounts to customers who make purchases on designated days, such as Fridays.
Complete Code Implementation:
-- Create the 'Sales' table
CREATE TABLE Sales (
ID INT PRIMARY KEY,
Date DATE,
Amount INT
);
-- Insert sample data
INSERT INTO Sales (Date, Amount) VALUES
('2023-01-01', 10),
('2023-01-05', 20),
('2023-01-07', 15),
('2023-01-09', 25);
-- Execute the query
SELECT SUM(Amount)
FROM Sales
WHERE DAYNAME(Date) = 'Friday';
Expected Result:
If there are two sales made on Fridays (e.g., '2023-01-05' and '2023-01-09'), the query will return:
35
Generate the Invoice
Problem:
You are given two tables:
Invoice: Contains invoice data such as invoice number, invoice date, customer ID, etc.
InvoiceLine: Contains invoice line item data such as product ID, quantity, unit price, etc.
Write a query to generate an invoice for a specific invoice number. The invoice should include the following columns:
Invoice Number
Invoice Date
Customer ID
Customer Name (from the Customer table)
Product ID
Product Name (from the Product table)
Quantity
Unit Price
Line Total
Invoice Total
Solution:
SELECT
Invoice.InvoiceNo,
Invoice.InvoiceDate,
Invoice.CustomerID,
Customer.CustomerName,
InvoiceLine.ProductID,
Product.ProductName,
InvoiceLine.Quantity,
InvoiceLine.UnitPrice,
InvoiceLine.Quantity * InvoiceLine.UnitPrice AS LineTotal
FROM Invoice
JOIN Customer ON Invoice.CustomerID = Customer.CustomerID
JOIN InvoiceLine ON Invoice.InvoiceNo = InvoiceLine.InvoiceNo
JOIN Product ON InvoiceLine.ProductID = Product.ProductID
WHERE
Invoice.InvoiceNo = 'INV0001';
Explanation:
JOIN Tables: We first join the
Invoice
,Customer
,InvoiceLine
, andProduct
tables using appropriate foreign key relationships. This ensures that we can access data from all four tables.WHERE Clause: We use the
WHERE
clause to filter the results based on the specified invoice number ('INV0001'
in this example).SELECT Clause: The
SELECT
clause specifies the columns to be included in the invoice. These include invoice details, customer information, product details, and invoice line item details such as quantity, unit price, and line total.
Real-World Application:
This query is useful for generating invoices for customers in various business applications, such as e-commerce platforms, billing systems, and accounting software. It provides a complete view of an invoice, including customer information, product details, and the total amount due.
All People Report to the Given Manager
Problem Statement:
Given two tables:
Employee
(id, name, manager_id)Manager
(id, name, department_id)
Return a list of employees who report directly to the given manager.
SQL Solution:
SELECT e.name
FROM Employee e
INNER JOIN Manager m ON e.manager_id = m.id
WHERE m.name = 'Given Manager Name';
Breakdown:
INNER JOIN: We join the
Employee
andManager
tables on themanager_id
column to connect employees with their managers.WHERE Clause: We filter the results to include only employees whose managers have the specified name.
Example:
Employee:
+----+-------+----------+
| id | name | manager_id |
+----+-------+----------+
| 1 | John | 2 |
| 2 | Mary | 3 |
| 3 | Bob | null |
Manager:
+----+-------+------------+
| id | name | department_id |
+----+-------+------------+
| 1 | Alice | 10 |
| 2 | Tom | 20 |
| 3 | Susan | 30 |
Query:
SELECT e.name
FROM Employee e
INNER JOIN Manager m ON e.manager_id = m.id
WHERE m.name = 'Tom';
Result:
+-------+
| name |
+-------+
| John |
John is the only employee who reports directly to Tom, so his name is returned.
Real-World Applications:
Employee Management Systems: Identify the employees who report to a specific manager for performance evaluations, project assignments, or organizational restructuring.
HR Reporting: Generate reports on the number of employees reporting to each manager or the average salaries of teams under their supervision.
Team Building: Create a list of employees who work directly with a given manager for team-building activities or project collaboration.
Classifying Triangles by Lengths
Problem Statement:
You have a table triangles
that contains the lengths of the three sides of a triangle: a
, b
, and c
. Classify the triangles based on the lengths of their sides into three categories:
Equilateral: All three sides are equal.
Isosceles: Two of the three sides are equal.
Scalene: All three sides are different.
Best & Performant SQL Solution:
SELECT
CASE
WHEN a = b AND b = c
THEN 'Equilateral'
WHEN a = b OR b = c OR a = c
THEN 'Isosceles'
ELSE 'Scalene'
END AS triangle_type
FROM triangles;
Implementation Details:
The
CASE
expression is used to determine the triangle type based on the values ofa
,b
, andc
.If all three sides are equal (
a = b = c
), the triangle is Equilateral.If any two sides are equal (
a = b
,b = c
, ora = c
), the triangle is Isosceles.Otherwise, all three sides are different and the triangle is Scalene.
Example Usage:
CREATE TABLE triangles (a INT, b INT, c INT);
INSERT INTO triangles VALUES (3, 3, 3); -- Equilateral
INSERT INTO triangles VALUES (3, 4, 3); -- Isosceles
INSERT INTO triangles VALUES (2, 3, 4); -- Scalene
SELECT * FROM triangles;
+---+---+---+--------------+
| a | b | c | triangle_type |
+---+---+---+--------------+
| 3 | 3 | 3 | Equilateral |
| 3 | 4 | 3 | Isosceles |
| 2 | 3 | 4 | Scalene |
+---+---+---+--------------+
Real-World Applications:
Triangle classification is used in various fields, including:
Engineering: Determining the stability and strength of structures based on the shape of their components.
Architecture: Designing buildings with specific structural properties.
Biology: Identifying and classifying plant and animal species based on their shape.
Computer Graphics: Rendering objects with different triangles to create realistic models.
Compute the Rank as a Percentage
** Problem Statement**
Given a table Scores
with two columns:
score
(an integer)name
(a string) Write a SQL query to rank the students based on their scores in descending order, and output the rank as a percentage rounded to two decimal places.
Real World Application
Employee Performance Evaluation:
Rank employees based on their performance scores.
Calculate the percentage rank to provide a fair comparison among employees.
SQL Solution
SELECT
name,
score,
RANK() OVER (ORDER BY score DESC) AS rank,
ROUND((RANK() OVER (ORDER BY score DESC) / COUNT(*) * 100), 2) AS rank_percentage
FROM
Scores
Code Explanation
The
RANK()
function assigns a rank to each row based on thescore
column in descending order.The
COUNT(*)
function returns the total number of rows in the table.The
ROUND()
function rounds the rank percentage to two decimal places for better readability.
Example
Input Table:
John
90
Mary
80
Bob
70
Output:
John
90
1
100.00
Mary
80
2
66.67
Bob
70
3
33.33
Breakdown
The
RANK()
function assigns ranks based on thescore
column in descending order:John has the highest score of 90, so he is ranked 1st.
Mary has the next highest score of 80, so she is ranked 2nd.
Bob has the lowest score of 70, so he is ranked 3rd.
The
COUNT(*)
function returns the total number of rows in the table, which is 3.The
ROUND()
function rounds the rank percentage to two decimal places:John's rank percentage is 100.00% (1 / 3 * 100).
Mary's rank percentage is 66.67% (2 / 3 * 100).
Bob's rank percentage is 33.33% (3 / 3 * 100).
Total Traveled Distance
Problem Statement:
Given a table logs
that tracks the locations of a fleet of vehicles, calculate the total distance traveled by each vehicle between each pair of consecutive timestamps.
Table Schema:
CREATE TABLE logs (
vehicle_id INT NOT NULL,
timestamp TIMESTAMP NOT NULL,
longitude FLOAT NOT NULL,
latitude FLOAT NOT NULL,
PRIMARY KEY (vehicle_id, timestamp)
);
SQL Solution:
WITH CTE AS (
SELECT
vehicle_id,
timestamp,
longitude,
latitude,
LAG(timestamp) OVER (PARTITION BY vehicle_id ORDER BY timestamp) AS prev_timestamp,
LAG(longitude) OVER (PARTITION BY vehicle_id ORDER BY timestamp) AS prev_longitude,
LAG(latitude) OVER (PARTITION BY vehicle_id ORDER BY timestamp) AS prev_latitude
FROM
logs
)
SELECT
vehicle_id,
prev_timestamp,
timestamp,
3959 * acos(
cos(radians(prev_latitude)) * cos(radians(latitude))
* cos(radians(prev_longitude) - radians(longitude))
+ sin(radians(prev_latitude)) * sin(radians(latitude))
) AS distance_traveled
FROM
CTE
WHERE
prev_timestamp IS NOT NULL
ORDER BY
vehicle_id,
prev_timestamp;
Explanation:
This solution uses a Common Table Expression (CTE) to calculate the distance traveled by each vehicle between consecutive timestamps. Here's how it works:
Create the CTE: The CTE, named
CTE
, selects the relevant columns from thelogs
table and adds three additional columns:prev_timestamp
: The previous timestamp for each row.prev_longitude
: The previous longitude for each row.prev_latitude
: The previous latitude for each row.
Calculate the Distance: For each row, the
distance_traveled
column is calculated using the Haversine formula, which measures the distance between two points on a sphere (Earth).Filter Out Null Values: The
WHERE
clause filters out rows whereprev_timestamp
is null, as these rows represent the first location entry for each vehicle and have no previous data to calculate the distance from.Sort the Results: The results are sorted by
vehicle_id
andprev_timestamp
to group the distances traveled by each vehicle in chronological order.
Example:
Consider the following logs
table:
1
2022-01-01 10:00:00
-122.4194
37.7749
1
2022-01-01 12:00:00
-122.4205
37.7758
2
2022-01-01 11:00:00
-118.2437
34.0522
The query would produce the following result:
1
2022-01-01 10:00:00
2022-01-01 12:00:00
1414.2126
2
NULL
2022-01-01 11:00:00
NULL
Real-World Applications:
This solution can be used for various real-world applications, such as:
Fleet Management: Tracking the total distance traveled by vehicles in a fleet helps monitor fuel consumption, maintenance schedules, and driver performance.
Ride-Sharing Services: Calculating the distance traveled for ride-sharing trips is used to determine fares and provide insights into traffic patterns.
Logistics and Supply Chain Management: Measuring the distance traveled by trucks or ships can help optimize routes, reduce transportation costs, and improve delivery times.
Rank Scores
Rank Scores
Problem:
Given a table of scores:
+-----+-------+
| ID | Score |
+-----+-------+
| 1 | 90 |
| 2 | 80 |
| 3 | 70 |
| 4 | 60 |
| 5 | 50 |
+-----+-------+
Rank the scores in descending order, with ranks starting from 1 for the highest score.
Solution:
SELECT ID, Score,
ROW_NUMBER() OVER (ORDER BY Score DESC) AS Rank
FROM Scores;
Breakdown:
ROW_NUMBER() OVER (ORDER BY Score DESC): This function assigns ranks to each row based on the
Score
column, starting from 1 for the highest score and incrementing for each subsequent row.
Example:
| ID | Score | Rank |
+-----+-------+------+
| 1 | 90 | 1 |
| 2 | 80 | 2 |
| 3 | 70 | 3 |
| 4 | 60 | 4 |
| 5 | 50 | 5 |
+-----+-------+------+
Real-World Applications:
Ranking bidders in an auction based on their bids.
Sorting students in a class based on their test scores.
Identifying top-performing employees based on their sales figures.
Snaps Analysis
Problem: Find the average rating of movies released in a specific year.
SQL Query:
SELECT AVG(rating) AS avg_rating
FROM movies
WHERE YEAR(release_date) = 2020;
Breakdown:
SELECT AVG(rating) AS avg_rating: Calculates the average rating of movies.
FROM movies: Specifies the movies table to retrieve data from.
WHERE YEAR(release_date) = 2020: Filters the movies by release year, selecting only movies released in 2020.
Simplified Explanation:
We get the average rating of movies by adding up all the ratings and dividing by the total number of movies. We only include movies released in 2020 by filtering the results.
Real-World Application:
Movie Recommendation Systems: To determine the average rating of movies released in a particular year, helping users find popular movies.
Entertainment Industry Analysis: To track trends in movie ratings over time.
Marketing Campaigns: To gauge the success of movie releases by comparing their average rating to other movies released in the same year.
Sales Analysis I
Problem Statement:
Given two tables:
Sales (id, product_id, quantity, price)
Products (id, name, category)
Find the total sales amount for each category in the Products table.
Solution:
SELECT p.category, SUM(s.quantity * s.price) AS total_sales
FROM Sales s
JOIN Products p ON s.product_id = p.id
GROUP BY p.category;
Breakdown:
JOIN: Join the Sales and Products tables on the product_id column to connect sales data with product categories.
SUM(): Use SUM() to calculate the total sales for each category by multiplying the quantity sold by the price and summing the results for each row in the joined table.
GROUP BY: Group the results by the category column to get the total sales for each category.
Explanation:
Imagine a grocery store that sells various products. The Sales table records the sales transactions, including the product ID, quantity sold, and price. The Products table contains the category of each product.
To find the total sales for each category, we need to:
Link the sales data to the product categories using the product ID.
Calculate the total sales for each product by multiplying the quantity sold by the price.
Sum up the total sales for each category to get the final results.
Real-World Application:
This solution can be useful in various business scenarios:
Product Performance Analysis: Determine which product categories are performing well or need improvement.
Inventory Management: Identify categories with high or low sales to optimize inventory levels and reduce waste.
Marketing Campaign Evaluation: Analyze the effectiveness of marketing campaigns by comparing sales performance across different categories.
Customer Segmentation: Group customers based on the categories they purchase to tailor marketing efforts and product recommendations.
Bank Account Summary II
Problem Statement:
Given a table BankAccounts
with the following schema:
| Column | Type |
|---|---|
| account_id | int |
| account_number | varchar(255) |
| account_balance | decimal(10, 2) |
Write a SQL query to summarize the account balances for each account holder, with the account number and account balance.
Solution:
SELECT
account_number,
SUM(account_balance) AS total_balance
FROM BankAccounts
GROUP BY account_number;
Explanation:
SELECT account_number, SUM(account_balance) AS total_balance: This part of the query selects the account number and the sum of account balances for each account number. The
SUM()
function is used to add up the account balances for each account number. The resulting column is aliased astotal_balance
.FROM BankAccounts: This part specifies the table from which to retrieve the data. In this case, it is the
BankAccounts
table.GROUP BY account_number: This part groups the rows in the result set by the account number. This means that all rows with the same account number will be grouped together. The
total_balance
will be calculated for each group.
Example:
Consider the following data in the BankAccounts
table:
| account_id | account_number | account_balance |
|---|---|---|
| 1 | 123456789 | 100.00 |
| 2 | 987654321 | 200.00 |
| 3 | 123456789 | 50.00 |
| 4 | 987654321 | 100.00 |
The SQL query will return the following result:
| account_number | total_balance |
|---|---|
| 123456789 | 150.00 |
| 987654321 | 300.00 |
Real-World Application:
This query can be used in various real-world applications, such as:
Bank reports: Banks can use this query to generate reports on account balances for customers.
Financial analysis: Financial analysts can use this query to analyze the distribution of account balances across different accounts.
Customer service: Customer service representatives can use this query to quickly determine the total balance of a customer's accounts.
Arrange Table by Gender
Problem:
Given a table with columns name
and gender
, arrange the rows by gender in ascending order (i.e., male, female).
Solution:
SQL Query:
SELECT *
FROM table_name
ORDER BY CASE
WHEN gender = 'male' THEN 0
WHEN gender = 'female' THEN 1
ELSE 2 -- Handle any other gender values (optional)
END;
Breakdown:
SELECT * FROM table_name
: Selects all columns from the specified table.ORDER BY
clause: Arranges the rows in ascending order based on the specified condition.CASE
expression:The
CASE
expression evaluates thegender
column and assigns a numeric value to each row:0
for rows withgender = 'male'
1
for rows withgender = 'female'
2
(or any other value) for rows with other gender values (if handling them is necessary)
The rows are then sorted in ascending order based on these numeric values, effectively grouping the rows by gender.
Example:
Input Table:
John
male
Mary
female
Alice
female
Bob
male
Result:
John
male
Bob
male
Mary
female
Alice
female
Real-World Applications:
Arranging employee data by gender for reporting or analysis
Displaying user information in a sorted manner based on gender in social networks or e-commerce websites
Aggregating data and generating statistics related to gender distribution
Status of Flight Tickets
Problem: Given a table of flight bookings, find the status of each flight, which can be "Cancelled", "Active", or "Completed".
Schema:
CREATE TABLE Flights (
id INT PRIMARY KEY,
start_date DATE,
end_date DATE,
status VARCHAR(255)
);
SQL Solution:
SELECT
id,
CASE
WHEN start_date > CURRENT_DATE THEN 'Active'
WHEN end_date < CURRENT_DATE THEN 'Completed'
ELSE 'Cancelled'
END AS status
FROM
Flights;
Explanation:
The query uses a CASE
expression to determine the status of each flight based on the current date:
If the
start_date
is greater than theCURRENT_DATE
, the flight is considered "Active".If the
end_date
is less than theCURRENT_DATE
, the flight is considered "Completed".If neither of the above conditions is met, the flight is considered "Cancelled".
Real-World Application:
This query can be used to provide real-time information about flight status to passengers or travel agents. It can also be used to track the performance of flights and identify patterns in cancellations or delays.
Additional Notes:
The query assumes that the
start_date
andend_date
columns represent the dates when the flight is scheduled to depart and arrive, respectively.The query can be modified to handle additional flight statuses, such as "Delayed" or "Rescheduled".
The query can be made more efficient by using an index on the
start_date
andend_date
columns.
Human Traffic of Stadium
Problem:
Query the database to find information about human traffic at a stadium on a specific day.
SQL Implementation:
SELECT
*
FROM HumanTraffic
WHERE
Date = '2023-03-08';
Breakdown:
*SELECT : Selects all columns from the
HumanTraffic
table.FROM HumanTraffic: Specifies the table to query from.
WHERE Date = '2023-03-08';: Filters the results to include only records where the
Date
column matches the specified date.
Simplification:
Explanation:
This query retrieves all the rows from the HumanTraffic
table where the Date
column is equal to the specified date. It can be used to analyze the human traffic at a stadium on a particular day, such as:
Total number of visitors: Count the number of rows in the result to get the total number of visitors who entered the stadium that day.
Peak and off-peak hours: Analyze the data to determine the hours when the stadium had the highest and lowest traffic.
Demographics: Use additional columns in the
HumanTraffic
table, such asAge
andGender
, to understand the demographic profile of the visitors.
Real-World Applications:
This query can be used in the following real-world applications:
Stadium management: Optimize stadium operations and staffing based on expected traffic patterns.
Event planning: Plan events and promotions that attract the desired number and type of attendees.
City planning: Understand how human traffic impacts transportation and infrastructure in the area surrounding the stadium.
The Latest Login in 2020
Problem Statement:
Find the latest login timestamp for each user in the year 2020.
Solution:
SELECT user_id, MAX(login_timestamp) AS latest_login
FROM login_log
WHERE YEAR(login_timestamp) = 2020
GROUP BY user_id;
Breakdown:
SELECT user_id, MAX(login_timestamp) AS latest_login: Selects the user ID and the maximum login timestamp as the latest login.
FROM login_log: Specifies the table containing the login records.
WHERE YEAR(login_timestamp) = 2020: Filters the records to include only those in the year 2020.
GROUP BY user_id: Groups the records by user ID to find the latest login for each user.
Example:
Consider the following login_log
table:
1
2020-01-01 00:00:00
2
2020-01-02 00:00:00
1
2020-01-03 00:00:00
3
2020-01-04 00:00:00
2
2020-01-05 00:00:00
1
2020-01-06 00:00:00
The query would return the following result:
1
2020-01-06 00:00:00
2
2020-01-05 00:00:00
3
2020-01-04 00:00:00
Real-World Applications:
User Activity Analysis: Identifying the most recent login for each user can help track user engagement and identify any potential issues with account access.
Fraud Detection: Comparing the latest login timestamp to known successful logins can help detect unauthorized access attempts.
Security Auditing: Tracking the latest login for critical accounts can help identify any suspicious activity and improve security.
Calculate Compressed Mean
Problem Statement:
Given a table of numbers, find the compressed mean of all the positive numbers. Compressed mean is defined as the mean of all positive numbers, rounded to the nearest integer.
Example:
Input Table:
| number |
| ------- |
| 1 |
| 3 |
| 5 |
| -2 |
| 4 |
Output:
| compressed_mean |
| --------------- |
| 3 |
Solution:
SELECT ROUND(AVG(number)) AS compressed_mean
FROM table_name
WHERE number > 0;
Explanation:
Average:
AVG(number)
calculates the average of all the positive numbers in the table.Rounding:
ROUND()
rounds the average to the nearest integer. This is what "compressed" means in compressed mean.
Examples:
Input:
(1, 3, 5, -2, 4)
Output:
3
(The average of 1, 3, 5, 4 is 3.33, rounded to 3)Input:
(2, 4, 6, 8, 10)
Output:
6
(The average of 2, 4, 6, 8, 10 is 6)
Real-World Applications:
Compressed mean can be useful in various scenarios:
Average Ratings: In online review systems, the compressed mean of ratings can provide a simplified and meaningful representation of the average rating.
Temperature Analysis: In weather data, the compressed mean temperature for a month can give a quick overview of the typical temperature range.
Health Metrics: In medical settings, compressed mean can be used to calculate the average blood sugar level or other health metrics that often fall within an integer range.
Patients With a Condition
Problem Statement:
Find all patients who have a specific condition.
SQL Query:
SELECT patient_id, condition_name
FROM PatientConditions
WHERE condition_name = 'Asthma';
Breakdown:
1. Table:
The problem mentions a table called PatientConditions
, which contains the following columns:
patient_id
: The unique identifier for each patient.condition_name
: The name of the condition that the patient has.
2. WHERE Clause:
The WHERE
clause specifies which patients to retrieve. In this case, we want to find all patients who have the condition Asthma
. The =
operator checks for equality between the condition_name
column and the value 'Asthma'
.
3. Output:
The query selects two columns from the table:
patient_id
: The unique identifier for each patient.condition_name
: The name of the condition that the patient has.
Simplified Explanation:
Imagine you have a table with records of all patients and their conditions. You want to find all the patients who have asthma.
1. Table:
It's like a big spreadsheet with rows for each patient and columns for different information, like their patient ID and the conditions they have.
2. WHERE Clause:
This is like a filter that you use to narrow down the list of patients. You're saying that you only want to see patients who have asthma.
3. Output:
The query will give you a list of all the patients who have asthma, along with their patient IDs.
Real-World Applications:
Identifying patients for clinical trials or studies focused on specific conditions.
Tracking the prevalence of different conditions within a population.
Providing information to patients about their health conditions and treatment options.
Calculate Orders Within Each Interval
Problem Statement:
Given a table Orders
with the following schema:
order_id
integer
order_date
date
Calculate the number of orders within each interval of days (e.g., 7 days).
SQL Solution:
SELECT
order_date,
COUNT(*) AS order_count
FROM Orders
GROUP BY FLOOR((order_date - MIN(order_date)) / 7)
Example:
Suppose we have the following Orders
table:
order_id
order_date
1
2022-01-01
2
2022-01-02
3
2022-01-03
4
2022-01-04
5
2022-01-07
6
2022-01-09
7
2022-01-10
The query result will be:
order_date
order_count
2022-01-01
4
2022-01-07
2
2022-01-10
1
Explanation:
FLOOR((order_date - MIN(order_date)) / 7)
: This expression calculates the interval number for each order date. It subtracts the minimum order date from each order date and then divides the result by 7 (the interval size). TheFLOOR
function rounds down to the nearest integer, assigning each order date to an interval.GROUP BY
: TheGROUP BY
clause groups the results by the interval number, effectively counting the number of orders within each interval.
Real-World Applications:
Sales Analysis: Tracking orders within intervals can help businesses analyze sales patterns and identify trends. For example, a restaurant may want to count the number of orders during each week to determine peak hours or adjust staffing accordingly.
Resource Planning: If orders require specific resources, such as raw materials or labor, businesses can use this information to plan their resource allocation based on the expected order volume within different intervals.
Marketing Campaigns: By understanding the distribution of orders over time, businesses can tailor their marketing campaigns to target specific intervals with higher order volumes.
Exchange Seats
Problem:
You have two tables:
Students
(id, name)Seats
(id, student_id)
Each student can occupy at most one seat.
You want to exchange the seats of two students. Specifically, given two student IDs, student_a
and student_b
, you want to update the Seats
table to reflect that student_a
now occupies the seat that was occupied by student_b
, and vice versa.
Best & Performant SQL Solution:
UPDATE Seats
SET student_id = CASE
WHEN student_id = @student_a THEN @student_b
WHEN student_id = @student_b THEN @student_a
ELSE student_id
END
WHERE student_id IN (@student_a, @student_b);
Explanation:
This solution uses a single UPDATE
statement with a CASE
expression to conditionally update the student_id
field in the Seats
table. Specifically:
If the
student_id
field is equal to@student_a
, it is updated to@student_b
.If the
student_id
field is equal to@student_b
, it is updated to@student_a
.Otherwise, the
student_id
field remains unchanged.
The WHERE
clause ensures that only the rows corresponding to the two students (@student_a
and @student_b
) are updated.
Real-World Applications:
This query could be used in a school system to manage student seating arrangements in classrooms. For example, a teacher may want to exchange the seats of two students who are causing distractions or who would benefit from sitting near each other.
Example:
Consider the following tables:
Students:
id name
1 Alice
2 Bob
Seats:
id student_id
1 1
2 2
To exchange the seats of Alice
(student ID 1) and Bob
(student ID 2), we would use the following query:
UPDATE Seats
SET student_id = CASE
WHEN student_id = 1 THEN 2
WHEN student_id = 2 THEN 1
ELSE student_id
END
WHERE student_id IN (1, 2);
This would result in the following updated Seats
table:
Seats:
id student_id
1 2
2 1
Not Boring Movies
Problem Statement: Find all movies that are not boring.
SQL Query:
SELECT title
FROM Movies
WHERE rating > 3;
Explanation:
The SQL query uses the SELECT
statement to retrieve the title
column from the Movies
table. The WHERE
clause filters the results to only include movies where the rating
column is greater than 3.
Breakdown:
SELECT title
: This part of the query specifies the column that we want to retrieve from the table. In this case, we want to retrieve thetitle
column.FROM Movies
: This part of the query specifies the table that we want to select the data from. In this case, we want to select the data from theMovies
table.WHERE rating > 3
: This part of the query filters the results to only include rows where therating
column is greater than 3.
Real-World Example:
This query could be used by a website or app to show users a list of movies that are not boring. This could be useful for users who are looking for movies to watch that they will enjoy.
Potential Applications:
A website or app could use this query to create a list of recommended movies for users.
A movie streaming service could use this query to filter movies by their rating.
Median Employee Salary
Problem Statement:
Given a table called Employees
with columns id
, name
, and salary
, find the median salary of employees.
Solution:
1. Median Function:
Create a user-defined function to calculate the median value from a set of numbers. This function assumes the input values are sorted in ascending order.
CREATE FUNCTION median(arr NUMERIC[]) RETURNS NUMERIC AS $$
DECLARE
arr_size INTEGER := array_length(arr, 1);
ret NUMERIC;
BEGIN
IF arr_size % 2 = 1 THEN
ret := arr[arr_size / 2 + 1];
ELSE
ret := (arr[arr_size / 2] + arr[arr_size / 2 + 1]) / 2;
END IF;
RETURN ret;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
2. Calculate Running Salaries:
Create a window function to calculate the running total of salaries for employees in descending order of salary.
SELECT
id,
name,
salary,
SUM(salary) OVER (ORDER BY salary DESC) AS running_salary
FROM
employees
ORDER BY
salary DESC;
3. Calculate Median Salary:
Use the median
function to calculate the median value from the running_salary
values.
SELECT
median(running_salary) AS median_salary
FROM
(
SELECT
id,
name,
salary,
SUM(salary) OVER (ORDER BY salary DESC) AS running_salary
FROM
employees
) AS running_salaries;
Example:
Consider the following table:
| id | name | salary |
|-----|--------|--------|
| 1 | John | 500 |
| 2 | Mary | 1000 |
| 3 | Tom | 1500 |
| 4 | Lisa | 2000 |
Output:
| median_salary |
|---------------|
| 1500 |
Applications:
Finding the median income of a population
Determining the midpoint of a distribution
Comparing performance of different groups based on salary levels
Students With Invalid Departments
Problem Statement:
Find all students who are enrolled in departments that don't exist.
Database Schema:
CREATE TABLE Students (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL,
department_id INT,
FOREIGN KEY (department_id) REFERENCES Departments(id)
);
CREATE TABLE Departments (
id INT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
SQL Solution:
SELECT
s.id,
s.name,
s.department_id
FROM
Students AS s
LEFT JOIN
Departments AS d
ON
s.department_id = d.id
WHERE
d.id IS NULL;
Breakdown and Explanation:
LEFT JOIN
is used to include all rows from theStudents
table, even if they don't have a matching row in theDepartments
table.The
WHERE d.id IS NULL
condition filters out students who have valid department IDs (i.e., those with matching rows in theDepartments
table).
Real-World Application:
This query can be used in a university database to identify students who have registered for courses in non-existent departments. This allows for quick identification and correction of any errors in student enrollment data.
Immediate Food Delivery III
Immediate Food Delivery III
Problem Statement
Given two tables:
Restaurants with columns
restaurant_id
andname
Orders with columns
order_id
,restaurant_id
,customer_id
,order_time
You are asked to find the number of orders each restaurant has received within the last 30 minutes.
Solution
SELECT
r.name AS restaurant_name,
COUNT(o.order_id) AS num_orders
FROM Restaurants AS r
JOIN Orders AS o
ON r.restaurant_id = o.restaurant_id
WHERE
o.order_time BETWEEN DATE_SUB(NOW(), INTERVAL 30 MINUTE) AND NOW()
GROUP BY
r.name
ORDER BY
num_orders DESC;
Explanation
This query uses the following steps:
Join the
Restaurants
andOrders
tables on therestaurant_id
column.Filter the
Orders
table to include only orders that were placed within the last 30 minutes.Group the results by the restaurant name.
Count the number of orders for each restaurant.
Sort the results in descending order by the number of orders.
Real-World Applications
This query can be used to identify popular restaurants based on the number of orders they receive. This information can be used to make decisions about which restaurants to highlight on a food delivery app or to offer special promotions to.
Count Occurrences in Text
Sure, here is a detailed breakdown and explanation of the Count Occurrences in Text LeetCode problem, along with a simplified implementation in SQL:
Problem Statement
The Count Occurrences in Text problem asks you to find the number of occurrences of a given substring within a larger string. For example, if you have the string "Hello World" and you want to find the number of occurrences of the substring "el", you would get the result 2.
SQL Implementation
Here is a simplified SQL implementation of the Count Occurrences in Text problem:
SELECT COUNT(*)
FROM table_name
WHERE column_name LIKE '%substring%';
In this implementation, we use the LIKE
operator to find all rows in the table_name
table where the column_name
column contains the substring we are looking for. The %
wildcard character is used to represent any number of characters before or after the substring.
Real-World Applications
Counting occurrences in text has many real-world applications, including:
Search engines: Search engines use this technique to find the number of times a keyword appears on a web page.
Spam filters: Spam filters use this technique to identify emails that contain certain keywords or phrases.
Data analysis: Data analysts use this technique to identify patterns and trends in text data.
Potential Gotchas
One potential gotcha to be aware of when using this technique is that it can be slow for large datasets. If you are working with a large dataset, you may want to consider using a more efficient algorithm, such as the Boyer-Moore algorithm.
Conclusion
Counting occurrences in text is a common task in data science and has many real-world applications. The SQL implementation provided in this article is a simple and efficient way to perform this task.
Combine Two Tables
LeetCode Problem Statement:
Table: Orders
1
10
2022-01-01
1
10
2
20
2022-01-02
2
5
3
10
2022-01-03
1
20
4
30
2022-01-04
3
15
Table: Customers
10
John
20
Mary
30
Bob
Goal:
Combine the Orders
and Customers
tables into a single table, mapping the customer_id
column in both tables to connect them.
SQL Solution:
SELECT * FROM Orders o
INNER JOIN Customers c ON o.customer_id = c.customer_id;
Breakdown and Explanation:
Step 1: SELECT * FROM Orders o
This line retrieves all rows from the Orders
table and aliases it as o
.
Step 2: INNER JOIN Customers c ON o.customer_id = c.customer_id
This line joins the Orders
table with the Customers
table on the customer_id
column, effectively mapping customers to their orders.
Result:
The query combines all the columns from both tables into a single result set, creating a unified view of orders and customer information.
Real-World Applications:
Order Management: Display customer details alongside orders for better order tracking and customer service.
Customer Analysis: Analyze customer behavior by combining order history with demographic data.
Marketing: Personalize marketing campaigns by targeting customers with specific product preferences.
Fraud Detection: Identify potential fraudulent activity by linking customer information to suspicious order patterns.
Inventory Management: Forecast demand and optimize inventory levels by tracking customer orders and preferences.
Average Salary: Departments VS Company
Problem Description
Given two tables, Salaries
and Departments
, calculate the average salary for each department and the company as a whole.
Tables:
Salaries
:emp_id
: Employee IDsalary
: Employee salary
Departments
:dept_id
: Department IDdept_name
: Department name
SQL Query:
SELECT
d.dept_name,
AVG(s.salary) AS avg_salary
FROM Salaries AS s
JOIN Departments AS d
ON s.emp_id = d.dept_id
GROUP BY
d.dept_name
UNION ALL
SELECT
'Company Average',
AVG(s.salary)
FROM Salaries AS s;
Explanation:
The
JOIN
clause combines theSalaries
andDepartments
tables on the common columnemp_id
.The
GROUP BY
clause groups the results by department name and calculates the average salary for each department.The
UNION ALL
operator combines the department-level results with the overall company average.The
SELECT
clause retrieves the department name and the average salary for each department and the company as a whole.
Sample Data and Output:
Salaries
1
1000
2
1200
3
1500
4
1800
Departments
1
Sales
2
Marketing
3
Engineering
Output:
Sales
1400
Marketing
1800
Engineering
1500
Company Average
1562.5
Real-World Application:
The query can be used by HR departments to analyze salary trends and compare compensation levels across different departments and the company as a whole. It can also be used to identify departments with lower-than-average salaries or to allocate compensation budgets more effectively.
Find Latest Salaries
Problem Statement:
Find the latest salaries for each employee.
Table Schema:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(255),
salary INT,
start_date DATE
);
Sample Data:
INSERT INTO employees (id, name, salary, start_date) VALUES
(1, 'John Doe', 50000, '2020-01-01'),
(2, 'Jane Smith', 60000, '2020-03-01'),
(3, 'Mike Jones', 40000, '2020-05-01'),
(4, 'Mary Johnson', 55000, '2021-01-01'),
(5, 'Bob Johnson', 45000, '2021-03-01');
SOLUTION IN MYSQL:
SELECT id, name, MAX(salary) AS latest_salary
FROM employees
GROUP BY id, name;
Breakdown and Explanation:
MAX(salary) AS latest_salary: This expression calculates the maximum salary for each employee. The
MAX()
function returns the highest salary for each group of employees. TheAS latest_salary
alias gives the result column a meaningful name.GROUP BY id, name: This clause groups the employees by their ID and name. This ensures that the
MAX()
function is applied separately to each employee, resulting in the latest salary for each individual.SELECT id, name, latest_salary: This clause selects the employee's ID, name, and latest salary from the result of the
GROUP BY
operation.
Real-World Applications:
This query can be used in various real-world applications, such as:
Finding the latest salary history of employees for payroll purposes
Identifying employees who have received the most recent salary increases
Analyzing salary trends within an organization
The Number of Seniors and Juniors to Join the Company
SELECT
s.dept_name AS department,
COUNT(j.job_id) AS juniors,
COUNT(s.emp_id) AS seniors
FROM
seniors s
LEFT JOIN
juniors j ON s.senior_id = j.senior_id
GROUP BY
s.dept_name
ORDER BY
s.dept_name;
This SQL statement counts the number of seniors and juniors in each department. The seniors
table contains the senior employees, and the juniors
table contains the junior employees. The LEFT JOIN
statement joins the two tables on the senior_id
column, which is the foreign key in the juniors
table that references the primary key in the seniors
table.
The GROUP BY
statement groups the results by department, and the COUNT
function counts the number of senior and junior employees in each department. The ORDER BY
statement orders the results by department name.
Here is an example of the output of this SQL statement:
+-----------+---------+---------+
| department | juniors | seniors |
+-----------+---------+---------+
| Engineering | 3 | 2 |
| Marketing | 2 | 1 |
| Sales | 1 | 3 |
+-----------+---------+---------+
This output shows that the Engineering department has 3 junior employees and 2 senior employees, the Marketing department has 2 junior employees and 1 senior employee, and the Sales department has 1 junior employee and 3 senior employees.
This SQL statement can be used to analyze the distribution of senior and junior employees across different departments. This information can be used to make decisions about hiring, training, and development programs.
Report Contiguous Dates
Problem: Find contiguous date ranges in a table of dates.
SQL Query:
WITH DateRanges AS (
SELECT dt, LEAD(dt) OVER (ORDER BY dt) AS next_dt
FROM dates
)
SELECT dt AS start_date, COALESCE(next_dt, MAX(dt)) AS end_date
FROM DateRanges
WHERE next_dt IS NULL OR next_dt - dt > 1
GROUP BY start_date
ORDER BY start_date;
Breakdown:
Create a table of consecutive date ranges:
The
DateRanges
Common Table Expression (CTE) calculates the next date for each date in thedates
table.
Identify contiguous date ranges:
The
WHERE
clause checks if the next date isNULL
(indicating the end of a contiguous range) or if the difference between the current and next date is greater than 1 (indicating a gap).
Group by start date:
The
GROUP BY
clause groups the contiguous date ranges by their start date.
Calculate end date:
The
COALESCE
function assigns thenext_dt
as the end date if it's notNULL
. Otherwise, it assigns the maximum date in thedates
table.
Example:
-- Sample data
CREATE TABLE dates (dt DATE);
INSERT INTO dates VALUES ('2023-01-01'), ('2023-01-02'), ('2023-01-04'), ('2023-01-05'), ('2023-01-07');
-- Execute query
SELECT * FROM DateRanges;
Output:
start_date end_date
2023-01-01 2023-01-02
2023-01-04 2023-01-05
2023-01-07 2023-01-07
Real-World Applications:
Calculating sales figures for specific date ranges.
Identifying periods of activity or inactivity in a process.
Analyzing user visits to a website over time.
Article Views II
Problem Description:
Given a table articles
with columns article_id
, author_id
, title
, view_count
, and last_viewed_at
, you need to find the top view count for each author along with their average view count.
SQL Query:
SELECT
author_id,
MAX(view_count) AS max_view,
AVG(view_count) AS avg_view
FROM
articles
GROUP BY
author_id;
Breakdown and Explanation:
SELECT
: Specifies the columns to be retrieved:author_id
: The ID of the authorMAX(view_count)
: The maximum view count for each authorAVG(view_count)
: The average view count for each author
FROM
: Specifies the table to be used:articles
GROUP BY
: Groups the rows byauthor_id
. This means that the results will be grouped together for each unique author.HAVING
: (Optional) Can be used to filter the results based on the groupings. In this case, it is not used.
Real-World Application:
This query can be used in a website or analytics dashboard to track the performance of authors and identify those with the highest engagement. It can also help in optimizing content strategies and promoting popular articles.
Change Null Values in a Table to the Previous Value
Problem: You have a table with a column that contains null values. You want to change these null values to the previous value in the column.
Best & Performant Solution:
UPDATE table_name
SET column_name = (
SELECT column_name
FROM table_name
WHERE row_id < table_name.row_id
ORDER BY row_id DESC
LIMIT 1
)
WHERE column_name IS NULL;
Explanation (Simplified):
This solution uses a subquery to find the previous value for each row that has a null value.
The subquery selects the
column_name
value from the same table (table_name
) but only for rows where therow_id
is less than the current row'srow_id
. This ensures that we find the previous value.The subquery is sorted in descending order by
row_id
to get the most recent previous value.LIMIT 1
is used to select only the first row in the sorted result, which is the immediate previous value.The main
UPDATE
query sets thecolumn_name
to the value obtained from the subquery for all rows wherecolumn_name
is null.
Real-World Implementation:
Consider a table named Sales
with the following data:
| row_id | product_sales |
|---------|----------------|
| 1 | 100 |
| 2 | 200 |
| 3 | NULL |
| 4 | 400 |
| 5 | 500 |
Running the following query will fill the null value in row 3 with the previous value 200:
UPDATE Sales
SET product_sales = (
SELECT product_sales
FROM Sales
WHERE row_id < Sales.row_id
ORDER BY row_id DESC
LIMIT 1
)
WHERE product_sales IS NULL;
After running the query, the Sales
table will look like this:
| row_id | product_sales |
|---------|----------------|
| 1 | 100 |
| 2 | 200 |
| 3 | 200 |
| 4 | 400 |
| 5 | 500 |
Potential Applications:
This solution can be used in various real-world applications, such as:
Filling missing data in time-series data, where the values are expected to be consecutive.
Creating interpolated data for estimation or prediction tasks.
Imputing missing values in datasets for data analysis and modeling.
Maintaining data integrity by ensuring that columns with sequential or auto-incrementing values do not contain gaps or inconsistencies.
Sellers With No Sales
Problem: Find all sellers who have not made any sales.
SQL Query:
SELECT
seller_id
FROM
Sellers
EXCEPT
SELECT
DISTINCT seller_id
FROM
Sales;
Explanation:
The
EXCEPT
operator is used to find rows in theSellers
table that are not in theSales
table.The
DISTINCT
keyword is used to ensure that we only get unique seller IDs in theSales
subquery.
Example:
Consider the following tables:
Sellers Table:
1
John Smith
2
Jane Doe
3
Michael Jones
4
Mary Brown
Sales Table:
1
1
100
2
2
200
3
3
300
Result:
| seller_id | |---|---| | 4 |
Mary Brown (seller ID 4) is the only seller who has not made any sales.
Real-World Application:
This query can be used to identify inactive sellers who may not be generating revenue for the company. The company can then take steps to address this issue, such as providing additional training or incentives to these sellers.
Find Active Users
Problem: Find Active Users
SQL:
-- Table: Users
-- Columns:
-- id INT UNSIGNED NOT NULL,
-- name VARCHAR(255) NOT NULL,
-- last_active TIMESTAMP NOT NULL,
-- PRIMARY KEY (id)
--
-- Table: Posts
-- Columns:
-- id INT UNSIGNED NOT NULL,
-- author_id INT UNSIGNED NOT NULL,
-- title VARCHAR(255) NOT NULL,
-- created_at TIMESTAMP NOT NULL,
-- PRIMARY KEY (id)
--
SELECT DISTINCT name
FROM Users
WHERE last_active >= DATE_SUB(NOW(), INTERVAL 30 DAY)
ORDER BY last_active DESC;
Breakdown:
SELECT DISTINCT name: Selects the distinct names of active users.
FROM Users: Specifies the Users table to query.
WHERE last_active >= DATE_SUB(NOW(), INTERVAL 30 DAY): Filters users who have been active within the last 30 days.
ORDER BY last_active DESC: Orders the results in descending order of last activity.
How it Works:
The query starts by selecting the distinct
name
column from theUsers
table.It then applies a filter to include only users whose
last_active
timestamp is greater than or equal to 30 days ago. This identifies users who have been active recently.Finally, the query orders the results in descending order of
last_active
to display the most recently active users first.
Example:
Consider the following tables:
Users:
1
John
2023-03-08 18:00:00
2
Mary
2023-02-15 12:00:00
3
Bob
2023-04-01 15:00:00
Posts:
1
1
Post 1
2023-03-09 10:00:00
2
3
Post 2
2023-04-02 13:00:00
3
2
Post 3
2023-02-16 14:00:00
Executing the query will return the following result:
name
John
Bob
Real-World Applications:
Marketing and customer engagement: Identifying active users can help businesses target their marketing campaigns and engage with customers who are actively interacting with their products or services.
Product development: Analyzing user activity can provide insights into which features are being used the most and identify areas for improvement.
Security and fraud detection: Monitoring user activity can help identify suspicious patterns or detect potential threats.
Rolling Average Steps
Problem:
Implement a rolling average query that calculates the average number of steps taken by a user over a specified number of days.
SQL Query:
WITH RollingAverage AS (
SELECT
user_id,
date,
steps,
SUM(steps) OVER (ORDER BY date ROWS BETWEEN 6 AND 6 PRECEDING) AS rolling_average
FROM steps_table
)
SELECT
user_id,
date,
rolling_average
FROM RollingAverage
WHERE rolling_average IS NOT NULL;
Explanation:
Common Table Expression (CTE):
The
WITH
clause creates a Common Table Expression (CTE) calledRollingAverage
.
Window Function:
The window function
SUM(steps) OVER (ORDER BY date ROWS BETWEEN 6 AND 6 PRECEDING)
calculates the sum of steps over the previous 6 days for each row.ROWS BETWEEN 6 AND 6 PRECEDING
indicates that the window frame extends from 6 rows before the current row.
Final Query:
The final query selects the
user_id
,date
, androlling_average
from theRollingAverage
CTE.It filters out rows where
rolling_average
is null (i.e., there were less than 6 days of data available for a given row).
Real-World Application:
This query can be used in a fitness tracking app to:
Calculate a user's average daily steps over a specified period.
Track the user's progress and identify trends in their activity levels.
Provide personalized recommendations for exercise goals.
Weather Type in Each Country
Problem Statement:
Given two tables:
Country (country_name, country_code)
Weather (country_code, weather_type, measurement_value)
Write a SQL query to find the weather type for each country.
Solution:
SELECT
c.country_name,
w.weather_type
FROM
Country c
INNER JOIN
Weather w
ON
c.country_code = w.country_code;
Explanation:
Country Table: Stores the country names and their corresponding country codes.
Weather Table: Stores the weather types (e.g., "Sunny", "Rainy", "Snowy") and their corresponding measurement values for different countries.
The
INNER JOIN
operation combines the two tables based on the common columncountry_code
. This ensures that only rows where the country code matches are included in the result.The query retrieves the
country_name
andweather_type
columns from the joined table, providing a list of weather types for each country.
Example:
Country Table:
| country_name | country_code |
|---|---|
| United States | US |
| France | FR |
| Australia | AU |
Weather Table:
| country_code | weather_type | measurement_value |
|---|---|---|
| US | Sunny | 75 |
| US | Rainy | 60 |
| FR | Cloudy | 55 |
| AU | Sunny | 80 |
Result:
United States
Sunny
United States
Rainy
France
Cloudy
Australia
Sunny
Real-World Applications:
This query can be used for various purposes, such as:
Displaying weather forecasts on websites or mobile apps.
Analyzing weather patterns and trends across different countries.
Providing customized weather alerts based on location.
Conducting research on climate change and its impact on different regions.
Products With Three or More Orders in Two Consecutive Years
Problem Statement: Find products that have received at least three orders in two consecutive years (e.g., 2021 and 2022).
Solution:
/* Find the total orders for each product in each year */
WITH ProductYearOrderCount AS (
SELECT
product_id,
YEAR(order_date) AS order_year,
COUNT(*) AS order_count
FROM
orders
GROUP BY
product_id,
order_year
),
/* Find products with at least 3 orders in two consecutive years */
ProductConsecutiveThreeOrders AS (
SELECT
product_id
FROM
ProductYearOrderCount
/* Group by product_id to count consecutive years with at least 3 orders */
GROUP BY
product_id
HAVING
COUNT(DISTINCT order_year) >= 2
AND SUM(order_count >= 3) >= 2
AND MAX(order_year) - MIN(order_year) <= 1
)
SELECT
p.product_name
FROM
products p
JOIN
ProductConsecutiveThreeOrders c
ON
p.product_id = c.product_id;
Breakdown:
ProductYearOrderCount: This subquery counts the orders for each product in each year.
ProductConsecutiveThreeOrders: This subquery identifies products with at least three orders in two consecutive years by:
Grouping the products by ID and counting the number of distinct order years.
Checking that the number of years with at least three orders is at least two.
Ensuring that the difference between the maximum and minimum order year is at most one (indicating consecutive years).
Final Query: This query joins the original products table with the ProductConsecutiveThreeOrders subquery to retrieve the product names of those that meet the criteria.
Example:
CREATE TABLE products (
product_id INT,
product_name VARCHAR(255)
);
CREATE TABLE orders (
order_id INT,
product_id INT,
order_date DATE
);
INSERT INTO products (product_id, product_name) VALUES
(1, 'Product 1'),
(2, 'Product 2'),
(3, 'Product 3');
INSERT INTO orders (order_id, product_id, order_date) VALUES
(1, 1, '2021-01-01'),
(2, 1, '2021-03-01'),
(3, 1, '2021-05-01'),
(4, 2, '2021-02-01'),
(5, 2, '2021-04-01'),
(6, 3, '2021-06-01'),
(7, 1, '2022-02-01'),
(8, 1, '2022-04-01'),
(9, 2, '2022-03-01'),
(10, 2, '2022-05-01'),
(11, 3, '2022-07-01');
-- Execute the final query
SELECT
p.product_name
FROM
products p
JOIN
ProductConsecutiveThreeOrders c
ON
p.product_id = c.product_id;
Result:
+--------------+
| product_name |
+--------------+
| Product 1 |
| Product 2 |
+--------------+
Real-World Applications:
Identifying popular products with sustained demand for targeted advertising or inventory management.
Analyzing customer behavior to determine if specific products are seasonal or have consistent sales throughout the year.
Tracking product performance over time to identify potential issues or growth opportunities.
Number of Unique Subjects Taught by Each Teacher
Problem:
Number of Unique Subjects Taught by Each Teacher
Given two tables:
Teachers (id, name)
Subjects (id, name)
And a join table:
TeacherSubject (teacher_id, subject_id)
Write a SQL query to find the number of unique subjects taught by each teacher.
Solution:
SELECT
t.name AS TeacherName,
COUNT(DISTINCT s.name) AS NumberOfUniqueSubjects
FROM Teachers AS t
JOIN TeacherSubject AS ts
ON t.id = ts.teacher_id
JOIN Subjects AS s
ON ts.subject_id = s.id
GROUP BY
t.name
ORDER BY
NumberOfUniqueSubjects DESC;
Explanation:
JOIN Tables: We join the
Teachers
,TeacherSubject
, andSubjects
tables using theteacher_id
andsubject_id
fields to link them.COUNT DISTINCT Subjects: For each row in the
TeacherSubject
table, we count the number of unique subjects taught by the teacher usingCOUNT(DISTINCT s.name)
.GROUP BY Teacher Name: We group the results by the teacher's name to count the unique subjects for each teacher.
ORDER BY Number of Subjects: Finally, we sort the results in descending order of the number of unique subjects taught by each teacher.
Result:
The query returns a table with the teacher's name and the number of unique subjects they teach.
Real-World Application:
This query can be used in a school or university system to track the number of subjects taught by each teacher. This information can be used for administrative purposes, such as assigning teachers to classes or evaluating their workload.
Market Analysis II
Problem:
Find the top k products with the highest total revenue in a given period.
SQL Query:
SELECT product_id, SUM(quantity * unit_price) AS total_revenue
FROM sales_data
WHERE sales_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT k;
Breakdown:
SELECT product_id, SUM(quantity * unit_price) AS total_revenue: Selects the product ID and calculates the total revenue for each product by summing the product of quantity sold and unit price for each sale within the specified period.
FROM sales_data: Specifies the sales data table to query.
WHERE sales_date BETWEEN '2023-01-01' AND '2023-01-31': Filters the data to include sales only within the specified period.
GROUP BY product_id: Groups the results by product ID to calculate the total revenue for each product.
ORDER BY total_revenue DESC: Orders the results in descending order of total revenue.
LIMIT k: Limits the results to the top k products with the highest total revenue.
Real-World Applications:
Analyzing sales data to identify top-selling products
Identifying which products are driving the most revenue
Making informed decisions about product inventory and marketing strategies
Tracking customer purchasing trends
Managers with at Least 5 Direct Reports
Problem Statement: Find all managers who have at least 5 direct reports.
SQL Query:
SELECT ManagerID, COUNT(*) AS DirectReports
FROM EmployeeTable
GROUP BY ManagerID
HAVING COUNT(*) >= 5;
Explanation:
The
EmployeeTable
table contains columnsEmployeeID
andManagerID
.The
COUNT(*)
function returns the number of rows in a group.The
GROUP BY
clause groups the results byManagerID
.The
HAVING
clause filters the results to include only managers with at least 5 direct reports.
Simplified Explanation:
We count the number of direct reports for each manager (
COUNT(*) AS DirectReports
).We group the results by manager (
GROUP BY ManagerID
).We filter out managers with less than 5 direct reports (
HAVING COUNT(*) >= 5
).
Real World Applications:
Identifying top performers in a management team.
Assessing the efficiency of managers in managing large teams.
Planning for succession planning by identifying potential replacements for managers with a high number of direct reports.
Example:
| ManagerID | DirectReports |
|---|---|
| 1 | 5 |
| 2 | 7 |
| 3 | 3 |
| 4 | 6 |
In this example, managers with IDs 2 and 4 have at least 5 direct reports and would be selected by the query.
Manager of the Largest Department
Problem: Find the manager who manages the largest number of employees.
SQL Query:
SELECT ManagerID, COUNT(*) AS NumEmployees
FROM EmployeeTable
GROUP BY ManagerID
ORDER BY NumEmployees DESC
LIMIT 1;
Explanation:
SELECT ManagerID, COUNT(*) AS NumEmployees: This line selects the manager ID and the number of employees managed by that manager. The
COUNT(*)
function returns the number of rows in a group, which in this case is the number of employees managed by each manager.FROM EmployeeTable: This line specifies the table from which the data is retrieved. In this case, it is the
EmployeeTable
.GROUP BY ManagerID: This line groups the results by the manager ID. This means that all the employees managed by a particular manager are grouped together.
ORDER BY NumEmployees DESC: This line orders the results in descending order of the number of employees managed. This means that the manager with the largest number of employees appears first in the results.
LIMIT 1: This line limits the results to only the top row. This means that only the manager with the largest number of employees is returned.
Example:
Consider the following EmployeeTable
:
1
2
2
3
3
1
4
2
5
3
The following SQL query would return the manager with the largest number of employees:
SELECT ManagerID, COUNT(*) AS NumEmployees
FROM EmployeeTable
GROUP BY ManagerID
ORDER BY NumEmployees DESC
LIMIT 1;
The results of this query would be:
2
2
This indicates that manager 2 manages the largest number of employees (2 employees).
Real-World Applications:
This query can be useful in a variety of real-world scenarios, such as:
Identifying managers who are responsible for a large number of employees.
Analyzing the distribution of employees across different managers.
Making decisions about promotions and job assignments.
Class Performance
Problem:
Given a table student_performance
with the following columns:
student_id
(integer)subject_id
(integer)score
(float)test_time
(timestamp)
Find the student with the highest average score in each subject.
SQL Query:
-- Step 1: Calculate the average score for each student in each subject
SELECT
student_id,
subject_id,
AVG(score) AS avg_score
FROM
student_performance
GROUP BY
student_id,
subject_id;
-- Step 2: Find the highest average score for each subject
WITH SubjectAvg AS (
SELECT
subject_id,
MAX(avg_score) AS max_avg_score
FROM
student_avg
)
-- Step 3: Find the students with the highest average score in each subject
SELECT
student_id,
subject_id
FROM
student_avg
JOIN
SubjectAvg ON student_avg.subject_id = SubjectAvg.subject_id
AND student_avg.avg_score = SubjectAvg.max_avg_score;
Explanation:
Step 1: Calculate the average score for each student in each subject using
GROUP BY
andAVG
.Step 2: Find the highest average score for each subject using
MAX
in a Common Table Expression (CTE) namedSubjectAvg
.Step 3: Join
SubjectAvg
withstudent_avg
to find the students with the highest average score in each subject.
Breakdown:
GROUP BY: Groups the rows in a table based on the specified columns. In this case, we group the rows by student_id
and subject_id
to calculate the average score for each student in each subject.
AVG: Calculates the average value of a numeric column for each group.
Common Table Expressions (CTEs): Temporary tables that can be used in a query. In this case, SubjectAvg
is used to store the highest average score for each subject.
MAX: Finds the maximum value of a numeric column.
JOIN: Combines rows from two or more tables based on a common column. In this case, we join SubjectAvg
and student_avg
on subject_id
to find the students with the highest average score in each subject.
Real-World Applications:
Identifying top performers in a school or organization
Analyzing student performance trends over time
Evaluating the effectiveness of different learning methods
Find the Quiet Students in All Exams
Problem Statement:
Find the students who never participated (scored zero) in all the exams.
Schema:
CREATE TABLE Exam (
StudentId INT PRIMARY KEY,
ExamId INT,
Score INT DEFAULT 0
);
Solution:
SELECT StudentId
FROM Exam
GROUP BY StudentId
HAVING SUM(Score) = 0;
Explanation:
SELECT StudentId: This selects the unique student IDs from the Exam table.
GROUP BY StudentId: This groups the results by student ID, combining all the exam scores for each student.
HAVING SUM(Score) = 0: This filter only returns the student IDs where the sum of all exam scores is zero, indicating that the student never scored non-zero in any exam.
Example Usage:
SELECT StudentId, SUM(Score) AS TotalScore
FROM Exam
GROUP BY StudentId
HAVING SUM(Score) = 0;
Output:
+------------+------------+
| StudentId | TotalScore |
+------------+------------+
| 1 | 0 |
| 3 | 0 |
+------------+------------+
This output shows that students with IDs 1 and 3 have never scored non-zero in any exam.
Real-World Applications:
Identifying students who need academic support: The results of this query can help identify students who may need additional attention or support in their studies.
Evaluating student participation: It can be used to assess student engagement and participation in class activities.
Attendance tracking: By considering exams as attendance records, this query can identify students who have not participated (attended) any exams.
Fix Product Name Format
Problem: Given a table products
with the following schema:
CREATE TABLE products (
product_id INT NOT NULL,
product_name VARCHAR(255) NOT NULL,
PRIMARY KEY (product_id)
);
Convert product names to lowercase and remove special characters. For example, "iPhone 13 Pro" should become "iphone 13 pro".
Solution:
UPDATE products
SET product_name = LOWER(
REGEXP_REPLACE(product_name, '[^a-zA-Z0-9 ]', '')
);
Explanation:
The solution uses two MySQL functions:
LOWER()
converts the product name to lowercase.REGEXP_REPLACE()
removes all non-alphanumeric characters ([^a-zA-Z0-9 ]
) from the product name.
The resulting product name is then stored in the product_name
column.
Example:
INSERT INTO products (product_id, product_name) VALUES
(1, 'iPhone 13 Pro'),
(2, 'Samsung Galaxy S22 Ultra'),
(3, 'Google Pixel 6 Pro');
UPDATE products
SET product_name = LOWER(
REGEXP_REPLACE(product_name, '[^a-zA-Z0-9 ]', '')
);
SELECT * FROM products;
Output:
+-----------+-----------------+
| product_id | product_name |
+-----------+-----------------+
| 1 | iphone 13 pro |
| 2 | samsung galaxy s22 ultra |
| 3 | google pixel 6 pro |
+-----------+-----------------+
Real-World Applications:
Fixing product names in this way can improve search results and data analysis. For example:
A search for "iphone" will now match products with the name "iPhone" or "iphone".
Analysis of product sales by category will be more accurate because all "iPhone" products will be grouped together.
Number of Times a Driver Was a Passenger
Problem Statement:
Given two tables:
drivers
: Contains information about drivers.passengers
: Contains information about passengers.
Find the number of times each driver has been a passenger.
SQL Solution:
SELECT d.id, d.name, COUNT(*) AS passenger_count
FROM drivers AS d
JOIN passengers AS p ON d.id = p.driver_id
GROUP BY d.id, d.name;
Breakdown and Explanation:
JOIN
Operation:We use the
JOIN
operation to combine rows from thedrivers
andpassengers
tables based on the commonid
column.This creates a new table that contains all drivers and their corresponding passenger records.
GROUP BY
Operation:The
GROUP BY
operation groups the results by the driver'sid
andname
columns.This combines all passenger records for each driver into a single row.
COUNT(*)
Function:The
COUNT(*)
function counts the number of passenger records for each driver.
Real-World Applications:
Taxi Service:
Track the number of rides each taxi driver has completed.
Ride-Sharing Service:
Determine which drivers have the highest passenger demand.
Bus Service:
Monitor the number of passengers carried by each bus driver.
Popularity Percentage
Problem Statement:
Given a table containing a list of votes, find the percentage of votes each candidate received.
Table Structure:
CREATE TABLE votes (
candidate TEXT,
num_votes INTEGER
);
SQL Solution:
SELECT
candidate,
(num_votes * 100.0 / SUM(num_votes)) AS percentage
FROM
votes
GROUP BY
candidate
ORDER BY
percentage DESC;
Explanation:
Window Function SUM(): Calculates the total number of votes by summing the
num_votes
column. This provides a benchmark for calculating percentages.Division by SUM(): Divides each candidate's vote count by the total votes to calculate their percentage.
ORDER BY: Sorts the results in descending order of percentage, showing the candidates who received the most votes at the top.
Real-World Applications:
Election Analysis: Determine the popularity of candidates in an election.
Market Research: Track the popularity of products or services.
Social Media Analytics: Analyze the engagement of different posts or content creators.
Customer Feedback: Gauge the satisfaction levels of customers.
Find the Start and End Number of Continuous Ranges
Problem Statement:
Given a table ranges
containing start and end numbers of a range, find the start and end numbers of all continuous ranges.
Example:
Input:
1
5
6
9
10
13
14
16
Output:
1
5
6
9
10
13
14
16
Solution Explanation:
The idea is to use the LAG()
function to get the previous range end, and then check if the current range start is less than or equal to the previous range end. If it is, then the ranges are continuous.
SQL Query:
SELECT MIN(start) AS start, MAX(end) AS end
FROM
(
SELECT *, LAG(end) OVER (ORDER BY start) AS prev_end
FROM ranges
) AS subquery
WHERE start <= prev_end
GROUP BY prev_end
ORDER BY start;
Breakdown of the Query:
Get Previous Range End: The subquery
SELECT *, LAG(end) OVER (ORDER BY start) AS prev_end FROM ranges
adds a columnprev_end
to theranges
table. This column contains the end number of the previous range for each row.Filter Continuous Ranges: The
WHERE start <= prev_end
clause filters out any rows where the current range start is not less than or equal to the previous range end. This ensures that only continuous ranges are included.Group Ranges: The
GROUP BY prev_end
clause groups the rows by theprev_end
column. This merges all continuous ranges with the same previous range end into a single row.Get Range Start and End: The
MIN(start) AS start
andMAX(end) AS end
expressions calculate the start and end numbers of each continuous range, respectively.Order Results: The
ORDER BY start
clause orders the results in ascending order by the range start.
Real-World Applications:
This query can be used in various real-world applications, such as:
Time Management: Finding continuous time slots within a schedule.
Inventory Management: Identifying continuous stock levels that need to be replenished.
Data Analysis: Identifying patterns and trends in continuous data.
Average Selling Price
Problem Statement:
Given two tables, Orders
and Products
, find the average selling price of products.
Tables:
Orders (order_id, product_id, quantity)
Products (product_id, price)
Solution:
SELECT
AVG(p.price) AS average_selling_price
FROM
Orders AS o
JOIN
Products AS p
ON o.product_id = p.product_id;
Breakdown:
Step 1: Join the Tables: We join the
Orders
andProducts
tables on theproduct_id
column to connect orders with product prices.Step 2: Calculate Average Price: We use the
AVG()
function on theprice
column from theProducts
table to calculate the average selling price.
Simplified Explanation:
Imagine a store that sells two products:
Product 1: $10
Product 2: $20
A customer orders 2 units of Product 1 and 1 unit of Product 2. So, the customer pays:
Product 1: $10 x 2 = $20
Product 2: $20 x 1 = $20
Total amount paid: $40 Number of products sold: 3 (2 + 1)
Average selling price = $40 / 3 = $13.33
Real-World Application:
Retail: Calculate the average selling price of products in a retail store.
Manufacturing: Determine the average selling price of finished goods.
Financial Analysis: Analyze the profitability of products by comparing their average selling price with their production costs.
Friends With No Mutual Friends
Problem: Find pairs of friends who do not have any mutual friends.
Example:
Table: Friend
| Person1 | Person2 |
| --------| --------|
| A | B |
| B | C |
In this example, A and C are friends with no mutual friends.
Solution:
SELECT f1.Person1, f1.Person2, f2.Person1, f2.Person2
FROM Friend f1
JOIN Friend f2 ON f1.Person1 = f2.Person2
WHERE NOT f1.Person1 = f2.Person1
AND NOT f1.Person2 = f2.Person2
AND NOT EXISTS (
SELECT 1
FROM Friend
WHERE Person1 = f1.Person1
AND Person2 = f2.Person1
);
Breakdown:
The
JOIN
clause joins theFriend
table with itself using thePerson1
andPerson2
columns. This creates pairs of rows where one person is thePerson1
of one row and thePerson2
of the other row.The
WHERE
clause removes pairs of rows where thePerson1
andPerson2
columns are the same. This ensures that we only consider pairs of different people.The
NOT EXISTS
subquery removes pairs of rows where there exists a mutual friend.
Output:
| Person1 | Person2 | Person1 | Person2 |
| --------| --------| --------| --------|
| A | B | C | D |
Real-World Application:
This query can be used to identify pairs of users in a social network who may be potential friends. By identifying users who do not have any mutual friends, we can recommend them as potential connections.
Finding the Topic of Each Post
Problem Statement:
Write a SQL query to find the topic of each post in a forum. Posts can belong to multiple topics.
Example Table:
1
Hello world!
1
2
This is a post
2
3
About Python
1
4
SQL query
3
5
Java tutorial
2
Answer Table:
1
General
2
Programming
3
Python
4
SQL
5
Java
Explanation:
To find the topic of each post, we need to join the Posts
and Topics
tables on the topic_id
column. Then, we can use the GROUP_CONCAT()
function to concatenate the topic names for each post.
Real-World Applications:
This query can be used to find the topics that are most commonly discussed in a forum. This information can be used to improve the forum's organization and make it easier for users to find the content they are interested in.
Simplified Implementation:
SELECT
post_id,
GROUP_CONCAT(topic_name) AS topic
FROM Posts
INNER JOIN Topics
ON Posts.topic_id = Topics.topic_id
GROUP BY
post_id;
Symmetric Coordinates
Problem:
Given a table coordinates_table
with columns x
, y
, and z
, find all pairs of points that lie symmetrically across the origin.
SQL Solution:
WITH SymmetricPoints AS (
SELECT
x AS x1,
y AS y1,
z AS z1,
-x AS x2,
-y AS y2,
-z AS z2
FROM coordinates_table
)
SELECT
x1,
y1,
z1,
x2,
y2,
z2
FROM SymmetricPoints;
Explanation:
This problem can be solved by creating a new table SymmetricPoints
that contains the original coordinates and their symmetric counterparts. The -
operator can be used to negate the original coordinates and create the symmetric points. The WITH
clause allows us to create a temporary table that can be used in the main query. The final SELECT
statement returns the pairs of symmetric points.
Example:
Consider the following table:
+---+---+---+
| x | y | z |
+---+---+---+
| 1 | 2 | 3 |
| -5 | 4 | -2 |
| 7 | 0 | 1 |
+---+---+---+
The SymmetricPoints
table would look like this:
+----+----+----+----+----+----+
| x1 | y1 | z1 | x2 | y2 | z2 |
+----+----+----+----+----+----+
| 1 | 2 | 3 | -1 | -2 | -3 |
| -5 | 4 | -2 | 5 | -4 | 2 |
| 7 | 0 | 1 | -7 | 0 | -1 |
+----+----+----+----+----+----+
The final result would be:
+----+----+----+----+----+----+
| x1 | y1 | z1 | x2 | y2 | z2 |
+----+----+----+----+----+----+
| 1 | 2 | 3 | -1 | -2 | -3 |
| -5 | 4 | -2 | 5 | -4 | 2 |
+----+----+----+----+----+----+
Real-World Application:
This problem has applications in physics, geometry, and computer graphics. For example, it can be used to find the center of mass of a system or to compute the moment of inertia of an object.
User Purchase Platform
Problem Statement
Given a table purchases
with the following schema:
| user_id | product_id | purchase_date | purchase_amount |
Write a SQL query to create a user purchase platform that shows the top 10 products purchased by each user.
Solution
WITH UserPurchaseCounts AS (
SELECT
user_id,
product_id,
COUNT(*) AS purchase_count
FROM purchases
GROUP BY
user_id,
product_id
),
RankedProducts AS (
SELECT
user_id,
product_id,
purchase_count,
RANK() OVER (PARTITION BY user_id ORDER BY purchase_count DESC) AS rank
FROM UserPurchaseCounts
)
SELECT
user_id,
product_id
FROM RankedProducts
WHERE
rank <= 10;
Breakdown
1. UserPurchaseCounts Subquery:
This subquery groups the purchases by user_id
and product_id
and counts the number of purchases made for each combination.
2. RankedProducts Subquery:
This subquery ranks the products within each user's purchase history using the RANK()
function. The PARTITION BY
clause ensures that the ranking is done separately for each user.
3. Final Query:
The final query selects the user_id
and product_id
for all products that are ranked within the top 10 for each user.
Real-World Application
This query can be used to create personalized user purchase experiences, such as:
Product Recommendations: Showing users products they are likely to purchase based on their previous purchases.
Loyalty Programs: Rewarding users for purchasing specific products or combinations of products.
Fraud Detection: Identifying unusual purchase patterns that may indicate fraudulent activity.
The Category of Each Member in the Store
Problem: Given a table that stores the categories of members in a store, find the category of each member.
Table:
1
Gold
2
Silver
3
Bronze
Query:
SELECT
member_id,
category
FROM members;
Output:
1
Gold
2
Silver
3
Bronze
Explanation:
The query simply selects all the columns from the members
table. This will give us a list of all the members and their respective categories.
Real-World Applications:
This query can be used in a variety of real-world applications, such as:
Customer segmentation: Businesses can use this query to segment their customers into different categories based on their membership status. This information can then be used to tailor marketing campaigns and other promotions.
Loyalty programs: Businesses can use this query to track the progress of their members in loyalty programs. This information can then be used to reward members for their loyalty.
Fraud detection: Businesses can use this query to identify members who are attempting to commit fraud. For example, a business could flag members who are trying to make purchases with stolen credit cards.
Binary Tree Nodes
SQL Solution
WITH RECURSIVE Tree AS (
SELECT id, parent_id, 1 AS depth
FROM table_name
WHERE parent_id IS NULL -- Start with the root node
UNION ALL
SELECT t.id, t.parent_id, tr.depth + 1
FROM table_name t
JOIN Tree tr ON t.parent_id = tr.id
)
SELECT id, depth FROM Tree;
Explanation
Recursive Common Table Expression (CTE): The
WITH RECURSIVE Tree AS ( ... )
clause defines a recursive CTE namedTree
. It serves as the base for generating a hierarchical representation of the tree structure.Initialization: The base case of the recursion selects the root node (where
parent_id
isNULL
) and sets its depth to 1.Recursive Step: The recursive part selects child nodes and increments their depth by 1 based on the depth of their parent nodes. This step populates the CTE with all the nodes and their respective depths.
Projection: The final
SELECT
statement projects theid
anddepth
columns from theTree
CTE. This gives us the desired result: a list of nodes with their corresponding depths.
Real-World Applications
Genealogical Trees: Represent family relationships and track lineage.
Organizational Charts: Model hierarchical structures within companies or organizations.
File Systems: Organize files and folders into a nested hierarchy.
Graph Algorithms: Perform depth-first or breadth-first search operations on tree structures.
Dynamic Unpivoting of a Table
Problem:
You have a table containing multiple columns, and you want to "unpivot" it to create a new table with two columns: one for the original column names and one for the corresponding values.
Example:
Original Table:
John
25
New York
Mary
30
London
Bob
28
Paris
Unpivoted Table:
Name
John
Age
25
City
New York
Name
Mary
Age
30
City
London
Name
Bob
Age
28
City
Paris
Solution:
The Dynamic Unpivoting technique involves using a combination of SQL functions and dynamic SQL to create the unpivoted table. Here's a step-by-step breakdown:
Create a temporary table to store the column names:
CREATE TEMP TABLE ColumnNames AS
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'OriginalTable';
Generate the unpivoted SQL statement dynamically:
DECLARE @UnpivotSQL nvarchar(max) = '';
SELECT @UnpivotSQL += 'SELECT '""' + Column_Name + '""" AS Column_Name, ' + Column_Name + ' AS Value FROM OriginalTable '
FROM ColumnNames;
-- Remove the trailing space and semicolon
SET @UnpivotSQL = LEFT(@UnpivotSQL, LEN(@UnpivotSQL) - 1);
Execute the dynamic SQL statement:
EXEC (@UnpivotSQL);
Explanation:
The
INFORMATION_SCHEMA.COLUMNS
table provides metadata about the columns in the original table.The first SQL statement creates a temporary table
ColumnNames
that contains the names of all columns in the original table.The second SQL statement dynamically generates the unpivot SQL statement using the column names from the
ColumnNames
table.The
EXEC
statement executes the dynamically generated SQL statement, creating the unpivoted table.
Potential Applications:
Reporting: Unpivoting can help create reports that summarize data across multiple columns in a table.
Data Analysis: Unpivoted data can be used to perform more complex data analysis tasks, such as identifying trends and patterns.
Data Integration: Unpivoting can be useful when integrating data from different sources with different schemas.
Pizza Toppings Cost Analysis
Problem Statement
You have a pizza shop and offer various toppings to your customers. Each topping has a different cost per serving. Given a list of orders, you need to calculate the total cost of all the toppings for each order.
Input Table:
1
1
2
1
2
1
2
1
1
2
3
2
Toppings Table:
1
Pepperoni
0.50
2
Mushrooms
0.25
3
Onions
0.15
Output Table:
1
1.25
2
0.60
Solution in SQL
SELECT o.order_id, SUM(t.cost_per_serving * o.quantity) AS total_topping_cost
FROM orders o
JOIN toppings t ON o.topping_id = t.topping_id
GROUP BY o.order_id;
Explanation
INNER JOIN: The query starts by joining the
orders
table (aliased aso
) with thetoppings
table (aliased ast
) on thetopping_id
column using an INNER JOIN. This operation ensures that only orders with valid topping IDs are considered.SUM Aggregation: After joining the tables, the query uses
SUM()
to calculate the total topping cost for each order. It multiplies thecost_per_serving
for each topping by thequantity
ordered and then sums these values for each order.GROUP BY: The
GROUP BY
clause groups the results by theorder_id
column. This step ensures that the total topping cost is calculated separately for each order.
Real-World Example
This query can be used in a real-world pizza shop to calculate the total cost of toppings for each customer's order. This information can be used for inventory management, cost analysis, and billing purposes.
Winning Candidate
SQL Code:
WITH CandidateScores AS (
SELECT candidate, SUM(score) AS total_score
FROM Votes
GROUP BY candidate
),
WinningCandidate AS (
SELECT candidate
FROM CandidateScores
WHERE total_score = (SELECT MAX(total_score) FROM CandidateScores)
)
SELECT *
FROM WinningCandidate;
Breakdown and Explanation:
Common Table Expression (CTE):
CandidateScores: This CTE calculates the total score for each candidate by summing the scores from the Votes table.
Subquery:
SELECT MAX(total_score) FROM CandidateScores
: This subquery finds the maximum total score from all candidates.
WinningCandidate CTE:
This CTE selects the candidate with the maximum total score, effectively identifying the winning candidate.
SELECT Statement:
Finally, the
SELECT *
statement retrieves all columns from the WinningCandidate CTE, which contains the information about the winning candidate.
Example:
Suppose we have a Votes table with the following data:
John
40
Mary
50
Bob
30
The Winning Candidate query would produce the following result:
| candidate | |---|---| | Mary |
Applications in Real World:
This query can be used in various scenarios:
Election Results: Determine the winner of an election based on the number of votes received.
Customer Feedback: Identify the best-rated product or service based on customer reviews.
School Performance: Find the top-performing student in a class based on their exam scores.
Primary Department for Each Employee
Problem Statement:
Given a table named Employees
with the following columns:
id
(int)name
(string)department
(string)
You need to find the primary department for each employee. The primary department is the department with the highest number of employees.
SQL Solution:
SELECT
e.name,
d.name AS department
FROM Employees AS e
JOIN (
SELECT
department,
COUNT(*) AS employee_count
FROM Employees
GROUP BY
department
ORDER BY
employee_count DESC
LIMIT 1
) AS d
ON e.department = d.department;
Breakdown:
Join the Employees Table: We start by joining the
Employees
table with a subquery to find the department with the highest number of employees.Subquery: The subquery calculates the employee count for each department and sorts the results in descending order.
Limit 1: We only want the department with the highest count, so we limit the results to 1 row.
Join on Department: We then join the
Employees
table with the subquery result on the department column to identify the primary department for each employee.
Example:
Consider the following Employees
table:
1
John
Sales
2
Mary
Engineering
3
Bob
Sales
4
Jane
Engineering
5
Tom
Marketing
Result:
| name | department |
|---|---|
| John | Sales |
| Mary | Engineering |
| Bob | Sales |
| Jane | Engineering |
| Tom | Marketing |
Real-World Application:
This query can be useful in HR systems to identify the primary department of employees for various purposes, such as:
Staffing decisions
Resource allocation
Performance evaluations
Countries You Can Safely Invest In
Problem Statement
Given a table of countries and their risk ratings, find the countries that are safe to invest in.
Table Schema
CREATE TABLE countries (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
risk_rating INT NOT NULL,
PRIMARY KEY (id)
);
Solution
SELECT name
FROM countries
WHERE risk_rating <= 5;
Explanation
The SELECT
statement retrieves the name
column from the countries
table. The WHERE
clause filters the results to only include countries with a risk_rating
of 5 or less.
Example
SELECT name
FROM countries
WHERE risk_rating <= 5;
| name | |---|---| | Canada | | Switzerland | | Sweden |
**Real World Applications**
This query can be used by investors to identify countries that are considered safe for making investments. By investing in countries with low risk ratings, investors can reduce the risk of losing their money.
---
# Game Play Analysis II
**Problem Statement:**
Given two tables:
Game_Play ( player_id INT, game_id INT, start_time TIMESTAMP, end_time TIMESTAMP )
Player ( player_id INT, name VARCHAR(255) )
Find the top 10 players with the longest total playtime.
**Best & Performant Solution:**
```sql
WITH PlayerTotalTime AS (
SELECT
player_id,
SUM(end_time - start_time) AS total_time
FROM Game_Play
GROUP BY player_id
)
SELECT
p.name,
ptt.total_time
FROM Player p
JOIN PlayerTotalTime ptt ON p.player_id = ptt.player_id
ORDER BY ptt.total_time DESC
LIMIT 10;
Explanation:
The solution uses a Common Table Expression (CTE) named PlayerTotalTime
to calculate the total playtime for each player. It then joins the Player
table with the PlayerTotalTime
CTE to get the player names and total playtime. Finally, it orders the results by total playtime in descending order and limits the results to the top 10 players.
Simplified Explanation:
Create a new table called
PlayerTotalTime
that calculates the total playtime for each player.Join the
Player
table with thePlayerTotalTime
table to get the player names and total playtime.Sort the results by the total playtime in descending order.
Limit the results to the top 10 players.
Real World Implementation and Examples:
This problem can be used to analyze gameplay data in real-world applications. For example, a game developer could use this query to identify the players who are most engaged with their game. This information could then be used to tailor marketing campaigns or improve game design.
Potential Applications:
Identifying the most engaged players in a game.
Analyzing player behavior patterns.
Improving game design by understanding how players interact with the game.
Employee Bonus
Problem:
Given a table of employee bonuses, find the total bonus received by each employee.
Table:
bonuses (employee_id, bonus_amount)
Example:
1
100
2
200
1
300
3
400
Output:
1
400
2
200
3
400
Solution:
SELECT employee_id, SUM(bonus_amount) AS total_bonus
FROM bonuses
GROUP BY employee_id;
Explanation:
The
SUM(bonus_amount)
function calculates the total bonus received by each employee.The
GROUP BY employee_id
clause groups the results by employee ID, so that the total bonus is calculated for each employee.
Real-World Application:
This query can be used to generate a report of the total bonuses received by employees in a company. This report can be used for performance evaluations, compensation planning, and other HR-related tasks.
Second Degree Follower
Problem Statement:
Given a table of social media followers, find the second-degree followers of a given user.
Table Schema:
CREATE TABLE followers (
id INT PRIMARY KEY,
follower INT,
followed INT,
FOREIGN KEY (follower) REFERENCES users (id),
FOREIGN KEY (followed) REFERENCES users (id)
);
Example Data:
INSERT INTO followers (id, follower, followed) VALUES
(1, 1, 2),
(2, 1, 3),
(3, 2, 4),
(4, 3, 5),
(5, 4, 6);
Example Query:
SELECT u.name
FROM users u
JOIN followers f ON u.id = f.follower
JOIN followers sf ON f.followed = sf.follower
WHERE f.followed = 2;
Output:
John
Breakdown and Explanation:
1. Find the followers of the given user (first-degree followers):
SELECT follower FROM followers WHERE followed = 2;
This query returns the IDs of the first-degree followers of user 2, which are 1.
2. Find the followers of the first-degree followers (second-degree followers):
SELECT follower FROM followers WHERE followed IN (SELECT follower FROM followers WHERE followed = 2);
This query returns the IDs of the second-degree followers of user 2, which are 4 and 6.
3. Join the results to get the names of the second-degree followers:
SELECT u.name FROM users u JOIN followers f ON u.id = f.follower WHERE f.followed IN (SELECT follower FROM followers WHERE followed = 2);
This query joins the users
table with the followers
table using the follower
column to get the names of the second-degree followers.
Potential Applications:
Social Media Platform: Identify the second-degree connections of users to provide recommendations or targeted advertising.
Customer Relationship Management (CRM): Track the connections between customers to understand their relationships and interactions.
Fraud Detection: Identify potential fraud by analyzing the connections between users involved in suspicious activities.
Find Cutoff Score for Each School
Problem:
Find the cutoff score for each school, where the cutoff score is the minimum score required to be admitted to that school.
Table:
Students (id, name, score, school_id)
Schools (id, name)
SQL Query:
WITH SchoolCutoff AS (
SELECT
S.school_id,
MIN(S.score) AS cutoff_score
FROM Students AS S
GROUP BY
S.school_id
)
SELECT
S.name,
C.cutoff_score
FROM Schools AS S
JOIN SchoolCutoff AS C
ON S.id = C.school_id;
Breakdown:
Step 1: Calculate Cutoff Scores for Each School
The subquery SchoolCutoff
calculates the minimum score (cutoff score) for each school. It groups the students by their school ID and finds the minimum score for each group.
Step 2: Join Schools and Cutoff Scores
The main query joins the Schools
table with the SchoolCutoff
subquery using the school ID. This combines the school names with their respective cutoff scores.
Result:
The result is a table that lists each school's name and its corresponding cutoff score.
Example:
Students Table:
1
John
75
1
2
Mary
80
1
3
Bob
65
2
4
Alice
70
2
Schools Table:
1
Harvard
2
Stanford
Result:
Harvard
75
Stanford
65
Real-World Application:
This query can be used by universities to determine the cutoff scores for admission to different programs or schools. It helps ensure that only students who meet or exceed the minimum requirements are admitted.
Department Highest Salary
Problem Statement: Given a table containing employee data, including their department and salary, find the highest salary for each department.
SQL Query:
SELECT department, MAX(salary) AS highest_salary
FROM employee
GROUP BY department;
Explanation:
The query can be broken down into the following steps:
Select the department and maximum salary: The
SELECT
statement selects two columns:department
: The department of the employee.MAX(salary)
: The maximum salary for each department. This is calculated using theMAX()
aggregate function.
Group by department: The
GROUP BY
clause groups the results by department. This means that for each department, the maximum salary will be calculated based on all employees in that department.
Example:
Consider the following employee table:
1
Sales
50000
2
Marketing
60000
3
Sales
45000
4
Engineering
70000
The query will return the following result:
Sales
50000
Marketing
60000
Engineering
70000
Real-World Applications:
This query can be used in various real-world applications, such as:
Human Resource Management: To identify the highest-paid employee in each department for salary negotiations.
Budget Planning: To estimate the total budget required for salaries in each department based on the highest salary.
Performance Analysis: To compare the performance of different departments based on the average or highest salary of their employees.
Immediate Food Delivery I
LeetCode Problem: Immediate Food Delivery I
Problem Statement:
A delivery company provides immediate food delivery services. You are given a table named orders
that contains the following columns:
order_id
INT
Unique ID of the order
restaurant_id
INT
ID of the restaurant
rider_id
INT
ID of the rider assigned to deliver the order
delivery_time
VARCHAR(20)
Time taken for delivery in "HH:MM:SS" format
You need to write a SQL query that calculates the average delivery time for each rider.
Solution:
-- Calculate the average delivery time for each rider
SELECT rider_id,
AVG(delivery_time) AS avg_delivery_time
FROM orders
GROUP BY rider_id;
Explanation:
SELECT rider_id, AVG(delivery_time): This line selects the
rider_id
and calculates the averagedelivery_time
for each rider.GROUP BY rider_id: This line groups the results by
rider_id
, so that the average delivery time is calculated for each rider separately.
Example:
1
00:15:00
2
00:20:00
3
00:25:00
Real World Applications:
This query can be used by the delivery company to:
Identify riders who have consistently fast delivery times.
Monitor rider performance and provide training or support to improve delivery times.
Optimize delivery routes and allocate riders to minimize delivery times.
Active Users
Problem: Find the number of active users on a website.
Input: A table named user_sessions
with the following columns:
user_id
: The unique ID of the user.start_time
: The time when the user's session started.end_time
: The time when the user's session ended.
Output: A table named active_users
with the following columns:
user_id
: The unique ID of the active user.
Solution:
SELECT DISTINCT user_id
FROM user_sessions
WHERE start_time >= NOW() - INTERVAL 1 DAY;
Breakdown:
The
SELECT DISTINCT user_id
clause selects the distinct user IDs of active users.The
FROM user_sessions
clause specifies the input table.The
WHERE start_time >= NOW() - INTERVAL 1 DAY
clause filters the rows to only include user sessions that started within the last 24 hours.
Performance:
The query is optimized for performance because it uses the following techniques:
Indexing: The
user_sessions
table should be indexed on thestart_time
column to improve the performance of the query.Filtering: The
WHERE
clause filters out inactive users, which reduces the number of rows that need to be processed.DISTINCT: The
DISTINCT
keyword prevents duplicate user IDs from being returned, which can improve performance.
Real-World Application:
This query can be used to identify active users on a website for various purposes, such as:
Website Analytics: Tracking the number of active users can provide insights into website traffic and engagement.
Targeted Marketing: Active users can be targeted with personalized marketing campaigns.
Customer Support: Identifying active users can help customer support teams prioritize support requests.
Leetflex Banned Accounts
Problem Statement:
A social media platform, Leetflex, bans accounts for violating their community guidelines. They store the banned accounts in a table called BannedAccounts
. Given a list of account IDs, determine which accounts are banned.
SQL Implementation:
SELECT account_id
FROM BannedAccounts
WHERE account_id IN (SELECT account_id FROM InputAccounts);
Simplified Explanation:
Input Table: The
InputAccounts
table contains the list of account IDs we want to check.BannedAccounts Table: The
BannedAccounts
table contains the account IDs of all banned accounts.IN Operator: The
IN
operator checks if the account ID from theInputAccounts
table is present in theBannedAccounts
table.SELECT Statement: The query retrieves the account IDs from the
BannedAccounts
table that match the ones in theInputAccounts
table.
Real-World Application:
This query can be used by Leetflex to identify banned accounts from a list of user-submitted account IDs. This is essential for enforcing their community guidelines and maintaining the integrity of the platform. For example, if someone reports an account for harassment, Leetflex can use this query to determine if the account has already been banned.
Example:
InputAccounts Table:
1
2
3
BannedAccounts Table:
2
4
Query Result:
2
Explanation: Account ID 2 is present in both the InputAccounts
and BannedAccounts
tables, indicating that it is a banned account.
Find Customers With Positive Revenue this Year
SQL Query
SELECT customer_id
FROM customers
WHERE year(revenue_date) = year(now()) AND revenue > 0;
Explanation
This query finds customers who have generated positive revenue in the current year. It uses the following steps:
Extract the year from the
revenue_date
column: Theyear()
function extracts the year component from a date value. This is used to filter out revenue data from previous years.Compare the extracted year with the current year: The
year(now())
expression returns the current year. The query checks if the year extracted from therevenue_date
column matches the current year.Filter for positive revenue: The
revenue > 0
condition ensures that only customers with positive revenue are included in the results.
Real-World Examples
This query can be used in various real-world applications, such as:
Sales analysis: Identifying customers who are generating revenue can help businesses understand their customer base and target sales efforts accordingly.
Customer segmentation: Segmenting customers based on their revenue can help businesses personalize marketing campaigns and provide tailored promotions.
Customer retention: Focusing on customers with positive revenue can help businesses identify valuable customers and develop strategies to retain them.
Potential Applications
Customer relationship management (CRM) systems
Financial reporting platforms
Sales analytics dashboards
Find the Missing IDs
Problem Statement
Given a table employees
with the following columns:
employee_id
(primary key)employee_name
manager_id
(foreign key referencing theemployee_id
column of the same table)
Find the employee IDs of employees who don't have managers.
SQL Query
SELECT employee_id
FROM employees
WHERE manager_id IS NULL;
Explanation
The
SELECT
clause retrieves theemployee_id
column.The
FROM
clause selects from theemployees
table.The
WHERE
clause specifies that only employees with aNULL
value in themanager_id
column are selected.
Real-World Application
This query can be used to find employees who report directly to the CEO or other top-level executives. This information can be useful for organizational planning and communication.
Find Total Time Spent by Each Employee
Problem Statement:
Given a table EmployeeHours
that stores employee hours worked, find the total time spent by each employee.
Example:
Table:
1
2022-01-01
8
1
2022-01-02
6
2
2022-01-01
10
Output:
1
14
2
10
Solution:
SELECT
employee_id,
SUM(hours_worked) AS total_hours
FROM EmployeeHours
GROUP BY
employee_id;
Explanation:
SELECT employee_id, SUM(hours_worked) AS total_hours: This line calculates the total hours worked for each employee and assigns it to the alias
total_hours
.FROM EmployeeHours: Specifies the table from which to fetch data.
GROUP BY employee_id: Groups the data by employee ID, ensuring that each employee's hours are summed correctly.
Real-World Applications:
Tracking employee productivity by calculating total hours worked.
Identifying employees with high or low work volume.
Planning employee schedules based on total hours available.
Project Employees I
Problem:
Find all employees who work on at least three different projects.
SQL Query:
SELECT EmployeeID
FROM Employees
GROUP BY EmployeeID
HAVING COUNT(ProjectID) >= 3;
Explanation:
SELECT EmployeeID
: Select the unique employee IDs.FROM Employees
: From the 'Employees' table.GROUP BY EmployeeID
: Group the results by employee ID to count the number of projects each employee works on.HAVING COUNT(ProjectID) >= 3
: Filter the results to include only employees who work on three or more projects.
Example:
10
3
20
5
30
2
Result:
| EmployeeID | |---|---| | 20 |
Breakdown:
Group BY: Groups rows in a table based on one or more columns. In this case, we group by 'EmployeeID' to count the number of projects for each employee.
Having: Filters the results of a group operation based on a condition. In this case, we filter to include only groups (employees) with three or more projects.
Real-World Application:
This query can be used to identify employees who are involved in multiple projects, which can be helpful in:
Project management: Tracking employee workload and project involvement.
Resource allocation: Identifying employees with diverse skills who can contribute to multiple projects.
Performance evaluation: Assessing employees based on their contributions to different projects.
Consecutive Available Seats
Problem Statement:
Find the maximum number of consecutive available seats in a row of seats in a theater.
SQL Query:
WITH SeatIntervals AS (
SELECT seat_number AS start, seat_number + 1 AS end
FROM seats
WHERE is_available = 1
), GapIntervals AS (
SELECT start, start - 1 AS gap_size
FROM SeatIntervals
WHERE end IN (SELECT start FROM SeatIntervals)
)
SELECT MAX(gap_size) AS max_consecutive_available_seats
FROM GapIntervals;
Breakdown and Explanation:
SeatIntervals: This CTE (Common Table Expression) creates intervals for each consecutive available seat. Each interval starts at the seat number and ends at the next seat number plus 1. For example, if seats 5 and 6 are available, it will create an interval [5, 6].
GapIntervals: This CTE creates intervals for the gaps between consecutive available seats. It starts at the start of each SeatInterval and ends at the start minus 1. For example, if there is a gap between seats 4 and 5, it will create an interval [4, 3].
MAX(gap_size): Finally, we calculate the maximum gap size, which represents the maximum number of consecutive available seats.
Real-World Application:
This query can be used in a theater booking system to find the best seats for a group of people. By maximizing the number of consecutive available seats, you can ensure that the group can sit together.
Find Users With Valid E-Mails
Problem Statement:
Find all users in a database table who have valid email addresses. A valid email address is one that contains an "@" symbol and a period ".".
Solution:
SELECT *
FROM Users
WHERE email LIKE '%@%' AND email LIKE '%.%';
Breakdown and Explanation:
*SELECT : This selects all columns (fields) from the "Users" table.
FROM Users: This specifies which table to search in, in this case, the "Users" table.
WHERE: This is the filtering condition that specifies which rows to select.
email LIKE '%@%': This checks if the email column contains the "@" symbol anywhere within the string. The wildcard character "%" matches any number of characters.
AND email LIKE '%.%': This checks if the email column contains the "." symbol anywhere within the string.
Together, these conditions ensure that only rows with email addresses that contain both "@" and "." are selected.
Real-World Application:
This query can be useful in various scenarios:
Validating user input: Websites and applications often require users to provide email addresses, and this query can ensure that the entered emails are in a valid format.
Cleaning up data: Databases can contain outdated or invalid email addresses, and this query can help identify and remove such records.
Marketing campaigns: Email marketing campaigns rely on accurate email addresses to reach the intended recipients. This query can help ensure that the target list contains only valid emails.
Trips and Users
Problem:
Given two tables: Trips
and Users
.
Trips:
| id | user_id | start_time | end_time |
|---|---|---|---|
| 1 | 10 | 2022-01-01 10:00:00 | 2022-01-01 12:00:00 |
| 2 | 15 | 2022-01-02 14:00:00 | 2022-01-02 16:00:00 |
| 3 | 20 | 2022-01-03 09:00:00 | 2022-01-03 11:00:00 |
Users:
| user_id | name |
|---|---|
| 10 | John |
| 15 | Mary |
| 20 | Bob |
Find the total duration of trips for each user.
SQL Solution:
SELECT u.name, SUM(TIMESTAMPDIFF(SECOND, t.start_time, t.end_time)) AS total_duration
FROM Trips t
JOIN Users u ON t.user_id = u.user_id
GROUP BY u.name;
Explanation:
Join the
Trips
andUsers
tables: This is done using theJOIN
clause, which matches rows from the two tables based on the common columnuser_id
.Calculate the duration of each trip: This is done using the
TIMESTAMPDIFF()
function, which calculates the time difference between two timestamps (in seconds).Sum the durations for each user: The
SUM()
function is used to sum the durations of all trips for each user.Group the results by user name: The
GROUP BY
clause groups the results by thename
column from theUsers
table.
Output:
| name | total_duration |
|---|---|
| John | 7200 |
| Mary | 7200 |
| Bob | 7200 |
Additional Notes:
This query assumes that the timestamps in the
Trips
table are in the same time zone. If they are not, you may need to use theCONVERT_TIMEZONE()
function to convert them to a common time zone.The query can be optimized using an index on the
user_id
column in theTrips
table.This query can be used in a variety of real-world applications, such as:
Calculating the total duration of trips for employees in a travel expense reimbursement system.
Analyzing the usage of shared vehicles in a carpooling application.
Determining the most popular travel routes for a ride-sharing service.
Percentage of Users Attended a Contest
SELECT user_id,
COUNT(*) AS total_contests_attended
FROM contest_logs
WHERE contest_id IN (SELECT contest_id
FROM contests
WHERE start_date >= '2022-01-01'
AND end_date <= '2022-12-31')
GROUP BY user_id
ORDER BY total_contests_attended DESC;
In this query:
The
contest_logs
table contains a record for each time a user attends a contest. Each row includes theuser_id
and thecontest_id
.The
contests
table contains information about each contest, including thecontest_id
,start_date
, andend_date
.The subquery in the
WHERE
clause selects thecontest_id
s of all contests that took place between '2022-01-01' and '2022-12-31'.The
GROUP BY
clause groups the results byuser_id
to count the number of contests each user attended.The
ORDER BY
clause sorts the results in descending order by the number of contests attended.
Monthly Transactions II
Problem Statement:
Given a table Transactions
containing the following columns:
user_id
(int)month
(string)transaction_amount
(int)
You need to find the total transaction amount for each user in each month.
Best & Performant SQL Solution:
SELECT
user_id,
month,
SUM(transaction_amount) AS total_amount
FROM Transactions
GROUP BY
user_id,
month;
Breakdown and Explanation:
SELECT ...
specifies the columns to be included in the result.user_id
is the user identifier.month
is the month in which the transaction occurred.SUM(transaction_amount)
calculates the total amount of transactions for each user in each month.
FROM Transactions
specifies the table from which the data is retrieved.GROUP BY
groups the results by the user ID and month to calculate the total amount for each user in each month.
Real-World Implementation and Examples:
This query can be used in various real-world applications, such as:
Financial Analysis: To analyze user spending patterns over time.
Customer Segmentation: To identify users with similar transaction behaviors.
Sales Forecasting: To predict future transaction amounts based on historical data.
Example:
Consider the following Transactions
table:
1
January
100
1
February
200
2
January
150
2
March
250
3
February
300
3
March
400
The following query would return the total transaction amount for each user in each month:
SELECT
user_id,
month,
SUM(transaction_amount) AS total_amount
FROM Transactions
GROUP BY
user_id,
month;
Output:
1
January
100
1
February
200
2
January
150
2
March
250
3
February
300
3
March
400
Customers Who Bought Products A and B but Not C
Problem Statement: Find customers who have purchased products A and B, but haven't purchased product C.
SQL Solution:
WITH ProductPurchases AS (
SELECT
customer_id,
product_id
FROM
purchases
), PurchasesCounts AS (
SELECT
customer_id,
SUM(CASE WHEN product_id = 'A' THEN 1 ELSE 0 END) AS num_bought_A,
SUM(CASE WHEN product_id = 'B' THEN 1 ELSE 0 END) AS num_bought_B,
SUM(CASE WHEN product_id = 'C' THEN 1 ELSE 0 END) AS num_bought_C
FROM
ProductPurchases
GROUP BY
customer_id
)
SELECT
customer_id
FROM
PurchasesCounts
WHERE
num_bought_A > 0 AND num_bought_B > 0 AND num_bought_C = 0;
Breakdown:
ProductPurchases: A common table expression (CTE) that selects the
customer_id
andproduct_id
from thepurchases
table.PurchasesCounts: Another CTE that groups the purchases by
customer_id
and counts the number of times products A, B, and C were purchased.The final query uses the
PurchasesCounts
CTE to selectcustomer_id
s where products A and B were purchased more than 0 times, but product C was not purchased.
Real-World Application:
Identifying customers who may be interested in purchasing product C based on their past purchases.
Understanding customer behavior and product preferences.
Targeted marketing campaigns to offer product C to specific customers.
Bank Account Summary
Problem Statement:
Write an SQL query to get a summary of a bank account. The summary should include the account number, account balance, and a list of all transactions made on the account.
Sample Data:
Accounts:
+-------------+-------------+
| account_id | balance |
+-------------+-------------+
| 1001 | 1000.00 |
| 1002 | 500.00 |
| 1003 | 2500.00 |
+-------------+-------------+
Transactions:
+-------------+-------------+-------------+
| transaction_id | account_id | amount |
+-------------+-------------+-------------+
| 2001 | 1001 | 100.00 |
| 2002 | 1001 | 200.00 |
| 2003 | 1002 | 50.00 |
| 2004 | 1003 | 1000.00 |
+-------------+-------------+-------------+
Solution:
WITH AccountSummary AS (
SELECT
a.account_id,
a.balance,
SUM(t.amount) AS total_transactions
FROM
Accounts a
LEFT JOIN
Transactions t ON a.account_id = t.account_id
GROUP BY
a.account_id, a.balance
)
SELECT
account_id,
balance,
total_transactions,
(
SELECT
GROUP_CONCAT(amount)
FROM
Transactions t
WHERE
t.account_id = AccountSummary.account_id
) AS transactions
FROM
AccountSummary;
Explanation:
Create a Common Table Expression (CTE) named
AccountSummary
to calculate the account summary:Join the
Accounts
andTransactions
tables using a left join on theaccount_id
column.Group the results by
account_id
andbalance
.Calculate the total amount of transactions for each account using
SUM(t.amount)
and alias it astotal_transactions
.
Select the Columns:
From the
AccountSummary
CTE, select theaccount_id
,balance
,total_transactions
, and a subquery to retrieve the list of transactions.
Subquery to Get Transactions:
The subquery selects the
amount
column from theTransactions
table wheret.account_id
matches theAccountSummary.account_id
.The results are concatenated using
GROUP_CONCAT
and returned as thetransactions
column.
Output:
+-------------+-------------+-------------+------------------------+
| account_id | balance | total_transactions | transactions |
+-------------+-------------+-------------+------------------------+
| 1001 | 1000.00 | 2 | 100.00,200.00 |
| 1002 | 500.00 | 1 | 50.00 |
| 1003 | 2500.00 | 1 | 1000.00 |
+-------------+-------------+-------------+------------------------+
Real-World Application:
This query can be used to provide a summary of a bank account to the account holder or for internal reporting purposes. The summary includes key information such as the account balance, total transactions, and a list of all transactions made on the account.
Find Median Given Frequency of Numbers
SQL Solution:
WITH FrequencyTable AS (
SELECT number, frequency
FROM frequency_table
)
SELECT number
FROM (
SELECT number, SUM(frequency) OVER (ORDER BY number ASC) AS partial_sum
FROM FrequencyTable
) AS CumulativeTable
WHERE partial_sum = FLOOR((
SELECT SUM(frequency)
FROM FrequencyTable
) / 2);
Explanation:
FrequencyTable: Create a temporary table to store the numbers and their frequencies.
CumulativeTable: Calculate the cumulative sum of frequencies for each number. This helps us identify the median number, which is the number with a cumulative sum equal to half the total sum of frequencies.
Median Query: Select the number that satisfies this condition: its cumulative sum is equal to the floor (rounded down) of half the total sum of frequencies. This gives us the median number.
Example:
Suppose we have a frequency table like this:
1
4
2
6
3
2
Cumulative Table:
1
4
2
10
3
12
Median Query:
SELECT number
FROM CumulativeTable
WHERE partial_sum = FLOOR((SELECT SUM(frequency) FROM FrequencyTable) / 2);
This query returns the number 2
, which is the median since its cumulative sum (10) is equal to half the total sum of frequencies (12).
Applications:
Finding the median of a distribution is useful in many real-world applications, such as:
Data Analysis: Identifying the middle value of a dataset, representing the "typical" value.
Statistics: Calculating the 50th percentile, which is often used to summarize data.
Machine Learning: Evaluating the performance of models by using the median as a threshold or target value.
Immediate Food Delivery II
Problem:
Given two tables:
Restaurants
(restaurant_id
(int)restaurant_name
(string)address
(string)phone_number
(string)
Orders
(order_id
(int)restaurant_id
(int)customer_id
(int)order_time
(string)total_amount
(int)
Customers
(customer_id
(int)customer_name
(string)address
(string)phone_number
(string)
Find all restaurants that can deliver food to customers within a certain time frame T
. The time frame T
can be calculated as the difference between the current time and the order time.
SOLUTION:
SELECT DISTINCT
R.restaurant_id,
R.restaurant_name
FROM Restaurants AS R
JOIN Orders AS O
ON R.restaurant_id = O.restaurant_id
WHERE
O.order_time >= DATE_SUB(NOW(), INTERVAL T MINUTE);
Explanation:
Join the
Restaurants
andOrders
tables on the common columnrestaurant_id
to link restaurant information with customer orders.Filter the orders by checking if the order time is greater than or equal to the current time minus the specified time frame
T
. This condition ensures that we only select orders that were placed within the specified time frame.Group the results by the
restaurant_id
andrestaurant_name
to find all the unique restaurants that meet the time frame criteria.Finally, select the distinct
restaurant_id
andrestaurant_name
to get the complete list of restaurants that can deliver food within the specified time frame.
Example:
Suppose we have the following data:
Restaurants:
+---------------+------------------+------------------+----------------+
| restaurant_id | restaurant_name | address | phone_number |
+---------------+------------------+------------------+----------------+
| 1 | Pizza Palace | 123 Main Street | 555-1234 |
| 2 | Burgers & Fries | 456 Oak Street | 555-2345 |
| 3 | Sushi Delight | 789 Pine Street | 555-3456 |
Orders:
+----------+---------------+------------+-----------------+---------------+
| order_id | restaurant_id | customer_id | order_time | total_amount |
+----------+---------------+------------+-----------------+---------------+
| 1 | 1 | 10 | 2023-01-01 12:00 | $50.00 |
| 2 | 2 | 15 | 2023-01-01 13:00 | $40.00 |
| 3 | 3 | 20 | 2023-01-01 14:00 | $60.00 |
| 4 | 1 | 10 | 2023-01-01 15:00 | $45.00 |
| 5 | 2 | 15 | 2023-01-01 16:00 | $50.00 |
Customers:
+------------+-------------------+-------------------+-----------------+
| customer_id | customer_name | address | phone_number |
+------------+-------------------+-------------------+-----------------+
| 10 | John Doe | 101 Elm Street | 555-5678 |
| 15 | Jane Smith | 202 Cedar Street | 555-6789 |
| 20 | Michael Johnson | 303 Maple Street | 555-7890 |
If we run the query with T = 15
, we will get the following result:
+---------------+------------------+
| restaurant_id | restaurant_name |
+---------------+------------------+
| 1 | Pizza Palace |
| 2 | Burgers & Fries |
This result shows that both Pizza Palace and Burgers & Fries can deliver food to customers within a 15-minute time frame. Sushi Delight is excluded because its order was placed outside the time frame (14:00, which is 15 minutes past the current time).
Applications:
This query can be used in real-world applications that provide on-demand food delivery services. It can help customers find restaurants that can deliver food within a specific time frame, ensuring that they receive their food quickly and efficiently.
Count the Number of Experiments
Problem: Count the Number of Experiments
SQL:
SELECT COUNT(*) AS experiment_count
FROM experiments;
Explanation:
This SQL query counts the number of rows in the experiments
table. Each row represents an experiment, so the count of rows is the count of experiments.
Breakdown:
The
SELECT
clause specifies the columns to be returned in the result set. In this case, we want to count the number of experiments, so we selectCOUNT(*)
. TheCOUNT(*)
function counts all rows in the table, regardless of their column values.The
FROM
clause specifies the table to be used in the query. In this case, we use theexperiments
table.The
WHERE
clause can be used to filter the rows in the table based on certain criteria. In this case, we don't use aWHERE
clause because we want to count all experiments.
Example:
Consider the following experiments
table:
| id | name |
|---|---|
| 1 | Experiment 1 |
| 2 | Experiment 2 |
| 3 | Experiment 3 |
If we run the above SQL query on this table, we will get the following result:
| experiment_count |
|-----------------|
| 3 |
This means that there are three experiments in the experiments
table.
Real-World Application:
This SQL query can be used in various real-world applications, such as:
To track the number of experiments conducted in a scientific study.
To determine the number of experiments that have been completed in a laboratory setting.
To count the number of experiments that have been published in a scientific journal.
User Activities within Time Bounds
Problem:
Given a table activities
with columns id
, user_id
, start_time
, and end_time
, find the users who have activities within a specific time range.
Solution:
SELECT DISTINCT user_id
FROM activities
WHERE start_time >= '2022-01-01' AND end_time <= '2022-12-31';
Breakdown:
The
SELECT
statement retrieves theuser_id
column.The
DISTINCT
keyword ensures that each user is listed only once.The
FROM
clause specifies theactivities
table.The
WHERE
clause filters the rows based on the following condition:The
start_time
column must be greater than or equal to '2022-01-01'.The
end_time
column must be less than or equal to '2022-12-31'.
Real-World Application:
This query can be used to analyze user activity within a specific time period. For example, a business could use this query to determine which users have been active in the last year or to identify users who have recently stopped using the platform.
Simplification:
In plain English, the query finds all the users who have had activities between January 1, 2022, and December 31, 2022. This is useful if you want to identify users who have been active during a particular time period, such as during a promotional campaign or holiday season.
Account Balance
Problem Statement:
Given a table Account
containing the following columns:
account_id
: Unique identifier for each accountbalance
: Current balance of the account
Write a query to calculate the sum of balances for each account.
Solution:
SELECT
account_id,
SUM(balance) AS total_balance
FROM
Account
GROUP BY
account_id;
Explanation:
The
SELECT
statement retrieves theaccount_id
and the sum ofbalance
for eachaccount_id
.The
FROM
clause specifies theAccount
table from which the data is retrieved.The
GROUP BY
clause groups the results byaccount_id
so that theSUM
function can calculate the total balance for each account.
Real-World Application:
This query is useful in any system that tracks financial transactions, such as banking or accounting systems. It can be used to:
Calculate the total balance of a customer's savings and checking accounts
Monitor account balances to identify potential fraud or suspicious activity
Generate reports on account activity
Example:
Consider the following data in the Account
table:
100
1000
101
2000
100
500
The query would return the following result:
100
1500
101
2000
This shows that account 100 has a total balance of $1500 (1000 + 500), while account 101 has a balance of $2000.
Suspicious Bank Accounts
Problem Statement:
You have a database of bank accounts and their corresponding transactions. A bank account is considered suspicious if it has more than $1,000 in deposits on a single day. Your task is to identify all the suspicious accounts.
SQL Solution:
SELECT account_number
FROM transactions
GROUP BY account_number
HAVING SUM(amount) > 1000
Explanation:
The
SELECT
statement fetches the account numbers from thetransactions
table.The
GROUP BY
statement groups the transactions by account number.The
HAVING SUM(amount) > 1000
clause filters out the accounts with total deposits greater than $1,000.
Real-World Applications:
Suspicious bank account detection systems are used in fraud detection and anti-money laundering efforts. By identifying accounts that exhibit unusual activity, banks can prevent financial crimes and protect their customers.
Additional Notes:
The above query assumes that all transactions are positive (deposits). If there are also withdrawals, you can use the
ABS()
function to treat them as positive values for the comparison.You can optimize the query by creating an index on the
account_number
column.
Simplified Explanation:
Imagine a bank has a bunch of accounts and keeps track of all the deposits made into each account. To find the suspicious accounts, we need to check each account and see if the total amount of deposits on any day exceeds $1,000.
The SQL query does this by grouping all the transactions for each account and then checking if the total amount of deposits is more than $1,000. If it is, the account is considered suspicious.
This is like checking all the kids in a school and seeing if any of them have more than $1,000 in their piggy banks. The query groups the kids by class and then checks each class to see if the total amount of money in the piggy banks is more than $1,000. If it is, the class is considered suspicious.
Second Highest Salary
Problem:
Find the second highest salary paid to an employee.
Solution:
SELECT DISTINCT
Salary -- Select the distinct salary values
FROM Employee -- From the Employee table
ORDER BY
Salary DESC -- Order the salaries in descending order
LIMIT 1, 1; -- Limit the result to the second highest salary (starting from row 1, limit to 1 row)
Breakdown:
SELECT DISTINCT Salary: Selects only unique salary values to avoid duplicates.
FROM Employee: Specifies the table to search for salaries.
ORDER BY Salary DESC: Orders the salaries in descending order, so that the highest salary is at the top.
LIMIT 1, 1: Limits the result to the second row, as the first row will contain the highest salary.
Real-World Example:
Consider an employee database with the following table:
1
John
1000
2
Mary
1200
3
Bob
900
The query would return 1200, which is the second highest salary.
Potential Applications:
Identifying employees for bonuses or promotions based on their salary.
Analyzing salary trends within a company or industry.
Calculating average or median salaries for comparison purposes.
Low-Quality Problems
LeetCode Problem:
Find the average salary of all employees in a company who earn more than a certain threshold.
SQL Query:
SELECT AVG(salary)
FROM Employee
WHERE salary > (SELECT AVG(salary) FROM Employee);
Explanation:
Subquery: The subquery
(SELECT AVG(salary) FROM Employee)
calculates the average salary of all employees in the company.Main Query: The main query selects the average salary (
AVG(salary)
) from theEmployee
table.Filter: The
WHERE
clause filters the results to include only employees whose salary is greater than the average salary calculated in the subquery.
Real-World Application:
This query can be used by a company to determine which employees are earning above-average salaries. It can help with:
Performance reviews: Identifying top performers who deserve promotions or bonuses.
Compensation planning: Setting salaries and benefits that are competitive within the industry.
Budgeting: Forecasting employee expenses by estimating the cost of salaries above the average.
Example:
Consider the following Employee
table:
1
John Doe
10,000
2
Jane Smith
15,000
3
Michael Jones
20,000
4
Susan Miller
12,000
5
William Brown
18,000
The average salary in this company is (10000 + 15000 + 20000 + 12000 + 18000) / 5 = 15000
.
The employees earning above average are:
Jane Smith (15,000)
Michael Jones (20,000)
William Brown (18,000)
The average salary of these employees is (15000 + 20000 + 18000) / 3 = 17666.67
.
Order Two Columns Independently
Problem:
Given two tables:
orders
: id, order_date, customer_idproducts
: id, product_name, price
Write an SQL query to order the products independently for each order by price. The result should show the product with the highest price at the top for each order.
SOLUTION 1 (Using a Subquery):
SQL:
SELECT id, order_date, customer_id,
(SELECT product_name
FROM products
WHERE id = (SELECT product_id
FROM order_products
WHERE order_id = orders.id
ORDER BY price DESC
LIMIT 1)) AS highest_priced_product
FROM orders
ORDER BY order_date, customer_id;
Breakdown:
The subquery
(SELECT product_name ...)
finds the product name for the highest-priced product in each order.It uses the nested subquery
(SELECT product_id ...)
to get the product ID for the highest-priced product, ordered by price in descending order.The outer query selects the order details and assigns the highest-priced product to the
highest_priced_product
column.The result is ordered by order date and customer ID.
SOLUTION 2 (Using a Window Function):
SQL:
SELECT id, order_date, customer_id,
MAX(product_name) OVER (PARTITION BY id ORDER BY price DESC) AS highest_priced_product
FROM orders
JOIN order_products ON orders.id = order_products.order_id
JOIN products ON order_products.product_id = products.id
ORDER BY order_date, customer_id;
Breakdown:
The window function
MAX(product_name) OVER (PARTITION BY id ORDER BY price DESC)
calculates the highest-priced product for each order.The
PARTITION BY id
clause groups the products by order ID.The
ORDER BY price DESC
clause orders the products within each partition by price in descending order.The
MAX()
function then returns the product name with the highest price for each partition.
Real-World Applications:
Displaying products in e-commerce stores for each customer's order.
Analyzing customer preferences and purchasing patterns based on the highest-priced products ordered.
Identifying upselling and cross-selling opportunities by recommending related products with higher prices.
Product Sales Analysis V
LeetCode Problem: Product Sales Analysis V
Problem Statement: Given a table containing daily product sales, determine the total sales for each product over a specified date range.
Example Input Table:
CREATE TABLE Sales (
product_id INT,
product_name VARCHAR(255),
sale_date DATE,
sale_amount DECIMAL(10, 2)
);
Sample Data:
INSERT INTO Sales (product_id, product_name, sale_date, sale_amount) VALUES
(1, 'Product A', '2023-03-01', 100.00),
(2, 'Product B', '2023-03-02', 200.00),
(1, 'Product A', '2023-03-03', 300.00),
(2, 'Product B', '2023-03-04', 400.00),
(3, 'Product C', '2023-03-05', 500.00);
Parameters:
from_date: Start date of the date range (inclusive).
to_date: End date of the date range (inclusive).
Output:
product_id: ID of the product.
product_name: Name of the product.
total_sales: Total sales of the product over the specified date range.
SQL Solution:
SELECT
product_id,
product_name,
SUM(sale_amount) AS total_sales
FROM Sales
WHERE
sale_date >= from_date
AND sale_date <= to_date
GROUP BY
product_id, product_name;
Explanation:
JOIN Operation: None required in this case.
FROM Clause: Selects the
Sales
table.WHERE Clause: Filters rows based on the specified date range.
GROUP BY Clause: Groups rows by
product_id
andproduct_name
.SELECT Clause: Calculates the total sales (sum of
sale_amount
) for each product and returns theproduct_id
,product_name
, and total sales.
Example Input and Output:
-- Example Input Parameters
DECLARE @from_date DATE = '2023-03-01';
DECLARE @to_date DATE = '2023-03-04';
-- Execute the Query
SELECT
product_id,
product_name,
SUM(sale_amount) AS total_sales
FROM Sales
WHERE
sale_date >= @from_date
AND sale_date <= @to_date
GROUP BY
product_id, product_name;
-- Example Output
+-----------+-------------+-------------+
| product_id | product_name | total_sales |
+-----------+-------------+-------------+
| 1 | Product A | 400.00 |
| 2 | Product B | 600.00 |
+-----------+-------------+-------------+
Real-World Applications:
Sales Reporting: Analyze product sales over time to identify trends and make data-driven decisions.
Inventory Management: Track sales to ensure optimal inventory levels and avoid overstocking or shortages.
Customer Behavior Analysis: Understand customer preferences and identify cross-selling opportunities by analyzing sales of different products together.
Financial Analysis: Calculate total revenue generated from product sales for financial reporting and planning.
Page Recommendations
Problem:
Find the most popular page visited by users from a specific country.
SQL Solution:
-- Count page views by country
SELECT country, page_url, COUNT(*) AS page_count
FROM page_views
GROUP BY country, page_url
-- Find the page with the highest count for each country
SELECT country, MAX(page_count) AS max_page_count, page_url
FROM (
SELECT country, page_url, COUNT(*) AS page_count
FROM page_views
GROUP BY country, page_url
) AS page_counts
GROUP BY country
Breakdown:
The first query groups page views by country and page URL and counts the number of views for each combination.
The second query uses a subquery to find the maximum page count for each country.
The final query groups the results by country and selects the page URL with the maximum count for each country.
Example:
+---------+-------------+----------------+
| country | page_url | max_page_count |
+---------+-------------+----------------+
| USA | page1.html | 1000 |
| UK | page2.html | 500 |
| Canada | page3.html | 250 |
+---------+-------------+----------------+
Applications:
Website analytics: Track the most popular pages visited by users from different countries to understand user demographics and preferences.
Marketing: Target specific products or services to users based on their country and page preferences.
Content optimization: Tailor website content based on the most popular pages visited by users from different countries to increase engagement.
Get the Second Most Recent Activity
Problem Statement:
Given a table of activities, write a SQL query to find the second most recent activity for each user.
Input Table:
activities (user_id, activity_date)
Output Table:
last_2_activities (user_id, activity_date)
Solution:
Step 1: Find the Most Recent Activity
SELECT user_id, MAX(activity_date) AS most_recent_date
FROM activities
GROUP BY user_id;
This subquery finds the most recent activity date for each user. We assign it an alias, most_recent_date
.
Step 2: Find the Second Most Recent Activity
SELECT a.user_id, a.activity_date
FROM activities AS a
JOIN (
SELECT user_id, MAX(activity_date) AS most_recent_date
FROM activities
GROUP BY user_id
) AS b ON a.user_id = b.user_id AND a.activity_date < b.most_recent_date
ORDER BY a.user_id, a.activity_date DESC;
This subquery finds the second most recent activity for each user. We join the main activities table (aliased as a
) with the subquery from Step 1 (aliased as b
) on the user ID. We then filter out the most recent activity by checking if a.activity_date
is less than b.most_recent_date
. Finally, we sort the results in descending order of activity date to get the second most recent activity.
Final Query:
WITH MostRecentActivity AS (
SELECT user_id, MAX(activity_date) AS most_recent_date
FROM activities
GROUP BY user_id
)
SELECT a.user_id, a.activity_date
FROM activities AS a
JOIN MostRecentActivity AS b ON a.user_id = b.user_id AND a.activity_date < b.most_recent_date
ORDER BY a.user_id, a.activity_date DESC;
Explanation:
MostRecentActivity Subquery:
This subquery calculates the most recent activity date for each user. It groups the activities by user ID and finds the maximum activity date for each user.
Main Query:
The main query joins the activities table with the MostRecentActivity subquery on the user ID. It then filters out the most recent activity by comparing the activity dates. Finally, it sorts the results in descending order of activity date to get the second most recent activity for each user.
Real-World Application:
This query can be useful in scenarios where you need to retrieve historical data for analysis. For example, you could use it to:
Analyze user behavior patterns
Identify trends over time
Swap Salary
Problem Statement:
Given a table Salaries
with the following columns:
emp_id
(integer) - Employee IDname
(string) - Employee namesalary
(integer) - Employee salary
Swap the salaries of two employees with specific IDs.
Simplified Solution:
To swap the salaries of employees with IDs id1
and id2
, you can use the following query:
-- Swap salaries of employees with IDs id1 and id2
UPDATE Salaries
SET salary = CASE
WHEN emp_id = id1 THEN (SELECT salary FROM Salaries WHERE emp_id = id2)
WHEN emp_id = id2 THEN (SELECT salary FROM Salaries WHERE emp_id = id1)
ELSE salary
END
WHERE emp_id IN (id1, id2);
Breakdown and Explanation:
CASE Statement: The CASE statement is used to conditionally update the salary column based on the value of
emp_id
.WHEN Clause: If
emp_id
equalsid1
, the salary is updated to the salary of the employee withemp_id
equalsid2
.ELSE Clause: If neither of the WHEN clauses match, the salary is left unchanged.
WHERE Clause: The WHERE clause ensures that the update is only applied to employees with
emp_id
in the list(id1, id2)
.
Example:
Let's say we have a table with the following data:
1
John Doe
5000
2
Jane Smith
6000
To swap the salaries of John and Jane, we can use the following query:
UPDATE Salaries
SET salary = CASE
WHEN emp_id = 1 THEN (SELECT salary FROM Salaries WHERE emp_id = 2)
WHEN emp_id = 2 THEN (SELECT salary FROM Salaries WHERE emp_id = 1)
ELSE salary
END
WHERE emp_id IN (1, 2);
After executing this query, the salaries will be swapped:
1
John Doe
6000
2
Jane Smith
5000
Real-World Applications:
Swapping salaries is useful in scenarios where employees need to adjust their salaries for various reasons, such as:
Fairness and equity adjustments
Promotions and demotions
Salary negotiations
Temporary salary adjustments (e.g., for training or special projects)
Sales by Day of the Week
Problem:
Given a table Sales
that contains information about sales transactions, determine the total sales for each day of the week.
Schema:
CREATE TABLE Sales (
sales_id INT PRIMARY KEY,
sales_date TIMESTAMP,
sales_amount INT
);
Solution:
SELECT
strftime('%w', sales_date) AS day_of_week,
SUM(sales_amount) AS total_sales
FROM Sales
GROUP BY day_of_week
ORDER BY day_of_week;
Implementation and Explanation:
Convert
sales_date
to Day of Week:Use the
strftime
function with the '%w' format specifier to extract the day of the week from thesales_date
column. This will convert the timestamp to an integer representing the day of the week (1=Sunday, 2=Monday, ..., 7=Saturday).
Group by Day of Week:
Use the
GROUP BY
clause to group the results by theday_of_week
column. This will create separate groups for each day of the week.
Sum Sales Amount:
Within each group, use the
SUM
aggregate function to calculate the total sales amount.
Order by Day of Week:
Use the
ORDER BY
clause to sort the results in ascending order byday_of_week
for easy readability.
Example:
| day_of_week | total_sales |
| ----------- | ----------- |
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
| 4 | 400 |
| 5 | 500 |
| 6 | 600 |
| 7 | 700 |
Real-World Applications:
Sales Analysis: Retailers can use this query to identify which days of the week are most profitable for sales.
Scheduling: Businesses can optimize staffing and inventory levels based on the sales volume for each day of the week.
Marketing Campaigns: Marketers can tailor campaigns to target different customer segments based on their weekday shopping patterns.
Game Play Analysis I
Game Play Analysis I
Problem:
You're given a table called GamePlays
with the following columns:
user_id
: ID of the usergame_id
: ID of the gamesession_start
: Start time of the game sessionsession_end
: End time of the game session
Your task is to find the total number of unique users who played a game for at least n
minutes.
Solution:
SELECT
COUNT(DISTINCT user_id)
FROM GamePlays
WHERE
CAST((session_end - session_start) AS INTEGER) / 60 >= n;
Breakdown:
The
DISTINCT
keyword beforeuser_id
ensures that only unique user IDs are counted.The
CAST
function converts the time difference from seconds to minutes.The
INTEGER
data type ensures that the result is an integer.The
>=
operator checks if the converted time difference is greater than or equal ton
minutes.
Real-World Applications:
This query can be used in various real-world scenarios:
Measuring User Engagement: Track the number of users who play a game for a significant amount of time to identify highly engaged players.
Retention Analysis: Determine how many users continue playing a game after a certain period of inactivity by measuring the number of unique users who have played in the past
n
minutes.Marketing Campaigns: Target users who have recently played a game for a specific duration to promote new features or game updates.
Duplicate Emails
LeetCode Problem:
Duplicate Emails
Problem Statement:
Find all email addresses that appear more than once in a table of email addresses.
Table:
emails (email)
Example:
Input:
| email |
|-----------------|
| john@example.com |
| mary@example.com |
| bob@example.com |
| john@example.com |
| alice@example.com |
Output:
| email |
|-----------------|
| john@example.com |
Solution:
The most straightforward solution is to use the COUNT()
function to count the occurrences of each email address and then filter for those that appear more than once.
SELECT email
FROM emails
GROUP BY email
HAVING COUNT(*) > 1;
Explanation:
GROUP BY email
: This groups the rows in theemails
table by theemail
column.COUNT(*)
: This counts the number of rows in each group.HAVING COUNT(*) > 1
: This filters for the groups that have more than one row, indicating that the email address appears more than once.
Example Usage:
This query can be used in a variety of real-world applications, such as:
Detecting duplicate email addresses in a user database to prevent multiple accounts from being created with the same email.
Identifying potential spam emails, as spammers often use the same email address to send multiple emails.
Analyzing email usage patterns to identify popular email domains or email service providers.
Students and Examinations
Problem:
Find the students who have taken all the exams.
SQL Query:
SELECT StudentID
FROM Students
EXCEPT
SELECT DISTINCT StudentID
FROM Exams
WHERE StudentID NOT IN (
SELECT StudentID
FROM Students
);
Breakdown:
The
SELECT
statement retrieves theStudentID
from theStudents
table.The
EXCEPT
operator removes from this result set anyStudentID
that is present in the second query.The second query uses the
SELECT DISTINCT
statement to retrieve theStudentID
of students who have taken at least one exam.The
WHERE
clause filters out students who are not in theStudents
table.
Real-World Example:
This query can be used to determine which students have completed all of their exams in a particular semester. This information can be used to:
Identify students who need additional support or tutoring.
Provide feedback to instructors on the effectiveness of their exams.
Generate reports on student progress.
Code Implementation:
-- Create the Students table
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(255)
);
-- Insert data into the Students table
INSERT INTO Students (StudentID, Name) VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Jack Jones');
-- Create the Exams table
CREATE TABLE Exams (
ExamID INT PRIMARY KEY,
StudentID INT,
Score INT,
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);
-- Insert data into the Exams table
INSERT INTO Exams (ExamID, StudentID, Score) VALUES
(1, 1, 90),
(2, 2, 85),
(3, 3, 95),
(4, 1, 80);
-- Query to find students who have taken all the exams
SELECT StudentID
FROM Students
EXCEPT
SELECT DISTINCT StudentID
FROM Exams
WHERE StudentID NOT IN (
SELECT StudentID
FROM Students
);
Output:
StudentID
---------
2
This output shows that student with StudentID
2 has taken all the exams.
Activity Participants
LeetCode SQL Competitive Coding Problem:
Problem Statement:
Find the most active participants in a series of activities.
SQL Implementation:
WITH ActivityCounts AS (
SELECT participant, COUNT(*) AS activity_count
FROM ActivityParticipants
GROUP BY participant
),
RankedParticipants AS (
SELECT participant, activity_count,
DENSE_RANK() OVER (ORDER BY activity_count DESC) AS rank
FROM ActivityCounts
)
SELECT participant
FROM RankedParticipants
WHERE rank <= 3;
Explanation:
Step 1: Count Activity Participation
The ActivityCounts
subquery counts the number of activities each participant participated in.
Step 2: Rank Participants
The RankedParticipants
subquery ranks the participants based on their activity count in descending order. The DENSE_RANK
function is used to assign consecutive ranks to participants with the same activity count.
Step 3: Select Most Active Participants
The final query selects the participants with the top 3 ranks.
Real-World Applications:
Identifying the most engaged users on a social media platform.
Determining the top performers in a fitness competition.
Tracking the most active employees in a project management system.
Consecutive Transactions with Increasing Amounts
Problem: Given a table of financial transactions, identify consecutive transactions where the amount increases.
Table:
transactions (id, amount, time)
Query:
WITH CTE AS (
SELECT id, amount, time, ROW_NUMBER() OVER (ORDER BY id) AS row_num
FROM transactions
)
SELECT t1.id, t1.amount
FROM CTE t1
JOIN CTE t2 ON t1.row_num + 1 = t2.row_num AND t1.amount < t2.amount;
Breakdown:
1. Common Table Expression (CTE):
Creates a temporary table called
CTE
that adds arow_num
column for each row.
2. Joining Rows:
Joins
t1
andt2
based on therow_num
column.The condition
t1.row_num + 1 = t2.row_num
ensures that the rows are consecutive.
3. Filtering by Amount:
The condition
t1.amount < t2.amount
ensures that the amount increases between consecutive rows.
Real-World Applications:
Fraud Detection: Identifying suspicious patterns of increasing transactions can help detect fraudulent activity.
Financial Analysis: Tracking the increasing trend of transactions can provide insights into spending habits and investment strategies.
Customer Segmentation: Identifying customers with a history of consecutive increasing transactions can help segment them for targeted marketing campaigns.
Example:
Transactions Table:
1
100
2023-01-01
2
120
2023-01-02
3
150
2023-01-03
4
70
2023-01-04
Result:
1
100
2
120
3
150
Election Results
Question:
Implement a SQL query to find the top k candidates in an election based on the number of votes they received.
Optimal Solution:
WITH RankedCandidates AS (
SELECT candidate_id, SUM(votes) AS total_votes
FROM election_results
GROUP BY candidate_id
)
SELECT candidate_id, total_votes
FROM RankedCandidates
ORDER BY total_votes DESC
LIMIT k;
Breakdown:
Step 1: Create a Common Table Expression (CTE) to Rank Candidates
WITH RankedCandidates AS (
SELECT candidate_id, SUM(votes) AS total_votes
FROM election_results
GROUP BY candidate_id
)
Creates a CTE called
RankedCandidates
that calculates the total votes for each candidate.
Step 2: Select the Top K Candidates
SELECT candidate_id, total_votes
FROM RankedCandidates
ORDER BY total_votes DESC
LIMIT k;
Selects the
candidate_id
andtotal_votes
fromRankedCandidates
and orders them in descending order by the number of votes.Uses the
LIMIT
clause to return only the top k candidates.
Simplification:
The election results are stored in a table called
election_results
with columns likecandidate_id
andvotes
.We first calculate the total votes for each candidate and store them in a CTE called
RankedCandidates
.Then we select the top k candidates from
RankedCandidates
based on the number of votes they received.
Real-World Application:
This query can be used to determine the winners in elections, where the candidates with the highest number of votes are declared victors.
It can also be used to rank contestants in competitions, such as sports tournaments or talent shows.
Biggest Single Number
Problem Statement:
Find the largest single digit number in a column of a table.
SQL Query:
SELECT MAX(SUBSTRING(num_column, 1, 1)) AS largest_digit
FROM table_name;
Breakdown:
SUBSTRING(num_column, 1, 1)
: This function extracts the first character (single digit) from thenum_column
.MAX()
: This function finds the maximum value among the extracted single digits.
Example:
Consider the following table:
123
456
789
Running the query returns:
largest_digit
9
Explanation:
The query extracts the first character (single digit) from each row of the num_column
. The extracted digits are '1', '4', and '7'. The MAX()
function then finds the maximum value among these digits, which is '9'.
Real-World Applications:
Finding the highest-valued digit in a set of numeric codes.
Analyzing financial data to identify the largest single digit in a series of values.
Determining the most frequent single digit in a dataset for statistical purposes.
NPV Queries
Problem Statement:
Given a list of cash flows, calculate the Net Present Value (NPV) using a specified discount rate.
SQL Implementation:
-- Define the cash flow table
CREATE TABLE CashFlows (
Period INT, -- Period number
CashFlow INT -- Cash flow amount
);
-- Insert sample data
INSERT INTO CashFlows (Period, CashFlow) VALUES
(0, 1000),
(1, 500),
(2, -250),
(3, 100),
(4, 50);
-- Calculate the NPV
SELECT SUM(CashFlow / POWER(1.0 + DiscountRate, Period)) AS NPV
FROM CashFlows
WHERE Period >= 0;
Breakdown and Explanation:
Cash Flow Table: We create a table named
CashFlows
to store the period and cash flow amount for each period.Sample Data: We insert sample cash flow data into the table, including an initial investment of $1000, subsequent cash inflows and outflows, and a final cash flow of $50 in period 4.
NPV Calculation: The NPV is calculated using the formula:
NPV = ∑ (CashFlow / (1 + DiscountRate)^Period)
CashFlow
is the cash flow amount for a specific period.DiscountRate
is the specified discount rate.Period
is the period number.
SQL Query: We use a SQL query to calculate the NPV by summing the present value of each cash flow using the
POWER
function to adjust for the discount rate. We exclude periods before 0 (the initial investment) from the calculation.
Real-World Applications:
NPV is a widely used financial metric to evaluate investment projects. It helps businesses determine the profitability of a project by considering the time value of money. Potential applications include:
Analyzing the financial feasibility of a new product launch.
Determining the return on investment for a marketing campaign.
Evaluating capital budgeting decisions, such as purchasing new equipment or constructing a new facility.
Employees Earning More Than Their Managers
Problem Statement
Given a table Employees
with columns id
, name
, salary
, and manager_id
, find all employees who earn more than their managers.
Example
1
John
1000
2
2
Mary
1200
null
3
Bob
950
2
4
Alice
1100
2
Output:
1
John
1000
2
4
Alice
1100
2
Solution
Find all employees and their managers.
Filter out employees who earn more than their managers.
SELECT e.id, e.name, e.salary, e.manager_id
FROM Employees e
INNER JOIN Employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
Explanation
INNER JOIN
theEmployees
table with itself on themanager_id
column. This creates a new table that contains all employees and their managers.WHERE
clause filters out employees who earn more than their managers.
Real-World Applications
This query can be used to identify employees who may be underpaid compared to their managers. It can also be used to ensure that there is a fair pay structure within an organization.
The Users That Are Eligible for Discount
Problem Statement:
Find all users who are eligible for a discount.
Solution:
/* Users Table */
CREATE TABLE Users (
id INT NOT NULL,
email VARCHAR(255) NOT NULL,
discount TINYINT NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
INSERT INTO Users (id, email, discount) VALUES
(1, 'user1@example.com', 0),
(2, 'user2@example.com', 1),
(3, 'user3@example.com', 0),
(4, 'user4@example.com', 1);
/* Orders Table */
CREATE TABLE Orders (
id INT NOT NULL,
user_id INT NOT NULL,
order_date DATETIME NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (user_id) REFERENCES Users (id)
);
INSERT INTO Orders (id, user_id, order_date, total_amount) VALUES
(1, 1, '2022-01-01', 100.00),
(2, 2, '2022-01-02', 200.00),
(3, 3, '2022-01-03', 300.00),
(4, 4, '2022-01-04', 400.00);
/* Query to find eligible users */
SELECT DISTINCT u.id, u.email
FROM Users u
JOIN Orders o ON u.id = o.user_id
WHERE o.total_amount >= 200.00 AND u.discount = 0;
Explanation:
Create Tables (Users & Orders): We start by creating the
Users
andOrders
tables with their respective columns.Insert Data into Tables: We insert sample data into both tables to demonstrate how the query will work.
Discount Eligibility Query: The main query retrieves eligible users by joining the
Users
andOrders
tables and filtering based on the following conditions:o.total_amount >= 200.00
: The user has placed at least one order with a total amount of $200 or more.u.discount = 0
: The user is not currently receiving a discount.
Output:
3
user3@example.com
4
user4@example.com
These users have placed orders totaling $200 or more but are not yet receiving a discount. Therefore, they are eligible for a discount.
Real-World Application:
This query can be used in e-commerce websites to identify customers who are eligible for discounts based on their purchase history. By offering discounts to such customers, businesses can encourage repeat purchases and increase customer loyalty.
Unique Orders and Customers Per Month
Problem:
You are given a table called orders
that contains the following columns:
order_id | user_id | order_date
You want to find the number of unique orders and customers per month.
Solution:
To solve this problem, we can use the following SQL query:
SELECT
strftime('%Y-%m', order_date) AS month,
COUNT(DISTINCT order_id) AS num_orders,
COUNT(DISTINCT user_id) AS num_customers
FROM
orders
GROUP BY
month
ORDER BY
month;
Explanation:
The strftime('%Y-%m', order_date)
expression extracts the year and month from the order_date
column. The COUNT(DISTINCT order_id)
expression counts the number of distinct order IDs, which gives us the number of unique orders. The COUNT(DISTINCT user_id)
expression counts the number of distinct user IDs, which gives us the number of unique customers. The GROUP BY month
clause groups the results by month. The ORDER BY month
clause orders the results by month.
Real-World Example:
This query can be used to analyze the number of unique orders and customers per month for an online store. This information can be used to identify trends and patterns in customer behavior, such as when orders are most likely to be placed or when new customers are most likely to sign up.
Potential Applications:
This query can be used for a variety of purposes, including:
Identifying trends and patterns in customer behavior
Forecasting demand
Optimizing marketing campaigns
Improving customer service
Count Apples and Oranges
Problem Statement:
You have two tables:
Apples
: Contains information about apples, including theirid
andquantity
.Oranges
: Contains information about oranges, including theirid
andquantity
.
Write a query to count the total number of apples and oranges in the database.
Solution:
SELECT
COUNT(*) AS total_fruits
FROM (
SELECT * FROM Apples
UNION ALL
SELECT * FROM Oranges
);
Breakdown:
The main query uses a subquery to combine the rows from the
Apples
andOranges
tables into a single table.The
COUNT(*)
function is used to count the total number of rows in the combined table, which gives us the total number of fruits.
Example:
| Total Fruits |
| ------------ |
| 20 |
Applications:
This query can be used in the following real-world scenarios:
Inventory management: To track the total number of apples and oranges in a warehouse.
Data analysis: To identify trends and patterns in the production and consumption of fruits.
Sales forecasting: To predict future demand for apples and oranges based on historical data.
Friend Requests II: Who Has the Most Friends
Problem Statement:
You are given a table Friend_Requests
that represents friend requests between users on a social media platform. The table has the following schema:
CREATE TABLE Friend_Requests (
requester_id INT NOT NULL,
receiver_id INT NOT NULL,
status VARCHAR(10) NOT NULL -- 'pending', 'accepted', 'rejected'
);
Your task is to write a SQL query to find the user with the most accepted friends.
Solution:
-- Count the number of accepted friend requests for each user
SELECT requester_id, receiver_id, COUNT(*) AS friend_count
FROM Friend_Requests
WHERE status = 'accepted'
GROUP BY requester_id, receiver_id
-- Find the user with the maximum number of accepted friends
SELECT requester_id, MAX(friend_count) AS max_friends
FROM (
SELECT requester_id, receiver_id, COUNT(*) AS friend_count
FROM Friend_Requests
WHERE status = 'accepted'
GROUP BY requester_id, receiver_id
) AS subquery
GROUP BY requester_id
Explanation:
Count Accepted Friend Requests: The first subquery counts the number of accepted friend requests for each user. It does this by grouping the rows by
requester_id
andreceiver_id
and then counting the number of rows in each group.Find User with Most Friends: The second subquery finds the user with the maximum number of accepted friends. It does this by first finding the maximum number of friends for each user in the first subquery and then grouping the results by
requester_id
. TheMAX()
function is used to find the maximum value for each group.
Real-World Application:
This query can be used in a social media platform to identify the most popular users. This information can be used for various purposes, such as:
Recommending users to follow
Displaying leaderboards of popular users
Identifying influencers for targeted advertising
Count Student Number in Departments
Problem Statement:
Given two tables, Students
and Departments
, find the count of students in each department.
Table Schema:
Students:
student_id
(int)name
(string)department_id
(int)
Departments:
department_id
(int)name
(string)
Implementation:
-- Count the number of students in each department
SELECT
d.name AS department_name,
COUNT(s.student_id) AS number_of_students
FROM
Students s
JOIN
Departments d
ON
s.department_id = d.department_id
GROUP BY
d.name
ORDER BY
d.name;
Explanation:
JOIN the
Students
andDepartments
tables:The
JOIN
clause connects theStudents
andDepartments
tables on the common columndepartment_id
. This creates a new table that includes rows from both tables.
Count the number of students in each department:
The
COUNT()
function counts the number of rows in theStudents
table for each department. This gives us the number of students in each department.
GROUP BY department name:
The
GROUP BY
clause groups the results by thedepartment_name
column. This means that the results will show the count of students for each department separately.
ORDER BY department name:
The
ORDER BY
clause sorts the results by thedepartment_name
column in ascending order.
Output:
Computer Science
50
Mathematics
30
Physics
20
Real-World Applications:
This query can be used in various real-world scenarios, such as:
Generating reports on student enrollment by department
Allocating resources (e.g., teachers, classrooms) based on student numbers
Analyzing student trends and patterns within departments
Product Sales Analysis IV
Problem Statement
You are given a table named ProductSales
that contains the following columns:
product_id
(integer) - The unique identifier of the product.quantity_sold
(integer) - The quantity of the product sold.sale_date
(date) - The date when the product was sold.
You are asked to write a SQL query that calculates the total quantity sold and the average sale date for each unique product.
Solution
SELECT product_id, SUM(quantity_sold) AS total_quantity_sold, AVG(sale_date) AS average_sale_date
FROM ProductSales
GROUP BY product_id;
Explanation
The above SQL query uses the GROUP BY
clause to group the rows in the ProductSales
table by the product_id
column. This means that for each unique product_id
, the query will calculate the total quantity sold and the average sale date.
The SUM()
function is used to calculate the total quantity sold for each product, while the AVG()
function is used to calculate the average sale date for each product.
The GROUP BY
clause is important because it ensures that the query only returns one row for each unique product_id
. This is necessary because the SUM()
and AVG()
functions can only be applied to a single group of rows.
Example
Consider the following ProductSales
table:
1
10
2023-01-01
1
20
2023-01-02
2
30
2023-01-03
The following SQL query would return the following results:
SELECT product_id, SUM(quantity_sold) AS total_quantity_sold, AVG(sale_date) AS average_sale_date
FROM ProductSales
GROUP BY product_id;
1
30
2023-01-01.5
2
30
2023-01-03
Real-World Applications
The query provided above can be used in a variety of real-world applications, such as:
Inventory management: The query can be used to determine the total quantity of a product that has been sold and the average date on which it was sold. This information can be used to manage inventory levels and ensure that there is always enough stock on hand.
Sales forecasting: The query can be used to forecast future sales by identifying trends in the total quantity sold and the average sale date. This information can be used to make decisions about product pricing, marketing, and staffing.
Product performance analysis: The query can be used to compare the performance of different products. This information can be used to identify products that are selling well and products that are not selling well.
Highest Salaries Difference
Problem Statement:
Find the difference between the maximum and minimum salaries for each department.
SQL Query:
SELECT department, MAX(salary) - MIN(salary) AS salary_difference
FROM employee
GROUP BY department;
Breakdown and Explanation:
SELECT department, MAX(salary) - MIN(salary) AS salary_difference:
SELECT department
: Selects the department column.MAX(salary)
: Calculates the maximum salary for each department.MIN(salary)
: Calculates the minimum salary for each department.-
: Subtracts the minimum salary from the maximum salary to find the difference.AS salary_difference
: Aliases the result assalary_difference
.
FROM employee:
Selects data from the
employee
table.
GROUP BY department:
Groups the results by the
department
column.
Real-World Application:
This query can be used in HR systems to analyze salary disparities within departments. It can help identify departments where there is a significant gap between the highest and lowest salaries. This information can be used to address potential pay inequality issues.
Example:
Consider the following employee
table:
1
Sales
100000
2
Sales
80000
3
Engineering
120000
4
Engineering
90000
Query Result:
Sales
20000
Engineering
30000
This result shows that the Sales department has a salary difference of $20,000, while the Engineering department has a salary difference of $30,000.
Team Scores in Football Tournament
Problem Statement:
In a football tournament, there are two teams in each match. Each match has a home team and an away team. The home team wins 3 points if they win the match, 1 point if they draw, and 0 points if they lose. The away team wins 0 points if they lose, 1 point if they draw, and 3 points if they win.
Given a table Matches
that contains the results of the matches played in the tournament, you need to find the final scores of each team.
Table Structure:
CREATE TABLE Matches (
match_id INT PRIMARY KEY,
home_team_id INT,
away_team_id INT,
home_team_score INT,
away_team_score INT
);
Example:
INSERT INTO Matches (match_id, home_team_id, away_team_id, home_team_score, away_team_score) VALUES
(1, 1, 2, 2, 1),
(2, 3, 4, 0, 3),
(3, 5, 6, 1, 1);
Output:
| team_id | total_score |
|---|---|
| 1 | 3 |
| 2 | 0 |
| 3 | 0 |
| 4 | 3 |
| 5 | 1 |
| 6 | 1 |
Solution:
To find the total score for each team, we can use a CASE
expression to determine the points awarded to each team based on the match result. We then use a SUM()
function to calculate the total points for each team.
SELECT team_id, SUM(CASE
WHEN m.home_team_score > m.away_team_score THEN 3
WHEN m.home_team_score = m.away_team_score THEN 1
ELSE 0
END) AS total_score
FROM Matches m
GROUP BY team_id;
Breakdown:
The
CASE
expression evaluates the match results and assigns points accordingly:If the home team score is greater than the away team score, the home team wins 3 points.
If the home team score is equal to the away team score, both teams draw and earn 1 point each.
Otherwise, the away team wins 3 points.
We group the results by the team ID using the
GROUP BY
clause.We use the
SUM()
function to calculate the total points earned by each team.
Real-World Applications:
This query can be used to find the final standings in a football tournament or league. It can also be used to track the performance of individual teams over time.
Merge Overlapping Events in the Same Hall
Problem Statement:
You are given a table event
that stores information about events happening in different halls.
CREATE TABLE event (
id INT PRIMARY KEY,
hall_id INT,
start_time DATETIME,
end_time DATETIME
);
You need to merge overlapping events in the same hall into a single event.
Solution:
SELECT hall_id, MIN(start_time) AS start_time, MAX(end_time) AS end_time
FROM event
GROUP BY hall_id
ORDER BY hall_id, start_time;
Explanation:
Group By Hall ID: Group the events by their
hall_id
. This will give us a list of all the events in each hall.Calculate Minimum Start Time: For each group of events in a hall, find the minimum
start_time
. This will be the start time of the merged event.Calculate Maximum End Time: For each group of events in a hall, find the maximum
end_time
. This will be the end time of the merged event.Order by Hall ID and Start Time: Finally, order the merged events by their
hall_id
andstart_time
.
Example:
| hall_id | start_time | end_time |
|---------|------------|----------|
| 1 | 2022-05-01 | 2022-05-03 |
| 1 | 2022-05-02 | 2022-05-04 |
| 1 | 2022-05-05 | 2022-05-07 |
| 2 | 2022-06-01 | 2022-06-02 |
| 2 | 2022-06-02 | 2022-06-03 |
Output:
| hall_id | start_time | end_time |
|---------|------------|----------|
| 1 | 2022-05-01 | 2022-05-07 |
| 2 | 2022-06-01 | 2022-06-03 |
Real-World Applications:
This query can be used in real-world applications such as:
Event Management: To merge overlapping events in a calendar to avoid scheduling conflicts.
Room Booking: To find available time slots in a conference room by merging overlapping bookings.
Resource Allocation: To optimize resource utilization by merging overlapping tasks.
Number of Calls Between Two Persons
Problem Statement: Given a table
CallLog
that records the call history of a group of people, find the number of calls between two specific persons,A
andB
.Table Schema:
CallLog (
CallerId INT,
ReceiverId INT,
CallTime TIMESTAMP
)
SQL Query:
SELECT COUNT(*) AS NumberOfCalls
FROM CallLog
WHERE (CallerId = A AND ReceiverId = B) OR (CallerId = B AND ReceiverId = A);
Explanation:
The query first calculates the number of calls where
A
was the caller andB
was the receiver, or vice versa.The
OR
keyword combines the two conditions into a single expression.
Example:
| CallerId | ReceiverId | CallTime |
|---|---|---|
| 1 | 2 | 2023-03-08 12:34:56 |
| 2 | 3 | 2023-03-09 11:12:34 |
| 3 | 4 | 2023-03-10 10:23:15 |
| 4 | 1 | 2023-03-11 09:34:26 |
To find the number of calls between person 1 and person 2, the query would be:
SELECT COUNT(*) AS NumberOfCalls
FROM CallLog
WHERE (CallerId = 1 AND ReceiverId = 2) OR (CallerId = 2 AND ReceiverId = 1);
The result would be 2, as there are two calls recorded in the table between these two persons.
Real-World Applications:
Telecom companies can use this query to analyze call patterns and identify frequently called contacts.
Law enforcement agencies can use it to investigate communication networks and track relationships between individuals.
Product Price at a Given Date
Problem: You have a table Product
that contains the following columns:
product_id
(int)price
(float)date
(date)
You want to find the price of a product on a given date.
Solution:
SELECT price
FROM Product
WHERE product_id = ? AND date = ?;
Explanation: This query uses the equality operator (=
) to find the row in the Product
table that has the specified product_id
and date
. It then returns the price
column from that row.
Example:
SELECT price
FROM Product
WHERE product_id = 1 AND date = '2023-01-01';
This query would return the price of the product with product_id
1 on January 1, 2023.
Real-World Applications: This query can be used in a variety of real-world applications, such as:
Tracking the price history of a product
Finding the lowest price for a product on a given date
Generating invoices for products sold on a given date
Customers with Maximum Number of Transactions on Consecutive Days
Problem Statement:
Given a table of customer transactions transactions
with columns customer_id
, transaction_date
, and amount
, find the customers who have the maximum number of consecutive days with at least one transaction.
Best & Performant SQL Solution:
WITH CustomerConsecutiveTransactionDays AS (
SELECT
customer_id,
transaction_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS row_num,
CASE
WHEN LAG(transaction_date, 1, NULL) OVER (PARTITION BY customer_id ORDER BY transaction_date) = DATE_SUB(transaction_date, INTERVAL 1 DAY)
THEN 1
ELSE 0
END AS consecutive_flag
FROM
transactions
), MaxConsecutiveTransactionDays AS (
SELECT
customer_id,
MAX(consecutive_flag) AS max_consecutive_days
FROM
CustomerConsecutiveTransactionDays
GROUP BY
customer_id
), CustomerWithMaxConsecutiveTransactions AS (
SELECT
customer_id,
transaction_date
FROM
CustomerConsecutiveTransactionDays
WHERE
max_consecutive_days = (
SELECT
max_consecutive_days
FROM
MaxConsecutiveTransactionDays
WHERE
customer_id = CustomerConsecutiveTransactionDays.customer_id
)
)
SELECT
customer_id,
GROUP_CONCAT(transaction_date) AS consecutive_transactions
FROM
CustomerWithMaxConsecutiveTransactions
GROUP BY
customer_id;
Explanation:
Calculate Consecutive Transaction Days: The
CustomerConsecutiveTransactionDays
CTE calculates the row number of transactions for each customer and adds a flag (consecutive_flag
) to indicate whether a transaction is consecutive to the previous one.Find Maximum Consecutive Days: The
MaxConsecutiveTransactionDays
CTE finds the maximum consecutive transaction days for each customer.Identify Customers with Maximum Consecutive Transactions: The
CustomerWithMaxConsecutiveTransactions
CTE selects the customers with the maximum consecutive transaction days.Group Transactions by Customer: The final query groups the selected transactions by customer and returns the
customer_id
and a comma-separated list ofconsecutive_transactions
.
Real-World Applications:
This query is useful for analyzing customer loyalty and engagement. Businesses can use it to:
Identify customers who are most actively engaged with their products or services.
Reward customers for consecutive purchases to encourage repeat business.
Target marketing campaigns to customers with high engagement levels.
Group Sold Products By The Date
Problem: Given a table SoldProducts
with the following columns:
id
(int)product_id
(int)date
(date)quantity
(int)
Group the sold products by date and product ID and calculate the total quantity sold for each group.
Solution:
SELECT
date,
product_id,
SUM(quantity) AS total_quantity_sold
FROM
SoldProducts
GROUP BY
date, product_id
Breakdown:
SELECT ...: Selects the
date
,product_id
, and the sum ofquantity
astotal_quantity_sold
.FROM SoldProducts: Specifies the table to be queried from.
GROUP BY ...: Groups the rows in the table by
date
andproduct_id
. The rows with the samedate
andproduct_id
are grouped together.SUM(quantity) ...: Calculates the sum of
quantity
for each group. This gives the total quantity sold for eachdate
andproduct_id
combination.
Example: Consider the following SoldProducts
table:
1
1
2023-01-01
10
2
2
2023-01-01
5
3
1
2023-01-02
15
4
2
2023-01-02
10
The query would produce the following output:
2023-01-01
1
10
2023-01-01
2
5
2023-01-02
1
15
2023-01-02
2
10
Real-World Applications:
This query can be used in real-world scenarios such as:
Analyzing sales trends for different products over time.
Identifying products with the highest and lowest sales on specific dates.
Forecasting future sales based on historical data.
Friend Requests I: Overall Acceptance Rate
Problem:
Given a table FriendRequests
containing friend request records, the task is to find the overall acceptance rate of friend requests.
Table Schema:
CREATE TABLE FriendRequests (
id INT PRIMARY KEY,
sender_id INT NOT NULL,
receiver_id INT NOT NULL,
status INT NOT NULL,
created_at TIMESTAMP NOT NULL
);
SQL Query:
WITH AcceptedRequests AS (
SELECT
sender_id,
receiver_id
FROM FriendRequests
WHERE
status = 1
), TotalRequests AS (
SELECT
sender_id,
COUNT(receiver_id) AS total_requests
FROM FriendRequests
GROUP BY
sender_id
)
SELECT
SUM(AR.sender_id) / SUM(TR.total_requests) AS acceptance_rate
FROM AcceptedRequests AS AR
JOIN TotalRequests AS TR
ON AR.sender_id = TR.sender_id;
Explanation:
Step 1: Get Accepted Requests
The AcceptedRequests
Common Table Expression (CTE) filters the FriendRequests
table to include only accepted requests (i.e., status = 1
).
WITH AcceptedRequests AS (
SELECT
sender_id,
receiver_id
FROM FriendRequests
WHERE
status = 1
)
Step 2: Get Total Requests
The TotalRequests
CTE calculates the total number of requests sent by each sender.
WITH TotalRequests AS (
SELECT
sender_id,
COUNT(receiver_id) AS total_requests
FROM FriendRequests
GROUP BY
sender_id
)
Step 3: Calculate Acceptance Rate
The outer query joins the AcceptedRequests
and TotalRequests
CTEs on the sender_id
and calculates the acceptance rate by dividing the number of accepted requests by the total number of requests.
SELECT
SUM(AR.sender_id) / SUM(TR.total_requests) AS acceptance_rate
FROM AcceptedRequests AS AR
JOIN TotalRequests AS TR
ON AR.sender_id = TR.sender_id;
Output:
The output is a single row with a decimal value representing the overall acceptance rate of friend requests.
Real-World Applications:
Measuring the success of social media platforms in connecting users
Identifying popular users or influencers
Understanding user behavior and engagement patterns
Find Followers Count
Problem Statement
Given a table of users and their followers, find the count of followers for each user.
SQL Query:
SELECT user_id, COUNT(*) AS follower_count
FROM followers
GROUP BY user_id;
Breakdown and Explanation:
SELECT user_id, COUNT(*) AS follower_count: This line selects the user's ID and counts the number of rows in the
followers
table where theuser_id
column matches the current row'suser_id
. The result is stored in a new column namedfollower_count
.FROM followers: This line specifies that the data is being selected from the
followers
table.GROUP BY user_id: This line groups the results by the
user_id
column. This means that for each uniqueuser_id
, the query will return a single row containing theuser_id
and the total count of followers associated with thatuser_id
.
Real-World Application:
This query can be used in social networking applications to display the number of followers for each user. It can also be used for analysis, such as identifying the most popular users or tracking the growth of user followings over time.
Example:
1
10
2
5
3
15
This table shows the number of followers for each user ID. User 1 has 10 followers, User 2 has 5 followers, and User 3 has 15 followers.
Warehouse Manager
Problem Statement
You are given a table WarehouseManagers
with the following schema:
| Column | Type |
|---|---|
| ManagerID | int |
| ManagerName | varchar(255) |
| Department | varchar(255) |
| Salary | int |
Find the Department with the maximum average salary.
SOLUTION
-- Calculate the average salary for each department
SELECT Department, AVG(Salary) AS AverageSalary
FROM WarehouseManagers
GROUP BY Department;
-- Find the department with the maximum average salary
SELECT Department
FROM (
SELECT Department, AVG(Salary) AS AverageSalary
FROM WarehouseManagers
GROUP BY Department
) AS Subquery
WHERE AverageSalary = (SELECT MAX(AverageSalary) FROM Subquery);
Breakdown of the Solution
Calculate the average salary for each department:
SELECT Department, AVG(Salary) AS AverageSalary
FROM WarehouseManagers
GROUP BY Department;
This query calculates the average salary for each department and stores the result in a temporary table called Subquery
.
Find the department with the maximum average salary:
SELECT Department
FROM Subquery
WHERE AverageSalary = (SELECT MAX(AverageSalary) FROM Subquery);
This query finds the department with the maximum average salary from the Subquery
table.
Real-World Application
This query can be used to identify the departments with the highest average salaries, which can be helpful for HR planning and budgeting. For example, a company may want to offer higher bonuses or promotions to employees in departments with the highest average salaries to retain top talent.
Additional Notes
This solution uses a subquery to calculate the maximum average salary. This approach is efficient because it only needs to scan the data once to calculate the average salary for each department.
The
AVG()
function is used to calculate the average salary. This function takes a set of values and returns the average of those values.
Project Employees III
Problem Statement
Given a table of projects and employees, find the number of employees working on each project.
Input Table
1
John
1
Mary
2
Jane
2
Peter
2
Susan
3
Michael
Output Table
1
2
2
3
3
1
Solution
SELECT
Project,
COUNT(*) AS Employee_Count
FROM Projects
JOIN Employees
ON Projects.Project = Employees.Project
GROUP BY
Project;
Explanation
Join the tables: We join the
Projects
andEmployees
tables on theProject
column to create a single table that contains all the project and employee data.Count the employees: We use the
COUNT(*)
function to count the number of employees for each project.Group the results: We group the results by project to get the employee count for each project.
Real-World Applications
This query can be used to track employee workload and resource allocation in a project management system. It can also be used to identify projects that are understaffed or overstaffed.
Potential Applications
Project planning
Resource allocation
Employee management
Performance evaluation
The Number of Users That Are Eligible for Discount
Problem: Find the number of users eligible for a discount.
Data:
CREATE TABLE users (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL,
orders INT NOT NULL DEFAULT 0,
PRIMARY KEY (id),
UNIQUE INDEX idx_email (email)
);
CREATE TABLE orders (
id INT NOT NULL AUTO_INCREMENT,
user_id INT NOT NULL,
amount DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
FOREIGN KEY (user_id) REFERENCES users(id)
);
Solution:
SELECT COUNT(*) AS num_eligible_users
FROM users
WHERE orders >= 3;
Explanation:
COUNT(*)
counts the number of rows.WHERE orders >= 3
filters out users with less than 3 orders.The result is the number of users who have placed at least 3 orders.
Potential Applications:
Identifying customers who are eligible for loyalty discounts.
Segmenting users based on their purchase history.
Analyzing customer behavior and targeting marketing campaigns.
Students Report By Geography
Students Report By Geography
Problem:
Given a table of students with their names, countries, and grades, find the average grade for students from each country.
Solution:
SELECT country, AVG(grade) AS average_grade
FROM students
GROUP BY country;
Breakdown:
The
SELECT
clause specifies the columns to be included in the result:country
and the average grade.The
FROM
clause specifies the table from which to retrieve the data:students
.The
GROUP BY
clause groups the rows in the table by thecountry
column.The
AVG()
function calculates the average grade for each group.
Example:
USA
85
UK
90
France
75
Applications in Real World:
Analyzing student performance by country
Identifying countries with the highest or lowest average grades
Targeting educational resources to specific regions
Big Countries
Problem:
Find all countries with a population greater than 100 million.
SQL Query:
SELECT name
FROM Country
WHERE population > 100000000;
Explanation:
SELECT name: This part of the query selects the
name
column from theCountry
table. This column contains the names of the countries.FROM Country: This specifies that we are selecting data from the
Country
table.WHERE population > 100000000: This is a filter condition. It checks for all rows in the
Country
table where thepopulation
column is greater than 100 million.
Real-World Applications:
The query can be used in various applications, such as:
Analyzing global population trends
Identifying countries with high birth rates
Studying the distribution of population across the world
Sales Analysis II
Problem Statement:
Given a table Sales
with columns product_id
, date
, quantity
, and price
, find the total sales for each product in a given date range.
Solution:
SELECT product_id,
SUM(quantity) AS total_quantity,
SUM(price * quantity) AS total_sales
FROM Sales
WHERE date BETWEEN '2023-01-01' AND '2023-03-31'
GROUP BY product_id;
Breakdown:
SELECT product_id, SUM(quantity) AS total_quantity, SUM(price * quantity) AS total_sales: This line selects the product ID, total quantity sold, and total sales for each product.
FROM Sales: This specifies the table from which to retrieve the data.
WHERE date BETWEEN '2023-01-01' AND '2023-03-31': This filters the rows based on dates within the specified range ('January 1, 2023' to 'March 31, 2023').
GROUP BY product_id: This groups the results by product ID, so that the total sales and quantity are calculated for each product separately.
Example:
| product_id | total_quantity | total_sales |
|-------------|----------------|-------------|
| 1 | 100 | 1000 |
| 2 | 50 | 750 |
| 3 | 75 | 1125 |
Explanation:
This query will return a table with three columns: product ID, total quantity sold, and total sales for products sold between '2023-01-01' and '2023-03-31'.
Performance:
The GROUP BY
operation can be expensive, especially for large datasets. To improve performance, if the table has an index on the product_id
column, the database can use the index to efficiently retrieve the results.
Real-World Applications:
This query can be used in a variety of scenarios, including:
Sales analysis: To track sales for different products over time and identify trends.
Inventory management: To ensure adequate stock levels for high-selling products.
Product performance evaluation: To compare the sales of different products and identify areas for improvement.
Find the Subtasks That Did Not Execute
Problem: Find the Subtasks That Did Not Execute
SQL Query:
SELECT SubtaskID
FROM Subtasks
WHERE TaskID NOT IN (SELECT TaskID FROM CompletedTasks);
Explanation:
This query retrieves the SubtaskID
s of subtasks that do not have a corresponding entry in the CompletedTasks
table.
Breakdown:
Subtasks
Table: Stores information about subtasks, including theirSubtaskID
.CompletedTasks
Table: Stores information about completed tasks, including theirTaskID
.TaskID
Column: This column is present in both tables and establishes a relationship between subtasks and their parent tasks.NOT IN Operator: The
NOT IN
operator checks whether theTaskID
of a subtask is present in theCompletedTasks
table.
Simplified Explanation:
Imagine you have a task management system. Each task can have multiple subtasks. This query finds subtasks that belong to tasks that have not been completed yet.
Real-World Example:
Inventory Management: Find products that have not been checked into the warehouse.
Project Management: Identify sub-steps of a project that still need to be completed.
Code Implementation:
import mysql.connector
# Connect to the database
connection = mysql.connector.connect(
host="localhost",
user="root",
password="",
database="tasks"
)
# Create a cursor to execute queries
cursor = connection.cursor()
# Execute the query
cursor.execute("SELECT SubtaskID FROM Subtasks WHERE TaskID NOT IN (SELECT TaskID FROM CompletedTasks);")
# Fetch the results
results = cursor.fetchall()
# Print the list of subtask IDs
for result in results:
print(result[0]) # SubtaskID
# Close the cursor and connection
cursor.close()
connection.close()
Shortest Distance in a Line
Problem Statement:
Given a line segment defined by two points (x1, y1)
and (x2, y2)
, find the shortest distance from a point (x3, y3)
to the line segment.
SQL Solution:
WITH LineSegment AS (
SELECT
x1, y1, x2, y2
FROM Points
), Distance AS (
SELECT
x3, y3,
SQRT(POW(x3 - x1, 2) + POW(y3 - y1, 2)) AS dist1,
SQRT(POW(x3 - x2, 2) + POW(y3 - y2, 2)) AS dist2,
CASE
WHEN MIN(dist1, dist2) = dist1
THEN (x1, y1)
ELSE (x2, y2)
END AS nearest_point
FROM LineSegment
JOIN Points ON Points.x = LineSegment.x1 AND Points.y = LineSegment.y1
)
SELECT
MIN(SQRT(POW(Distance.x3 - Distance.nearest_point.x, 2) + POW(Distance.y3 - Distance.nearest_point.y, 2))) AS shortest_distance
FROM Distance;
Step-by-Step Explanation:
1. Define the Line Segment:
The LineSegment
CTE (Common Table Expression) defines the line segment using the coordinates of its endpoints.
2. Calculate Distances:
The Distance
CTE calculates the distances from point (x3, y3)
to the two endpoints of the line segment: dist1
and dist2
. It also identifies the nearest endpoint, nearest_point
, to point (x3, y3)
.
3. Find Minimum Distance:
The main query retrieves the minimum of the two distances, dist1
and dist2
, and also finds the corresponding nearest endpoint. This represents the shortest distance from point (x3, y3)
to the line segment.
4. Calculate Final Answer:
The final query calculates the shortest distance from point (x3, y3)
to the line segment by finding the minimum of the distances between the given point and the nearest endpoint identified in the previous step.
Example:
Let's say we have a line segment defined by the points (1, 2)
and (3, 4)
and a point (2, 3)
.
The following query would return the shortest distance:
WITH LineSegment AS (
SELECT
1 AS x1, 2 AS y1, 3 AS x2, 4 AS y2
), Distance AS (
SELECT
2 AS x3, 3 AS y3,
SQRT(POW(2 - 1, 2) + POW(3 - 2, 2)) AS dist1,
SQRT(POW(2 - 3, 2) + POW(3 - 4, 2)) AS dist2,
CASE
WHEN MIN(dist1, dist2) = dist1
THEN (1, 2)
ELSE (3, 4)
END AS nearest_point
FROM LineSegment
JOIN Points ON Points.x = LineSegment.x1 AND Points.y = LineSegment.y1
)
SELECT
MIN(SQRT(POW(Distance.x3 - Distance.nearest_point.x, 2) + POW(Distance.y3 - Distance.nearest_point.y, 2))) AS shortest_distance
FROM Distance;
Result:
shortest_distance = 1
Orders With Maximum Quantity Above Average
Problem Statement
Given a table 'Orders' with the following schema:
| Column | Type |
|---|---|
| id | int |
| product_id | int |
| quantity | int |
Write an SQL query to find all orders with a quantity greater than the average quantity for all orders.
Solution
The following SQL query solves the problem:
SELECT *
FROM Orders
WHERE quantity > (SELECT AVG(quantity) FROM Orders);
Breakdown
The query is a simple SELECT statement that retrieves all rows from the 'Orders' table where the quantity
column is greater than the average quantity of all orders.
The subquery (SELECT AVG(quantity) FROM Orders)
calculates the average quantity of all orders in the table. This value is then used in the main query to filter out orders with quantities that are above average.
Example
Consider the following 'Orders' table:
| id | product_id | quantity |
|---|---|---|
| 1 | 1 | 10 |
| 2 | 2 | 5 |
| 3 | 3 | 15 |
| 4 | 4 | 12 |
| 5 | 5 | 8 |
The average quantity of all orders is:
(10 + 5 + 15 + 12 + 8) / 5 = 10
Therefore, the query SELECT * FROM Orders WHERE quantity > 10
would return the following rows:
| id | product_id | quantity |
|---|---|---|
| 1 | 1 | 10 |
| 3 | 3 | 15 |
| 4 | 4 | 12 |
Real-World Applications
This query can be used in a variety of real-world applications, such as:
Identifying customers who are purchasing above-average quantities of a product
Flagging orders that may be fraudulent due to unusually high quantities
Analyzing sales trends to identify products that are selling well and products that are not selling well
Leetcodify Similar Friends
Problem:
Find pairs of friends who have at least k
common friends.
Table Schema:
CREATE TABLE Friends (
user_id1 INT NOT NULL,
user_id2 INT NOT NULL,
PRIMARY KEY (user_id1, user_id2)
);
Solution:
SELECT
f1.user_id1 AS user1,
f1.user_id2 AS user2
FROM
Friends AS f1
JOIN
Friends AS f2 ON f1.user_id1 = f2.user_id2
WHERE
f1.user_id2 != f2.user_id1
AND EXISTS (
SELECT
*
FROM
Friends AS f3
WHERE
(f3.user_id1 = f1.user_id1 AND f3.user_id2 = f2.user_id2)
OR (f3.user_id1 = f2.user_id1 AND f3.user_id2 = f1.user_id2)
)
AND (
SELECT
COUNT(*)
FROM
Friends AS f4
WHERE
(f4.user_id1 = f1.user_id1 AND f4.user_id2 = f2.user_id2)
OR (f4.user_id1 = f2.user_id1 AND f4.user_id2 = f1.user_id2)
) >= k;
Explanation:
Join the
Friends
table with itself on the condition thatf1.user_id1
is equal tof2.user_id2
. This ensures that we only consider pairs of friends.Filter out pairs where
f1.user_id2
is equal tof2.user_id1
, as these pairs are duplicates.Check if there exists a common friend
f3
betweenf1
andf2
. This is done using the subquery in theEXISTS
clause.Count the number of common friends between
f1
andf2
using another subquery.Filter out pairs with less than
k
common friends.
Example:
SELECT
f1.user_id1 AS user1,
f1.user_id2 AS user2
FROM
Friends AS f1
JOIN
Friends AS f2 ON f1.user_id1 = f2.user_id2
WHERE
f1.user_id2 != f2.user_id1
AND EXISTS (
SELECT
*
FROM
Friends AS f3
WHERE
(f3.user_id1 = f1.user_id1 AND f3.user_id2 = f2.user_id2)
OR (f3.user_id1 = f2.user_id1 AND f3.user_id2 = f1.user_id2)
)
AND (
SELECT
COUNT(*)
FROM
Friends AS f4
WHERE
(f4.user_id1 = f1.user_id1 AND f4.user_id2 = f2.user_id2)
OR (f4.user_id1 = f2.user_id1 AND f4.user_id2 = f1.user_id2)
) >= 2;
This query returns all pairs of friends who have at least 2 common friends.
Possible applications in real world:
Recommending friends to users on social media platforms based on their common connections.
Identifying influencers in a network by analyzing their number of common friends.
Detecting fraudulent activities by identifying connected groups of users.
Confirmation Rate
Problem Statement:
Given a table of bookings, determine the confirmation rate for each hotel.
Table:
bookings (
hotel_id INT,
confirmed BOOLEAN
)
Query:
SELECT
hotel_id,
SUM(confirmed) AS total_confirmed,
COUNT(*) AS total_bookings,
ROUND((SUM(confirmed) * 100.0) / COUNT(*), 2) AS confirmation_rate
FROM bookings
GROUP BY hotel_id;
Explanation:
SUM(confirmed): Counts the number of confirmed bookings for each hotel.
COUNT(*): Counts the total number of bookings for each hotel.
ROUND((SUM(confirmed) * 100.0) / COUNT(*), 2): Calculates the confirmation rate as a percentage by dividing the number of confirmed bookings by the total bookings and multiplying by 100. The
ROUND
function rounds the result to two decimal places.
Result:
The query returns the following columns for each hotel:
hotel_id
: The unique identifier for the hotel.total_confirmed
: The total number of confirmed bookings.total_bookings
: The total number of bookings.confirmation_rate
: The confirmation rate as a percentage.
Real-World Application:
This query is useful for hotel managers to track the performance of their confirmation process. A high confirmation rate indicates that the hotel is efficiently confirming bookings, while a low rate suggests areas for improvement.
Calculate the Influence of Each Salesperson
Problem Statement:
Sales records are maintained in a table "Sales", which has columns like "SalespersonId", "SalesAmount", and "Date". Calculate the influence of each salesperson by computing their average daily sales. The influence is defined as the salesperson's average daily sales divided by the total average daily sales of all salespeople.
SQL Query:
WITH SalespersonDailySales AS (
SELECT
SalespersonId,
DATE(Date) AS SaleDate,
SUM(SalesAmount) AS DailySales
FROM
Sales
GROUP BY
SalespersonId,
SaleDate
), SalespersonTotalSales AS (
SELECT
SalespersonId,
SUM(DailySales) AS TotalSales
FROM
SalespersonDailySales
GROUP BY
SalespersonId
)
SELECT
SalespersonId,
(
AVG(DailySales) / (
SELECT
AVG(DailySales)
FROM
SalespersonDailySales
)
) AS Influence
FROM
SalespersonDailySales
GROUP BY
SalespersonId;
Explanation:
SalespersonDailySales: This Common Table Expression (CTE) calculates the daily sales for each salesperson.
SELECT SalespersonId, DATE(Date) AS SaleDate, SUM(SalesAmount) AS DailySales
It groups the sales records by salesperson and the date of sale, then sums up the sales amounts for each salesperson on each date.
SalespersonTotalSales: This CTE calculates the total sales for each salesperson.
SELECT SalespersonId, SUM(DailySales) AS TotalSales
It groups the
SalespersonDailySales
CTE by salesperson and sums up their daily sales.
Final Query: This query calculates the influence of each salesperson.
(AVG(DailySales) / (SELECT AVG(DailySales) FROM SalespersonDailySales))
This calculates the average daily sales for each salesperson and divides it by the overall average daily sales of all salespeople.
GROUP BY SalespersonId
This groups the results by salesperson, giving the influence of each salesperson.
Real-World Application:
This query can be used to:
Evaluate the performance of individual salespersons.
Identify high-performing salespersons who can mentor or train others.
Make data-driven decisions about sales strategies and resource allocation.
Flight Occupancy and Waitlist Analysis
LeetCode SQL Problem:
Flight Occupancy and Waitlist Analysis
Problem Statement:
Analyze flight data to determine the flight occupancy rate and waitlist status for a given airline.
SQL Solution:
-- Calculate the flight occupancy rate for each flight
WITH FlightOccupancy AS (
SELECT
flight_id,
SUM(passengers) AS total_passengers,
SUM(capacity) AS total_capacity,
ROUND((SUM(passengers) / SUM(capacity)) * 100, 2) AS occupancy_rate
FROM Flights
GROUP BY
flight_id
),
-- Check if a flight has any waitlisted passengers
WaitlistStatus AS (
SELECT
flight_id,
CASE
WHEN SUM(waitlisted) > 0 THEN 'Waitlisted'
ELSE 'Not Waitlisted'
END AS waitlist_status
FROM Flights
GROUP BY
flight_id
)
-- Combine the results
SELECT
FlightOccupancy.flight_id,
FlightOccupancy.total_passengers,
FlightOccupancy.total_capacity,
FlightOccupancy.occupancy_rate,
WaitlistStatus.waitlist_status
FROM FlightOccupancy
INNER JOIN WaitlistStatus
ON FlightOccupancy.flight_id = WaitlistStatus.flight_id;
Breakdown and Explanation:
1. Calculate Flight Occupancy Rate:
The
FlightOccupancy
Common Table Expression (CTE) calculates the total passengers, total capacity, and occupancy rate for each flight.It does this by grouping the data by flight ID and summing the
passengers
andcapacity
columns.The occupancy rate is then calculated as the number of passengers divided by the capacity, multiplied by 100 to get a percentage.
2. Check Waitlist Status:
The
WaitlistStatus
CTE determines if a flight has any waitlisted passengers.It does this by grouping the data by flight ID and summing the
waitlisted
column.If the sum is greater than 0, the waitlist status is set to 'Waitlisted', otherwise it is set to 'Not Waitlisted'.
3. Combine Results:
The main query joins the two CTEs to combine the flight occupancy rate and waitlist status information for each flight.
Real-World Applications:
Airline Management: Airlines can use this analysis to identify flights with low occupancy rates and adjust schedules or pricing accordingly.
Passenger Experience: Customers can use this information to choose flights with lower waitlist rates or higher occupancy rates for a more comfortable flying experience.
Revenue Optimization: Airlines can use this analysis to maximize revenue by selling more seats on flights with high occupancy rates and adjusting prices on flights with low occupancy rates.
User Activity for the Past 30 Days II
Problem Statement:
Find the number of active users for each day in the past 30 days. A user is considered active if they have performed any action on the website or app.
SQL Query:
WITH UserActivity AS (
SELECT
user_id,
DATE(timestamp) AS activity_date
FROM
user_actions
WHERE
timestamp >= DATE('now', '-30 days')
)
SELECT
activity_date,
COUNT(DISTINCT user_id) AS active_users
FROM
UserActivity
GROUP BY
activity_date
ORDER BY
activity_date;
Breakdown:
CTE (UserActivity):
This Common Table Expression (CTE) is used to create a temporary table that contains the user IDs and activity dates for the past 30 days.
DATE(timestamp)
extracts the date from the timestamp column.
Main Query:
SELECT activity_date, COUNT(DISTINCT user_id) AS active_users
: Counts the number of distinct user IDs for each activity date.GROUP BY activity_date
: Groups the results by activity date.ORDER BY activity_date
: Orders the results by activity date in ascending order.
Real-World Applications:
User Engagement Analysis: This query can help website and app owners track user activity over time and identify any trends or fluctuations.
Trend Analysis: By comparing the number of active users across different days, you can identify patterns and make data-driven decisions about marketing campaigns or product updates.
User Segmentation: You can use the results to segment users based on their activity patterns and tailor marketing efforts accordingly.
Example:
Consider the following table:
1
2023-03-01 12:00:00 PM
2
2023-03-02 09:30:00 AM
3
2023-03-03 02:30:00 PM
1
2023-03-04 06:15:00 PM
Running the SQL query on this table would produce the following results:
2023-03-01
1
2023-03-02
2
2023-03-03
3
2023-03-04
1
Movie Rating
Movie Rating
Problem Statement: Given a table of movie ratings by users, find the top 10 movies with the highest average rating.
Input Table:
1
1
5
2
2
4
3
3
3
4
4
5
5
2
4
6
5
2
7
3
4
8
1
3
Output:
1
4
2
4
3
3.5
4
5
5
2
SQL Solution:
WITH MovieAverageRatings AS (
SELECT movie_id, AVG(rating) AS average_rating
FROM ratings
GROUP BY movie_id
)
SELECT movie_id, average_rating
FROM MovieAverageRatings
ORDER BY average_rating DESC
LIMIT 10;
Breakdown:
The
WITH
clause is used to create a common table expression (CTE) calledMovieAverageRatings
.The CTE calculates the average rating for each movie.
The
GROUP BY
clause groups the ratings by movie_id.The
AVG()
function calculates the average rating for each movie.The
SELECT
clause selects the movie_id and average_rating from the CTE.The
ORDER BY
clause sorts the results in descending order of average_rating.The
LIMIT
clause limits the results to the top 10 movies.
Real-World Applications:
Creating a list of recommended movies for users based on their average rating.
Identifying popular movies for marketing or promotion.
Analyzing user preferences and trends in movie ratings.
Employees With Missing Information
Problem Statement:
Find all employees who have missing information in any of these columns: first_name
, last_name
, email
, or phone_number
.
Solution:
SELECT
emp_id,
first_name,
last_name,
email,
phone_number
FROM
employees
WHERE
first_name IS NULL
OR last_name IS NULL
OR email IS NULL
OR phone_number IS NULL;
Breakdown:
Step 1: Select the necessary columns.
We select the employee ID (
emp_id
), first name (first_name
), last name (last_name
), email (email
), and phone number (phone_number
) columns from theemployees
table.
Step 2: Use the
WHERE
clause to filter the results.We filter the results using the
WHERE
clause to return only employees who have at least one of the following columns set toNULL
.first_name IS NULL
: Employees with missing first names.last_name IS NULL
: Employees with missing last names.email IS NULL
: Employees with missing emails.phone_number IS NULL
: Employees with missing phone numbers.
Real-World Applications:
Data Integrity Verification: Ensuring that all employee information is complete and accurate for compliance purposes or internal processes.
Employee Management: Identifying employees with incomplete profiles to avoid communication issues or data errors.
HR Reporting: Generating reports on employees with missing information to improve data quality and overall HR management.
Actors and Directors Who Cooperated At Least Three Times
SQL Implementation:
SELECT
A.actor_name,
D.director_name,
COUNT(DISTINCT M.movie_name) AS num_collaborations
FROM Actor AS A
JOIN Movie AS M
ON A.actor_id = M.actor_id
JOIN Director AS D
ON M.director_id = D.director_id
GROUP BY
A.actor_name,
D.director_name
HAVING
COUNT(DISTINCT M.movie_name) >= 3;
Breakdown and Explanation:
Join Tables: We join the
Actor
(A
),Movie
(M
), andDirector
(D
) tables based on the relationships between them:A.actor_id
=M.actor_id
M.director_id
=D.director_id
Group By: Group the results by the actor's name (
A.actor_name
) and director's name (D.director_name
). This creates groups for each unique actor-director pair.Aggregate Function: Count the distinct movie names (
DISTINCT M.movie_name
) for each group. This gives us the number of collaborations between each actor and director.Having Clause: Filter the results to include only those groups where the number of collaborations is greater than or equal to 3.
Real-World Application:
This query can be used to identify actors and directors who have worked together multiple times in the film industry. It provides insights into long-standing collaborations and can be useful for:
Identifying actors and directors who have a strong professional chemistry.
Analyzing the success rate of collaborations between specific individuals.
Predicting future collaborations based on past history.
Creating lists of notable actor-director pairings for awards or promotional purposes.
The Number of Passengers in Each Bus I
Problem:
You have a table called Buses
that contains information about buses and their passengers. Each row in the table represents a bus, with the following columns:
bus_id
: The unique ID of the bus.passengers
: The number of passengers on the bus.
You want to write a query that finds the number of passengers in each bus.
Solution:
The following query will find the number of passengers in each bus:
SELECT bus_id, passengers
FROM Buses;
Explanation:
The SELECT
statement selects the bus_id
and passengers
columns from the Buses
table. The FROM
statement specifies that the rows to be selected should come from the Buses
table.
Real-world Application:
This query can be used to track the number of passengers on each bus in a public transportation system. This information can be used to improve scheduling, determine which buses need to be replaced, and estimate the revenue generated by each bus.
Potential Applications in the Real World:
Public transportation: Tracking the number of passengers on each bus can help transportation planners improve scheduling and determine which buses need to be replaced.
School transportation: Tracking the number of passengers on each school bus can help schools ensure that there are enough buses to meet the needs of their students.
Private transportation: Tracking the number of passengers on each private bus can help businesses optimize their transportation operations and estimate revenue.
Classes More Than 5 Students
Problem:
Find all classes that have more than five students.
SQL Query:
SELECT class_id, count(*) AS student_count
FROM students
GROUP BY class_id
HAVING student_count > 5;
Breakdown:
FROM students: Selects all rows from the 'students' table.
GROUP BY class_id: Groups the rows by the 'class_id' column, aggregating the rows for each class.
HAVING student_count > 5: Filters the results to include only classes with more than five students.
Explanation:
The
GROUP BY
clause creates a set of groups, each containing the rows that share the same value in the specified column. In this case, we are grouping byclass_id
.The
COUNT(*)
function counts the number of rows in each group. This gives us the total number of students in each class.The
HAVING
clause filters the groups based on a condition. In this case, we are only interested in groups with more than five students.
Applications:
School administrators can use this query to identify classes that have exceeded their capacity.
Teachers can use this query to identify classes that need additional support or resources.
Parents can use this query to find out how many students are in their child's class.
Product's Worth Over Invoices
Problem:
Given two tables:
Products with columns (id, name, price)
Invoices with columns (id, product_id, quantity)
Find the total worth of each product over all invoices.
Solution:
SELECT
p.id,
p.name,
SUM(p.price * i.quantity) AS total_worth
FROM Products p
JOIN Invoices i ON p.id = i.product_id
GROUP BY
p.id, p.name;
Breakdown:
Join the Products and Invoices tables: We use an INNER JOIN to match products with the corresponding invoices. The join condition is
p.id = i.product_id
, which ensures that only products with matching invoices are included.Calculate the product worth: For each matched product and invoice, we calculate the worth as
p.price * i.quantity
. This gives us the value of the product sold in that invoice.Group and aggregate: We group the results by the
p.id
andp.name
columns to get the total worth for each product. TheSUM()
function is used to accumulate the worth values for each product.
Real-World Application:
This query can be used in a business analytics system to track the total sales value of products over a period of time. It allows businesses to analyze product performance, identify trends, and make informed decisions about inventory management and pricing strategies.
Create a Session Bar Chart
Problem Statement:
Given a table sessions
with columns user_id
, timestamp
, find the total number of active users in each hour.
SQL Solution:
SELECT
STRFTIME('%Y-%m-%d %H:00:00', timestamp) AS hour,
COUNT(DISTINCT user_id) AS active_users
FROM sessions
GROUP BY hour
ORDER BY hour;
Breakdown:
STRFTIME('%Y-%m-%d %H:00:00', timestamp)
: Extracts the hour from thetimestamp
column and formats it as'YYYY-MM-DD HH:00:00'
.COUNT(DISTINCT user_id)
: Counts the number of distinctuser_id
values for each hour.GROUP BY hour
: Groups the results by hour.ORDER BY hour
: Orders the results by hour.
Explanation:
The
STRFTIME
function extracts the hour from thetimestamp
column and converts it to the specified format. This ensures that all timestamps within an hour are grouped together.The
COUNT(DISTINCT user_id)
function counts the number of uniqueuser_id
values for each hour. This gives us the total number of active users during that hour.The
GROUP BY
clause groups the results by hour, so we get one row for each hour.The
ORDER BY
clause orders the results by hour, making it easier to read and analyze the data.
Real-World Application:
This query can be used to analyze user activity patterns, such as:
Identifying peak usage hours
Monitoring user engagement over time
Optimizing server capacity based on usage trends
Customer Who Visited but Did Not Make Any Transactions
Problem Statement:
Find all customers who visited the store but did not make any transactions.
SQL Solution:
SELECT DISTINCT c.customer_id, c.first_name, c.last_name
FROM Customers c
LEFT JOIN Transactions t ON c.customer_id = t.customer_id
WHERE t.transaction_id IS NULL;
Explanation:
This query uses a LEFT JOIN
to combine the Customers
table with the Transactions
table. For each customer, the query checks if there is a matching transaction. If there is no matching transaction, the query returns the customer's information in the result set.
Breakdown:
SELECT DISTINCT c.customer_id, c.first_name, c.last_name
: This part of the query selects the columns we are interested in: the customer ID, first name, and last name. TheDISTINCT
keyword is used to ensure that only unique customer records are returned.FROM Customers c
: This part of the query specifies the table we are selecting from, which is theCustomers
table. We assign the aliasc
to this table for brevity.LEFT JOIN Transactions t ON c.customer_id = t.customer_id
: This part of the query performs aLEFT JOIN
between theCustomers
table and theTransactions
table. TheLEFT JOIN
operation matches rows from theCustomers
table with rows from theTransactions
table based on thecustomer_id
column. If there is no matching row in theTransactions
table, thet
table will have aNULL
value for thetransaction_id
column.WHERE t.transaction_id IS NULL
: This part of the query filters the results to include only customers who have aNULL
value for thetransaction_id
column. This means that these customers visited the store but did not make any transactions.
Real-World Applications:
This query can be used in a variety of real-world applications, such as:
Identifying potential customers who may be interested in making a purchase.
Analyzing customer behavior to improve marketing strategies.
Tracking customer engagement with a business.
Calculate Salaries
Problem Statement:
Given a table employees
with the following columns:
employee_id
(primary key)name
salary
Write a SQL query to calculate the salaries of all employees.
Best & Performant Solution:
SELECT
employee_id,
name,
salary
FROM
employees;
Breakdown and Explanation:
SELECT Clause: This clause specifies the columns to be included in the result set. In this case, we want to include all columns:
employee_id
,name
, andsalary
.FROM Clause: This clause specifies the table from which we want to retrieve the data. In this case, we want to retrieve data from the
employees
table.
Real World Applications:
This query can be used in various real-world applications, such as:
Payroll Processing: To determine the salary of each employee for payroll purposes.
Compensation Analysis: To analyze the compensation structure of a company and identify any disparities or trends.
Human Resource Reporting: To generate reports on employee salaries and benefits for internal or external use.
Product Sales Analysis I
Problem Statement
Given a table Sales
with columns product_id
, sales_date
, and sales_amount
, find the total sales for each product category and the percentage contribution of each category to the total sales.
SQL Solution
-- First, create a temporary table to calculate the total sales for each product category
CREATE TEMP TABLE CategorySales AS
SELECT
product_id,
SUM(sales_amount) AS total_sales
FROM Sales
GROUP BY product_id;
-- Then, join the temporary table with the Sales table to get the sales date for each product category
SELECT
s.sales_date,
s.product_id,
cs.total_sales,
(cs.total_sales / SUM(cs.total_sales)) * 100 AS percentage_contribution
FROM Sales s
JOIN CategorySales cs
ON s.product_id = cs.product_id
GROUP BY
s.sales_date,
s.product_id,
cs.total_sales;
Explanation
This solution uses a temporary table to calculate the total sales for each product category. Then, it joins the temporary table with the Sales table to get the sales date for each product category. Finally, it groups the results by sales date, product ID, and total sales to calculate the percentage contribution of each category to the total sales.
Applications
This solution can be used to analyze sales data by product category. This information can be used to identify trends, make informed decisions about product marketing, and improve overall sales performance.
Calculate Trapping Rain Water
Problem Description:
You have a bunch of containers in a row. Each container can hold some amount of water. Some of them are not filled to the brim. Calculate how much rain water you could collect in these containers if it rains.
Example:
Input:
1
3
2
0
3
2
4
1
Output:
2 units of water
Explanation:
There are two potential areas to collect rainwater:
Between containers 1 and 3: The left container is 3 units high and the right container is 2 units high. The height difference is 1 unit, which means you can collect 1 unit of water here.
Between containers 3 and 4: The left container is 2 units high and the right container is 1 unit high. The height difference is 1 unit, which means you can collect 1 unit of water here.
Total water collected: 1 + 1 = 2 units
Best Solution:
WITH Subquery AS (
SELECT *,
LEAD(water_level, 1) OVER (ORDER BY container_id ASC) AS next_level,
LAG(water_level, 1) OVER (ORDER BY container_id DESC) AS prev_level
FROM Container
), MinMax AS (
SELECT MIN(water_level) AS min_level, MAX(water_level) AS max_level
FROM Subquery
)
SELECT SUM(CASE
WHEN water_level < min_level AND next_level IS NOT NULL
THEN min_level - water_level
WHEN water_level < max_level AND prev_level IS NOT NULL
THEN max_level - water_level
ELSE 0
END) AS trapped_water
FROM Subquery
JOIN MinMax ON 1=1;
Explanation:
Subquery:
Add columns for the next and previous water levels to each container.
MinMax:
Calculate the minimum and maximum water levels of all the containers.
Final Query:
For each container, check if the current water level is less than the minimum or maximum water level.
If the current level is less than the minimum, it means water can flow from the next container.
If the current level is less than the maximum, it means water can flow from the previous container.
Sum up the potential water collected from all the containers.
Real-World Applications:
Rainfall analysis: Predicting runoff and flooding risks.
Civil engineering: Designing water retention systems, such as dams and reservoirs.
Agriculture: Optimizing irrigation techniques and minimizing water loss.
Ad-Free Sessions
Problem Statement:
Given two tables, Sessions
and Purchases
, determine the total number of ad-free sessions for each product.
Database Schema:
| Sessions Table | |---|---| | session_id | integer | | product_id | integer | | ad_free | boolean |
| Purchases Table | |---|---| | purchase_id | integer | | product_id | integer | | user_id | integer |
Solution:
SELECT
s.product_id,
SUM(CASE WHEN s.ad_free = 1 THEN 1 ELSE 0 END) AS ad_free_sessions
FROM Sessions AS s
JOIN Purchases AS p
ON s.product_id = p.product_id
GROUP BY
s.product_id;
Breakdown:
Join the
Sessions
andPurchases
tables: Use aJOIN
to match sessions to purchases based on theproduct_id
column. This ensures that you only count sessions for products that have been purchased.Count ad-free sessions: The
CASE
expression checks whether thead_free
column in theSessions
table is set to1
. If it is, the expression evaluates to1
, otherwise it evaluates to0
. TheSUM()
function is then used to count the number of ad-free sessions for each product.Group results: The
GROUP BY s.product_id
clause groups the results by product ID. This allows you to count the ad-free sessions for each product separately.
Example:
1
3
2
5
3
0
Real-World Applications:
This query can be used by businesses to analyze the effectiveness of their ad-free offerings. For example:
Identifying products with the highest ad-free session rates can help businesses decide which products to invest more in.
Tracking changes in ad-free session rates over time can help businesses understand the impact of new ad campaigns or changes to their pricing models.
Comparing ad-free session rates across different platforms or channels can help businesses determine which marketing efforts are most effective.
Reformat Department Table
Problem Statement:
You are given a table called Department
with columns Id
(unique identifier), Name
, and Parent_Id
. The table represents a hierarchical structure where each department has a parent department, except for the root department which has a Parent_Id
of NULL
.
Reformat the table to have a new column called Path
that contains the path from the root department to the current department. The path should be separated by a forward slash (/
).
Example:
Original Table:
1
HR
NULL
2
Sales
1
3
IT
1
4
Dev
2
5
QA
4
Reformatted Table:
1
HR
NULL
/HR
2
Sales
1
/HR/Sales
3
IT
1
/HR/IT
4
Dev
2
/HR/Sales/Dev
5
QA
4
/HR/Sales/Dev/QA
SQL Solution:
ALTER TABLE Department
ADD COLUMN Path VARCHAR(255);
UPDATE Department
SET Path = '/Department.Name'
WHERE Parent_Id IS NULL;
UPDATE Department
SET Path = (
SELECT CASE
WHEN d1.Path IS NULL THEN '/Department.Name'
ELSE d1.Path || '/' || Department.Name
END
FROM Department d1
WHERE d1.Id = Department.Parent_Id
)
WHERE Parent_Id IS NOT NULL;
Breakdown:
Add the 'Path' Column: Use the
ALTER TABLE
statement to add a new column namedPath
of typeVARCHAR(255)
to theDepartment
table.Set the Root Department's Path: For the root department with
Parent_Id
ofNULL
, set thePath
to/Department.Name
. This path represents the department's name at the root level.Recursively Update Child Departments' Paths: Use a recursive
UPDATE
statement to update thePath
column for all child departments. The path of a child department is calculated as the concatenation of its parent department'sPath
and its own name, separated by a forward slash.
Real-World Applications:
The Path
column can be useful in various applications:
Hierarchical Navigation: Easily navigate through the department hierarchy by traversing the paths.
Permission Management: Control user access based on their department's position within the hierarchy.
Reporting and Analysis: Group data and perform analysis based on department paths.
User Interface Design: Display department structures in a tree-like view for user interaction.
Total Sales Amount by Year
Problem Statement:
Given a sales table with the following columns:
order_id
product_id
order_date
quantity
unit_price
Calculate the total sales amount for each year.
SQL Solution:
-- Calculate the total sales amount for each year
SELECT
strftime('%Y', order_date) AS year, -- Extract the year from the order date
SUM(quantity * unit_price) AS total_sales_amount -- Calculate the total sales amount for each year
FROM
sales -- Your sales table
GROUP BY
year -- Group the results by year
ORDER BY
year; -- Order the results by year
Breakdown and Explanation:
strftime('%Y', order_date) AS year
: This line extracts the year from theorder_date
column using thestrftime
function.SUM(quantity * unit_price)
: This line calculates the total sales amount for each year by multiplying thequantity
andunit_price
columns and then summing the results.GROUP BY year
: This line groups the results by year, so that the total sales amount is calculated for each unique year.ORDER BY year
: This line orders the results by year in ascending order.
Real-World Applications:
This query can be used to analyze sales trends over time. For example, a business could use this query to:
Identify years with the highest and lowest sales
Track sales growth or decline over the years
Compare sales performance to previous years
Forecast future sales trends
Maximum Transaction Each Day
Problem Statement:
You are given a table Transactions
that contains the following columns:
id
(primary key)customer_id
amount
date
Find the maximum transaction amount for each day.
Explanation:
The goal of this problem is to find the highest transaction amount that occurred on each day. We can achieve this by grouping the transactions by day and then finding the maximum amount within each group.
SQL Solution:
SELECT
date,
MAX(amount) AS max_amount
FROM Transactions
GROUP BY
date;
Breakdown:
The
SELECT
statement retrieves thedate
column and the maximumamount
for each date.The
FROM
clause specifies theTransactions
table as the source of data.The
GROUP BY
clause groups the transactions bydate
, which means that all transactions that occurred on the same day will be grouped together.The
MAX()
function is used to find the maximumamount
within each group.
Real-World Applications:
This query can be used in various real-world applications, such as:
Identifying the busiest days for a business based on transaction volume.
Analyzing spending patterns and identifying days with unusually high transactions.
Detecting potential fraudulent transactions by comparing daily maximum amounts to established baselines.
Example:
Consider the following Transactions
table:
1
100
100
2023-07-01
2
200
200
2023-07-01
3
300
300
2023-07-02
4
400
400
2023-07-02
5
500
500
2023-07-03
Running the SQL query on this table will produce the following result:
2023-07-01
200
2023-07-02
400
2023-07-03
500
This result shows that the maximum transaction amounts for each day are:
July 1, 2023: $200
July 2, 2023: $400
July 3, 2023: $500
Maximize Items
Problem:
Given a table Items with columns:
id
: Integername
: Stringsize
: Integer
Find the items that maximize the total size of all selected items while ensuring that the total size of selected items does not exceed a given limit.
Constraints:
1 <= id <= 1000
name
is a string1 <= size <= 10000
1 <= limit <= 1000000
Solution:
WITH LargestItems AS (
SELECT id, name, size,
ROW_NUMBER() OVER (ORDER BY size DESC) AS rank
FROM Items
)
SELECT id, name, size
FROM LargestItems
WHERE rank <= (
SELECT COUNT(*)
FROM LargestItems
WHERE SUM(size) <= @limit
);
Explanation:
Create a Common Table Expression (CTE) LargestItems:
This CTE selects all items with their size and rank (largest size first).
Subquery to Find the Maximum Item Count:
The subquery calculates the count of items with a total size that does not exceed the given limit. This gives us the maximum number of items we can select.
Filter LargestItems by Rank:
We filter the LargestItems CTE to select only the items with a rank less than or equal to the maximum item count.
Real-World Applications:
Knapsack Problem: Maximizing the value of items that can be fit into a limited-capacity backpack.
Resource Allocation: Distributing resources (e.g., storage space, processing power) optimally to maximize utilization.
Inventory Management: Determining the most valuable items to stock within a limited warehouse capacity.
Simplified Example:
Items Table:
1
Item A
10
2
Item B
20
3
Item C
30
4
Item D
40
5
Item E
50
Given Limit: 60
Output:
3
Item C
30
4
Item D
40
Customers Who Never Order
Problem Statement:
Given a table orders
that contains the following columns:
order_id
(int): Unique ID of the ordercustomer_id
(int): ID of the customer who placed the orderorder_date
(date): Date when the order was placed
Find the customers who have never placed an order.
Best & Performant Solution:
SELECT customer_id
FROM customers
EXCEPT
SELECT customer_id
FROM orders;
Breakdown and Explanation:
SELECT customer_id FROM customers: This subquery retrieves all the unique customer IDs from the
customers
table.EXCEPT: The
EXCEPT
operator is used to exclude any rows that are present in the second subquery.SELECT customer_id FROM orders: This subquery retrieves all the unique customer IDs from the
orders
table.Putting it together: The
EXCEPT
operator ensures that only the customer IDs that are not present in theorders
table (i.e., customers who haven't placed any orders) are returned.
Real-World Application:
This query can be useful for identifying inactive customers in an e-commerce system. Businesses can use this information to target these customers with special promotions or incentives to encourage them to make purchases.
Hopper Company Queries I
LeetCode Problem:
Find all employees who have a manager and the manager's manager is the CEO.
SQL Query:
SELECT E.employee_id, E.name
FROM Employee E
JOIN Employee M ON E.manager_id = M.employee_id
JOIN Employee C ON M.manager_id = C.employee_id
WHERE C.name = 'CEO';
Breakdown and Explanation:
Join Tables:
We first join the
Employee
table with itself using an inner join (JOIN
). This creates a Cartesian product, meaning it pairs each row in theEmployee
table with every other row.We alias the second table as
M
to represent the manager of each employee.We then join
M
with theEmployee
table again to get the manager of each manager. We alias this third table asC
.
Filter Results:
We use the
WHERE
clause to filter the results based on the condition that the manager of the manager (C.name
) is equal to 'CEO'.
Real-World Application:
This query can be useful in scenarios where you need to identify employees who are reporting to a specific manager and that manager is also reporting to a higher-level manager. For example, in a company with a hierarchical structure, this query can be used to find all employees who are directly or indirectly reporting to the CEO.
Nth Highest Salary
Problem Statement:
Given a table employees
with the following columns:
emp_id
name
salary
Find the Nth highest salary among all employees.
Example Table:
1
John Doe
10000
2
Jane Smith
12000
3
Michael Johnson
15000
4
David Wilson
8000
5
Sarah Jones
9000
Nth Highest Salary Function:
WITH RankedSalaries AS (
SELECT emp_id, name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees
)
SELECT name, salary
FROM RankedSalaries
WHERE rank = N;
Breakdown:
Common Table Expression (CTE) - The
WITH
clause creates a temporary table calledRankedSalaries
that ranks the employees in descending order of salary using theDENSE_RANK()
function, which assigns consecutive ranks to each employee without any gaps.Subquery - The
SELECT
statement selects the name and salary of the employee with the Nth ranking.
How it Works:
The
RankedSalaries
CTE creates a new table that contains the original employee information along with their calculated ranks.The outer
SELECT
statement then filters theRankedSalaries
table to find the employee with the Nth rank.
Example Usage:
To find the 3rd highest salary in the example table:
WITH RankedSalaries AS (
SELECT emp_id, name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
FROM employees
)
SELECT name, salary
FROM RankedSalaries
WHERE rank = 3;
Output:
Michael Johnson
15000
Potential Applications:
This function can be used in various real-world scenarios, such as:
Finding the highest-paid employees in a company
Calculating salary ranges for different job roles
Analyzing salary distribution patterns
Setting HR policies related to compensation and benefits
Customers Who Bought All Products
Problem:
Find the customers who have purchased all the products in a database.
Table Structure:
CREATE TABLE customers (
customer_id INT NOT NULL,
customer_name VARCHAR(255) NOT NULL,
PRIMARY KEY (customer_id)
);
CREATE TABLE products (
product_id INT NOT NULL,
product_name VARCHAR(255) NOT NULL,
PRIMARY KEY (product_id)
);
CREATE TABLE orders (
order_id INT NOT NULL,
customer_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
order_date DATE NOT NULL,
PRIMARY KEY (order_id),
FOREIGN KEY (customer_id) REFERENCES customers (customer_id),
FOREIGN KEY (product_id) REFERENCES products (product_id)
);
Solution:
SELECT
c.customer_id,
c.customer_name
FROM
customers AS c
JOIN
(
SELECT DISTINCT
customer_id
FROM
orders
GROUP BY
customer_id
HAVING
COUNT(DISTINCT product_id) = (
SELECT
COUNT(*)
FROM
products
)
) AS t
ON
c.customer_id = t.customer_id;
Explanation:
Join Customers and Order Details: We join the
customers
table and a subquery that selects distinct customer IDs who have ordered all products (HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(*) FROM products)
).Final Result: The query returns the customer IDs and names of customers who have purchased all products.
Real-World Application:
This query can be used to identify customers who are potentially loyal to a brand or who have a complete set of products in a specific category. This information can be used to target marketing campaigns or provide personalized recommendations.
Highest Grade For Each Student
Problem Statement
Given a table containing student records, find the highest grade for each student.
Example Table:
1
John
90
1
John
92
2
Mary
80
2
Mary
85
Output:
1
92
2
85
SQL Solution:
-- Using the MAX() aggregate function
SELECT student_id, MAX(grade) AS highest_grade
FROM student_records
GROUP BY student_id;
Explanation:
MAX() Aggregate Function: The
MAX()
function returns the maximum value in a group of rows. In this case, it finds the highest grade for each student.GROUP BY Clause: The
GROUP BY
clause groups the results bystudent_id
, so that theMAX()
function is applied separately to each student's grades.
Breakdown:
student_id: The column that identifies each student.
grade: The column that contains the grades.
highest_grade: The column that stores the highest grade for each student.
Real-World Applications:
This query can be used in various educational applications, such as:
Finding the top-performing students in a class.
Calculating average grades for students.
Identifying students who need additional support.
New Users Daily Count
LeetCode Problem: New Users Daily Count
SQL Query:
-- Create a temporary table to store the daily count of new users
CREATE TEMP TABLE DailyNewUserCount AS
SELECT
DATE(created_at) AS date,
COUNT(*) AS new_users
FROM
users
WHERE
NOT EXISTS (
SELECT
*
FROM
users
WHERE
created_at < DATE(users.created_at) AND user_id = users.user_id
)
GROUP BY
date;
-- Select the date and new user count from the temporary table
SELECT
date,
new_users
FROM
DailyNewUserCount;
Explanation:
Step 1: Create Temporary Table with Daily New User Count
We create a temporary table named
DailyNewUserCount
to store the daily count of new users.We use the
DATE()
function to extract the date from thecreated_at
column.We count the number of distinct
user_id
for each date to get the daily new user count.The subquery ensures that we only count users who have not been created on previous dates.
Step 2: Select Date and New User Count
From the temporary table, we select the
date
andnew_users
columns.This gives us a list of dates and the corresponding number of new users for each date.
Real-World Applications:
User Growth Analysis: Tracking new users daily can help businesses understand their user acquisition rate and growth trends.
Campaign Effectiveness: By comparing the daily new user count to marketing campaigns, businesses can evaluate the effectiveness of their user acquisition efforts.
Product Usage Analysis: Analyzing the daily new user count can provide insights into the usage patterns and onboarding experience of new users.
Customer Support Optimization: High daily new user counts may indicate a need for additional customer support resources to assist with onboarding and troubleshoot issues.
Find Third Transaction
Problem Statement:
You have a table Transactions
with the following columns:
id
(int)amount
(int)sender_id
(int)receiver_id
(int)timestamp
(timestamp)
You want to find the third transaction in the table.
Solution:
SELECT
*
FROM Transactions
ORDER BY timestamp
LIMIT 2, 1;
Explanation:
The ORDER BY timestamp
clause orders the transactions by their timestamps in ascending order. The LIMIT 2, 1
clause skips the first two transactions and returns the next one.
Example:
Consider the following table:
1
100
1
2
2023-01-01 12:00:00
2
200
3
4
2023-01-02 14:00:00
3
300
5
6
2023-01-03 16:00:00
4
400
7
8
2023-01-04 18:00:00
5
500
9
10
2023-01-05 20:00:00
The query would return the following result:
3
300
5
6
2023-01-03 16:00:00
This is the third transaction in the table.
Applications in Real World:
This query can be used in various real-world applications, such as:
Identifying fraudulent transactions by looking for anomalous patterns in the sequence of transactions.
Tracking the flow of money in a system by identifying the source and destination of each transaction.
Analyzing customer behavior by understanding the types of transactions they make and the frequency of their transactions.
Rearrange Products Table
Problem:
Rearrange the Products
table so that products with higher price
values are listed first.
Solution:
SELECT *
FROM Products
ORDER BY price DESC;
Explanation:
The ORDER BY
clause allows us to sort the results of a query. In this case, we sort the results in descending order by the price
column. This means that products with higher prices will be listed first.
Real-World Application:
This query can be used in a variety of applications, such as:
Displaying a list of products on a website, with the most expensive products listed first
Generating a report of the most expensive products sold in a given period
Identifying products that are overpriced compared to their competitors
Potential Performance Improvements:
If the Products
table is very large, the query may take a long time to execute. In this case, you can improve performance by creating an index on the price
column. An index is a data structure that helps the database quickly find rows based on the values in a particular column.
To create an index on the price
column, you can use the following query:
CREATE INDEX idx_products_price ON Products (price);
Once the index is created, the query will be able to execute much faster.
Unpopular Books
Problem:
Find books with less than 100 sales.
SQL Query:
SELECT
BookId,
Title,
Sales
FROM
Books
WHERE
Sales < 100;
Breakdown:
The
SELECT
clause specifies the columns to retrieve:BookId
,Title
, andSales
.The
FROM
clause specifies the table to retrieve the data from, which isBooks
in this case.The
WHERE
clause filters the results to only include books with sales less than 100.
Real-World Application:
This query can be used by a bookstore to identify books that are not selling well and need to be discounted or removed from inventory.
Article Views I
Problem: Find the number of views for each article.
SQL Query:
SELECT article_id, COUNT(*) AS views
FROM article_views
GROUP BY article_id;
Breakdown:
article_views: The table containing the article views.
article_id: The ID of the article.
COUNT(*) AS views: The number of views for each article. The alias
views
is used to name the column.GROUP BY article_id: Groups the results by article ID, so that the count is calculated for each article.
Example:
CREATE TABLE article_views (
article_id INT,
user_id INT,
date DATETIME
);
INSERT INTO article_views VALUES
(1, 10, '2023-01-01'),
(1, 20, '2023-01-02'),
(2, 30, '2023-01-03'),
(2, 40, '2023-01-04'),
(3, 50, '2023-01-05');
SELECT article_id, COUNT(*) AS views
FROM article_views
GROUP BY article_id;
Output:
article_id | views
---------- | -----
1 | 2
2 | 2
3 | 1
Real-World Application:
This query can be used to analyze the popularity of articles on a website, such as in a content management system or blog. The results can be used to:
Identify which articles are most popular with users.
Track the performance of different articles over time.
Make decisions about which articles to promote or feature.
Customer Order Frequency
Customer Order Frequency (SQL)
Objective: Find the average number of orders placed by each customer in a given table.
SQL Implementation:
-- Table: orders
-- Columns:
-- id - Order ID
-- customer_id - Customer ID
SELECT
customer_id,
COUNT(*) AS order_count,
AVG(order_count) OVER (PARTITION BY customer_id) AS avg_orders
FROM
orders
GROUP BY
customer_id;
Breakdown:
SELECT: Select columns for the result:
customer_id
: The unique identifier for each customer.order_count
: The total number of orders placed by each customer.avg_orders
: The average number of orders placed by each customer.
FROM: Specify the input table
orders
that contains the order records.GROUP BY: Group the rows by
customer_id
to calculate the order count and average for each customer.COUNT(): Counts the number of orders for each customer and stores it in the
order_count
column.AVG() OVER(): Calculates the average number of orders for each customer using the
PARTITION BY
clause to group bycustomer_id
. TheOVER()
clause specifies that the average should be calculated within each customer partition.
Example:
1
5
5
2
3
3
3
7
7
This result shows that:
Customer with ID 1 has placed 5 orders, averaging 5 orders.
Customer with ID 2 has placed 3 orders, averaging 3 orders.
Customer with ID 3 has placed 7 orders, averaging 7 orders.
Applications:
Customer Segmentation: Identifying customer groups based on their order frequency can help businesses tailor marketing campaigns.
Loyalty Programs: Rewarding customers with higher order frequencies can encourage loyalty and repeat business.
Inventory Management: Understanding the average order frequency for each customer can help businesses optimize inventory levels.
Fraud Detection: Customers with unusually high or low order frequencies may be flagged for potential fraud investigation.
All Valid Triplets That Can Represent a Country
Problem Statement
Given a table Country
containing country information, write a query to find all valid triplets that can represent a country.
Country (code, name)
A valid triplet is a set of three country codes that satisfy the following conditions:
The first two codes are neighboring countries, sharing a border.
The third code is a neighbor of the first country, but not the second country.
SELECT c1.code, c2.code, c3.code
FROM Country c1
JOIN Country c2 ON c1.code != c2.code AND c1.name = c2.name
JOIN Country c3 ON c3.code != c1.code AND c1.code = c3.name
WHERE c2.code != c3.code;
Breakdown and Explanation
Join Countries with Shared Borders:
JOIN Country c2 ON c1.code != c2.code AND c1.name = c2.name
This query joins
Country
with itself (using the aliasc2
) on the condition that the two countries have different codes but the same name, indicating that they share a border.Find Neighbors of the First Country:
JOIN Country c3 ON c3.code != c1.code AND c1.code = c3.name
This query joins
Country
(aliased asc3
) with the original tablec1
. It ensures thatc3
is not the same country asc1
and thatc1
is a neighbor ofc3
.Filter Out Invalid Pairs:
WHERE c2.code != c3.code
This condition eliminates cases where
c2
andc3
are the same country. The result is a list of all valid triplets that can represent a country.
The Most Frequently Ordered Products for Each Customer
Problem Statement
Given a table of customer orders, find the most frequently ordered products for each customer.
Table Schema
CREATE TABLE orders (
customer_id INT NOT NULL,
product_id INT NOT NULL,
order_date DATE NOT NULL
);
Solution
The following SQL query uses the ROW_NUMBER()
function to rank the products for each customer based on the number of orders:
SELECT customer_id,
product_id,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY COUNT(*) DESC) AS rank
FROM orders;
Output
The output of the query is a table with the following columns:
customer_id
: The ID of the customer.product_id
: The ID of the product.rank
: The rank of the product for the customer.
Explanation
The ROW_NUMBER()
function is used to assign a rank to each product for each customer. The PARTITION BY
clause groups the products by customer, and the ORDER BY
clause orders the products by the number of orders in descending order.
The ROW_NUMBER()
function assigns a rank to each product within each partition. The first product in each partition is assigned a rank of 1, the second product is assigned a rank of 2, and so on.
Real-World Application
This query can be used to identify the most popular products for each customer. This information can be used to personalize marketing campaigns, improve product recommendations, and optimize inventory management.
Potential Applications
Personalized marketing campaigns: Retailers can use this information to send targeted marketing campaigns to customers based on their purchase history. For example, a retailer could send a discount coupon for a customer's favorite product.
Improved product recommendations: Online retailers can use this information to recommend products to customers based on their past purchases. For example, an online retailer could recommend a product that is similar to a product that the customer has purchased in the past.
Optimized inventory management: Retailers can use this information to optimize their inventory by stocking more of the products that are most popular with their customers. This can help to reduce lost sales and improve customer satisfaction.
Friday Purchases I
Problem Statement
Find the total sum of purchases made on Fridays for the following table:
1
2023-01-09
10
2
2023-01-10
20
3
2023-01-11
30
4
2023-01-12
40
5
2023-01-13
50
Solution
SELECT SUM(purchase_amount)
FROM purchases
WHERE strftime('%w', purchase_date) = '5';
Explanation
strftime('%w', purchase_date) = '5': This condition checks if the day of the week for the
purchase_date
column is equal to '5', which represents Friday (according to the ISO 8601 calendar).SUM(purchase_amount): Calculates the sum of all
purchase_amount
values that satisfy the condition in step 1, representing the total amount spent on Fridays.
Real-Time Example
Suppose you have an online shopping website and you want to track the total revenue generated on Fridays. The provided SQL query can be used to retrieve this information. The results can be used to analyze sales patterns and optimize marketing campaigns for higher conversions on Fridays.
Potential Applications
Sales Analysis: Determine the days of the week with the highest sales.
Marketing Optimization: Target users with promotions and discounts on days with high purchasing activity.
Inventory Management: Predict demand based on historical sales patterns by day of the week.
Revenue Forecasting: Estimate future revenue based on historical data and anticipated trends.
Tournament Winners
Problem Statement:
Tournament Winners (LeetCode Problem #1046)
Write an SQL query to find the players who won at least one tournament.
Schema:
| Table: Players |
| Column: id | Type: INTEGER | Primary Key |
| Column: name | Type: VARCHAR(255) |
| Column: team | Type: VARCHAR(255) |
| Table: Tournaments |
| Column: id | Type: INTEGER | Primary Key |
| Column: name | Type: VARCHAR(255) |
| Column: winner_id | Type: INTEGER | Foreign Key (References Players.id) |
Solution:
SELECT
P.id,
P.name,
P.team
FROM Players AS P
INNER JOIN Tournaments AS T
ON P.id = T.winner_id;
Explanation:
Join the Tables: We join the
Players
andTournaments
tables on thewinner_id
column, which links the player who won a tournament to the tournament itself.Filter for Winners: The
INNER JOIN
operator only returns rows that have matching values in both tables. In this case, it selects only the players who have won at least one tournament.Select Player Data: The
SELECT
statement retrieves theid
,name
, andteam
columns from thePlayers
table, which contains the information about the winning players.
Real-World Applications:
Track and reward players for their accomplishments in tournaments.
Identify and showcase the most successful teams and individuals in a competition.
Analyze player performance and team dynamics for improvement.
Number of Transactions per Visit
Problem:
You are given a table called Transactions
with the following schema:
| Column Name | Type |
|-------------|------|
| user_id | int |
| visit_id | int |
| transaction_id | int |
| amount | int |
Each row in this table represents a transaction made by a user during a visit to a website. You need to write an SQL query to find the number of transactions made by each user per visit.
Solution:
SELECT user_id, visit_id, COUNT(*) AS num_transactions
FROM Transactions
GROUP BY user_id, visit_id;
Explanation:
This query uses the GROUP BY
clause to group the transaction records by the user_id
and visit_id
columns. The COUNT(*)
function is then used to count the number of transactions in each group.
Breakdown:
The
SELECT
clause selects theuser_id
,visit_id
, andnum_transactions
columns.The
FROM
clause specifies theTransactions
table.The
GROUP BY
clause groups the rows by theuser_id
andvisit_id
columns.The
COUNT(*)
function counts the number of rows in each group.
Real-World Example:
This query can be used to analyze website usage data. For example, you could use it to identify users who make multiple transactions during a single visit. This information could be used to target those users with personalized offers or discounts.
Form a Chemical Bond
Problem Statement:
Given two tables:
elements
(id, symbol, atomic_number)bonds
(element1_id, element2_id, bond_type)
Form a chemical bond between two elements based on their atomic numbers.
Solution:
ALTER TABLE bonds
ADD COLUMN bond_length FLOAT;
UPDATE bonds
SET bond_length =
CASE
WHEN element1_id = element2_id THEN 0
WHEN (element1_id + element2_id) % 3 = 0 THEN 1.0
WHEN (element1_id + element2_id) % 5 = 0 THEN 1.5
ELSE 2.0
END;
Explanation:
ALTER TABLE bonds ADD COLUMN bond_length FLOAT creates a new column
bond_length
of data typeFLOAT
in thebonds
table.UPDATE bonds SET bond_length = ... updates the
bond_length
column based on the following conditions:If the two elements have the same atomic number, the bond length is 0.
If the sum of the atomic numbers is divisible by 3, the bond length is 1.0.
If the sum of the atomic numbers is divisible by 5, the bond length is 1.5.
Otherwise, the bond length is 2.0.
Real-World Applications:
This query can be used to:
Simulate chemical reactions: Predict the bond lengths of molecules formed by combining different elements.
Design materials: Determine the strength and properties of materials based on the bond lengths between atoms.
Understand molecular structure: Analyze the geometric arrangement of atoms in molecules.
Game Play Analysis IV
LeetCode Problem: Game Play Analysis IV
Problem Statement:
Given a table Gameplay
that records player gameplay data, where:
player_id
is the ID of the playergame_id
is the ID of the gameevent_type
is the type of event that occurred during the game, either"START"
or"END"
timestamp
is the timestamp of the event
Find the number of players who have completed at least 5 games.
Best & Performant Solution:
WITH PlayerGamesCompleted AS (
SELECT player_id, COUNT(*) AS num_games_completed
FROM Gameplay
WHERE event_type = 'END'
GROUP BY player_id
HAVING COUNT(*) >= 5
), PlayerCount AS (
SELECT COUNT(*) AS num_players_completed_5_games
FROM PlayerGamesCompleted
)
SELECT num_players_completed_5_games;
Breakdown:
1. PlayerGamesCompleted Common Table Expression (CTE):
Groups gameplay events by
player_id
and counts the number of"END"
events for each player.Filters out players who have completed less than 5 games.
2. PlayerCount CTE:
Counts the number of players who have completed at least 5 games from the
PlayerGamesCompleted
CTE.
3. Final Query:
Selects the count of players who have completed at least 5 games from the
PlayerCount
CTE.
Simplified Explanation:
We first count the number of completed games for each player.
Then, we only keep the players who have completed at least 5 games.
Finally, we count the number of players in this filtered group to get the number of players who have completed at least 5 games.
Real-World Applications:
This query can be used in game analytics to identify players who are highly engaged and have progressed significantly in the game. This information can be used to reward active players, offer them exclusive perks, or track player retention rates.