ltcsql0


Shortest Distance in a Plane

Problem:

In a plane, you are given the coordinates of two points (x1, y1) and (x2, y2). Find the shortest distance between these two points.

SQL Solution:

SELECT SQRT(POW((x2 - x1), 2) + POW((y2 - y1), 2)) AS distance
FROM table_name
WHERE ...;

Breakdown:

  • The POW() function calculates the square of a number.

  • The SQRT() function calculates the square root of a number.

  • The + operator adds two numbers.

The formula for calculating the distance between two points is:

distance = sqrt((x2 - x1)^2 + (y2 - y1)^2)

The SQL query uses this formula to calculate the distance between the two points.

Real-World Application:

This problem can be applied in various real-world scenarios, such as:

  • Finding the shortest driving distance between two cities.

  • Calculating the distance between two points on a map.

  • Determining the minimum distance required to reach a destination.


Find Candidates for Data Scientist Position

Problem Statement:

You have a table called candidates with the following columns:

  • candidate_id - Unique identifier for each candidate

  • name - Name of the candidate

  • email - Email address of the candidate

  • skills - A comma-separated list of skills possessed by the candidate

You want to find all candidates who have at least one of the skills: "Python", "SQL", and "Machine Learning".

Best & Performant SQL Solution:

SELECT
  *
FROM candidates
WHERE
  skills LIKE '%Python%'
  OR skills LIKE '%SQL%'
  OR skills LIKE '%Machine Learning%';

Breakdown and Explanation:

  1. LIKE Operator:

    The LIKE operator is used to perform pattern matching in SQL. It checks if a string matches a specified pattern. The % wildcard character matches any number of characters, including no characters.

  2. Using LIKE with Skills:

    In the WHERE clause, we use the LIKE operator with each skill we want to match. For example, skills LIKE '%Python%' will match any candidate whose skills column contains the string "Python".

  3. OR Operator:

    The OR operator is used to combine multiple conditions. In this case, we are combining the three conditions using OR. This means that a candidate will be selected if any of the three conditions are met.

Real-World Applications:

This query can be used in a variety of real-world applications, such as:

  • Candidate Screening: HR professionals can use this query to identify candidates who possess specific skills required for a job.

  • Skill Analysis: Managers can use this query to analyze the skills of their employees and identify areas where training or development is needed.

  • Market Research: Companies can use this query to research the availability of candidates with certain skills in a particular market.

Example:

Consider the following candidates table:

candidate_id
name
email
skills

1

John Doe

john.doe@example.com

Python, SQL, Machine Learning

2

Jane Smith

jane.smith@example.com

SQL, Machine Learning

3

Mark Jones

mark.jones@example.com

Java, C++, JavaScript

Running the above query on this table will return the following result:

candidate_id
name
email
skills

1

John Doe

john.doe@example.com

Python, SQL, Machine Learning

2

Jane Smith

jane.smith@example.com

SQL, Machine Learning

Note: For better performance, you can create an index on the skills column for faster lookups.


The Number of Rich Customers

Problem Statement:

Given a table called Customers, which contains the following columns:

  • id - Integer: Unique identifier for each customer

  • name - String: Customer's name

  • amount - Integer: Transaction amount for each customer

Find the number of distinct customers who have made transactions of more than a certain amount X.

SQL Solution:

SELECT COUNT(DISTINCT id)
FROM Customers
WHERE amount > X;

Explanation:

  • SELECT COUNT(DISTINCT id): This expression counts the number of distinct id values, which corresponds to the number of distinct customers. The DISTINCT keyword ensures that each unique customer is counted only once.

  • FROM Customers: The query retrieves data from the Customers table.

  • WHERE amount > X: This condition filters the rows where the amount column is greater than the specified value X. Only customers who have made transactions of more than X are included in the count.

Example:

Consider the following Customers table:

id
name
amount

1

Alice

100

2

Bob

50

3

Cindy

150

4

David

75

5

Emily

200

If we want to find the number of customers who have made transactions of more than 125, we would run the following query:

SELECT COUNT(DISTINCT id)
FROM Customers
WHERE amount > 125;

This query would return the result:

count

2

Explanation: There are two customers, Alice and Emily, who have made transactions of more than 125.

Real-World Application:

This query can be used to analyze customer spending patterns and identify high-value customers. For example, a retail company could use this query to identify customers who have spent a certain amount in the past month and offer them exclusive discounts or rewards.


Biggest Window Between Visits

Problem Statement

Given a table of patient_visits that logs when patients visit a clinic, find the largest time gap between any two consecutive visits for each patient.

SQL Query:

WITH PatientTimeGaps AS (
    SELECT patient_id,
           visit_date,
           DATEDIFF(visit_date, LAG(visit_date) OVER (PARTITION BY patient_id ORDER BY visit_date)) AS time_gap
    FROM patient_visits
)

SELECT patient_id, MAX(time_gap) AS largest_time_gap
FROM PatientTimeGaps
GROUP BY patient_id;

Explanation:

This query uses a common table expression (CTE) named PatientTimeGaps to calculate the time difference between each patient's consecutive visits.

Breakdown:

  1. Window Function (LAG): The LAG function shifts the visit_date values back by one row, allowing us to calculate the time difference between current and previous visits.

  2. Partitioning: The PARTITION BY patient_id clause ensures that the LAG function calculates time gaps for each patient separately.

  3. Subquery (CTE): The CTE PatientTimeGaps stores the time gaps for each patient.

  4. Main Query (Outer Query): The outer query then finds the maximum time gap for each patient using GROUP BY and MAX.

Real-World Applications:

  • Patient Monitoring: Tracking the time between visits can help healthcare providers monitor patients' health and adherence to treatment plans.

  • Predictive Analytics: By analyzing the distribution of time gaps, healthcare systems can predict future visit patterns and adjust staffing or resources accordingly.

  • Patient Engagement: Identifying long time gaps between visits can prompt outreach efforts to re-engage patients with their care.


Number of Accounts That Did Not Stream

Problem Statement:

Given a database of user accounts and their streaming activity, find the number of accounts that have not streamed anything.

Database Schema:

CREATE TABLE accounts (
  account_id INT PRIMARY KEY,
  name VARCHAR(255)
);

CREATE TABLE streams (
  stream_id INT PRIMARY KEY,
  account_id INT,
  start_time TIMESTAMP,
  end_time TIMESTAMP,
  FOREIGN KEY (account_id) REFERENCES accounts (account_id)
);

SQL Query:

SELECT COUNT(*) AS num_inactive_accounts
FROM accounts
EXCEPT
SELECT COUNT(DISTINCT account_id) AS num_active_accounts
FROM streams;

Explanation:

  • The SELECT COUNT(*) statement counts the number of rows in the accounts table, which represents the total number of accounts.

  • The EXCEPT operator subtracts the num_active_accounts count from the total number of accounts.

  • The DISTINCT keyword in the streams query ensures that only unique account_id values are counted, indicating active accounts.

  • By subtracting the number of active accounts from the total number of accounts, we get the number of inactive accounts (accounts with no streaming activity).

Real-World Applications:

  • Analytics dashboards: To track the percentage of inactive accounts and understand user engagement.

  • Targeted marketing campaigns: To identify users who may benefit from introductory streaming offers.

  • Product development: To gather insights into user behavior and improve streaming recommendations.


Group Employees of the Same Salary

Problem Statement

Given a table employees with columns id, name, and salary, group employees with the same salary together.

Table: employees

id
name
salary

1

John Doe

10000

2

Jane Doe

10000

3

David Doe

15000

4

Mary Doe

15000

5

Tom Doe

20000

Result Table:

salary
names

10000

John Doe, Jane Doe

15000

David Doe, Mary Doe

20000

Tom Doe

Solution

Use the GROUP_CONCAT() function to concatenate the names of employees with the same salary.

SELECT
    salary,
    GROUP_CONCAT(name) AS names
FROM
    employees
GROUP BY
    salary;

Breakdown

  • SELECT salary, GROUP_CONCAT(name) AS names: Selects the salary and concatenated names for each group.

  • FROM employees: Specifies the input table.

  • GROUP BY salary: Groups the rows by salary.

  • GROUP_CONCAT(name): Concatenates the names of employees within each salary group using a comma separator.

Real-World Applications

This query can be useful in various real-world scenarios, such as:

  • Salary Analysis: Analyzing the distribution of salaries within an organization.

  • Payroll Management: Identifying employees with the same salary for payroll processing.

  • Employee Compensation: Grouping employees by salary for performance reviews and compensation decisions.

  • HR Reporting: Generating reports on the number of employees at different salary levels.


Loan Types

Problem Statement

Write a SQL query to find all loan types and the number of loans for each type.

Table Schema

CREATE TABLE loans (
  id INT PRIMARY KEY,
  type VARCHAR(255) NOT NULL,
  amount DECIMAL(10, 2) NOT NULL,
  term INT NOT NULL
);

Example Data

INSERT INTO loans (id, type, amount, term) VALUES
(1, 'Personal', 10000.00, 12),
(2, 'Business', 20000.00, 24),
(3, 'Mortgage', 30000.00, 36),
(4, 'Personal', 15000.00, 18),
(5, 'Business', 25000.00, 30);

Query

SELECT type, COUNT(*) AS num_loans
FROM loans
GROUP BY type;

Results

| type | num_loans |
|---|---|
| Business | 2 |
| Mortgage | 1 |
| Personal | 2 |

Explanation

The query uses the GROUP BY clause to group the loan types together. The COUNT(*) function is used to count the number of loans for each type. The results are sorted by the type column.

Applications

This query can be used to analyze the distribution of loan types in a database. This information can be used to make decisions about which types of loans to offer, and to set interest rates and other terms for each type of loan.


Market Analysis I

Problem Statement:

You are given a table called Orders that contains the following columns:

  • order_id: The unique ID of an order.

  • product_id: The ID of the product purchased.

  • quantity: The number of units of the product purchased.

  • price: The price of each unit of the product purchased.

  • order_date: The date the order was placed.

You need to write a SQL query to calculate the total revenue generated by each product in the given date range. The date range is specified by the start_date and end_date parameters.

Best & Performant Solution:

SELECT
  product_id,
  SUM(quantity * price) AS total_revenue
FROM Orders
WHERE
  order_date BETWEEN start_date AND end_date
GROUP BY
  product_id;

Breakdown and Explanation:

1. Select Columns:

SELECT
  product_id,
  SUM(quantity * price) AS total_revenue

We select the product_id column to identify each product and the total revenue generated by each product calculated as SUM(quantity * price).

2. Filter Orders:

WHERE
  order_date BETWEEN start_date AND end_date

We filter the orders to include only those placed within the specified date range. This ensures that we only calculate revenue for orders within the given period.

3. Group By Products:

GROUP BY
  product_id

We group the results by product_id to aggregate the total revenue for each product.

Performance Considerations:

The query is efficient because it uses an index on the order_date column. This allows MySQL to quickly retrieve the orders within the specified date range. Additionally, grouping the results by product_id reduces the number of rows that need to be processed, improving performance.

Real-World Applications:

This query can be used to calculate revenue for various time periods, such as daily, weekly, or monthly, providing valuable insights for businesses to analyze sales trends, identify top-selling products, and make informed decisions.


Longest Winning Streak

Problem Statement: Given a table of games played by players, find the longest winning streak for each player.

Example Table:

CREATE TABLE Games (
  player_id INT,
  game_date DATE,
  won BOOLEAN
);

INSERT INTO Games (player_id, game_date, won) VALUES
(1, '2023-01-01', TRUE),
(1, '2023-01-02', FALSE),
(1, '2023-01-03', TRUE),
(1, '2023-01-04', TRUE),
(1, '2023-01-05', FALSE),
(2, '2023-02-01', TRUE),
(2, '2023-02-02', FALSE),
(2, '2023-02-03', TRUE),
(2, '2023-02-04', TRUE),
(2, '2023-02-05', TRUE);

Solution:

SELECT
  g1.player_id,
  MAX(g2.game_date - g1.game_date) AS longest_winning_streak
FROM Games AS g1
JOIN Games AS g2
  ON g1.player_id = g2.player_id
  AND g2.won = TRUE AND g2.game_date > g1.game_date
GROUP BY
  g1.player_id;

Breakdown:

  1. Join the Games table with itself:

    JOIN Games AS g2
    ON g1.player_id = g2.player_id

    This creates a self-join that connects each game to all subsequent games played by the same player.

  2. Filter for wins:

    AND g2.won = TRUE

    This condition ensures that we only consider games where the player won.

  3. Filter for later dates:

    AND g2.game_date > g1.game_date

    This condition ensures that we only consider games that occurred after the previous game.

  4. Calculate the winning streak:

    MAX(g2.game_date - g1.game_date)

    This expression calculates the difference between the game dates of the most recent winning game and the first winning game, which gives us the longest winning streak.

  5. Group by player ID:

    GROUP BY
    g1.player_id;

    This groups the results by player ID to find the longest winning streak for each player.

Example Output:

+----------+--------------------------+
| player_id | longest_winning_streak |
+----------+--------------------------+
| 1         | 2                        |
| 2         | 3                        |
+----------+--------------------------+

Real-World Applications:

  • Tracking performance trends for athletes or teams.

  • Analyzing sales records to identify the most successful sales periods.

  • Monitoring customer churn rates to identify risk factors.


Game Play Analysis V

Game Play Analysis V

Problem:

Given a table of game play data, find the average score for each player who has played at least 3 games.

SQL Query:

SELECT player_id, AVG(score) AS average_score
FROM game_plays
GROUP BY player_id
HAVING COUNT(*) >= 3;

Breakdown:

  • SELECT player_id, AVG(score) AS average_score: Selects the player's ID and the average score for each player.

  • FROM game_plays: Specifies the table containing the game play data.

  • GROUP BY player_id: Groups the results by player ID. This allows us to calculate the average score for each player.

  • HAVING COUNT(*) >= 3: Filters the results to only include players who have played at least 3 games.

Example:

SELECT player_id, AVG(score) AS average_score
FROM game_plays
GROUP BY player_id
HAVING COUNT(*) >= 3;

+----------+-----------------+
| player_id | average_score |
+----------+-----------------+
| 1         | 60.0           |
| 2         | 75.0           |
| 3         | 85.0           |
+----------+-----------------+

This query will return the average score for each player who has played at least 3 games. In this example, player 1 has an average score of 60.0, player 2 has an average score of 75.0, and player 3 has an average score of 85.0.

Real-World Applications:

This query can be used to analyze game play data and identify players who are performing well. It can also be used to compare the performance of different players or teams.


Count Artist Occurrences On Spotify Ranking List

Problem Statement:

Count the number of times each artist appears in a given Spotify ranking list.

SQL Solution:

WITH ArtistOccurrences AS (
  SELECT
    a.name AS artist_name,
    COUNT(*) AS occurrence_count
  FROM artists a
  JOIN albums b ON a.id = b.artist_id
  JOIN songs c ON b.id = c.album_id
  JOIN spotify_ranking_list d ON c.id = d.song_id
  GROUP BY
    artist_name
)
SELECT
  artist_name,
  occurrence_count
FROM ArtistOccurrences
ORDER BY
  occurrence_count DESC;

Explanation:

  1. Create a CTE (Common Table Expression) called ArtistOccurrences:

    • This CTE calculates the artist occurrences by joining the artists, albums, songs, and spotify_ranking_list tables.

    • The COUNT(*) function counts the number of songs by each artist that appear in the ranking list.

  2. Select the Artist Name and Occurrence Count:

    • The main query selects the artist_name and occurrence_count from the ArtistOccurrences CTE.

  3. Order the Results:

    • The query orders the results by the occurrence_count in descending order, showing the artists with the highest number of occurrences first.

Real-World Application:

This query can be used to identify popular artists on Spotify, track artist trends, and make recommendations based on user preferences. It can also be used to analyze the performance of artists' albums and songs in the ranking list.

Example:

Consider the following data:

artists
albums
songs
spotify_ranking_list

Artist A

Album A

Song A

1

Artist A

Album A

Song B

2

Artist B

Album B

Song C

3

Artist C

Album C

Song D

4

The query would produce the following output:

artist_name
occurrence_count

Artist A

2

Artist B

1

Artist C

1

This shows that Artist A has the highest number of occurrences in the Spotify ranking list with 2 songs, followed by Artist B and Artist C with 1 song each.


The Number of Passengers in Each Bus II

LeetCode Problem:

Number of Passengers in Each Bus II

Problem Statement:

Given a table called Trips that records passenger journeys on buses, write a SQL query to count the total number of passengers on each bus.

Table Schema:

CREATE TABLE Trips (
  bus_id INT NOT NULL,
  passenger_count INT NOT NULL,
  start_time DATETIME NOT NULL,
  end_time DATETIME NOT NULL
);

Sample Data:

| bus_id | passenger_count | start_time        | end_time          |
|--------|-----------------|-------------------|--------------------|
| 1       | 10              | 2023-01-01 10:00 | 2023-01-01 11:00 |
| 2       | 15              | 2023-01-01 11:00 | 2023-01-01 12:00 |
| 1       | 5               | 2023-01-01 12:00 | 2023-01-01 13:00 |
| 2       | 20              | 2023-01-01 13:00 | 2023-01-01 14:00 |

Solution:

SELECT bus_id, SUM(passenger_count) AS total_passengers
FROM Trips
GROUP BY bus_id;

Breakdown:

  1. SELECT bus_id, SUM(passenger_count) AS total_passengers: Calculates the total number of passengers for each bus. It returns the bus ID and the sum of the passenger count for that bus.

  2. FROM Trips: Specifies the Trips table as the data source.

  3. GROUP BY bus_id: Groups the results by the bus ID. This ensures that the passenger count is summed up for each unique bus.

Result:

| bus_id | total_passengers |
|--------|-----------------|
| 1       | 15              |
| 2       | 35              |

Example:

Using the sample data, the query will produce the following result:

| bus_id | total_passengers |
|--------|-----------------|
| 1       | 15              |
| 2       | 35              |

Applications:

  • Passenger Transportation Tracking: Tracking the total number of passengers for each bus can help transportation companies optimize their routes and schedules.

  • Bus Sales Analysis: Businesses can use this query to determine which buses generate the highest revenue by analyzing the total number of passengers carried.

  • Customer Insights: By analyzing passenger counts, businesses can gain insights into passenger travel patterns and demographics.


Find Expensive Cities

Problem: Find Expensive Cities

SQL Query:

SELECT City, PriceLevel
FROM Costs
WHERE PriceLevel > (
    SELECT AVG(PriceLevel)
    FROM Costs
)
ORDER BY PriceLevel DESC;

Breakdown:

The query performs the following steps:

  1. Select the City and PriceLevel columns from the Costs table: This retrieves the city names and their corresponding price levels from the database.

  2. Filter the results where the PriceLevel is greater than the average: The query calculates the average price level of all cities using the subquery (SELECT AVG(PriceLevel) FROM Costs). It then filters out cities whose price levels exceed this average.

  3. Order the results in descending order of PriceLevel: This sorts the cities from most expensive to least expensive based on their price levels.

Simplified Explanation:

Imagine a restaurant menu where each dish has a price level. We want to find out which cities have dishes that are more expensive than the average dish price across all cities.

  1. Read the menu (Costs table): We start by reading the menu, which contains the city names and their corresponding dish prices (price levels).

  2. Find the average price (Average subquery): We then calculate the average price of all dishes on the menu. This represents the expected price level.

  3. Identify expensive dishes (Filter condition): We focus on dishes that are more expensive than the average. This filter separates the high-priced dishes from the more moderately priced ones.

  4. Rank the cities (Order by clause): Finally, we arrange the cities in order, starting with the most expensive dishes and ending with the least expensive dishes.

Real-World Application:

This query can be useful for travelers, tourists, or businesses evaluating the cost of living in different cities. It helps identify cities where goods and services tend to be more expensive than average, allowing individuals and organizations to make informed decisions about their budget and expenses.


Running Total for Different Genders

Problem Statement:

You have a table called person with columns:

  • id (primary key)

  • gender

  • age

  • name

Calculate the running total for each gender group. The running total is the sum of all ages up to that point.

Example Table:

id
gender
age
name

1

Male

20

John

2

Female

15

Jane

3

Male

25

Peter

4

Female

22

Susan

Expected Output:

gender
running_total

Male

45

Female

37

Solution:

Step 1: Create Common Table Expression (CTE)

WITH GenderRunningTotal AS (
  SELECT
    gender,
    SUM(age) OVER (PARTITION BY gender ORDER BY id) AS running_total
  FROM
    person
)
  • WITH GenderRunningTotal AS (...) creates a CTE.

  • SUM(age) OVER (PARTITION BY gender ORDER BY id) calculates the running total for each gender.

Step 2: Select Gender and Running Total

SELECT DISTINCT
  gender,
  running_total
FROM
  GenderRunningTotal;
  • SELECT DISTINCT gender, running_total removes duplicates.

  • FROM GenderRunningTotal uses the CTE created in Step 1.

Explanation:

  1. The CTE GenderRunningTotal calculates the running total for each gender by using the OVER clause:

    • PARTITION BY gender groups the data by gender.

    • ORDER BY id sorts the data by ID within each gender group.

    • SUM(age) calculates the cumulative sum of ages.

  2. The main query selects distinct values of gender and their corresponding running totals from the CTE.

Real World Applications:

  • Sales Data: Calculate the running total of sales for different product categories or regions to track sales trends.

  • Patient Records: Track the cumulative number of patients seen by a doctor over time to monitor their workload.

  • Inventory Management: Keep track of the running total of inventory items in each warehouse to optimize stock levels.


Consecutive Numbers

Problem:

Given a table Nums with integer values, find the longest consecutive sequences of consecutive numbers.

Table:

Nums (
    num INT
)

Query:

# Write your MySQL query statement below
SELECT
    MAX(LENGTH) AS LongestConsecutiveSequence
FROM
    (
        SELECT
            num,
            @cur_num := CASE
                WHEN @prev_num = num - 1 THEN @cur_num + 1
                ELSE 1
            END AS LENGTH,
            @prev_num := num
        FROM
            Nums,
            (SELECT @prev_num := NULL, @cur_num := 0) AS vars
        ORDER BY
            num
    ) AS subquery
GROUP BY
    num - LENGTH + 1

Breakdown:

1. Subquery:

SELECT
    num,
    @cur_num := CASE
        WHEN @prev_num = num - 1 THEN @cur_num + 1
        ELSE 1
    END AS LENGTH,
    @prev_num := num
FROM
    Nums,
    (SELECT @prev_num := NULL, @cur_num := 0) AS vars
ORDER BY
    num
  • This subquery iterates through the Nums table and calculates the length of each consecutive sequence.

  • It uses user-defined variables @prev_num and @cur_num to track the previous number and the length of the current consecutive sequence.

  • If the current number is consecutive with the previous number, it increments the length by 1. Otherwise, it sets the length to 1.

  • It also updates the @prev_num variable with the current number for the next iteration.

2. Group By and Max:

GROUP BY
    num - LENGTH + 1
SELECT
    MAX(LENGTH) AS LongestConsecutiveSequence
  • The outer query groups the results of the subquery by num - LENGTH + 1 to get the starting point of each consecutive sequence.

  • It then calculates the maximum length of these sequences and assigns it to the LongestConsecutiveSequence column.

Real-World Application:

This query can be used in various real-world applications, such as:

  • Finding the longest consecutive winning streak in a sports league.

  • Analyzing inventory data to identify items that are consistently selling together.

  • Detecting gaps or missing values in a dataset.


First and Last Call On the Same Day

Problem: Find the first and last phone call received by each customer on the same day.

Input:

  • table1 (calls):

| customer_id | call_date | call_time |
|---|---|---|
| 1 | 2022-08-01 | 09:00:00 |
| 1 | 2022-08-01 | 12:00:00 |
| 2 | 2022-08-02 | 10:00:00 |
| 2 | 2022-08-02 | 15:00:00 |
| 3 | 2022-08-03 | 11:00:00 |
| 3 | 2022-08-04 | 13:00:00 |

Output:

customer_id
first_call
last_call

1

09:00:00

12:00:00

2

10:00:00

15:00:00

3

11:00:00

13:00:00

Solution:

WITH RankedCalls AS (
  SELECT
    customer_id,
    call_date,
    call_time,
    ROW_NUMBER() OVER (PARTITION BY customer_id, call_date ORDER BY call_time) AS row_num
  FROM calls
)
SELECT
  customer_id,
  MIN(CASE WHEN row_num = 1 THEN call_time END) AS first_call,
  MAX(CASE WHEN row_num = 1 THEN call_time END) AS last_call
FROM RankedCalls
GROUP BY
  customer_id,
  call_date
ORDER BY
  customer_id;

Breakdown:

  1. RankedCalls Subquery: Creates a ranked table that assigns a row number to each call for each customer by call date.

  2. MIN and MAX Aggregation: Calculates the first and last call times for each customer on each call date using the MIN and MAX functions.

  3. GROUP BY and ORDER BY: Groups the results by customer ID and call date, then sorts them by customer ID for the final output.

Real-World Application:

  • Tracking customer call history and identifying patterns in call frequency.

  • Analyzing call center performance by tracking the time range of calls.

  • Enhancing customer service by providing information about the first and last contact points on a specific day.


Queries Quality and Percentage

Problem Statement

Given a table of queries and their corresponding execution times, find the top k queries that have the highest average execution time and also calculate the percentage of total execution time contributed by these queries.

SQL Query

WITH RankedQueries AS (
    SELECT query, AVG(execution_time) AS avg_execution_time,
    RANK() OVER (ORDER BY AVG(execution_time) DESC) AS rank
    FROM Queries
    GROUP BY query
), TopQueries AS (
    SELECT query, avg_execution_time
    FROM RankedQueries
    WHERE rank <= k
)
SELECT tq.query, tq.avg_execution_time,
(SUM(tq.avg_execution_time) / SUM(q.execution_time)) * 100 AS percentage
FROM TopQueries tq
JOIN Queries q ON tq.query = q.query
GROUP BY tq.query, tq.avg_execution_time
ORDER BY tq.avg_execution_time DESC;

Explanation

Step 1: Create a RankedQueries Table

  • The subquery RankedQueries groups the queries by their name and calculates the average execution time for each group.

  • It then ranks the queries in descending order of average execution time.

Step 2: Create a TopQueries Table

  • The subquery TopQueries selects the queries with a rank less than or equal to k.

  • These queries are the top k queries with the highest average execution time.

Step 3: Calculate the Percentage

  • The main query joins TopQueries with the original Queries table to get the average execution times for the top queries.

  • It calculates the percentage of total execution time contributed by these top queries by dividing the sum of their average execution times by the total sum of execution times for all queries and multiplying by 100.

Real-World Applications

This query can be useful in optimizing database performance by identifying the queries that are consuming the most resources. By understanding this information, database administrators can take steps to improve the performance of the database, such as optimizing the queries or creating indexes.


Product's Price for Each Store

Problem:

Given two tables:

  • Products (id, name, price)

  • Stores (id, name, city)

Find the price of each product in each store.

Solution:

-- Join the Products and Stores tables on the id column.
SELECT
  p.name AS product_name,
  s.name AS store_name,
  p.price AS product_price
FROM
  Products AS p
JOIN
  Stores AS s
ON
  p.id = s.id;

Explanation:

  1. Join the Tables: The JOIN clause combines the rows from the Products and Stores tables based on the common column id. This creates a new table that contains all the columns from both tables.

  2. Select the Columns: The SELECT clause specifies the columns that you want to retrieve from the joined table. In this case, we want the product name (product_name), store name (store_name), and product price (product_price).

  3. Alias the Columns: The AS keyword is used to alias the column names. This makes it easier to refer to the columns in the output.

Example:

Consider the following tables:

Products:
+----+--------+-------+
| id | name    | price |
+----+--------+-------+
| 1  | Apple   | 10    |
| 2  | Banana  | 5     |
| 3  | Orange  | 7     |
+----+--------+-------+

Stores:
+----+--------+-------+
| id | name    | city   |
+----+--------+-------+
| 1  | Walmart | Atlanta |
| 2  | Target  | Chicago |
| 3  | Kroger  | Dallas |
+----+--------+-------+

Running the SQL query:

SELECT
  p.name AS product_name,
  s.name AS store_name,
  p.price AS product_price
FROM
  Products AS p
JOIN
  Stores AS s
ON
  p.id = s.id;

will produce the following output:

+-------------+-----------+--------------+
| product_name | store_name | product_price |
+-------------+-----------+--------------+
| Apple        | Walmart    | 10           |
| Apple        | Target     | 10           |
| Apple        | Kroger     | 10           |
| Banana       | Walmart    | 5            |
| Banana       | Target     | 5            |
| Banana       | Kroger     | 5            |
| Orange       | Walmart    | 7            |
| Orange       | Target     | 7            |
| Orange       | Kroger     | 7            |
+-------------+-----------+--------------+

This output shows the price of each product in each store.

Applications:

This query can be used in various real-world applications, such as:

  • Displaying product prices on an e-commerce website

  • Comparing product prices across different stores

  • Managing inventory and pricing for a retail business


Strong Friendship

Problem Statement

Find all pairs of friends who have at least three mutual friends.

Table Schema

CREATE TABLE Friends (
  id1 INT NOT NULL,
  id2 INT NOT NULL,
  PRIMARY KEY (id1, id2),
  FOREIGN KEY (id1) REFERENCES Users (id),
  FOREIGN KEY (id2) REFERENCES Users (id)
);

Optimal Solution

SELECT f1.id1, f1.id2
FROM Friends f1
JOIN Friends f2 ON f1.id1 = f2.id2 AND f1.id2 = f2.id1
JOIN Friends f3 ON f1.id1 = f3.id2 AND f2.id2 = f3.id1
WHERE f1.id1 < f1.id2;

Explanation

This query uses self-joins on the Friends table to find pairs of friends who have at least three mutual friends.

  • The first join (f1 JOIN f2) finds pairs of friends who are friends with each other.

  • The second join (f2 JOIN f3) finds pairs of mutual friends who are friends with both of the friends from the first join.

  • The WHERE clause filters out any rows where the first friend's ID is greater than or equal to the second friend's ID to ensure that each pair is listed only once.

Example

Consider the following Friends table:

| id1 | id2 |
|---|---|
| 1   | 2   |
| 1   | 3   |
| 2   | 3   |
| 2   | 4   |
| 3   | 4   |
| 3   | 5   |
| 4   | 5   |

The query will return the following result:

| id1 | id2 |
|---|---|
| 1   | 3   |
| 1   | 4   |
| 2   | 3   |
| 2   | 4   |

Real-World Applications

This query can be used in a variety of real-world applications, such as:

  • Identifying social groups within a population

  • Recommending friends to users on social media platforms

  • Identifying potential collaborators for research projects


List the Products Ordered in a Period

Problem Statement:

Write a SQL query to list all products ordered in a specific period.

Table Schema:

orders (
  order_id int,
  product_id int,
  order_date date
)

Query:

SELECT DISTINCT product_id
FROM orders
WHERE order_date BETWEEN '2022-01-01' AND '2022-12-31';

Breakdown:

  • The SELECT DISTINCT statement retrieves the unique product_id values.

  • The FROM clause specifies the orders table as the data source.

  • The WHERE clause filters the rows where the order_date column is between the specified period, '2022-01-01' and '2022-12-31'.

Real-World Application:

This query can be used by businesses to identify which products were ordered the most in a specific time frame. This information can be useful for:

  • Sales analysis: Understanding which products are popular during different seasons or periods.

  • Inventory management: Determining which products need to be restocked based on demand.

  • Marketing campaigns: Identifying products that can be promoted based on their recent sales performance.


The Number of Employees Which Report to Each Employee

Problem Statement

Given a table Employees with columns id and managerId, where managerId represents the ID of the employee's manager, find the number of employees who report to each employee.

Table Schema

CREATE TABLE Employees (
  id INT PRIMARY KEY,
  managerId INT REFERENCES Employees(id),
  FOREIGN KEY (managerId) REFERENCES Employees(id)
);

Example Data

| Id | ManagerId |
|---|---|
| 1  | NULL      |
| 2  | 1        |
| 3  | 1        |
| 4  | 2        |
| 5  | 2        |

Expected Output

| Id | NumEmployees |
|---|---|
| 1  | 2           |
| 2  | 2           |

Solution

1. Recursive Common Table Expression (CTE)

WITH RECURSIVE EmployeeHierarchy AS (
  SELECT
    e1.id,
    e1.managerId,
    COUNT(*) OVER (PARTITION BY e1.managerId ORDER BY e1.id) AS NumEmployees
  FROM Employees e1
  JOIN EmployeeHierarchy e2
    ON e1.managerId = e2.id
  WHERE
    e1.managerId IS NULL
)
SELECT
  id,
  NumEmployees
FROM EmployeeHierarchy
WHERE
  managerId IS NOT NULL;

Explanation

  • The CTE EmployeeHierarchy recursively finds the number of employees under each manager by traversing the employee tree starting from the root node (manager with no manager).

  • The PARTITION BY clause ensures that we count employees for each manager separately.

  • The WHERE clause in line 13 ensures that we only select employees who have a manager (i.e., not the root node).

2. Join and Aggregation

SELECT
  em1.id,
  COUNT(DISTINCT em2.id) AS NumEmployees
FROM Employees em1
LEFT JOIN Employees em2
  ON em1.id = em2.managerId
GROUP BY
  em1.id;

Explanation

  • This query uses a LEFT JOIN to connect managers (em1) with their employees (em2).

  • The COUNT(DISTINCT) function counts the number of unique employees who report to each manager.

  • The GROUP BY clause groups the results by manager ID.

Applications

  • Organizational Hierarchy: Determine the number of direct reports for each manager in an organizational hierarchy.

  • Performance Management: Identify managers with the highest and lowest number of reports for performance evaluation purposes.

  • Resource Allocation: Allocate resources (e.g., training, equipment) based on the number of employees under each manager.


Build the Equation

Problem Statement:

Given a table Sales with columns product_id, quantity, and price, calculate the total sales amount for each product by multiplying the quantity and price for each row.

SQL Query:

SELECT product_id, SUM(quantity * price) AS total_sales
FROM Sales
GROUP BY product_id;

Breakdown and Explanation:

  • Selecting Product ID and Total Sales Amount:

    SELECT product_id, SUM(quantity * price) AS total_sales

    This part of the query selects the product_id column and calculates the sum of quantity * price for each product, which represents the total sales amount. The SUM() function aggregates the product of quantity and price for each row.

  • Grouping by Product ID:

    GROUP BY product_id

    This part groups the results by the product_id column. It ensures that the total sales amount is calculated for each unique product.

Real-World Application:

This SQL query can be used in various real-world scenarios:

  • Sales Analysis: Businesses can use this query to analyze the total sales for different products.

  • Revenue Forecasting: By tracking total sales over time, businesses can forecast future revenue.

  • Inventory Management: The total sales amount can help determine which products are selling more and need to be restocked.

  • Customer Segmentation: Businesses can use total sales to segment customers based on their purchase patterns.

Example:

Consider the following Sales table:

product_id
quantity
price

1

10

5

1

20

6

2

15

4

2

25

7

Result:

product_id
total_sales

1

260

2

275

This result shows the total sales amount for each product, which is 260 for product ID 1 and 275 for product ID 2.


Rising Temperature

Problem Statement: Given a table Temperature that records the temperature of a region at different time intervals, find the regions with the highest and lowest average temperatures.

Table Schema:

CREATE TABLE Temperature (
    region VARCHAR(255) NOT NULL,
    time TIMESTAMP NOT NULL,
    temperature FLOAT NOT NULL,
    PRIMARY KEY (region, time)
);

Solution:

WITH RegionAverageTemperatures AS (
    SELECT region, AVG(temperature) AS avg_temperature
    FROM Temperature
    GROUP BY region
)
SELECT region, avg_temperature
FROM RegionAverageTemperatures
ORDER BY avg_temperature DESC
LIMIT 1;

SELECT region, avg_temperature
FROM RegionAverageTemperatures
ORDER BY avg_temperature ASC
LIMIT 1;

Explanation:

  1. Create a Common Table Expression (CTE) called RegionAverageTemperatures:

    • Calculate the average temperature for each region using the AVG() function and group the results by region.

  2. Find the Region with the Highest Average Temperature:

    • Select the region and average temperature from the CTE, order the results in descending order by average temperature, and limit the results to 1 row using LIMIT 1.

  3. Find the Region with the Lowest Average Temperature:

    • Same as above, but order the results in ascending order by average temperature.

Simplified Explanation:

  1. We create a temporary table that calculates the average temperature for each region.

  2. We find the regions with the highest and lowest average temperatures by sorting the average temperatures in descending and ascending order, respectively.

  3. We only display the top 1 result for each.

Real-World Applications:

  • Climate analysis: Identifying regions with extreme temperature variations.

  • Weather forecasting: Predicting future temperature trends based on historical data.

  • Climate change modeling: Studying the impact of rising global temperatures on specific regions.


Top Travellers

Problem:

Find the top X most frequent travelers in a database of travel records.

SQL Query:

SELECT traveler_id, COUNT(*) AS travel_count
FROM TravelRecords
GROUP BY traveler_id
ORDER BY travel_count DESC
LIMIT X;

Breakdown:

  1. SELECT traveler_id, COUNT(*) AS travel_count:

    • Selects the unique traveler ID and the count of travel records for each traveler.

  2. FROM TravelRecords:

    • Specifies the table containing the travel records.

  3. GROUP BY traveler_id:

    • Groups the records by traveler ID, so that each traveler's travel count can be aggregated.

  4. ORDER BY travel_count DESC:

    • Orders the results in descending order of travel count.

  5. LIMIT X:

    • Specifies the maximum number of travelers to return (top X).

Real-World Application:

This query can be used by travel companies to identify their most frequent travelers, who can then be targeted with exclusive offers, loyalty programs, or personalized experiences.

Example:

Consider the following TravelRecords table:

traveler_id
travel_date

1

2023-01-01

1

2023-01-15

2

2023-02-01

3

2023-03-01

3

2023-03-15

4

2023-04-01

Running the query with X = 2 would return:

traveler_id
travel_count

1

2

3

2


Employees Whose Manager Left the Company

Problem:

Find the employees whose manager has left the company.

SQL Solution:

SELECT DISTINCT E.employee_id, E.employee_name
FROM Employees E
LEFT JOIN Employees M ON E.manager_id = M.employee_id
WHERE M.employee_id IS NULL;

Explanation:

  • SELECT DISTINCT E.employee_id, E.employee_name: Selects the distinct employee IDs and names of employees (E).

  • FROM Employees E: Specifies the Employees table as the source table for selecting the employees.

  • LEFT JOIN Employees M ON E.manager_id = M.employee_id: Performs a left join between the Employees table (E) and itself (M) on the condition that the employee's manager ID (E.manager_id) matches the manager's employee ID (M.employee_id).

  • WHERE M.employee_id IS NULL: Filters the results to include only those employees whose managers have a NULL employee ID. This indicates that the manager has left the company because employees without a manager will have a NULL value in their manager_id field.

Example:

Consider the following Employees table:

employee_id
employee_name
manager_id

1

John Smith

2

2

Mary Jones

NULL

3

Peter Parker

4

4

Tony Stark

NULL

Using the above SQL query, we get the result:

employee_id
employee_name

1

John Smith

3

Peter Parker

Explanation:

  • John Smith's manager (employee ID 2) is NULL, indicating that his manager has left the company.

  • Peter Parker's manager (employee ID 4) is also NULL, indicating that his manager has left the company.

  • Mary Jones' manager is not NULL (employee ID 2), so she is not included in the result.

Real-World Applications:

This query can be useful in various scenarios, such as:

  • Identifying employees who may need additional support or guidance due to the absence of their managers.

  • Ensuring that tasks and responsibilities are reassigned or redistributed effectively to maintain operational efficiency.

  • Tracking changes in the organizational structure and identifying potential management gaps or areas for restructuring.


Employees With Deductions

SELECT E.name, E.salary, SUM(D.amount) AS total_deductions
FROM Employees E
JOIN Deductions D ON E.id = D.employee_id
GROUP BY E.name, E.salary
ORDER BY total_deductions DESC;

This query retrieves the name and salary of employees along with the total amount of deductions applied to their paychecks.

Breakdown:

  1. JOIN Operation: The query uses an INNER JOIN between the Employees (E) and Deductions (D) tables to match employees with their respective deductions. The E.id column from the Employees table is linked to the D.employee_id column from the Deductions table. This ensures that only deductions belonging to employees in the Employees table are retrieved.

  2. SUM Aggregation: For each employee, the SUM() aggregate function is applied to the D.amount column to calculate the total amount of deductions. This value is aliased as total_deductions.

  3. GROUP BY Clause: The results are grouped by the E.name and E.salary columns, meaning that all deductions for each employee are aggregated together.

  4. ORDER BY Clause: Finally, the results are ordered in descending order based on the total_deductions column, displaying employees with the highest total deductions at the top of the list.

Real-World Applications:

This query can be useful in various real-world scenarios:

  • Payroll Processing: Companies can use this query to determine the total deductions for each employee, which is necessary for calculating net pay.

  • Budgeting and Financial Planning: Employees can use this query to assess their total deductions and adjust their budgets or financial plans accordingly.

  • Employee Performance Analysis: HR departments can use this query to identify employees with higher or lower deduction amounts, which may indicate areas for improvement in compensation and benefits packages.


Employees Project Allocation

Problem Statement

You have a table called Employees with the following columns:

  • emp_id (int)

  • name (string)

And a table called Projects with the following columns:

  • proj_id (int)

  • name (string)

Each employee can be assigned to multiple projects, and each project can have multiple employees assigned to it. You want to write a query to list the names of employees and the projects they are assigned to.

Solution

The following query will list the names of employees and the projects they are assigned to:

SELECT e.name, p.name
FROM Employees e
JOIN EmployeeProjects ep ON e.emp_id = ep.emp_id
JOIN Projects p ON ep.proj_id = p.proj_id;

Breakdown of the Solution

The query uses a JOIN operation to combine the Employees and Projects tables on the common column emp_id. This creates a new table that contains all of the rows from both tables that have matching emp_id values.

The JOIN operation is followed by a SELECT statement that selects the name column from the Employees table and the name column from the Projects table.

Real-World Applications

This query could be used in a variety of real-world applications, such as:

  • Generating reports on employee productivity

  • Tracking employee assignments

  • Managing project staffing

Potential Applications

Here are some potential applications of this solution in the real world:

  • Human Resources: This query could be used to generate reports on employee productivity. For example, the query could be used to identify employees who are assigned to multiple projects and to track their progress on each project.

  • Project Management: This query could be used to track employee assignments. For example, the query could be used to identify which employees are assigned to a particular project and to track their progress on the project.

  • Staffing: This query could be used to manage project staffing. For example, the query could be used to identify which employees are available to be assigned to a new project and to track their availability.


Triangle Judgement

Triangle Problem:

Imagine you have three sticks or line segments. You want to know if they can form a valid triangle.

SQL Solution:

SELECT CASE
    WHEN a + b > c AND a + c > b AND b + c > a
    THEN 'Valid Triangle'
    ELSE 'Invalid Triangle'
END AS Triangle_Validity
FROM Triangle_Info
WHERE a > 0 AND b > 0 AND c > 0;

Explanation:

  • Table Setup: We assume you have a table named Triangle_Info with three columns: a, b, and c, representing the lengths of the three sticks.

  • Query: The query first checks each combination of sticks if their sum is greater than the length of the other two sticks. If all combinations pass this check, then it's a valid triangle. Otherwise, it's invalid.

  • Output: The query returns the validity of the triangle as 'Valid Triangle' or 'Invalid Triangle'.

Real-World Applications:

  • Construction: Validating if materials have the right proportions to create a strong frame.

  • Engineering: Designing structures like bridges or buildings to ensure they can withstand forces.

  • Furniture Design: Determining if the dimensions of furniture pieces will provide proper support and stability.

Simplified Explanation:

Think of a triangle as a house. The sides (a, b, c) are like the beams supporting the roof. If any side is too short, the roof will collapse. If any side is too long, it will stick out and be unstable. But if all three sides are the right length, the roof will stand strong and you will have a valid triangle!


Accepted Candidates From the Interviews

Problem:

Given the tables candidates and interviews, find the names of candidates who passed the interviews.

Tables:

candidates (candidate_id, name)
interviews (candidate_id, interview_date, result)

Solution:

SELECT c.name
FROM candidates c
JOIN interviews i ON c.candidate_id = i.candidate_id
WHERE i.result = 'passed';

Explanation:

  • Join the candidates and interviews tables using the candidate_id column, which is the common column between the two tables.

  • Filter the joined table to only include candidates who passed the interviews, by checking for the condition i.result = 'passed'.

Real-World Application:

This query can be used in a real-world scenario to identify candidates who have passed the interview process for a particular job position. The results can be used to:

  • Send official job offers to the successful candidates.

  • Schedule onboarding for the new hires.

  • Track the progress of the hiring process and identify any potential bottlenecks.

Example:

Consider the following data in the candidates and interviews tables:

candidates
+-----------+--------+
| candidate_id | name   |
+-----------+--------+
| 1           | John   |
| 2           | Mary   |
| 3           | Bob    |
+-----------+--------+

interviews
+-----------+--------------+--------+
| candidate_id | interview_date | result |
+-----------+--------------+--------+
| 1           | 2023-01-01    | passed |
| 2           | 2023-01-02    | failed |
| 3           | 2023-01-03    | passed |
+-----------+--------------+--------+

Running the SQL query on this data will produce the following result:

+--------+
| name   |
+--------+
| John   |
| Bob    |
+--------+

This result shows that John and Bob passed their interviews and are eligible for job offers.


User Activity for the Past 30 Days I

Problem Statement:

Given a table of user activities, find the total number of activities for each user in the past 30 days.

SQL Query:

SELECT
  user_id,
  COUNT(*) AS total_activities
FROM user_activities
WHERE
  activity_date >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY
  user_id;

Breakdown:

  • SELECT: Selects two columns: user_id and total_activities.

  • FROM: Selects from the user_activities table.

  • WHERE: Filters the rows to include only those where activity_date is greater than or equal to 30 days ago.

  • GROUP BY: Groups the results by user_id so that we get a count for each user.

Example:

Consider the following table:

| user_id | activity_date |
|---|---|
| 1        | 2023-03-01    |
| 1        | 2023-03-05    |
| 2        | 2023-03-07    |
| 2        | 2023-03-10    |

The query will return the following result:

| user_id | total_activities |
|---|---|
| 1        | 2                 |
| 2        | 2                 |

Explanation:

  • The query first selects all rows from the user_activities table where the activity_date is within the past 30 days.

  • It then groups the results by user_id and counts the number of activities for each user.

  • The result is a table that shows the total number of activities for each user in the past 30 days.

Real-World Applications:

This query can be useful for:

  • Tracking user engagement: Monitoring the number of activities performed by users can help you understand how engaged they are with your application or website.

  • Identifying inactive users: You can use this query to identify users who have not been active in the past 30 days and target them with marketing campaigns.

  • Analyzing user behavior: By comparing the total number of activities for different users, you can identify patterns and trends in user behavior.


The Airport With the Most Traffic

Problem Statement:

Find the airport with the most total takeoffs and landings.

SQL Solution:

-- Find the airport with the most total takeoffs and landings
SELECT airport_id, 
       airport_name, 
       SUM(takeoffs + landings) AS total_operations
FROM airport_operations
GROUP BY airport_id, airport_name
ORDER BY total_operations DESC
LIMIT 1;

Explanation:

This query uses a combination of aggregation and grouping to count the total number of takeoffs and landings for each airport.

  1. Aggregation: The SUM() function is used to calculate the total number of takeoffs and landings for each airport. The expression (takeoffs + landings) adds the values of the takeoffs and landings columns for each row.

  2. Grouping: The GROUP BY clause groups the results by airport_id and airport_name. This means that all rows with the same airport_id and airport_name are grouped together.

  3. Ordering: The ORDER BY clause orders the results in descending order of total_operations. This means that the airports with the most total operations will be listed first.

  4. Limiting: The LIMIT 1 clause limits the results to the top 1 row. This means that only the airport with the most total operations will be returned.

Real-World Applications:

This query can be used in a variety of real-world applications, including:

  • Identifying the busiest airports in a region or country

  • Planning airport expansion projects

  • Forecasting air traffic demand

  • Analyzing trends in air travel

Example:

Suppose we have the following table of airport operations data:

airport_id
airport_name
takeoffs
landings

1

JFK

1000

1200

2

LAX

1500

1300

3

ORD

1200

1100

Running the above query on this data would return the following result:

airport_id
airport_name
total_operations

2

LAX

2800

This result shows that LAX is the airport with the most total takeoffs and landings (2800).


Restaurant Growth

Problem Statement

Given a table Restaurant, which contains the following columns:

  • rid: Restaurant ID

  • name: Restaurant name

  • city: Restaurant city

  • year: Year the restaurant was established

You are tasked to find the restaurants that have experienced the most growth in terms of the number of cities they operate in over a given period of time.

Solution

Step-by-Step Explanation

1. Find the Number of Cities for Each Restaurant in Each Year

We can achieve this by using a self-join:

SELECT r1.rid, r1.year AS year1, r2.year AS year2, COUNT(DISTINCT r2.city) AS city_count
FROM Restaurant r1
JOIN Restaurant r2 ON r1.rid = r2.rid
WHERE r1.year < r2.year
GROUP BY r1.rid, r1.year, r2.year;

This query counts the number of distinct cities each restaurant operates in for each pair of years (year1, year2) where year1 is less than year2.

2. Calculate the Growth for Each Restaurant

We can calculate the growth for each restaurant by finding the difference in the number of cities operated between year2 and year1:

SELECT rid, year1, year2, city_count AS growth
FROM (
    SELECT r1.rid, r1.year AS year1, r2.year AS year2, COUNT(DISTINCT r2.city) AS city_count
    FROM Restaurant r1
    JOIN Restaurant r2 ON r1.rid = r2.rid
    WHERE r1.year < r2.year
    GROUP BY r1.rid, r1.year, r2.year
) subquery
WHERE year2 - year1 = 1;

We filter the results to only include pairs of years that are one year apart.

3. Find Restaurants with the Most Growth

Finally, we can find the restaurants with the most growth by ranking them based on their growth:

SELECT rid, year1, year2, growth
FROM (
    SELECT rid, year1, year2, city_count AS growth
    FROM (
        SELECT r1.rid, r1.year AS year1, r2.year AS year2, COUNT(DISTINCT r2.city) AS city_count
        FROM Restaurant r1
        JOIN Restaurant r2 ON r1.rid = r2.rid
        WHERE r1.year < r2.year
        GROUP BY r1.rid, r1.year, r2.year
    ) subquery
    WHERE year2 - year1 = 1
) subquery
ORDER BY growth DESC;

Real-World Applications

This query can be used to identify restaurants that are rapidly expanding their geographical footprint. This information can be valuable for investors, real estate developers, and city planners who are interested in tracking the growth of the restaurant industry in specific areas.

Complete Code Example

The following is a complete code example in SQL:

WITH RestaurantGrowth AS (
    SELECT rid, year1, year2, COUNT(DISTINCT r2.city) AS city_count
    FROM Restaurant r1
    JOIN Restaurant r2 ON r1.rid = r2.rid
    WHERE r1.year < r2.year
    GROUP BY r1.rid, r1.year, r2.year
)
SELECT rid, year1, year2, city_count AS growth
FROM RestaurantGrowth
WHERE year2 - year1 = 1
ORDER BY growth DESC;

Convert Date Format

Problem: Convert Date Format

SQL Code:

SELECT DATE_FORMAT(input_date, '%Y-%m-%d') AS formatted_date
FROM table_name;

Breakdown:

  • DATE_FORMAT() Function: This function converts a date string into a specified format.

  • %Y-%m-%d: This is the format string that specifies the desired output format. It represents "Year-Month-Day."

Example Input and Output:

| input_date | formatted_date |
|---|---|
| '2023-04-05' | '2023-04-05' |
| '2022-12-25' | '2022-12-25' |
| '2021-07-14' | '2021-07-14' |

Explanation:

The DATE_FORMAT() function takes the input_date as an argument and converts it into the specified format. In this case, the format is "Year-Month-Day." This means that the output date string will be in the format YYYY-MM-DD.

Real-World Applications:

  • Format dates for display in user interfaces or reports.

  • Convert dates to a standard format for data exchange or storage.

  • Perform date-related calculations, such as finding the difference between two dates.


Tasks Count in the Weekend

Problem Statement

Given a table Tasks containing task information, including the date when each task was created (created_at) and its status, write a SQL query to count the number of tasks created during the weekend (Saturday and Sunday).

Table Schema

CREATE TABLE Tasks (
  id INT PRIMARY KEY,
  created_at TIMESTAMP NOT NULL,
  status VARCHAR(255) NOT NULL
);

Query

SELECT COUNT(*) AS weekend_task_count
FROM Tasks
WHERE created_at BETWEEN '2022-08-13' AND '2022-08-14';

Explanation

The query uses the BETWEEN operator to check if the created_at column falls between two dates, in this case, Saturday, August 13th, 2022 and Sunday, August 14th, 2022. The COUNT(*) function counts the number of rows that meet this condition, providing the count of tasks created during the weekend.

Real-World Applications

This query can be used in a project management system to track the number of tasks created during weekends, which can provide insights into project progress and potential workload issues. For example, if the number of weekend tasks is consistently high, it may indicate that the project timeline is too ambitious or that the team is understaffed.


Users With Two Purchases Within Seven Days

Problem Statement

Find users who have made two or more purchases within a seven-day period.

SQL Solution

SELECT DISTINCT user_id
FROM purchases
WHERE purchase_date >= DATE('now', '-7 days')
GROUP BY user_id
HAVING COUNT(*) >= 2;

Explanation

  • The WHERE clause filters out purchases made within the last seven days.

  • The GROUP BY clause groups the purchases by user ID.

  • The HAVING clause checks if each user has made at least two purchases.

Breakdown

  • DISTINCT: Ensures that each user is only counted once.

  • purchase_date >= DATE('now', '-7 days'): Selects purchases made within the last seven days.

  • COUNT(*): Counts the number of purchases for each user.

  • >= 2: Filters out users who have made less than two purchases.

Real-World Applications

  • Identifying active users for targeted marketing campaigns.

  • Analyzing customer buying behavior to improve sales strategies.

  • Detecting fraudulent purchases by identifying users who make multiple purchases in a short period.


Daily Leads and Partners

Problem Statement:

Given a table called leads that contains information about leads, including their lead_id and partner_id. The table also includes a column called daily_leads, which represents the number of leads generated by each partner on a specific day.

Write a SQL query to find the total number of daily leads generated by all partners on a specific day.

Breakdown:

  • Leads: A lead is a potential customer who has shown interest in a product or service.

  • Partner: A partner is a company or individual who collaborates with the business to generate leads.

  • Daily Leads: The number of leads generated by a partner on a specific day.

SQL Query:

SELECT SUM(daily_leads) AS total_daily_leads
FROM leads
WHERE date = '2023-03-08';

Explanation:

The query first filters the leads table to only include rows where the date column matches the specified date, which is '2023-03-08' in this example.

Then, it calculates the sum of the daily_leads column for all the filtered rows. This gives us the total number of daily leads generated by all partners on the specified date.

Real-World Applications:

This query can be used in a variety of real-world applications, such as:

  • Tracking Lead Generation Performance: Businesses can use this query to monitor the performance of their different partners in generating leads. By comparing the total daily leads generated by each partner, they can identify which partners are most effective and invest resources accordingly.

  • Optimizing Marketing Campaigns: Businesses can use this query to identify which days are most effective for lead generation. By analyzing the daily lead counts for different days of the week or month, they can plan their marketing campaigns accordingly.

  • Forecasting Lead Generation: Businesses can use this query to forecast future lead generation based on historical data. By analyzing the total daily leads generated over time, they can identify trends and predict future lead volume.


Top Percentile Fraud

Problem: Top Percentile Fraud

SQL Query:

WITH RankedTransactions AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY transaction_amount DESC) AS transaction_rank
  FROM Transactions
),
TopPercentile AS (
  SELECT
    user_id
  FROM RankedTransactions
  WHERE
    transaction_rank <= CEILING(0.99 * COUNT(user_id) OVER ())
)
SELECT
  user_id
FROM TopPercentile;

Explanation:

  1. Create a Ranked Transactions Table:

    • This subquery assigns a rank to each transaction for each user, in descending order of transaction amounts.

  2. Calculate the 99th Percentile:

    • This subquery uses the CEILING function to determine the 99th percentile rank for each user.

  3. Identify Top Percentile Users:

    • The main query uses the ranked transactions table and selects users whose transaction rank is within the top 99th percentile.

Steps in Detail:

  1. Window Function (ROW_NUMBER):

    • The ROW_NUMBER() function assigns a unique rank to each row within a specified partition. In this case, the partition is by user_id, and the rows are ranked by transaction_amount in descending order.

  2. 99th Percentile Calculation:

    • The CEILING function rounds up the result of dividing the total number of transactions by 100, multiplied by 99. This calculates the maximum rank that falls within the top 99th percentile.

  3. User Identification:

    • The main query uses the TopPercentile table to identify users whose user_id matches a rank within the top 99th percentile.

Real-World Applications:

  • Fraud Detection: Identifying users who exhibit unusually high transaction amounts, potentially indicating fraudulent activity.

  • Customer Segmentation: Classifying customers into different tiers based on their transaction activity, enabling targeted marketing campaigns.

  • Risk Management: Assessing the risk associated with individual users based on their transaction history.


Page Recommendations II

Problem:

Find all the web pages that are not visited by any users.

SQL Query:

SELECT
  p.url
FROM pages AS p
LEFT JOIN visits AS v
  ON p.id = v.page_id
WHERE
  v.page_id IS NULL;

Breakdown:

  1. SELECT p.url: This line selects the URL of the pages table (p).

  2. FROM pages AS p: This line specifies that we are selecting from the "pages" table, and we are aliasing it as "p".

  3. LEFT JOIN visits AS v ON p.id = v.page_id: This line performs a LEFT JOIN between the "pages" table and the "visits" table (v) based on the id column of the "pages" table and the page_id column of the "visits" table. A LEFT JOIN will return all rows from the left table (in this case, "pages"), even if there are no matching rows in the right table ("visits").

  4. WHERE v.page_id IS NULL: This line filters the results to only include rows where the page_id column in the "visits" table is NULL. This means that these are pages that have not been visited by any users.

Real World Example:

This query can be used by website administrators to identify web pages that are not getting any traffic. This information can be used to make decisions about which pages to remove or update.

Potential Applications:

  • Identifying underperforming web pages for SEO optimization.

  • Removing unused pages to improve website performance.

  • Finding orphaned pages that can be removed to avoid security vulnerabilities.


Project Employees II

Problem:

You are given a database table Projects with the following columns:

  • project_id (int)

  • project_name (string)

  • num_employees (int)

And another table Employees with the following columns:

  • employee_id (int)

  • project_id (int)

  • employee_name (string)

You need to write a SQL query to find all projects that have more employees than the average number of employees across all projects.

Solution:

SELECT
  Projects.project_id,
  Projects.project_name,
  Projects.num_employees
FROM Projects
JOIN (
  SELECT
    AVG(num_employees) AS avg_num_employees
  FROM Projects
) AS AverageEmployees
ON Projects.num_employees > AverageEmployees.avg_num_employees;

Breakdown:

  1. Calculate the average number of employees: We calculate the average number of employees across all projects using a subquery:

    SELECT AVG(num_employees) AS avg_num_employees FROM Projects
  2. Join the Projects table with the AverageEmployees subquery: We join the Projects table with the AverageEmployees subquery to compare the number of employees in each project with the average number of employees. The ON clause specifies that we only want to include projects where the number of employees is greater than the average.

    JOIN (
      SELECT AVG(num_employees) AS avg_num_employees
      FROM Projects
    ) AS AverageEmployees
    ON Projects.num_employees > AverageEmployees.avg_num_employees
  3. Select the desired columns: We select the project_id, project_name, and num_employees columns from the Projects table for the resulting rows.

    SELECT
      Projects.project_id,
      Projects.project_name,
      Projects.num_employees

Real-World Applications:

This query can be used in various real-world scenarios, such as:

  • Identifying projects that are overstaffed or understaffed.

  • Planning staffing levels for new or ongoing projects.

  • Analyzing the efficiency of project teams based on the number of employees assigned.


Sales Person

Problem Statement

Given a table containing sales data, implement a query to find the top salespersons for each month.

SQL Implementation

WITH MonthlySales AS (
    SELECT salesperson_id, MONTH(sale_date) AS sale_month, SUM(sales_amount) AS total_sales
    FROM sales_data
    GROUP BY salesperson_id, sale_month
)
SELECT salesperson_id, sale_month, total_sales
FROM MonthlySales
ORDER BY sale_month, total_sales DESC
LIMIT 10;

Breakdown and Explanation

  • MonthlySales View:

    • This view calculates the monthly sales for each salesperson.

    • MONTH(sale_date) extracts the month part from the sale_date column.

    • SUM(sales_amount) calculates the total sales for each salesperson-month combination.

  • Main Query:

    • The main query retrieves the top 10 salespersons for each month.

    • It orders the results by sale_month and total_sales in descending order.

    • LIMIT 10 shows only the top 10 results.

Real-World Applications

  • Tracking sales performance over time.

  • Identifying underperforming and overperforming salespersons.

  • Making informed decisions on sales strategies and incentives.

Example

Consider the following sales data:

salesperson_id
sale_date
sales_amount

1

2023-01-01

100

2

2023-01-05

150

1

2023-02-10

200

3

2023-02-15

250

1

2023-02-20

300

The result of the query would be:

salesperson_id
sale_month
total_sales

1

01

250

2

01

150

3

02

250

1

02

500


Find the Team Size

Problem: Find the Team Size

SQL Query:

WITH TeamSizes AS (
  SELECT team_id, COUNT(*) AS team_size
  FROM team_members
  GROUP BY team_id
)
SELECT team_id, team_size
FROM TeamSizes
ORDER BY team_size DESC;

Breakdown and Explanation:

  1. Common Table Expression (CTE): TeamSizes

    • This CTE calculates the team size for each team.

    • It uses the COUNT() function to count the number of rows for each team_id in the team_members table.

    • The result is stored in the new TeamSizes table, with columns team_id and team_size.

  2. Main Query

    • The main query selects team_id and team_size from the TeamSizes CTE.

    • It sorts the results in descending order of team_size, displaying the teams with the largest sizes first.

Real-World Example:

This query can be used to find the teams with the largest number of members in a company or organization. It can help with:

  • Identifying teams that need more resources or support.

  • Optimizing team structure for efficiency and collaboration.

  • Tracking team growth over time.

Code Implementation:

-- Create the `team_members` table
CREATE TABLE team_members (
  member_id INT NOT NULL,
  team_id INT NOT NULL
);

-- Insert data into the `team_members` table
INSERT INTO team_members (member_id, team_id) VALUES
(1, 1),
(2, 1),
(3, 2),
(4, 2),
(5, 3),
(6, 3),
(7, 4);

-- Execute the query to find the team sizes
SELECT team_id, team_size
FROM TeamSizes
ORDER BY team_size DESC;

Output:

team_id | team_size
------- | --------
2       | 2
3       | 2
1       | 2
4       | 1

Capital Gain/Loss

Problem:

Capital Gain/Loss

Given a table Transactions with the following columns:

  • transaction_id (primary key)

  • stock_symbol

  • quantity

  • price_per_share

  • transaction_date

Calculate the capital gain or loss for each transaction.

Solution:

SELECT
  transaction_id,
  stock_symbol,
  quantity,
  price_per_share,
  transaction_date,
  (
    (price_per_share - previous_price_per_share) * quantity
  ) AS capital_gain_loss
FROM (
  SELECT
    t.transaction_id,
    t.stock_symbol,
    t.quantity,
    t.price_per_share,
    t.transaction_date,
    (
      SELECT
        price_per_share
      FROM Transactions
      WHERE
        stock_symbol = t.stock_symbol AND transaction_date < t.transaction_date
      ORDER BY
        transaction_date DESC
      LIMIT 1
    ) AS previous_price_per_share
  FROM Transactions AS t
) AS derived_table;

Explanation:

  • The outer query selects all columns from the subquery, including the calculated capital_gain_loss column.

  • The subquery calculates the previous_price_per_share for each transaction using a correlated subquery. This subquery finds the most recent transaction for the same stock symbol before the current transaction date and returns its price per share.

  • The capital_gain_loss column is then calculated by multiplying the quantity by the difference between the current price per share and the previous price per share.

Example:

| transaction_id | stock_symbol | quantity | price_per_share | transaction_date | capital_gain_loss |
|---|---|---|---|---|---|
| 1 | AAPL | 100 | 100.00 | 2023-01-01 | NULL |
| 2 | AAPL | 50 | 120.00 | 2023-01-02 | 1000.00 |
| 3 | MSFT | 200 | 50.00 | 2023-01-03 | NULL |
| 4 | MSFT | 150 | 45.00 | 2023-01-04 | -750.00 |

Explanation:

  • Transaction 1 does not have a previous transaction, so its capital gain/loss is NULL.

  • Transaction 2 has a previous transaction (Transaction 1) with a price per share of 100.00. The capital gain/loss is (120.00 - 100.00) * 50 = 1000.00.

  • Transaction 3 does not have a previous transaction, so its capital gain/loss is NULL.

  • Transaction 4 has a previous transaction (Transaction 3) with a price per share of 50.00. The capital gain/loss is (45.00 - 50.00) * 150 = -750.00.

Potential Applications:

Calculating capital gains and losses is important for tax reporting purposes. This query can be used to create a report that shows the capital gains and losses for a given period of time. This information can then be used to determine the amount of taxes that need to be paid.


Market Analysis III

LeetCode Problem: Market Analysis III

SQL Query:

SELECT Market, SUM(Revenue) AS TotalRevenue
FROM RevenueTable
WHERE Market IN (
    SELECT Market
    FROM RevenueTable
    GROUP BY Market
    HAVING SUM(Revenue) > (
        SELECT SUM(Revenue)
        FROM RevenueTable
        WHERE Market = 'US'
    )
)
GROUP BY Market
ORDER BY TotalRevenue DESC;

Explanation:

1. Subquery:

  • The subquery (SELECT Market FROM RevenueTable GROUP BY Market HAVING SUM(Revenue) > (SELECT SUM(Revenue) FROM RevenueTable WHERE Market = 'US')) identifies all markets with total revenue greater than the total revenue in the US market.

2. Main Query:

  • The main query selects the markets from the subquery and then groups them by market to calculate the total revenue for each market.

  • The results are sorted in descending order of total revenue.

Example:

Consider the following RevenueTable:

Market
Revenue

US

100,000

UK

75,000

Canada

60,000

France

55,000

Germany

70,000

Results:

Market  TotalRevenue
UK      75,000
Germany  70,000

Real-World Applications:

  • Market Analysis: Identifying markets with high revenue potential and comparing them to a benchmark market (e.g., US market).

  • Sales Optimization: Prioritizing sales efforts on markets with the highest revenue growth.

  • Competitive Analysis: Monitoring market share and identifying competitive threats from other markets.


Monthly Transactions I

Topic or Step 1: Understanding the Question

Question: Find the total number of transactions for each month.

Breakdown:

  • Transaction: An activity involving the exchange of money or goods and services for a particular sum of money.

  • Month: A period of about 30 or 31 days.

To answer the question, you need to count the number of transactions for each month.

Topic or Step 2: SQL Solution

SELECT
  strftime('%Y-%m', Date) AS Month,  -- Extract the year and month from the Date column
  COUNT(*) AS TotalTransactions     -- Count the number of rows for each unique Month
FROM
  Transactions                      -- The table containing the transaction records
GROUP BY
  Month                            -- Group the transactions by Month
ORDER BY
  Month;                          -- Order the results by Month

Breakdown:

  • The strftime() function extracts the year and month from the Date column, formatting it as 'YYYY-MM'.

  • The COUNT(*) function counts the number of rows for each unique Month.

  • The GROUP BY clause groups the transactions by Month to count the transactions separately for each month.

  • The ORDER BY clause orders the results by Month in ascending order.

Simplified Explanation:

Imagine a table with a list of transactions, each having a Date column. To get the total transactions for each month, we first extract the year and month from the Date column. Then, we count the number of transactions for each unique year-month combination. Finally, we sort the results by month to make them easy to read.

Topic or Step 3: Real-World Applications

This query can be used in many real-world applications, such as:

  • Financial analysis: Tracking the number of transactions per month to monitor business trends and identify any seasonal patterns.

  • Customer relationship management: Identifying months with the highest transaction volumes to target customers with personalized offers.

  • Fraud detection: Analyzing transaction patterns to detect suspicious activity or identify potential fraud.


Find Cumulative Salary of an Employee

Problem Statement:

Given an employee table with columns (emp_id, salary, start_date, and end_date), find the cumulative salary of each employee for the specified date range.

Optimal Solution:

SELECT emp_id,
       SUM(salary) OVER (PARTITION BY emp_id ORDER BY start_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_salary
FROM employee
WHERE start_date <= @end_date AND end_date >= @start_date;

Explanation:

  1. Window Function: The OVER clause creates a window of rows for each employee, starting from the specified start date and going up to the current row.

  2. SUM Aggregate Function: Within each window, the SUM aggregate function calculates the cumulative salary by summing up the salary values.

  3. PARTITION BY Clause: The PARTITION BY clause groups the rows by employee ID, ensuring that the cumulative salary is calculated separately for each employee.

  4. ORDER BY Clause: The ORDER BY clause sorts the rows by start date in ascending order. This order is necessary for the window function to correctly calculate the cumulative salary.

  5. ROWS BETWEEN Syntax: The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW specifies that the window extends infinitely in the past (i.e., from the start of the employee's employment) up to the current row.

Real-World Applications:

This query can be used in HR or payroll systems to generate reports on employee salaries, bonuses, or other compensation-related information. For example, it can provide insights into an employee's salary progression over time or help determine total compensation for a given period.


All the Matches of the League

Problem: Find all the matches of a league.

Solution:

SELECT
  *
FROM MATCHES
WHERE
  LEAGUE_ID = ?;

Explanation:

The query is straightforward. It selects all the columns from the MATCHES table where the LEAGUE_ID is equal to the provided parameter.

Real World Applications:

This query can be used to find all the matches of a particular league, such as the Premier League or the Bundesliga. It can also be used to find all the matches that have been played in a particular stadium or city.

Example:

The following query finds all the matches that have been played in the Premier League:

SELECT
  *
FROM MATCHES
WHERE
  LEAGUE_ID = 1;

Sort the Olympic Table

Problem Statement:

You are given an Olympic table with columns:

  • Country (string): Name of the country

  • Gold (integer): Number of gold medals won

  • Silver (integer): Number of silver medals won

  • Bronze (integer): Number of bronze medals won

  • Total (integer): Total number of medals won (Gold + Silver + Bronze)

Sort the Olympic table to show countries with the most medals (Total) first. If two or more countries have the same number of medals, sort them by the most gold medals won, then by the most silver medals won, and finally by the most bronze medals won.

Best and Performant SQL Solution:

SELECT Country,
       Gold,
       Silver,
       Bronze,
       Total
FROM OlympicTable
ORDER BY Total DESC,
         Gold DESC,
         Silver DESC,
         Bronze DESC;

Breakdown and Explanation:

The query uses the ORDER BY clause to sort the results by multiple columns:

  1. Total DESC: Sort the countries in descending order by the total number of medals won.

  2. Gold DESC: If two or more countries have the same total medals, sort them in descending order by the number of gold medals won.

  3. Silver DESC: If two or more countries have the same total gold medals, sort them in descending order by the number of silver medals won.

  4. Bronze DESC: If two or more countries have the same total silver medals, sort them in descending order by the number of bronze medals won.

Real-World Applications:

This query can be used to display the rankings of countries in the Olympics or any other sporting event where medals are awarded. It can also be used for data analysis to identify the countries with the most successful Olympic programs or the most medals won in a particular sport.


All the Pairs With the Maximum Number of Common Followers

Problem Description:

You are given a table followers that records pairs of accounts that follow each other. You need to find all the pairs of accounts that have the maximum number of common followers.

Example:

Consider the following table:

followers (follower_id, followee_id)
+------------+-------------+
| follower_id | followee_id |
+------------+-------------+
| 1           | 2           |
| 1           | 3           |
| 1           | 4           |
| 2           | 3           |
| 2           | 4           |
| 3           | 4           |
+------------+-------------+

In this table, the pair (1, 4) has the maximum number of common followers (2).

Solution:

The solution to this problem involves the following steps:

  1. Find the number of followers for each account. This can be done using the following query:

    SELECT follower_id, followee_id, COUNT(*) AS follower_count
    FROM followers
    GROUP BY follower_id, followee_id;

    This query returns a table with the following columns:

    • follower_id: The ID of the follower.

    • followee_id: The ID of the followee.

    • follower_count: The number of followers that the follower has.

  2. Find the maximum number of followers. This can be done using the following query:

    SELECT MAX(follower_count) AS max_follower_count
    FROM followers
    GROUP BY follower_id, followee_id;

    This query returns a table with a single column called max_follower_count that contains the maximum number of followers that any follower has.

  3. Find all the pairs of accounts that have the maximum number of common followers. This can be done using the following query:

    SELECT follower_id, followee_id
    FROM followers
    WHERE follower_count = (
        SELECT MAX(follower_count)
        FROM followers
        GROUP BY follower_id, followee_id
    );

    This query returns a table with the following columns:

    • follower_id: The ID of the follower.

    • followee_id: The ID of the followee.

Real-World Application:

This problem can be used to find the most influential users on a social media platform. By finding the pairs of users that have the maximum number of common followers, we can identify the users who are most likely to reach a large audience. This information can be used to target marketing campaigns and to develop strategies for growing the user base.


Investments in 2016

Problem:

Given a table of investments made in 2016, find the total amount invested in each country.

Table:

CREATE TABLE investments (
  id INT AUTO_INCREMENT,
  country VARCHAR(255),
  amount INT
);
INSERT INTO investments (country, amount) VALUES
('USA', 1000),
('UK', 500),
('France', 700),
('Germany', 300),
('Spain', 200);

Best & Performant Solution:

SELECT country, SUM(amount) AS total_investment
FROM investments
GROUP BY country;

Explanation:

  • SELECT country, SUM(amount) AS total_investment: This part of the query selects the country column and calculates the sum of the amount column for each unique value in the country column. The result is a new column named total_investment which contains the total investment amount for each country.

  • FROM investments: This specifies the table from which the data should be retrieved.

  • GROUP BY country: This part of the query groups the results by the country column. This means that for each unique value in the country column, the SUM(amount) will be calculated separately.

Real-World Example:

This query can be useful for analyzing investment data and identifying the countries with the highest investment levels. This information can be used to make investment decisions or to understand investment trends.

Potential Applications:

  • Investment analysis

  • Financial planning

  • Economic development


Fix Names in a Table

Problem: There is a table called "Names" with the following schema:

CREATE TABLE Names (
  id INT NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(255) NOT NULL,
  last_name VARCHAR(255) NOT NULL,
  PRIMARY KEY (id)
);

Some of the first names and last names in the table are misspelled. Fix these misspellings by updating the table.

Solution:

-- Update first names
UPDATE Names
SET first_name = CASE
  WHEN first_name = 'Johen' THEN 'Johan'
  WHEN first_name = 'Kris' THEN 'Chris'
  WHEN first_name = 'Mihal' THEN 'Michael'
  ELSE first_name
END;

-- Update last names
UPDATE Names
SET last_name = CASE
  WHEN last_name = 'Smuth' THEN 'Smith'
  WHEN last_name = 'Jonhson' THEN 'Johnson'
  WHEN last_name = 'Davids' THEN 'Davis'
  ELSE last_name
END;

Explanation:

The first UPDATE statement uses a CASE expression to check for misspelled first names. If a first name matches one of the misspellings, it is updated to the correct spelling. Otherwise, the first name is left unchanged.

The second UPDATE statement does the same for last names.

Real World Application:

This technique can be used to fix misspellings in any table, regardless of the column names or data types. It is particularly useful for tables that contain a large number of records and it is not feasible to manually correct the misspellings.


League Statistics

Problem Statement:

Given a table LeagueStatistics with columns:

  • rank: Rank of the team (1 for the top team)

  • team_name: Unique name of the team

  • score: Score of the team in the season

You need to find the average score of the top 3 teams in the league.

Solution:

-- Find the average score of the top 3 teams in the league
SELECT AVG(score)
FROM LeagueStatistics
WHERE rank <= 3;

Explanation:

  1. We first filter the LeagueStatistics table to select only the top 3 teams by using the condition rank <= 3.

  2. Then, we calculate the average score of the selected teams using the AVG() function.

Example:

-- Input
CREATE TABLE LeagueStatistics (
    rank INT,
    team_name VARCHAR(255),
    score INT
);
INSERT INTO LeagueStatistics (rank, team_name, score) VALUES
(1, 'Liverpool', 95),
(2, 'Manchester City', 92),
(3, 'Chelsea', 88),
(4, 'Tottenham', 78),
(5, 'Arsenal', 75);

-- Query
SELECT AVG(score)
FROM LeagueStatistics
WHERE rank <= 3;

-- Output
88.33333333333333

Real-World Applications:

This query can be useful for:

  • Determining the average performance of the top teams in a sports league

  • Analyzing team performance and identifying areas for improvement

  • Comparing the strength of different leagues or divisions

  • Providing insights for sports analysts and fans


Last Person to Fit in the Bus

Problem Statement:

Given a table of bus stops trips with columns stop_id, stop_sequence, and trip_id, find the last person who boarded a bus on a particular trip.

Example:

| stop_id | stop_sequence | trip_id |
|---------|---------------|---------|
| 1       | 1             | 1        |
| 2       | 2             | 1        |
| 3       | 3             | 1        |
| 4       | 4             | 1        |
| 5       | 1             | 2        |
| 6       | 2             | 2        |
| 7       | 3             | 2        |

For trip_id 1, the last person boarded at stop_id 4. For trip_id 2, the last person boarded at stop_id 7.

Best & Performant SQL Solution:

WITH LastBoarding AS (
    SELECT stop_id, trip_id, MAX(stop_sequence) AS last_sequence
    FROM trips
    GROUP BY stop_id, trip_id
)
SELECT stop_id
FROM trips
WHERE stop_id IN (SELECT stop_id FROM LastBoarding)
AND stop_sequence = (SELECT last_sequence FROM LastBoarding WHERE stop_id = trips.stop_id AND trip_id = trips.trip_id);

Explanation:

  1. Create a Subquery (LastBoarding): This subquery finds the stop_id and last_sequence for each unique combination of stop_id and trip_id. It groups the rows by stop_id and trip_id, and for each group, it calculates the maximum stop_sequence, representing the last boarding stop.

  2. Filter trips Table: The main query then filters the trips table to only include rows where the stop_id is present in the subquery LastBoarding. This ensures that only the stops where people boarded the bus are considered.

  3. Find Last Boarding Stop: Finally, the main query further filters the trips table to only include rows where the stop_sequence matches the last_sequence for the corresponding stop_id and trip_id pair from the LastBoarding subquery. This identifies the rows representing the last boarding stop for each trip.

Real-World Applications:

This query can be useful in the following scenarios:

  • Tracking Passenger Flow: Determining the last stop where passengers boarded a bus helps transportation authorities analyze passenger flow patterns and optimize bus routes.

  • Security Monitoring: Identifying the last boarding stop can assist in security investigations and tracing the movement of individuals.

  • Passenger Assistance: It can help bus drivers identify the stops where special assistance is needed, such as wheelchair accessibility or language translation.


Number of Comments per Post

Leet Code Problem:

Number of Comments per Post

Problem Statement:

Given a table called Posts with columns id, title, and num_comments, and a table called Comments with columns id, post_id, and content, find the number of comments for each post in the Posts table.

SQL Solution:

SELECT
  p.id AS post_id,
  COUNT(c.id) AS num_comments
FROM Posts AS p
LEFT JOIN Comments AS c
  ON p.id = c.post_id
GROUP BY
  p.id;

Breakdown and Explanation:

  • SELECT Column List: The SELECT clause specifies the columns we want to include in the result:

    • p.id AS post_id: The ID of the post.

    • COUNT(c.id) AS num_comments: The number of comments for the post.

  • FROM Clause: The FROM clause specifies the table(s) we're querying from:

    • Posts AS p: The Posts table aliased as p.

  • JOIN Clause: The LEFT JOIN clause joins the Posts table with the Comments table on the post_id column:

    • LEFT JOIN Comments AS c ON p.id = c.post_id: This ensures that we include all posts, even if they don't have any comments.

  • GROUP BY Clause: The GROUP BY clause groups the results by the post_id column:

    • GROUP BY p.id: This ensures that the number of comments is calculated for each unique post.

Real-World Application:

This query could be used in a social media or blog application to display the number of comments for each post. This information can be useful for users to quickly see how engaged a post is.


Customers With Strictly Increasing Purchases

Problem Statement: Find the customers who have made a strictly increasing sequence of purchases. A strictly increasing sequence means that each purchase amount is strictly greater than the previous purchase amount.

Optimal Solution in SQL:

WITH CustomerPurchases AS (
  SELECT
    customer_id,
    purchase_amount,
    RANK() OVER (PARTITION BY customer_id ORDER BY purchase_date) AS purchase_rank
  FROM Purchases
), CustomerPurchaseLag AS (
  SELECT
    customer_id,
    purchase_amount,
    purchase_rank,
    LAG(purchase_amount, 1, NULL) OVER (PARTITION BY customer_id ORDER BY purchase_date) AS previous_purchase_amount
  FROM CustomerPurchases
)
SELECT
  CustomerPurchaseLag.customer_id
FROM CustomerPurchaseLag
WHERE
  CustomerPurchaseLag.purchase_amount > CustomerPurchaseLag.previous_purchase_amount;

Explanation:

  1. CustomerPurchases: This CTE (Common Table Expression) creates a ranking for each purchase made by a customer, based on the purchase date. The RANK() function assigns a unique rank to each purchase within each customer's purchase history.

  2. CustomerPurchaseLag: This CTE creates another CTE that adds a column called previous_purchase_amount. This column stores the purchase amount of the previous purchase made by the customer. It uses the LAG() function to retrieve the previous purchase amount for each row.

  3. Final Query: The main query selects the unique customer IDs from the CustomerPurchaseLag CTE where the current purchase amount is greater than the previous purchase amount. This identifies customers who have made a strictly increasing sequence of purchases.

Example:

Given the following Purchases table:

| customer_id | purchase_amount | purchase_date |
|-------------|-----------------|---------------|
| 1           | 100             | 2023-01-01   |
| 1           | 150             | 2023-01-02   |
| 1           | 180             | 2023-01-03   |
| 1           | 200             | 2023-01-04   |
| 2           | 200             | 2023-02-01   |
| 2           | 150             | 2023-02-02   |

The output of the query would be:

| customer_id |
|-------------|
| 1           |

Customer 1 has made a strictly increasing sequence of purchases, while Customer 2 has not (their February 2nd purchase was lower than their February 1st purchase).

Real-World Applications:

This query can be used in loyalty programs to identify customers who have shown a consistent increase in their spending. By understanding these valuable customers, businesses can tailor their marketing and reward programs to encourage them to make even more purchases.


Product Sales Analysis III

Best & Performant SQL Solution for LeetCode Product Sales Analysis III

Problem Statement:

Given a database table Sales with columns product_id, date, units_sold, and revenue, find the top 5 products that generated the highest revenue in a given month.

SQL Solution:

-- Extract the top 5 products with highest revenue in a given month
WITH MonthlyRevenue AS (
    SELECT
        product_id,
        SUM(revenue) AS total_revenue,
        EXTRACT(MONTH FROM date) AS sales_month
    FROM
        Sales
    WHERE
        EXTRACT(MONTH FROM date) = 'Month of interest'
    GROUP BY
        product_id
),
RankedProducts AS (
    SELECT
        product_id,
        total_revenue,
        RANK() OVER (ORDER BY total_revenue DESC) AS rank
    FROM
        MonthlyRevenue
)
SELECT
    product_id,
    total_revenue
FROM
    RankedProducts
WHERE
    rank <= 5;

Explanation:

  1. Create a common table expression (CTE) called MonthlyRevenue:

    • Calculate the total revenue for each product in the given month.

  2. Create another CTE called RankedProducts:

    • Rank the products based on their total revenue in descending order.

  3. Select the product_id and total_revenue from RankedProducts:

    • Filter for products ranked within the top 5.

Example Data and Output:

Sales Table:

product_id
date
units_sold
revenue

1

2023-03-01

10

$100

2

2023-03-05

15

$150

1

2023-03-10

5

$50

3

2023-03-15

20

$200

2

2023-03-20

10

$100

4

2023-03-25

15

$150

3

2023-03-30

10

$100

Result for Month '03':

product_id
total_revenue

3

$300

1

$150

2

$250

Potential Applications:

  • Identifying top-selling products for inventory planning and sales forecasting.

  • Analyzing product performance and identifying areas for improvement.

  • Tracking revenue trends and comparing sales performance across different products and time periods.


Ads Performance

LeetCode Problem: Ads Performance

Problem Statement:

Given an AdsPerformance table containing data about ad campaigns, calculate the following metrics for each ad: date, impressions, clicks, revenue, and cost.

CREATE TABLE AdsPerformance (
  date DATE NOT NULL,
  impressions INT NOT NULL,
  clicks INT NOT NULL,
  revenue DECIMAL(10, 2) NOT NULL,
  cost DECIMAL(10, 2) NOT NULL
);

Best & Performant SQL Solution:

WITH AdMetrics AS (
  SELECT
    date,
    SUM(impressions) AS total_impressions,
    SUM(clicks) AS total_clicks,
    SUM(revenue) AS total_revenue,
    SUM(cost) AS total_cost
  FROM AdsPerformance
  GROUP BY date
)
SELECT
  date,
  total_impressions,
  total_clicks,
  total_revenue,
  total_cost
FROM AdMetrics
ORDER BY date;

Implementation and Explanation:

  1. Create a Common Table Expression (CTE) called AdMetrics:

    • This CTE calculates the sum of impressions, clicks, revenue, and cost for each unique date.

    • The GROUP BY date clause ensures that the results are aggregated by date.

  2. Select the columns from the AdMetrics CTE:

    • The final query selects the date, total_impressions, total_clicks, total_revenue, and total_cost columns from the AdMetrics CTE.

    • The ORDER BY date clause sorts the results by date in ascending order.

Real World Application:

Tracking Ad Campaign Performance:

This query can be used by marketing teams to track the performance of ad campaigns over time. By analyzing the metrics for each ad, such as impressions, clicks, revenue, and cost, they can identify which campaigns are most effective and make data-driven decisions about future ad spending.


Department Top Three Salaries

Problem Statement:

Find the top three salaries for each department in a company.

SQL Query:

SELECT department_id, MAX(salary) AS top_salary
FROM (
    SELECT department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
    FROM employee
) AS subquery
WHERE row_num <= 3
GROUP BY department_id;

Breakdown and Explanation:

1. Subquery:

  • The subquery creates a new table that includes the department ID (department_id), salary, and the ROW_NUMBER() function applied within each department partition.

  • ROW_NUMBER() assigns a unique number to each row within each department, starting from 1 and incrementing for each subsequent row.

  • The result of this subquery is as follows:

department_id
salary
row_num

1

10000

1

1

9000

2

1

8000

3

2

15000

1

2

14000

2

2

13000

3

2. WHERE Clause:

  • The WHERE clause filters the subquery to select only rows where row_num is less than or equal to 3. This ensures that only the top three salaries for each department are selected.

3. GROUP BY Clause:

  • The GROUP BY clause groups the results by department_id to find the maximum (MAX) salary for each department.

Result:

The final result is a table that contains the department_id and the top salary (top_salary) for each department.

Real-World Applications:

  • Human Resources: Determine the highest-paid employees in each department for performance evaluations or salary negotiations.

  • Payroll: Calculate bonuses or benefits for employees based on their position within their department.

  • Management: Identify pay disparities within departments and make adjustments to ensure fairness and equity.


Get Highest Answer Rate Question

Problem Statement

Find the question with the highest answer rate in a database of questions and answers.

SQL Solution

SELECT Question
FROM Questions
WHERE AnswerRate = (SELECT MAX(AnswerRate) FROM Questions);

Breakdown

1. SELECT Question: This selects the Question column from the Questions table.

2. FROM Questions: This specifies the Questions table to select from.

3. WHERE AnswerRate = (SELECT MAX(AnswerRate) FROM Questions):

  • This subquery finds the maximum AnswerRate in the Questions table.

  • The outer query then filters the Questions table to find the question with the maximum AnswerRate.

Example

CREATE TABLE Questions (
  Question VARCHAR(255) PRIMARY KEY,
  AnswerRate FLOAT
);

INSERT INTO Questions (Question, AnswerRate) VALUES
('Question 1', 0.5),
('Question 2', 0.7),
('Question 3', 0.9);

SELECT Question
FROM Questions
WHERE AnswerRate = (SELECT MAX(AnswerRate) FROM Questions);

Output:

Question 3

Real-World Applications

This query can be used to identify the most popular or frequently asked questions in a knowledge base system, online forum, or customer support database. This information can be valuable for:

  • Improving user experience by prioritizing questions with high answer rates.

  • Identifying areas where users need more support or assistance.

  • Targeting marketing campaigns to address specific questions and topics.


Customer Placing the Largest Number of Orders

Customer Placing the Largest Number of Orders

Problem Statement:

Find the customer who has placed the highest number of orders.

SQL Script:

SELECT customer_id, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
ORDER BY order_count DESC
LIMIT 1;

Explanation:

This SQL script performs the following steps:

  1. SELECT customer_id, COUNT(*) AS order_count: For each unique customer, count the number of orders they have placed. The result is a table with two columns: the customer_id and the order_count.

  2. FROM orders: The data source is the orders table, which contains information about all orders.

  3. GROUP BY customer_id: Group the rows in the table by the customer_id column. This combines all orders for each customer into a single row.

  4. ORDER BY order_count DESC: Sort the rows in descending order by the order_count. This puts the customers with the highest order counts at the top.

  5. LIMIT 1: Limit the result to only the first row. This gives us the customer with the highest order count.

Real-World Application:

This query can be used in various real-world scenarios:

  • Marketing: Identify the most valuable customers based on their order history.

  • Customer Service: Prioritize customers with the highest order counts for better support.

  • Logistics: Plan inventory and shipping based on customer demand.

  • Fraud Detection: Identify potential fraud by comparing order counts to known customer behavior.

Simplified Example:

Imagine a table of orders:

customer_id
order_date

1

2023-01-01

1

2023-01-05

2

2023-01-03

2

2023-01-10

3

2023-01-07

The SQL script would produce the following result:

customer_id
order_count

1

2

2

2

3

1

Customer 1 and 2 have placed the same number of orders (2). However, since the ORDER BY clause is in descending order, the result would show Customer 1 as the one with the highest order count.


The Most Recent Orders for Each Product

Problem Statement:

Given a table Orders with columns order_id, product_id, order_date, find the most recent order for each unique product.

Schema:

CREATE TABLE Orders (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    order_date DATE NOT NULL,
    PRIMARY KEY (order_id)
);

Solution:

SELECT
    product_id,
    MAX(order_date) AS latest_order_date
FROM
    Orders
GROUP BY
    product_id;

Breakdown:

  1. SELECT product_id, MAX(order_date) AS latest_order_date: This selects the product ID and the maximum order date for each product.

  2. FROM Orders: This specifies the table to query from.

  3. GROUP BY product_id: This groups the results by product ID, so that we get the maximum order date for each unique product.

Simplified Explanation:

For each unique product, we find the order with the latest order date and show the product ID and the latest order date.

Real World Application:

This query can be used in a variety of real-world scenarios, such as:

  • Identifying the most recent orders for a given product to track inventory levels.

  • Finding the latest orders from a particular customer to provide personalized recommendations.

  • Monitoring order trends and patterns for specific products over time.

Example:

INSERT INTO Orders (order_id, product_id, order_date) VALUES
(1, 1, '2023-01-01'),
(2, 2, '2023-01-02'),
(3, 1, '2023-01-03'),
(4, 2, '2023-01-04'),
(5, 3, '2023-01-05');

SELECT
    product_id,
    MAX(order_date) AS latest_order_date
FROM
    Orders
GROUP BY
    product_id;

Output:

product_id  latest_order_date
1           2023-01-03
2           2023-01-04
3           2023-01-05

Top Three Wineries

1. Select the Top Three Wineries

Problem Statement:

Given a table of wine reviews, identify the top three wineries with the highest average ratings.

Query:

SELECT winery, AVG(rating) AS avg_rating
FROM reviews
GROUP BY winery
ORDER BY avg_rating DESC
LIMIT 3;

Explanation:

  1. The SELECT statement retrieves the winery column and calculates the average rating (AVG(rating)) for each unique winery.

  2. The FROM clause references the reviews table, which contains the wine review data.

  3. The GROUP BY clause groups the data by winery to calculate the average rating for each winery.

  4. The ORDER BY clause sorts the results by the average rating in descending order.

  5. The LIMIT 3 clause limits the output to the top three wineries with the highest average ratings.

2. Using a Common Table Expression (CTE)

Problem Statement:

Identify the top three wineries with the highest average ratings, but also include the number of reviews for each winery.

Query:

WITH WineryReviews AS (
  SELECT winery, AVG(rating) AS avg_rating, COUNT(*) AS review_count
  FROM reviews
  GROUP BY winery
)
SELECT winery, avg_rating, review_count
FROM WineryReviews
ORDER BY avg_rating DESC
LIMIT 3;

Explanation:

  1. The WITH statement creates a Common Table Expression (CTE) named WineryReviews, which calculates the average rating and review count for each winery.

  2. The SELECT statement inside the CTE retrieves the winery, AVG(rating), and COUNT(*) for each unique winery.

  3. The GROUP BY clause groups the data by winery to calculate these values.

  4. The outer SELECT statement selects the winery, avg_rating, and review_count columns from the WineryReviews CTE.

  5. The ORDER BY clause sorts the results by the average rating in descending order.

  6. The LIMIT 3 clause limits the output to the top three wineries with the highest average ratings.

3. Using a Subquery

Problem Statement:

Find the top three wineries with the highest average ratings, but only include wineries that have at least 10 reviews.

Query:

SELECT winery, AVG(rating) AS avg_rating
FROM reviews
WHERE winery IN (
  SELECT winery
  FROM reviews
  GROUP BY winery
  HAVING COUNT(*) >= 10
)
GROUP BY winery
ORDER BY avg_rating DESC
LIMIT 3;

Explanation:

  1. The inner subquery calculates the number of reviews for each unique winery and selects wineries that have at least 10 reviews.

  2. The WHERE clause in the main query checks if the winery is present in the subquery, ensuring that only wineries with at least 10 reviews are included.

  3. The GROUP BY clause groups the data by winery to calculate the average rating.

  4. The ORDER BY clause sorts the results by the average rating in descending order.

  5. The LIMIT 3 clause limits the output to the top three wineries with the highest average ratings.

Real-World Applications:

These queries can be used in various real-world scenarios, such as:

  • Identifying wineries to feature in a restaurant's wine list based on their ratings and popularity.

  • Marketing campaigns to target wineries with high average ratings and many reviews.

  • Identifying potential investment opportunities in the wine industry based on winery performance.


Viewers Turned Streamers

LeetCode Problem: Viewers Turned Streamers

SQL Query:

SELECT *
FROM Streamers AS s
WHERE s.streamer_id IN (
    SELECT viewer_id
    FROM Viewers
    WHERE viewer_id NOT IN (
        SELECT streamer_id
        FROM Streamers
    )
);

Breakdown and Explanation:

1. Subquery to Identify Viewers Not Streaming:

SELECT viewer_id
FROM Viewers
WHERE viewer_id NOT IN (
    SELECT streamer_id
    FROM Streamers
);
  • This subquery selects the viewer IDs of those who are not currently streaming.

  • It does this by removing streamer_id values from the Streamers table from the list of viewer_id values in the Viewers table.

2. Main Query to Select Viewers Turned Streamers:

SELECT *
FROM Streamers AS s
WHERE s.streamer_id IN (
    SELECT viewer_id
    FROM Viewers
    WHERE viewer_id NOT IN (
        SELECT streamer_id
        FROM Streamers
    )
);
  • The main query uses the subquery to identify the viewer_id values of viewers who are not currently streaming.

  • It then selects all the columns from the Streamers table where the streamer_id matches one of these viewer IDs.

  • The result is a table of streamers who were once viewers but have now become streamers themselves.

Real-World Applications:

  • Monitoring viewer growth: By tracking which viewers have become streamers, streaming platforms can monitor their viewer base growth and identify potential new talent.

  • Content strategy: Understanding which viewers turn into streamers can help platforms refine their content strategy to cater to those who aspire to become creators.

  • Community building: Identifying viewers who have become streamers can facilitate community building by connecting them with other streamers and viewers.


Grand Slam Titles

Problem: Find all distinct tennis players who have won at least one Grand Slam tournament in both men's and women's categories.

SQL Query:

SELECT DISTINCT Name
FROM Player
WHERE
  (Men_GrandSlams > 0 AND Women_GrandSlams > 0)
  OR (Men_GrandSlams IS NULL AND Women_GrandSlams > 0)
  OR (Men_GrandSlams > 0 AND Women_GrandSlams IS NULL);

Explanation:

Step 1: Identify Players with Grand Slam Wins in Both Categories

Men_GrandSlams > 0 AND Women_GrandSlams > 0

This condition checks for players who have won at least one Grand Slam in both men's and women's categories.

Step 2: Handle Null Values Null values represent missing data. We need to consider cases where a player has won Grand Slams in only one category.

(Men_GrandSlams IS NULL AND Women_GrandSlams > 0)

This condition identifies players who have won at least one Grand Slam in the women's category but may not have won any in the men's category.

(Men_GrandSlams > 0 AND Women_GrandSlams IS NULL)

Similarly, this condition identifies players who have won at least one Grand Slam in the men's category but may not have won any in the women's category.

Step 3: Combine Conditions We use the OR operator to combine the three conditions, ensuring that we capture all distinct players who meet any of these criteria.

Real-World Applications: This query can be used in various applications, such as:

  • Identifying legendary tennis players who have achieved success in both men's and women's competitions.

  • Analyzing historical Grand Slam data to identify trends and patterns.

  • Tracking the accomplishments of players and comparing their performances across categories.


Count Salary Categories

Question: Count Salary Categories

SQL Query:

SELECT
  CASE
    WHEN salary < 10000 THEN 'Low'
    WHEN salary BETWEEN 10000 AND 20000 THEN 'Medium'
    WHEN salary > 20000 THEN 'High'
    ELSE 'Invalid'
  END AS salary_category,
  COUNT(*) AS salary_count
FROM employees
GROUP BY salary_category;

Explanation:

1. CASE Statement:

The CASE statement evaluates an expression (salary) and returns a different value depending on the result. In this case, it categorizes salaries into three categories:

  • '< 10000: Low

  • 'Between 10000 and 20000: Medium

  • '> 20000: High

  • 'Otherwise: Invalid

2. COUNT(*) Function:

The COUNT(*) function counts all rows in the selected group, in this case, grouped by the salary category.

Breakdown of the Query:

  1. Select the CASE statement's result (salary_category) and the count of salary_category (salary_count).

  2. From the 'employees' table.

  3. Group the results by salary_category.

Real-World Applications:

This query can be used in HR systems to:

  • Analyze salary distribution across categories.

  • Identify potential salary disparities.

  • Make informed decisions about salary adjustments or bonuses.


Friendly Movies Streamed Last Month

Problem:

You are given a table Movie that contains the following columns:

  • id (int): The unique identifier of the movie.

  • title (varchar): The title of the movie.

  • genre (varchar): The genre of the movie.

You want to find all the movies that were streamed last month that are friendly.

Solution:

SELECT
  *
FROM Movie
WHERE
  genre = 'Friendly' AND
  streaming_date >= DATE('now', '-1 month');

Breakdown:

  • The SELECT statement selects all the columns from the Movie table.

  • The WHERE clause filters the results to only include movies that meet the following criteria:

    • The genre column is equal to 'Friendly'.

    • The streaming_date column is greater than or equal to the current date minus one month.

Explanation:

This query uses the DATE() function to subtract one month from the current date. The resulting date is then used to filter the streaming_date column to only include movies that were streamed within the last month.

Applications:

This query can be used to find all the friendly movies that were streamed last month on a streaming service. This information can be used to recommend movies to users or to track the popularity of friendly movies.


Product Sales Analysis II

Problem Statement

Given a table ProductSales containing the following columns:

  • product_id (primary key)

  • product_name

  • sales_date (date)

  • quantity (number of units sold)

  • sales_price (price per unit)

Write a SQL query to analyze product sales data and answer the following questions:

  1. Total sales amount for each product

  2. Average sales amount per month

  3. Total sales amount by product category

SQL Solution

-- Calculate total sales amount for each product
SELECT product_id, product_name, SUM(quantity * sales_price) AS total_sales_amount
FROM ProductSales
GROUP BY product_id, product_name;

-- Calculate average sales amount per month
SELECT
    SUBSTR(sales_date, 1, 7) AS sales_month,  -- Extract year-month from sales_date
    AVG(quantity * sales_price) AS avg_sales_amount_per_month
FROM ProductSales
GROUP BY sales_month;

-- Calculate total sales amount by product category
SELECT product_category, SUM(quantity * sales_price) AS total_sales_amount_by_category
FROM ProductSales
JOIN ProductCategories ON ProductSales.product_id = ProductCategories.product_id
GROUP BY product_category;

Breakdown and Explanation

1. Total Sales Amount for Each Product

  • GROUP BY product_id, product_name groups the rows by product ID and product name, creating a separate row for each product.

  • SUM(quantity * sales_price) calculates the total sales amount for each product by multiplying the quantity sold by the sales price and summing the results.

2. Average Sales Amount per Month

  • SUBSTR(sales_date, 1, 7) extracts the year-month from the sales_date column, grouping the rows by month.

  • AVG(quantity * sales_price) calculates the average sales amount for each month by averaging the total sales amount across all products sold in that month.

3. Total Sales Amount by Product Category

  • JOIN ProductSales ON ProductSales.product_id = ProductCategories.product_id joins the ProductSales table with the ProductCategories table to associate each sale with its corresponding product category.

  • GROUP BY product_category groups the rows by product category, creating a separate row for each category.

  • SUM(quantity * sales_price) calculates the total sales amount for each product category by summing the total sales amount across all products in that category.

Real-World Applications

This SQL query can be used in various real-world scenarios:

  • Inventory Management: To identify which products are selling well and which are not, helping companies optimize their inventory levels.

  • Marketing and Sales Analysis: To understand the performance of different products and categories over time, and to make informed decisions regarding marketing and sales strategies.

  • Financial Analysis: To calculate the overall sales revenue and profitability of a business, and to identify trends and patterns in sales data.


Tree Node

Problem Statement: Find the maximum depth of a binary tree from a given table.

Table Schema:

Column
Type
Description

id

int

Unique ID of the node

parent_id

int

ID of the parent node

value

int

Value of the node

Example Table:

id
parent_id
value

1

null

2

2

null

4

3

1

6

4

2

8

SQL Query:

WITH RECURSIVE TreeDepth AS (
  SELECT id, parent_id, value, 1 AS depth
  FROM Tree
  WHERE parent_id IS NULL
  
  UNION ALL
  
  SELECT t.id, t.parent_id, t.value, td.depth + 1
  FROM Tree t
  JOIN TreeDepth td ON t.parent_id = td.id
)

SELECT MAX(depth) AS max_depth
FROM TreeDepth;

Breakdown:

  1. Create a recursive CTE (Common Table Expression):

    • We create a CTE called TreeDepth that calculates the depth of each node in the tree.

    • The base case is when the parent ID is null (root node). We assign a depth of 1 to these nodes.

    • The recursive part fetches child nodes of each parent node and increments the depth by 1.

  2. Select the maximum depth:

    • After calculating the depth of all nodes, we select the maximum value from the depth column to get the maximum depth of the tree.

Example Output:

max_depth
3

Real-World Applications:

  • File Systems: Determining the maximum nesting level of folders in a hierarchical file system.

  • Organizational Structures: Finding the maximum reporting level in a company's organizational hierarchy.

  • Data Mining: Identifying patterns and trends in hierarchical data structures.


The Winner University

LeetCode Problem: Winner University

SQL Solution:

WITH UniversityWins AS (
    SELECT University, SUM(Score) AS TotalScore
    FROM Wins
    GROUP BY University
)

SELECT University
FROM UniversityWins
WHERE TotalScore = (
    SELECT MAX(TotalScore)
    FROM UniversityWins
)

Breakdown:

  1. UniversityWins Subquery:

    • Calculates the total score for each university by grouping the Wins table by University and summing the Score column.

    • Result: A table with two columns: University and TotalScore.

  2. Main Query:

    • Selects the university with the maximum total score from the UniversityWins subquery.

    • Result: A table with one column: University.

Explanation:

  • The subquery UniversityWins is used to calculate the total score for each university. This is done by grouping the rows in the Wins table by University and then summing the Score column.

  • The main query then selects the university with the maximum total score from the UniversityWins subquery. This is done using the MAX() aggregate function to find the maximum value of the TotalScore column.

Example:

Consider the following Wins table:

University
Score

A

10

B

5

C

15

A

20

UniversityWins Subquery:

SELECT University, SUM(Score) AS TotalScore
FROM Wins
GROUP BY University
University
TotalScore

A

30

B

5

C

15

Main Query:

SELECT University
FROM UniversityWins
WHERE TotalScore = (SELECT MAX(TotalScore) FROM UniversityWins)

Result:

| University | |---|---| | A |

This shows that University A is the winner with a total score of 30.

Potential Applications:

This problem can be used in real-world applications to determine the winner of a competition or tournament. For example, it can be used to find the team with the most points in a sports league or the student with the highest GPA in a university.


Concatenate the Name and the Profession

Problem: Given two tables:

Name

Profession

John

Doctor

Mary

Teacher

Bob

Engineer

Concatenate the Name and Profession columns to create a new column FullName.

Solution:

SELECT Name || ' ' || Profession AS FullName
FROM table_name;

Breakdown:

  • The || operator concatenates two strings.

  • The spaces between the quotes are used to add spaces between the name and profession.

  • The AS keyword is used to alias the new column.

Example:

SELECT Name || ' ' || Profession AS FullName
FROM table_name;

Output:

| FullName | |---|---| | John Doctor | | Mary Teacher | | Bob Engineer |

Real-World Applications:

  • Displaying full names in a report or dashboard.

  • Creating a drop-down list of names for a form.

  • Generating a list of employees with their job titles for printing or emailing.


Bikes Last Time Used

Problem Statement:

Given a table Bikes with the following columns:

  • bike_id (int): Unique identifier for each bike

  • last_used_date (date): The date the bike was last used

Write a SQL query to find the most recently used bike and the date it was last used.

Solution:

SELECT bike_id, MAX(last_used_date) AS most_recent_date
FROM Bikes
GROUP BY bike_id
ORDER BY most_recent_date DESC
LIMIT 1;

Breakdown of the Solution:

  1. SELECT bike_id, MAX(last_used_date) AS most_recent_date: This line selects the bike ID and the maximum value of the last_used_date column, which represents the most recent date the bike was used. The MAX aggregate function is used to find the maximum value in a group of rows.

  2. FROM Bikes: This line specifies that the query should be executed on the Bikes table.

  3. GROUP BY bike_id: This line groups the rows in the table by the bike_id column. This allows us to find the most recent date for each bike.

  4. ORDER BY most_recent_date DESC: This line sorts the rows in descending order of the most_recent_date column. This places the most recently used bike at the top of the result set.

  5. LIMIT 1: This line limits the result set to only the first row, which is the most recently used bike.

Real-World Applications:

This query can be used in various real-world applications, such as:

  • Bike rental companies: To track which bikes are being used most frequently and to ensure that all bikes are being maintained and repaired as needed.

  • City planners: To analyze bike usage patterns and identify areas where bike infrastructure can be improved.

  • Insurance companies: To assess the risk of bike theft or damage by determining which bikes are most commonly targeted.


Users That Actively Request Confirmation Messages

Problem:

You are given two tables:

  • Requests: Contains a list of requests made by users.

  • Users: Contains information about users.

The Requests table has the following columns:

  • request_id: The ID of the request.

  • user_id: The ID of the user who made the request.

  • request_type: The type of request.

The Users table has the following columns:

  • user_id: The ID of the user.

  • name: The name of the user.

  • email: The email address of the user.

You want to find the users who actively request confirmation messages. A user is considered active if they have made at least 3 requests of type "CONFIRMATION".

Solution:

SELECT
  U.user_id,
  U.name,
  U.email
FROM Users AS U
JOIN Requests AS R
  ON U.user_id = R.user_id
WHERE
  R.request_type = "CONFIRMATION"
GROUP BY
  U.user_id
HAVING
  COUNT(*) >= 3;

Breakdown:

  • Join the Users and Requests tables: This is done using the JOIN keyword. The ON clause specifies that the join should be performed based on the user_id column.

  • Filter the Requests table: This is done using the WHERE clause. We only want to include requests of type "CONFIRMATION".

  • Group the results by user ID: This is done using the GROUP BY clause. This will group the results together based on the user_id column.

  • Count the number of requests per user: This is done using the COUNT(*) aggregate function.

  • Filter the results to include only users with at least 3 requests: This is done using the HAVING clause. We only want to include users who have made at least 3 requests of type "CONFIRMATION".

Real World Application:

This query can be used to identify users who actively request confirmation messages. This information can be used to:

  • Send automated confirmation messages to these users.

  • Provide customer support to these users.

  • Target these users with marketing campaigns.


Game Play Analysis III

Problem Statement

Given a table GamePlay with the following schema:

| id | player_id | points | game_date |
|---|---|---|---|
| 1 | 1 | 100 | 2022-01-01 |
| 2 | 2 | 50 | 2022-01-02 |
| 3 | 1 | 150 | 2022-01-03 |
| 4 | 2 | 75 | 2022-01-04 |
| 5 | 1 | 200 | 2022-01-05 |

Find the players with the highest total points scored in the last 7 days.

Solution

To solve this problem, we can use a combination of window functions and date manipulation functions.

  1. Create a window for the last 7 days:

    OVER (PARTITION BY player_id ORDER BY game_date DESC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)

    This window will create a partition for each player and order the rows by game_date in descending order. It will then select only the rows within the last 7 days.

  2. Calculate the total points within the window:

    SUM(points)

    This will calculate the total points scored by each player within the last 7 days.

  3. Rank the players by total points:

    RANK() OVER (ORDER BY total_points DESC)

    This will rank the players in descending order by their total points within the last 7 days.

  4. Select the players with the highest rank:

    WHERE rank = 1

    This will select the players with the highest rank, which are the players with the highest total points within the last 7 days.

Complete SQL Statement

SELECT
  player_id,
  SUM(points) OVER (PARTITION BY player_id ORDER BY game_date DESC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS total_points,
  RANK() OVER (ORDER BY total_points DESC) AS rank
FROM GamePlay
WHERE game_date >= DATE('now', '-7 days')
GROUP BY player_id
HAVING rank = 1;

Example

If we execute this query on the sample table, we will get the following results:

| player_id | total_points | rank |
|---|---|---|
| 1 | 450 | 1 |
| 2 | 125 | 2 |

This shows that Player 1 has the highest total points (450) within the last 7 days, followed by Player 2 with 125 points.

Real-World Applications

This query can be used to identify the most active players in a game or the players who have scored the most points in a certain period of time. This information can be used to reward players, create leaderboards, or improve the game experience.


The First Day of the Maximum Recorded Degree in Each City

Problem:

Given a database containing weather records for multiple cities, find the first day when the maximum recorded temperature was achieved in each city.

Database Schema:

CREATE TABLE Weather (
  City TEXT,
  Date TEXT,
  Temperature REAL
);

SQL Query:

SELECT City, MIN(Date) AS FirstMaxTempDate
FROM Weather
WHERE Temperature = (SELECT MAX(Temperature) FROM Weather WHERE City = Weather.City)
GROUP BY City;

Query Breakdown:

  1. Subquery: (SELECT MAX(Temperature) FROM Weather WHERE City = Weather.City)

    • This subquery calculates the maximum temperature recorded for each city.

  2. Outer Query:

    • Weather refers to the main Weather table.

    • Temperature = (SELECT MAX(Temperature) FROM Weather WHERE City = Weather.City): This condition filters the main table to include only rows where the temperature is equal to the maximum temperature for each city.

    • GROUP BY City: This groups the results by city to get the first date for each city where the maximum temperature was recorded.

    • MIN(Date): This calculates the minimum date within each city group, which corresponds to the first day when the maximum temperature was achieved.

Example:

Consider the following table:

City
Date
Temperature

London

2020-06-01

20

London

2020-06-02

25

Paris

2020-07-01

15

Paris

2020-07-02

20

Paris

2020-07-03

25

The query would return:

City
FirstMaxTempDate

London

2020-06-02

Paris

2020-07-03

This shows that the first day when the maximum temperature was recorded in London was June 2, 2020, and in Paris was July 3, 2020.

Applications:

This query can be useful for weather analysis, such as finding the hottest day or analyzing seasonal patterns. It can also be used for comparative analysis between cities or understanding the impact of climate change over time.


Replace Employee ID With The Unique Identifier

Problem:

You have a table called Employees with the following columns:

  • EmployeeID (unique identifier for each employee)

  • Name

  • Salary

  • Department

You want to replace the EmployeeID column with a unique identifier called UID.

Solution:

ALTER TABLE Employees
ADD COLUMN UID UNIQUEIDENTIFIER ROWGUIDCOL,  -- Add a new column called UID of data type uniqueidentifier and set it as ROWGUIDCOL
DROP COLUMN EmployeeID;  -- Drop the EmployeeID column

Explanation:

The UNIQUEIDENTIFIER data type generates a unique 16-byte value that serves as a unique identifier for each row in the table. The ROWGUIDCOL property ensures that the UID column is automatically populated with a unique value when new rows are inserted. By replacing the EmployeeID column with the UID column, we create a new unique identifier that is both performant and reliable.

Real-World Applications:

  • Data Integrity: Ensuring unique identifiers eliminates the possibility of duplicate entries and data corruption.

  • Efficient Data Access: Unique identifiers allow for faster data retrieval because they can be used to quickly locate specific rows without the need for full table scans.

  • Replication: Unique identifiers simplify data replication across multiple systems, ensuring that each record is uniquely identified and can be tracked accurately.

  • Data Security: Unique identifiers can be used to enforce data security by restricting access to specific records based on the identifier.


Evaluate Boolean Expression

Problem Description:

Given a table Employee with the following columns: id, name, and salary, return the names of employees whose salary is greater than the average salary of all employees.

SQL Query:

SELECT name
FROM Employee
WHERE salary > (
    SELECT AVG(salary)
    FROM Employee
);

Breakdown:

  1. Subquery: The subquery (SELECT AVG(salary) FROM Employee) calculates the average salary of all employees.

  2. Comparison: The WHERE clause compares the salary of each employee to the average salary.

  3. Selection: Employees whose salary is greater than the average salary are selected.

Real-World Application:

This query can be used in various applications, such as:

  • Identifying employees who should receive bonuses or raises.

  • Analyzing the salary distribution within an organization.

  • Identifying potential pay discrepancies.

Simplified Explanation:

Imagine a table of employee salaries. To find employees who earn more than the average, we do the following:

  1. We calculate the average salary using the AVG() function.

  2. We compare each employee's salary to the average using the > operator.

  3. We select the names of employees whose salary is greater than the average.

Example:

Consider the following table:

id
name
salary

1

John

5000

2

Mary

4000

3

Alice

6000

The average salary is (5000 + 4000 + 6000) / 3 = 5000. The query will return:

name
Alice

as Alice's salary (6000) is greater than the average.


The Change in Global Rankings

Problem:

We have two tables:

  • Player table:

    • player_id (int)

    • player_name (string)

    • rank (int)

  • Score table:

    • player_id (int)

    • score (int)

We need to find the change in rank for each player after updating the player's score.

Explanation:

  1. Create a Common Table Expression (CTE) called PlayerWithUpdatedScore to calculate the updated rank for each player.

WITH PlayerWithUpdatedScore AS (
    SELECT
        p.player_id,
        p.player_name,
        p.rank AS old_rank,
        COALESCE(SUM(s.score), 0) AS updated_score
    FROM
        Player p
    LEFT JOIN
        Score s ON p.player_id = s.player_id
    GROUP BY
        p.player_id, p.player_name, p.rank
)
  • This CTE calculates the updated score for each player by summing up their scores from the Score table. It also includes the player's old rank.

  1. Create another CTE called PlayerWithRankChange to calculate the change in rank for each player.

WITH PlayerWithRankChange AS (
    SELECT
        p.player_id,
        p.player_name,
        p.old_rank,
        p.updated_score,
        (SELECT COUNT(*) FROM PlayerWithUpdatedScore WHERE updated_score > p.updated_score) + 1 AS new_rank,
        (new_rank - old_rank) AS rank_change
    FROM
        PlayerWithUpdatedScore p
)
  • This CTE calculates the new rank for each player by counting the number of players with a higher updated score. It then subtracts the old rank from the new rank to get the change in rank.

  1. Select the results from the PlayerWithRankChange CTE.

SELECT
    player_id,
    player_name,
    old_rank,
    new_rank,
    rank_change
FROM
    PlayerWithRankChange
ORDER BY
    player_id;

Example:

player_id
player_name
old_rank
new_rank
rank_change

1

John

5

3

-2

2

Mary

3

2

-1

3

Bob

2

1

-1

4

Alice

1

4

3

Potential Applications:

This problem can be applied in any scenario where you need to track changes in rankings based on updated scores, such as in:

  • Sports (tracking changes in player or team rankings based on game performance)

  • Gaming (tracking changes in player rankings based on in-game accomplishments)

  • Education (tracking changes in student rankings based on test scores)


Invalid Tweets

LeetCode SQL Coding Problem

Invalid Tweets

Problem: Find all the tweets with hashtags that start with the letter 'a' or 'b'.

SQL Query:

SELECT *
FROM Tweets
WHERE hashtag LIKE 'a%' OR hashtag LIKE 'b%';

Breakdown and Explanation:

  • **SELECT *: Selects all columns from the Tweets table.

  • **FROM Tweets: Specifies the Tweets table to query from.

  • **WHERE hashtag LIKE 'a%': Filters the tweets to include only those with hashtags that start with the letter 'a'. The wildcard character % represents any number of characters after 'a'.

  • **OR hashtag LIKE 'b%': Adds a second filter to include tweets with hashtags that start with the letter 'b'.

Real-World Application:

This query can be used by social media companies to analyze tweets for specific topics or trends. For example, a marketing team could use the query to identify popular hashtags related to their products or services.

Code Example:

-- Sample Tweets table
CREATE TABLE Tweets (
    id INT PRIMARY KEY,
    text VARCHAR(255),
    hashtag VARCHAR(255)
);

-- Insert sample data
INSERT INTO Tweets (id, text, hashtag) VALUES
(1, 'This is a tweet with #awesome', 'awesome'),
(2, 'This is a tweet with #bad', 'bad'),
(3, 'This is a tweet with #coffee', 'coffee');

-- Execute the query
SELECT *
FROM Tweets
WHERE hashtag LIKE 'a%' OR hashtag LIKE 'b%';

-- Output
+----+---------------------+---------+
| id | text                 | hashtag |
+----+---------------------+---------+
| 1  | This is a tweet with #awesome | awesome |
| 2  | This is a tweet with #bad | bad     |
+----+---------------------+---------+

Find Peak Calling Hours for Each City

Problem:

Find the peak calling hours for each city from a table of phone call records.

Solution:

Step 1: Group Calls by City and Hour

SELECT city, hour, COUNT(*) AS call_count
FROM phone_calls
GROUP BY city, hour
ORDER BY city, hour;

This query groups the call records by city and hour, and counts the number of calls for each city and hour.

Step 2: Find Max Call Count for Each City

SELECT city, MAX(call_count) AS peak_call_count
FROM phone_calls
GROUP BY city;

This query finds the maximum call count for each city.

Step 3: Join Results to Find Peak Hours

SELECT t1.city, t1.hour, t1.call_count
FROM phone_calls t1
JOIN (
    SELECT city, MAX(call_count) AS peak_call_count
    FROM phone_calls
    GROUP BY city
) t2 ON t1.city = t2.city AND t1.call_count = t2.peak_call_count;

This query joins the two previous queries to find the peak call count for each city and then returns the city, hour, and call count for the peak hours.

Example:

CREATE TABLE phone_calls (
  id INTEGER PRIMARY KEY,
  city TEXT,
  hour INTEGER,
  call_count INTEGER
);
INSERT INTO phone_calls (city, hour, call_count) VALUES
('New York', 10, 100),
('New York', 11, 150),
('New York', 12, 120),
('New York', 13, 100),
('London', 8, 50),
('London', 9, 70),
('London', 10, 90),
('London', 11, 100),
('Paris', 6, 20),
('Paris', 7, 40),
('Paris', 8, 60),
('Paris', 9, 80);
SELECT * FROM peak_calling_hours;

Simplified Explanation:

  1. We first group the call records by city and hour to count the number of calls for each combination.

  2. Then, we find the maximum call count for each city, which represents the peak call hour.

  3. Finally, we join these results to get the city, hour, and call count for the peak calling hours.

Real-World Applications:

  • Call Center Staffing: Determine the optimal staffing levels for call centers based on peak calling hours.

  • Network Optimization: Identify congested network areas during peak hours to improve call quality.

  • Marketing Campaigns: Target marketing campaigns to specific cities during their peak calling hours for maximum impact.


Dynamic Pivoting of a Table

Dynamic Pivoting of a Table

Problem:

You have a table with data in a specific format, and you need to transform it into a different format by pivoting the table. The pivot operation rotates the rows and columns of the table to create a new table.

Solution:

To perform dynamic pivoting, you can use a combination of PIVOT and XML PATH functions. Here's a step-by-step breakdown:

Step 1: Create a Sample Table

CREATE TABLE Sales (
  Product VARCHAR(255),
  Region VARCHAR(255),
  Sales INT
);

INSERT INTO Sales (Product, Region, Sales) VALUES ('Product A', 'East', 100);
INSERT INTO Sales (Product, Region, Sales) VALUES ('Product B', 'West', 200);
INSERT INTO Sales (Product, Region, Sales) VALUES ('Product C', 'South', 300);

Step 2: Use the PIVOT Function

The PIVOT function allows you to pivot a table by specifying the columns to be pivoted and the aggregation function to use. In our case, we want to pivot the table by the Region column and calculate the sum of Sales for each region.

SELECT *
FROM Sales
PIVOT (SUM(Sales) FOR Region IN ([East], [West], [South]));

Output:

Product
East
West
South

Product A

100

NULL

NULL

Product B

NULL

200

NULL

Product C

NULL

NULL

300

Step 3: Use the XML PATH Function

The XML PATH function can be used to dynamically generate the PIVOT clause based on the distinct values in the Region column.

DECLARE @PivotColumns NVARCHAR(MAX);

SET @PivotColumns = (
  SELECT ',' + QUOTENAME(Region)
  FROM (
    SELECT DISTINCT Region
    FROM Sales
  )
  FOR XML PATH('')
);

SELECT *
FROM Sales
PIVOT (SUM(Sales) FOR Region IN (@PivotColumns));

Output:

Product
East
West
South

Product A

100

NULL

NULL

Product B

NULL

200

NULL

Product C

NULL

NULL

300

Real-World Applications:

Dynamic pivoting can be useful in various scenarios, such as:

  • Converting data from a relational model to a multidimensional model (e.g., for reporting or analysis)

  • Summarizing data across multiple dimensions

  • Creating reports with tabular data where the columns vary dynamically based on the data


Calculate Special Bonus

Problem Statement:

Given a table Employees with columns EmployeeID, Salary, and Performance, calculate the special bonus for each employee based on their performance.

EmployeeID
Salary
Performance

1

1000

Excellent

2

2000

Good

3

3000

Average

Special Bonus Calculation:

  • Excellent: 10% of salary

  • Good: 5% of salary

  • Average: 0% of salary

SQL Query:

SELECT
  EmployeeID,
  Salary,
  Performance,
  CASE
    WHEN Performance = 'Excellent' THEN Salary * 0.10
    WHEN Performance = 'Good' THEN Salary * 0.05
    ELSE 0
  END AS SpecialBonus
FROM
  Employees;

Breakdown:

1. Select Columns:

SELECT
  EmployeeID,
  Salary,
  Performance,

This part selects the necessary columns from the Employees table.

2. Calculate Special Bonus Using CASE Statement:

  CASE
    WHEN Performance = 'Excellent' THEN Salary * 0.10
    WHEN Performance = 'Good' THEN Salary * 0.05
    ELSE 0
  END AS SpecialBonus

The CASE statement evaluates the Performance column and calculates the special bonus based on the following conditions:

  • If Performance is 'Excellent', the bonus is 10% of the salary.

  • If Performance is 'Good', the bonus is 5% of the salary.

  • Otherwise, the bonus is 0.

3. Alias the Result:

  AS SpecialBonus

This aliases the result of the CASE statement as SpecialBonus.

4. From Employees Table:

FROM
  Employees;

This specifies the source table for the query, which is the Employees table.

Output:

EmployeeID
Salary
Performance
SpecialBonus

1

1000

Excellent

100

2

2000

Good

100

3

3000

Average

0

Real-World Application:

This query can be used in an HR system to calculate special bonuses for employees based on their performance. It can also be used to analyze the performance of employees and make decisions regarding promotions or raises.


Reported Posts

Problem:

Reported Posts

Given a table called posts that contains the following columns:

  • post_id: The unique ID of the post.

  • title: The title of the post.

  • content: The content of the post.

  • reported: A flag indicating whether the post has been reported.

You need to write a query to find all the reported posts and the percentage of reported posts out of the total number of posts.

Solution:

SELECT
  COUNT(*) AS total_posts,
  (
    SELECT
      COUNT(*)
    FROM posts
    WHERE
      reported = 1
  ) AS reported_posts,
  (
    (
      SELECT
        COUNT(*)
      FROM posts
      WHERE
        reported = 1
    ) / COUNT(*) * 100
  ) AS reported_percentage
FROM posts;

Explanation:

The query first calculates the total number of posts in the total_posts column. Then, it calculates the number of reported posts in the reported_posts column by using a subquery. Finally, it calculates the percentage of reported posts out of the total number of posts in the reported_percentage column by dividing the number of reported posts by the total number of posts and multiplying the result by 100.

Real-World Applications:

This query can be used to find the percentage of reported posts on a social media platform or other online forum. This information can be used to identify posts that may contain inappropriate content or violate the platform's terms of service.


Find Customer Referee

Problem Statement:

Given a table of customer records with their referee information, find all customers who have been referred by a customer with a specific ID, as well as the number of customers they have referred.

SQL Query:

WITH Referrals AS (
    SELECT 
        referred_by,
        COUNT(*) AS num_referrals
    FROM
        customers
    GROUP BY
        referred_by
)

SELECT 
    c.customer_id,
    c.name,
    r.num_referrals
FROM 
    customers c
JOIN 
    Referrals r ON c.customer_id = r.referred_by
WHERE 
    c.referred_by = @specific_customer_id;

Explanation:

  1. Create a Common Table Expression (CTE) called Referrals: This CTE counts the number of referrals for each customer by grouping the customers table by the referred_by column.

  2. Join the customers table with the Referrals CTE: On the referred_by column, which links customers to their referring customers.

  3. Filter the results: Where the referred_by column in the customers table matches the specified customer ID.

Simplified Explanation:

Imagine a table with customer information, including who referred them. To find all customers referred by a specific customer, we first count the number of referrals for each customer using a CTE. Then, we join this count information with the customer table, filtering it to show only the customers referred by the specified customer.

Real-World Applications:

  • Referral programs: Tracking customer referrals to reward the referring customers.

  • Customer relationship management (CRM): Identifying key influencers or evangelists within a customer base.

  • Marketing campaigns: Targeting referred customers with tailored promotions to increase conversion rates.


Hopper Company Queries III

Problem Statement:

Find the total number of "Hopper" employees in the "Hopper Company" database.

Solution:

SELECT COUNT(*) AS TotalHopperEmployees
FROM Employees
WHERE Name LIKE '%Hopper%';

Simplified Explanation:

  1. SELECT COUNT(*) AS TotalHopperEmployees: This calculates the total number of rows in the "Employees" table where the "Name" column contains the substring "Hopper".

  2. FROM Employees: Specifies the table to query, which is "Employees".

  3. WHERE Name LIKE '%Hopper%': This is a filter condition that selects rows where the "Name" column contains the substring "Hopper" anywhere in the string. The percent signs (%) act as wildcards, allowing for partial matches.

Real-World Example:

In a human resources system for a large company, this query can be used to quickly count the number of employees with the last name "Hopper". This information could be used for various purposes, such as reporting on employee demographics or identifying potential candidates for promotions.

Potential Applications:

  • Employee Management: Tracking the number of employees with specific characteristics, such as last name, job title, or department.

  • Hiring and Recruitment: Identifying candidates with desired qualifications or experience.

  • Performance Evaluation: Analyzing the distribution of employees by performance ratings or salary range.


Reported Posts II

Problem Statement

Given a table named ReportedPosts with the following schema:

CREATE TABLE ReportedPosts (
    reportId INT NOT NULL,
    postId INT NOT NULL,
    reporterId INT NOT NULL,
    PRIMARY KEY (reportId)
);

You need to write a SQL query to find the top 10 reported posts.

Optimized Solution

SELECT postId, COUNT(DISTINCT reporterId) AS report_count
FROM ReportedPosts
GROUP BY postId
ORDER BY report_count DESC
LIMIT 10;

Explanation

  • The query starts with a SELECT statement that retrieves the postId and the count of distinct reporterIds for each post. The DISTINCT keyword is used to ensure that each reporter is counted only once.

  • The FROM clause specifies the ReportedPosts table as the data source.

  • The GROUP BY clause groups the results by postId so that the count of distinct reporterIds can be computed for each post.

  • The ORDER BY clause sorts the results in descending order of report_count.

  • The LIMIT 10 clause limits the results to the top 10 reported posts.

Performance Analysis

The query uses an index on the postId column to speed up the retrieval of the reported posts. The GROUP BY and ORDER BY operations are also optimized to take advantage of the index.

Real-World Applications

This query can be used to identify the most reported posts on a social media platform or any other platform that allows users to report content. This information can be used to investigate the content and take appropriate action, such as removing the post or banning the user who posted it.


The Number of Seniors and Juniors to Join the Company II

Problem:

Count the number of seniors and juniors in a company with the following employee table:

CREATE TABLE Employee (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  position VARCHAR(255)
);

Solution:

SELECT
  COUNT(CASE WHEN position = 'Senior' THEN 1 END) AS num_seniors,
  COUNT(CASE WHEN position = 'Junior' THEN 1 END) AS num_juniors
FROM
  Employee;

Breakdown:

  • The COUNT() function is used to count the number of rows that meet a certain condition.

  • The CASE statement is used to check the value of the position column and return a different value for each possible value.

  • The WHEN clause is used to specify the condition that must be met for the CASE statement to return a value.

  • The THEN clause specifies the value that the CASE statement will return if the condition is met.

  • The END clause closes the CASE statement.

Real-World Application:

This query could be used by a company to track the number of senior and junior employees they have. This information could be used to make decisions about hiring, promotion, and training.

Example:

| num_seniors | num_juniors |
|-------------|-------------|
| 5            | 10           |

This example shows that the company has 5 senior employees and 10 junior employees.


Active Businesses

Problem Statement:

Given a table called Businesses that contains information about businesses, including their business_id, address, and active status. Find all active businesses.

SQL Solution:

SELECT *
FROM Businesses
WHERE active = 1;

Breakdown and Explanation:

  • *SELECT : This clause selects all columns from the Businesses table.

  • FROM Businesses: This clause specifies the table from which the data will be retrieved.

  • WHERE active = 1: This clause filters the results to only include businesses where the active column is set to 1. In this table, a value of 1 indicates that the business is active.

Example:

Consider the following Businesses table:

business_id
address
active

1

123 Main Street

1

2

456 Elm Street

0

3

789 Oak Street

1

Running the query on this table will return the following result:

business_id
address
active

1

123 Main Street

1

3

789 Oak Street

1

These are the two businesses that are active.

Real-World Applications:

This query can be useful in various real-world applications, such as:

  • Creating a list of active businesses for a directory or website.

  • Identifying active businesses for marketing campaigns or promotions.

  • Tracking the status of businesses for regulatory purposes.


Delete Duplicate Emails

Problem: Delete Duplicate Emails

SQL Solution:

DELETE FROM users
WHERE id NOT IN (SELECT MIN(id) 
                  FROM users 
                  GROUP BY email);

Breakdown:

  • DELETE FROM users: This deletes rows from the "users" table.

  • WHERE id NOT IN(...): This condition checks for rows where the "id" column is not included in the list of minimum "id"s for each email address.

  • SELECT MIN(id) FROM users GROUP BY email: This subquery finds the minimum "id" for each unique email address. By comparing "id" not in this list, we can identify duplicate rows to delete.

Simplified Explanation:

Imagine you have a table of users with columns for "id" and "email." Each email address may be associated with multiple rows (e.g., if a user has multiple accounts). We want to keep only the row with the smallest "id" for each email address and delete the duplicates.

The query accomplishes this by first identifying the minimum "id" for each email address. Then, it deletes all rows except those with the matching minimum "id."

Example:

| id | email         |
| --- | ------------- |
| 1   | user@example.com |
| 2   | user@example.com |
| 3   | another@example.com |
| 4   | another@example.com |

DELETE FROM users
WHERE id NOT IN (SELECT MIN(id) 
                  FROM users 
                  GROUP BY email);

Result:

| id | email         |
| --- | ------------- |
| 1   | user@example.com |
| 3   | another@example.com |

Real-World Applications:

This query can be useful in various scenarios, such as:

  • Cleaning up user data tables by removing duplicate email addresses for better data integrity.

  • Enforcing unique email addresses in a registration system to prevent multiple accounts for the same user.

  • Identifying and removing duplicate emails from a marketing list to improve email deliverability rates.


Sales Analysis III

Problem Statement:

Given a sales database with tables Orders and Products, find the total sales for each product group.

Tables:

Orders:
- order_id: INTEGER
- customer_id: INTEGER
- product_id: INTEGER
- quantity: INTEGER
- unit_price: DECIMAL

Products:
- product_id: INTEGER
- product_group: STRING
- product_name: STRING

Solution:

SELECT
  p.product_group,
  SUM(o.quantity * o.unit_price) AS total_sales
FROM Orders AS o
JOIN Products AS p
  ON o.product_id = p.product_id
GROUP BY
  p.product_group;

Explanation:

  1. We start by joining the Orders and Products tables on the product_id column to link orders to products.

  2. We then use the SUM() function to calculate the total sales for each product group by multiplying the quantity ordered with the unit price and summing up the results.

  3. The GROUP BY clause groups the results by product group to show the total sales for each group.

Example:

product_group
total_sales

Electronics

10000

Clothing

5000

Furniture

2000

Real-World Applications:

This query can be used for various analytics and reporting purposes, such as:

  • Identifying the most profitable product groups

  • Analyzing sales trends and patterns

  • Making informed decisions about product development and marketing


Number of Trusted Contacts of a Customer

Problem: Given a table of customer information and a table of trusted contacts, determine the number of trusted contacts for each customer.

Table Schema:

  • customers: Contains customer data, including customer_id.

  • trusted_contacts: Contains trusted contact data, including customer_id and contact_id.

SQL Query:

SELECT c.customer_id, COUNT(DISTINCT tc.contact_id) AS num_trusted_contacts
FROM customers c
LEFT JOIN trusted_contacts tc ON c.customer_id = tc.customer_id
GROUP BY c.customer_id;

Explanation:

  • The LEFT JOIN combines the customers and trusted_contacts tables, preserving all customers even if they have no trusted contacts.

  • The COUNT(DISTINCT tc.contact_id) function counts the distinct contact IDs for each customer, giving us the total number of trusted contacts.

  • The GROUP BY c.customer_id clause groups the results by customer_id, ensuring that each customer has its own row with the corresponding number of trusted contacts.

Example:

customers table:

customer_id
name

1

John Doe

2

Jane Smith

trusted_contacts table:

customer_id
contact_id

1

101

1

102

2

201

2

202

Result:

customer_id
num_trusted_contacts

1

2

2

2

Real-World Applications:

  • Customer Relationship Management (CRM) systems: Understand the level of trust between customers and their trusted contacts.

  • Fraud detection: Identify customers with an unusually large number of trusted contacts, which may indicate suspicious activity.

  • Social network analysis: Determine the connectivity and influence of customers within a social network.


The Most Recent Three Orders

Problem Statement: Given a table Orders that contains columns like order_id, created_at, total_amount, find the most recent three orders for each customer.

Solution:

-- Select the order_id, created_at, and total_amount for the most recent three orders for each customer.
SELECT order_id,
       created_at,
       total_amount
-- From the Orders table.
FROM Orders
-- Use a subquery to find the most recent three orders for each customer.
WHERE order_id IN (
    SELECT order_id
    FROM (
        SELECT order_id,
               customer_id,
               created_at,
               total_amount,
               ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS row_num
        FROM Orders
    ) AS subquery
    WHERE row_num <= 3
);

Explanation:

  • The main query selects the order_id, created_at, and total_amount for the most recent three orders for each customer.

  • The subquery finds the most recent three orders for each customer using the ROW_NUMBER() function.

  • The ROW_NUMBER() function assigns a sequential number to each row within a partition.

  • The PARTITION BY customer_id clause ensures that the rows are partitioned by customer_id.

  • The ORDER BY created_at DESC clause sorts the rows within each partition in descending order of created_at.

  • The row_num <= 3 condition selects the rows with the lowest three row numbers, which correspond to the most recent three orders for each customer.

  • The IN clause in the main query uses the subquery to select the order_ids of the most recent three orders for each customer.

Real-World Applications:

  • Identifying the most recent orders for a customer can be useful for various purposes, such as:

    • Providing personalized recommendations based on recent purchases.

    • Tracking the status of recent orders and providing updates to customers.

    • Identifying any issues or delays with recent orders.


Apples & Oranges

Problem Statement

Implement a SQL query to return a list of apples and oranges and their total count. The table is given as:

+-------------+-------------+
| Fruits      | Count      |
+-------------+-------------+
| Apples      | 10         |
| Oranges     | 5          |
| Apples      | 7          |
| Oranges     | 10         |
+-------------+-------------+

Solution

-- Group the fruits by name and sum their counts
SELECT Fruits, SUM(Count) AS TotalCount
FROM Fruits
GROUP BY Fruits
ORDER BY Fruits;

Output

+-------------+-------------+
| Fruits      | TotalCount |
+-------------+-------------+
| Apples      | 17         |
| Oranges     | 15         |
+-------------+-------------+

Explanation

  • GROUP BY Fruits: This clause groups the rows in the Fruits table by the Fruits column. This means that all rows with the same fruit name will be grouped together.

  • SUM(Count): This clause calculates the sum of the Count column for each group. This gives us the total count of each fruit.

  • ORDER BY Fruits: This clause sorts the results by the Fruits column in ascending order.

Real-World Application

This query can be used in a variety of real-world applications, such as:

  • Inventory management: To track the total count of apples and oranges in a warehouse.

  • Sales analysis: To determine which fruits are selling the best.

  • Market research: To gather data on the popularity of different fruits.


Recyclable and Low Fat Products

LeetCode Problem:

Recyclable and Low Fat Products

Problem Statement:

You are given two tables:

  • products: Each row represents a product with product_id, product_name, and product_type.

  • product_tags: Each row represents a tag associated with a product with product_id and tag_id.

Find all products that are both recyclable and low fat.

Example:

products:
+--------------+-------------------+--------------------+
| product_id    | product_name      | product_type       |
+--------------+-------------------+--------------------+
| 1             | Apple            | Fruits             |
| 2             | Banana           | Fruits             |
| 3             | Milk             | Dairy              |
| 4             | Yogurt           | Dairy              |
| 5             | Cheese           | Dairy              |

product_tags:
+--------------+--------+
| product_id    | tag_id |
+--------------+--------+
| 1             | 1      |
| 2             | 2      |
| 3             | 3      |
| 4             | 4      |
| 5             | 5      |
| 1             | 6      |
+--------------+--------+

Result:

+--------------+-------------------+
| product_id    | product_name      |
+--------------+-------------------+
| 4             | Yogurt           |
| 5             | Cheese           |

Solution:

SELECT p.product_id, p.product_name
FROM products AS p
INNER JOIN product_tags AS pt ON p.product_id = pt.product_id
WHERE pt.tag_id IN (
    SELECT tag_id FROM tags WHERE tag = 'recyclable'
)
AND pt.tag_id IN (
    SELECT tag_id FROM tags WHERE tag = 'low_fat'
);

Breakdown:

  1. Join Products and Tags: We join the products and product_tags tables using the product_id column to associate products with their tags.

  2. Filter by Recyclable Tag: We use an INNER JOIN to filter the product_tags table and include only rows where the tag_id matches a tag with the value 'recyclable'.

  3. Filter by Low Fat Tag: We use a second INNER JOIN to further filter the product_tags table and include only rows where the tag_id matches a tag with the value 'low_fat'.

  4. Final Result: We select the product_id and product_name from the products table where the products have both a recyclable tag and a low fat tag.

Real-World Applications:

  • Consumer Information: Retailers can use this information to help consumers identify products that are both environmentally friendly and healthy.

  • Product Development: Manufacturers can use this analysis to develop and market products that meet the growing demand for sustainable and low-fat options.

  • Online Shopping: E-commerce platforms can provide filters and recommendations to guide customers towards recyclable and low fat products.


Leetcodify Friends Recommendations

Problem Statement

You are given a table Friends with the following columns:

  • user_id1

  • user_id2

This table represents friendships between users. Each row indicates that user user_id1 is friends with user user_id2.

You are also given a table RecommendedFriends with the following columns:

  • user_id

  • recommended_friend_id

This table contains potential friend recommendations for users. Each row indicates that user user_id may be interested in becoming friends with user recommended_friend_id.

Your task is to write a query to find a list of recommended friends for each user in the Friends table. The recommended friends should not already be friends with the user.

SQL Implementation

SELECT
    F.user_id1 AS user_id,
    RF.recommended_friend_id
FROM
    Friends AS F
INNER JOIN
    RecommendedFriends AS RF
ON
    F.user_id1 = RF.user_id
WHERE
    NOT EXISTS (
        SELECT
            1
        FROM
            Friends
        WHERE
            (user_id1 = F.user_id1 AND user_id2 = RF.recommended_friend_id)
            OR (user_id2 = F.user_id1 AND user_id1 = RF.recommended_friend_id)
    );

Explanation

  • The query starts by joining the Friends table (aliased as F) with the RecommendedFriends table (aliased as RF) on the user_id1 column. This ensures that we only consider potential friend recommendations for users who are already in the Friends table.

  • The WHERE clause uses a subquery to exclude any recommended friends who are already friends with the user. The subquery checks whether there are any rows in the Friends table where one of the users is the same as the user_id1 from the outer query and the other user is the same as the recommended_friend_id from the outer query.

  • The final result is a table containing a list of recommended friends for each user in the Friends table.

Real-World Applications

This query can be used in a variety of real-world applications, including:

  • Social media: Recommending new friends to users on social media platforms.

  • E-commerce: Recommending products to users based on their past purchases and browsing history.

  • Customer relationship management (CRM): Identifying potential customers who may be interested in a particular product or service.


Drop Type 1 Orders for Customers With Type 0 Orders

Problem Statement:

You are given a table Orders with the following schema:

CREATE TABLE Orders (
  customer_id INT PRIMARY KEY,
  order_type INT,
  order_amount INT
);

You need to delete all orders of type 1 for customers who have at least one order of type 0.

Solution:

DELETE FROM Orders
WHERE order_type = 1
AND customer_id IN (SELECT customer_id FROM Orders WHERE order_type = 0);

Explanation:

  1. Identify Customers with Type 0 Orders: We use a subquery to select the customer_ids of all customers who have at least one order of type 0.

  2. Delete Type 1 Orders: We then use these customer_ids to delete all orders of type 1 for those customers.

Simplified Explanation:

Imagine you have a table of orders, where each row represents an order placed by a customer. Each order has an order type (0 or 1) and an order amount.

We want to delete all orders of type 1 for customers who have also placed at least one order of type 0.

To do this, we first find all the customers who have placed an order of type 0. Then, we use this list to delete all their orders of type 1.

Real-World Applications:

This query can be used in various real-world scenarios, such as:

  • Customer Segmentation: Identifying customers who have placed specific types of orders can help businesses segment their customers and target marketing campaigns accordingly.

  • Inventory Management: Removing duplicate or unwanted orders can help optimize inventory levels and prevent overstocking.

  • Fraud Detection: Identifying customers who place orders with conflicting order types can help detect fraudulent activities.


Rectangles Area

Problem Statement:

Rectangles Area

You are given a table rectangles that contains the following columns:

  • id: Integer, the unique identifier of the rectangle.

  • x1: Integer, the x-coordinate of the lower left corner.

  • y1: Integer, the y-coordinate of the lower left corner.

  • x2: Integer, the x-coordinate of the upper right corner.

  • y2: Integer, the y-coordinate of the upper right corner.

Find the total area that is covered by all the rectangles.

Example:

Input:

+---+-----+-----+-----+-----+
| id | x1  | y1  | x2  | y2  |
+---+-----+-----+-----+-----+
| 1  | 1   | 1   | 4   | 5   |
| 2  | 3   | 2   | 5   | 7   |
| 3  | 6   | 3   | 8   | 6   |
+---+-----+-----+-----+-----+

Output:

45

Solution:

  1. Calculate the area of each rectangle:

SELECT id, (x2 - x1) * (y2 - y1) AS area
FROM rectangles;
  1. Find the total area:

SELECT SUM(area) AS total_area
FROM (
  SELECT id, (x2 - x1) * (y2 - y1) AS area
  FROM rectangles
);

Complete Solution:

SELECT SUM(area) AS total_area
FROM (
  SELECT (x2 - x1) * (y2 - y1) AS area
  FROM rectangles
);

Explanation:

  • The subquery calculates the area of each rectangle by subtracting the x1 from x2 and y1 from y2, then multiplying the results.

  • The outer query then sums the area of all the rectangles to find the total area covered.

  • The subquery is used to prevent duplicate calculations, as the area of a rectangle is the same regardless of which corner is considered the lower left corner.

Real-World Applications:

  • Calculating the total area covered by objects in a geographic area (e.g., buildings, parks, lakes).

  • Determining the amount of material needed to cover a surface (e.g., paint, fabric, flooring).

  • Estimating the amount of space required for a particular purpose (e.g., a warehouse, a parking lot, a garden).


Average Time of Process per Machine

Problem:

Find the average processing time of a process for each machine.

SQL Query:

-- Select the machine name and average processing time
SELECT machine_name,
       AVG(processing_time) AS average_processing_time
-- From the 'processes' table
FROM processes
-- Group the results by machine name
GROUP BY machine_name;

Breakdown and Explanation:

SELECT:

  • machine_name: Select the name of the machine.

  • AVG(processing_time): Calculate the average processing time for each machine.

FROM:

  • processes: The table containing the processing time data.

GROUP BY:

  • machine_name: Group the results by machine name to calculate the average processing time for each machine.

Real-World Application:

This query can be used in a manufacturing or production environment to:

  • Identify machines with longer processing times, indicating potential bottlenecks.

  • Compare the performance of different machines or configurations.

  • Set performance targets and monitor progress towards improving process efficiency.

Example:

Consider the following table:

process_id
machine_name
processing_time

1

Machine A

10

2

Machine A

15

3

Machine B

20

4

Machine B

25

The query would produce:

machine_name
average_processing_time

Machine A

12.5

Machine B

22.5

This indicates that Machine A has an average processing time of 12.5 units while Machine B has an average processing time of 22.5 units.


Find Interview Candidates

Problem:

Find candidates for an interview based on their skills and experience.

Table Schema:

candidates (
  id INT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  skills VARCHAR(255) NOT NULL,
  experience VARCHAR(255) NOT NULL
);

interviews (
  id INT PRIMARY KEY,
  position VARCHAR(255) NOT NULL,
  required_skills VARCHAR(255) NOT NULL,
  required_experience VARCHAR(255) NOT NULL
);

Solution:

SELECT
  c.id,
  c.name
FROM Candidates AS c
JOIN Interviews AS i
  ON c.skills LIKE '%'||i.required_skills||'%'
  AND c.experience LIKE '%'||i.required_experience||'%'
WHERE
  i.position = 'Software Engineer';

Explanation:

  1. JOIN: Combine the candidates and interviews tables on matching skill and experience requirements.

  2. LIKE: Use the LIKE operator to match candidates' skills and experience to the required skills and experience for the position.

  3. '%'||i.required_skills||'%': Surround the required skills with wildcard characters to allow for partial matches.

  4. WHERE: Filter the results to only include candidates applying for the specified position, in this case, "Software Engineer."

Example:

candidates:
+----+------+-------+----------+
| id | name | skills | experience |
+----+------+-------+----------+
| 1  | John | Java   | 5 years   |
| 2  | Mary | Python | 3 years   |
| 3  | Tom  | C++    | 2 years   |

interviews:
+----+---------+---------------+--------------------+
| id | position | required_skills | required_experience |
+----+---------+---------------+--------------------+
| 1  | Software Engineer | Java, Python | 3 years           |

Query:
```sql
SELECT
  c.id,
  c.name
FROM Candidates AS c
JOIN Interviews AS i
  ON c.skills LIKE '%'||i.required_skills||'%'
  AND c.experience LIKE '%'||i.required_experience||'%'
WHERE
  i.position = 'Software Engineer';

Result:

+----+------+
| id | name |
+----+------+
| 1  | John |
+----+------+

John is the only candidate with the required skills and experience for the Software Engineer position.

Real-World Applications:

  • Recruiting: Find potential candidates for job openings based on their skills and experience.

  • Talent Mapping: Identify internal employees with the right skills for promotions or new projects.

  • Skill Gap Analysis: Determine the skills that are lacking in an organization and develop training programs accordingly.


Find All Unique Email Domains

Problem: Find All Unique Email Domains

Description: Given a table of email addresses, find all unique domains.

SQL Query:

SELECT SUBSTR(email, INSTR(email, '@') + 1) AS domain
FROM email_table
GROUP BY domain;

Breakdown:

  1. Extract the Domain: The SUBSTR() function is used to extract the substring of the email address that starts after the '@' symbol. This is where the domain is located.

  2. Group by Domain: The results are grouped by the domain column using the GROUP BY clause. This ensures that only unique domains are returned.

Example:

| email |
|---|---|
| john@example.com |
| jane@example.com |
| bob@gmail.com |
| alice@yahoo.com |

Result:

| domain |
|---|---|
| example.com |
| gmail.com |
| yahoo.com |

Real-World Applications:

  • Email Marketing: Identifying unique email domains can help email marketers target specific audiences.

  • Spam Detection: Analyzing email domains can help identify potential spam or phishing attempts.

  • Data Analysis: Understanding the distribution of email domains can provide insights into user demographics and online behavior.


Hopper Company Queries II

Problem:

Hopper Company Queries II

Given a table called Accounts with columns id, email, balance, and a table called Transactions with columns id, from_account, to_account, amount, and timestamp.

Write a SQL query to find the accounts with the highest balance. If there is more than one account with the highest balance, return all of them.

SQL Query:

SELECT id, email, balance
FROM Accounts
WHERE balance = (SELECT MAX(balance) FROM Accounts);

Explanation:

  • The subquery (SELECT MAX(balance) FROM Accounts) finds the maximum balance in the Accounts table.

  • The outer query selects all accounts with a balance equal to the maximum balance.

Real-World Example:

A bank wants to know which accounts have the highest balance. This information can be used to target marketing campaigns to these accounts.

Applications in Real World:

  • Identifying high-value customers for targeted marketing campaigns

  • Monitoring account balances for potential fraud or overdraft fees

  • Tracking the financial health of a business or organization


Friday Purchases II

Problem Statement: Find the total amount spent by customers on Fridays from the 'Sales' table.

SQL Query:

SELECT SUM(Amount)
FROM Sales
WHERE DAYNAME(Date) = 'Friday';

Breakdown:

  • SELECT SUM(Amount): Calculates the total amount spent by summing up the 'Amount' column.

  • FROM Sales: Specifies the table from which data will be retrieved.

  • WHERE DAYNAME(Date) = 'Friday': Filters the rows to include only sales made on Friday. The DAYNAME() function returns the name of the day of the week for a given date.

Explanation:

The WHERE clause ensures that only records where the 'Date' column has a day name of 'Friday' are included in the calculation. The SUM() function computes the total amount spent by summing up the 'Amount' column across all eligible rows.

Real-World Application:

This query can be used in various real-world scenarios:

  • Retail Analytics: To analyze customer behavior and understand their spending patterns on specific days of the week.

  • Sales Performance Monitoring: To track weekly sales performance and identify trends based on weekdays.

  • Loyalty Program Management: To award rewards or discounts to customers who make purchases on designated days, such as Fridays.

Complete Code Implementation:

-- Create the 'Sales' table
CREATE TABLE Sales (
  ID INT PRIMARY KEY,
  Date DATE,
  Amount INT
);

-- Insert sample data
INSERT INTO Sales (Date, Amount) VALUES
('2023-01-01', 10),
('2023-01-05', 20),
('2023-01-07', 15),
('2023-01-09', 25);

-- Execute the query
SELECT SUM(Amount)
FROM Sales
WHERE DAYNAME(Date) = 'Friday';

Expected Result:

If there are two sales made on Fridays (e.g., '2023-01-05' and '2023-01-09'), the query will return:

35

Generate the Invoice

Problem:

You are given two tables:

  • Invoice: Contains invoice data such as invoice number, invoice date, customer ID, etc.

  • InvoiceLine: Contains invoice line item data such as product ID, quantity, unit price, etc.

Write a query to generate an invoice for a specific invoice number. The invoice should include the following columns:

  • Invoice Number

  • Invoice Date

  • Customer ID

  • Customer Name (from the Customer table)

  • Product ID

  • Product Name (from the Product table)

  • Quantity

  • Unit Price

  • Line Total

  • Invoice Total

Solution:

SELECT
  Invoice.InvoiceNo,
  Invoice.InvoiceDate,
  Invoice.CustomerID,
  Customer.CustomerName,
  InvoiceLine.ProductID,
  Product.ProductName,
  InvoiceLine.Quantity,
  InvoiceLine.UnitPrice,
  InvoiceLine.Quantity * InvoiceLine.UnitPrice AS LineTotal
FROM Invoice
JOIN Customer ON Invoice.CustomerID = Customer.CustomerID
JOIN InvoiceLine ON Invoice.InvoiceNo = InvoiceLine.InvoiceNo
JOIN Product ON InvoiceLine.ProductID = Product.ProductID
WHERE
  Invoice.InvoiceNo = 'INV0001';

Explanation:

  1. JOIN Tables: We first join the Invoice, Customer, InvoiceLine, and Product tables using appropriate foreign key relationships. This ensures that we can access data from all four tables.

  2. WHERE Clause: We use the WHERE clause to filter the results based on the specified invoice number ('INV0001' in this example).

  3. SELECT Clause: The SELECT clause specifies the columns to be included in the invoice. These include invoice details, customer information, product details, and invoice line item details such as quantity, unit price, and line total.

Real-World Application:

This query is useful for generating invoices for customers in various business applications, such as e-commerce platforms, billing systems, and accounting software. It provides a complete view of an invoice, including customer information, product details, and the total amount due.


All People Report to the Given Manager

Problem Statement:

Given two tables:

  • Employee (id, name, manager_id)

  • Manager (id, name, department_id)

Return a list of employees who report directly to the given manager.

SQL Solution:

SELECT e.name
FROM Employee e
INNER JOIN Manager m ON e.manager_id = m.id
WHERE m.name = 'Given Manager Name';

Breakdown:

  1. INNER JOIN: We join the Employee and Manager tables on the manager_id column to connect employees with their managers.

  2. WHERE Clause: We filter the results to include only employees whose managers have the specified name.

Example:

Employee:
+----+-------+----------+
| id | name   | manager_id |
+----+-------+----------+
| 1   | John   | 2         |
| 2   | Mary   | 3         |
| 3   | Bob    | null       |

Manager:
+----+-------+------------+
| id | name   | department_id |
+----+-------+------------+
| 1   | Alice  | 10          |
| 2   | Tom    | 20          |
| 3   | Susan  | 30          |

Query:

SELECT e.name
FROM Employee e
INNER JOIN Manager m ON e.manager_id = m.id
WHERE m.name = 'Tom';

Result:

+-------+
| name   |
+-------+
| John   |

John is the only employee who reports directly to Tom, so his name is returned.

Real-World Applications:

  • Employee Management Systems: Identify the employees who report to a specific manager for performance evaluations, project assignments, or organizational restructuring.

  • HR Reporting: Generate reports on the number of employees reporting to each manager or the average salaries of teams under their supervision.

  • Team Building: Create a list of employees who work directly with a given manager for team-building activities or project collaboration.


Classifying Triangles by Lengths

Problem Statement:

You have a table triangles that contains the lengths of the three sides of a triangle: a, b, and c. Classify the triangles based on the lengths of their sides into three categories:

  • Equilateral: All three sides are equal.

  • Isosceles: Two of the three sides are equal.

  • Scalene: All three sides are different.

Best & Performant SQL Solution:

SELECT
  CASE
    WHEN a = b AND b = c
    THEN 'Equilateral'
    WHEN a = b OR b = c OR a = c
    THEN 'Isosceles'
    ELSE 'Scalene'
  END AS triangle_type
FROM triangles;

Implementation Details:

  • The CASE expression is used to determine the triangle type based on the values of a, b, and c.

  • If all three sides are equal (a = b = c), the triangle is Equilateral.

  • If any two sides are equal (a = b, b = c, or a = c), the triangle is Isosceles.

  • Otherwise, all three sides are different and the triangle is Scalene.

Example Usage:

CREATE TABLE triangles (a INT, b INT, c INT);
INSERT INTO triangles VALUES (3, 3, 3); -- Equilateral
INSERT INTO triangles VALUES (3, 4, 3); -- Isosceles
INSERT INTO triangles VALUES (2, 3, 4); -- Scalene

SELECT * FROM triangles;

+---+---+---+--------------+
| a | b | c | triangle_type |
+---+---+---+--------------+
| 3 | 3 | 3 | Equilateral   |
| 3 | 4 | 3 | Isosceles     |
| 2 | 3 | 4 | Scalene       |
+---+---+---+--------------+

Real-World Applications:

Triangle classification is used in various fields, including:

  • Engineering: Determining the stability and strength of structures based on the shape of their components.

  • Architecture: Designing buildings with specific structural properties.

  • Biology: Identifying and classifying plant and animal species based on their shape.

  • Computer Graphics: Rendering objects with different triangles to create realistic models.


Compute the Rank as a Percentage

** Problem Statement**

Given a table Scores with two columns:

  • score (an integer)

  • name (a string) Write a SQL query to rank the students based on their scores in descending order, and output the rank as a percentage rounded to two decimal places.

Real World Application

  • Employee Performance Evaluation:

    • Rank employees based on their performance scores.

    • Calculate the percentage rank to provide a fair comparison among employees.

SQL Solution

SELECT 
    name,
    score,
    RANK() OVER (ORDER BY score DESC) AS rank,
    ROUND((RANK() OVER (ORDER BY score DESC) / COUNT(*) * 100), 2) AS rank_percentage
FROM 
    Scores

Code Explanation

  • The RANK() function assigns a rank to each row based on the score column in descending order.

  • The COUNT(*) function returns the total number of rows in the table.

  • The ROUND() function rounds the rank percentage to two decimal places for better readability.

Example

Input Table:

name
score

John

90

Mary

80

Bob

70

Output:

name
score
rank
rank_percentage

John

90

1

100.00

Mary

80

2

66.67

Bob

70

3

33.33

Breakdown

  • The RANK() function assigns ranks based on the score column in descending order:

    • John has the highest score of 90, so he is ranked 1st.

    • Mary has the next highest score of 80, so she is ranked 2nd.

    • Bob has the lowest score of 70, so he is ranked 3rd.

  • The COUNT(*) function returns the total number of rows in the table, which is 3.

  • The ROUND() function rounds the rank percentage to two decimal places:

    • John's rank percentage is 100.00% (1 / 3 * 100).

    • Mary's rank percentage is 66.67% (2 / 3 * 100).

    • Bob's rank percentage is 33.33% (3 / 3 * 100).


Total Traveled Distance

Problem Statement:

Given a table logs that tracks the locations of a fleet of vehicles, calculate the total distance traveled by each vehicle between each pair of consecutive timestamps.

Table Schema:

CREATE TABLE logs (
  vehicle_id INT NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  longitude FLOAT NOT NULL,
  latitude FLOAT NOT NULL,
  PRIMARY KEY (vehicle_id, timestamp)
);

SQL Solution:

WITH CTE AS (
  SELECT
    vehicle_id,
    timestamp,
    longitude,
    latitude,
    LAG(timestamp) OVER (PARTITION BY vehicle_id ORDER BY timestamp) AS prev_timestamp,
    LAG(longitude) OVER (PARTITION BY vehicle_id ORDER BY timestamp) AS prev_longitude,
    LAG(latitude) OVER (PARTITION BY vehicle_id ORDER BY timestamp) AS prev_latitude
  FROM
    logs
)
SELECT
  vehicle_id,
  prev_timestamp,
  timestamp,
  3959 * acos(
    cos(radians(prev_latitude)) * cos(radians(latitude))
    * cos(radians(prev_longitude) - radians(longitude))
    + sin(radians(prev_latitude)) * sin(radians(latitude))
  ) AS distance_traveled
FROM
  CTE
WHERE
  prev_timestamp IS NOT NULL
ORDER BY
  vehicle_id,
  prev_timestamp;

Explanation:

This solution uses a Common Table Expression (CTE) to calculate the distance traveled by each vehicle between consecutive timestamps. Here's how it works:

  1. Create the CTE: The CTE, named CTE, selects the relevant columns from the logs table and adds three additional columns:

    • prev_timestamp: The previous timestamp for each row.

    • prev_longitude: The previous longitude for each row.

    • prev_latitude: The previous latitude for each row.

  2. Calculate the Distance: For each row, the distance_traveled column is calculated using the Haversine formula, which measures the distance between two points on a sphere (Earth).

  3. Filter Out Null Values: The WHERE clause filters out rows where prev_timestamp is null, as these rows represent the first location entry for each vehicle and have no previous data to calculate the distance from.

  4. Sort the Results: The results are sorted by vehicle_id and prev_timestamp to group the distances traveled by each vehicle in chronological order.

Example:

Consider the following logs table:

vehicle_id
timestamp
longitude
latitude

1

2022-01-01 10:00:00

-122.4194

37.7749

1

2022-01-01 12:00:00

-122.4205

37.7758

2

2022-01-01 11:00:00

-118.2437

34.0522

The query would produce the following result:

vehicle_id
prev_timestamp
timestamp
distance_traveled

1

2022-01-01 10:00:00

2022-01-01 12:00:00

1414.2126

2

NULL

2022-01-01 11:00:00

NULL

Real-World Applications:

This solution can be used for various real-world applications, such as:

  • Fleet Management: Tracking the total distance traveled by vehicles in a fleet helps monitor fuel consumption, maintenance schedules, and driver performance.

  • Ride-Sharing Services: Calculating the distance traveled for ride-sharing trips is used to determine fares and provide insights into traffic patterns.

  • Logistics and Supply Chain Management: Measuring the distance traveled by trucks or ships can help optimize routes, reduce transportation costs, and improve delivery times.


Rank Scores

Rank Scores

Problem:

Given a table of scores:

+-----+-------+
| ID   | Score |
+-----+-------+
| 1    | 90    |
| 2    | 80    |
| 3    | 70    |
| 4    | 60    |
| 5    | 50    |
+-----+-------+

Rank the scores in descending order, with ranks starting from 1 for the highest score.

Solution:

SELECT ID, Score,
       ROW_NUMBER() OVER (ORDER BY Score DESC) AS Rank
FROM Scores;

Breakdown:

  • ROW_NUMBER() OVER (ORDER BY Score DESC): This function assigns ranks to each row based on the Score column, starting from 1 for the highest score and incrementing for each subsequent row.

Example:

| ID   | Score | Rank |
+-----+-------+------+
| 1    | 90    | 1    |
| 2    | 80    | 2    |
| 3    | 70    | 3    |
| 4    | 60    | 4    |
| 5    | 50    | 5    |
+-----+-------+------+

Real-World Applications:

  • Ranking bidders in an auction based on their bids.

  • Sorting students in a class based on their test scores.

  • Identifying top-performing employees based on their sales figures.


Snaps Analysis

Problem: Find the average rating of movies released in a specific year.

SQL Query:

SELECT AVG(rating) AS avg_rating
FROM movies
WHERE YEAR(release_date) = 2020;

Breakdown:

  • SELECT AVG(rating) AS avg_rating: Calculates the average rating of movies.

  • FROM movies: Specifies the movies table to retrieve data from.

  • WHERE YEAR(release_date) = 2020: Filters the movies by release year, selecting only movies released in 2020.

Simplified Explanation:

We get the average rating of movies by adding up all the ratings and dividing by the total number of movies. We only include movies released in 2020 by filtering the results.

Real-World Application:

  • Movie Recommendation Systems: To determine the average rating of movies released in a particular year, helping users find popular movies.

  • Entertainment Industry Analysis: To track trends in movie ratings over time.

  • Marketing Campaigns: To gauge the success of movie releases by comparing their average rating to other movies released in the same year.


Sales Analysis I

Problem Statement:

Given two tables:

  • Sales (id, product_id, quantity, price)

  • Products (id, name, category)

Find the total sales amount for each category in the Products table.

Solution:

SELECT p.category, SUM(s.quantity * s.price) AS total_sales
FROM Sales s
JOIN Products p ON s.product_id = p.id
GROUP BY p.category;

Breakdown:

  1. JOIN: Join the Sales and Products tables on the product_id column to connect sales data with product categories.

  2. SUM(): Use SUM() to calculate the total sales for each category by multiplying the quantity sold by the price and summing the results for each row in the joined table.

  3. GROUP BY: Group the results by the category column to get the total sales for each category.

Explanation:

Imagine a grocery store that sells various products. The Sales table records the sales transactions, including the product ID, quantity sold, and price. The Products table contains the category of each product.

To find the total sales for each category, we need to:

  1. Link the sales data to the product categories using the product ID.

  2. Calculate the total sales for each product by multiplying the quantity sold by the price.

  3. Sum up the total sales for each category to get the final results.

Real-World Application:

This solution can be useful in various business scenarios:

  • Product Performance Analysis: Determine which product categories are performing well or need improvement.

  • Inventory Management: Identify categories with high or low sales to optimize inventory levels and reduce waste.

  • Marketing Campaign Evaluation: Analyze the effectiveness of marketing campaigns by comparing sales performance across different categories.

  • Customer Segmentation: Group customers based on the categories they purchase to tailor marketing efforts and product recommendations.


Bank Account Summary II

Problem Statement:

Given a table BankAccounts with the following schema:

| Column | Type |
|---|---|
| account_id | int |
| account_number | varchar(255) |
| account_balance | decimal(10, 2) |

Write a SQL query to summarize the account balances for each account holder, with the account number and account balance.

Solution:

SELECT
  account_number,
  SUM(account_balance) AS total_balance
FROM BankAccounts
GROUP BY account_number;

Explanation:

  1. SELECT account_number, SUM(account_balance) AS total_balance: This part of the query selects the account number and the sum of account balances for each account number. The SUM() function is used to add up the account balances for each account number. The resulting column is aliased as total_balance.

  2. FROM BankAccounts: This part specifies the table from which to retrieve the data. In this case, it is the BankAccounts table.

  3. GROUP BY account_number: This part groups the rows in the result set by the account number. This means that all rows with the same account number will be grouped together. The total_balance will be calculated for each group.

Example:

Consider the following data in the BankAccounts table:

| account_id | account_number | account_balance |
|---|---|---|
| 1 | 123456789 | 100.00 |
| 2 | 987654321 | 200.00 |
| 3 | 123456789 | 50.00 |
| 4 | 987654321 | 100.00 |

The SQL query will return the following result:

| account_number | total_balance |
|---|---|
| 123456789 | 150.00 |
| 987654321 | 300.00 |

Real-World Application:

This query can be used in various real-world applications, such as:

  • Bank reports: Banks can use this query to generate reports on account balances for customers.

  • Financial analysis: Financial analysts can use this query to analyze the distribution of account balances across different accounts.

  • Customer service: Customer service representatives can use this query to quickly determine the total balance of a customer's accounts.


Arrange Table by Gender

Problem:

Given a table with columns name and gender, arrange the rows by gender in ascending order (i.e., male, female).

Solution:

SQL Query:

SELECT *
FROM table_name
ORDER BY CASE
  WHEN gender = 'male' THEN 0
  WHEN gender = 'female' THEN 1
  ELSE 2  -- Handle any other gender values (optional)
END;

Breakdown:

  • SELECT * FROM table_name: Selects all columns from the specified table.

  • ORDER BY clause: Arranges the rows in ascending order based on the specified condition.

  • CASE expression:

    • The CASE expression evaluates the gender column and assigns a numeric value to each row:

      • 0 for rows with gender = 'male'

      • 1 for rows with gender = 'female'

      • 2 (or any other value) for rows with other gender values (if handling them is necessary)

  • The rows are then sorted in ascending order based on these numeric values, effectively grouping the rows by gender.

Example:

Input Table:

name
gender

John

male

Mary

female

Alice

female

Bob

male

Result:

name
gender

John

male

Bob

male

Mary

female

Alice

female

Real-World Applications:

  • Arranging employee data by gender for reporting or analysis

  • Displaying user information in a sorted manner based on gender in social networks or e-commerce websites

  • Aggregating data and generating statistics related to gender distribution


Status of Flight Tickets

Problem: Given a table of flight bookings, find the status of each flight, which can be "Cancelled", "Active", or "Completed".

Schema:

CREATE TABLE Flights (
  id INT PRIMARY KEY,
  start_date DATE,
  end_date DATE,
  status VARCHAR(255)
);

SQL Solution:

SELECT
  id,
  CASE
    WHEN start_date > CURRENT_DATE THEN 'Active'
    WHEN end_date < CURRENT_DATE THEN 'Completed'
    ELSE 'Cancelled'
  END AS status
FROM
  Flights;

Explanation:

The query uses a CASE expression to determine the status of each flight based on the current date:

  • If the start_date is greater than the CURRENT_DATE, the flight is considered "Active".

  • If the end_date is less than the CURRENT_DATE, the flight is considered "Completed".

  • If neither of the above conditions is met, the flight is considered "Cancelled".

Real-World Application:

This query can be used to provide real-time information about flight status to passengers or travel agents. It can also be used to track the performance of flights and identify patterns in cancellations or delays.

Additional Notes:

  • The query assumes that the start_date and end_date columns represent the dates when the flight is scheduled to depart and arrive, respectively.

  • The query can be modified to handle additional flight statuses, such as "Delayed" or "Rescheduled".

  • The query can be made more efficient by using an index on the start_date and end_date columns.


Human Traffic of Stadium

Problem:

Query the database to find information about human traffic at a stadium on a specific day.

SQL Implementation:

SELECT
  *
FROM HumanTraffic
WHERE
  Date = '2023-03-08';

Breakdown:

  • *SELECT : Selects all columns from the HumanTraffic table.

  • FROM HumanTraffic: Specifies the table to query from.

  • WHERE Date = '2023-03-08';: Filters the results to include only records where the Date column matches the specified date.

Simplification:

Explanation:

This query retrieves all the rows from the HumanTraffic table where the Date column is equal to the specified date. It can be used to analyze the human traffic at a stadium on a particular day, such as:

  • Total number of visitors: Count the number of rows in the result to get the total number of visitors who entered the stadium that day.

  • Peak and off-peak hours: Analyze the data to determine the hours when the stadium had the highest and lowest traffic.

  • Demographics: Use additional columns in the HumanTraffic table, such as Age and Gender, to understand the demographic profile of the visitors.

Real-World Applications:

This query can be used in the following real-world applications:

  • Stadium management: Optimize stadium operations and staffing based on expected traffic patterns.

  • Event planning: Plan events and promotions that attract the desired number and type of attendees.

  • City planning: Understand how human traffic impacts transportation and infrastructure in the area surrounding the stadium.


The Latest Login in 2020

Problem Statement:

Find the latest login timestamp for each user in the year 2020.

Solution:

SELECT user_id, MAX(login_timestamp) AS latest_login
FROM login_log
WHERE YEAR(login_timestamp) = 2020
GROUP BY user_id;

Breakdown:

  • SELECT user_id, MAX(login_timestamp) AS latest_login: Selects the user ID and the maximum login timestamp as the latest login.

  • FROM login_log: Specifies the table containing the login records.

  • WHERE YEAR(login_timestamp) = 2020: Filters the records to include only those in the year 2020.

  • GROUP BY user_id: Groups the records by user ID to find the latest login for each user.

Example:

Consider the following login_log table:

user_id
login_timestamp

1

2020-01-01 00:00:00

2

2020-01-02 00:00:00

1

2020-01-03 00:00:00

3

2020-01-04 00:00:00

2

2020-01-05 00:00:00

1

2020-01-06 00:00:00

The query would return the following result:

user_id
latest_login

1

2020-01-06 00:00:00

2

2020-01-05 00:00:00

3

2020-01-04 00:00:00

Real-World Applications:

  • User Activity Analysis: Identifying the most recent login for each user can help track user engagement and identify any potential issues with account access.

  • Fraud Detection: Comparing the latest login timestamp to known successful logins can help detect unauthorized access attempts.

  • Security Auditing: Tracking the latest login for critical accounts can help identify any suspicious activity and improve security.


Calculate Compressed Mean

Problem Statement:

Given a table of numbers, find the compressed mean of all the positive numbers. Compressed mean is defined as the mean of all positive numbers, rounded to the nearest integer.

Example:

  • Input Table:

| number |
| ------- |
| 1 |
| 3 |
| 5 |
| -2 |
| 4 |
  • Output:

| compressed_mean |
| --------------- |
| 3 |

Solution:

SELECT ROUND(AVG(number)) AS compressed_mean
FROM table_name
WHERE number > 0;

Explanation:

  1. Average: AVG(number) calculates the average of all the positive numbers in the table.

  2. Rounding: ROUND() rounds the average to the nearest integer. This is what "compressed" means in compressed mean.

Examples:

  • Input: (1, 3, 5, -2, 4)

  • Output: 3 (The average of 1, 3, 5, 4 is 3.33, rounded to 3)

  • Input: (2, 4, 6, 8, 10)

  • Output: 6 (The average of 2, 4, 6, 8, 10 is 6)

Real-World Applications:

Compressed mean can be useful in various scenarios:

  • Average Ratings: In online review systems, the compressed mean of ratings can provide a simplified and meaningful representation of the average rating.

  • Temperature Analysis: In weather data, the compressed mean temperature for a month can give a quick overview of the typical temperature range.

  • Health Metrics: In medical settings, compressed mean can be used to calculate the average blood sugar level or other health metrics that often fall within an integer range.


Patients With a Condition

Problem Statement:

Find all patients who have a specific condition.

SQL Query:

SELECT patient_id, condition_name
FROM PatientConditions
WHERE condition_name = 'Asthma';

Breakdown:

1. Table:

The problem mentions a table called PatientConditions, which contains the following columns:

  • patient_id: The unique identifier for each patient.

  • condition_name: The name of the condition that the patient has.

2. WHERE Clause:

The WHERE clause specifies which patients to retrieve. In this case, we want to find all patients who have the condition Asthma. The = operator checks for equality between the condition_name column and the value 'Asthma'.

3. Output:

The query selects two columns from the table:

  • patient_id: The unique identifier for each patient.

  • condition_name: The name of the condition that the patient has.

Simplified Explanation:

Imagine you have a table with records of all patients and their conditions. You want to find all the patients who have asthma.

1. Table:

It's like a big spreadsheet with rows for each patient and columns for different information, like their patient ID and the conditions they have.

2. WHERE Clause:

This is like a filter that you use to narrow down the list of patients. You're saying that you only want to see patients who have asthma.

3. Output:

The query will give you a list of all the patients who have asthma, along with their patient IDs.

Real-World Applications:

  • Identifying patients for clinical trials or studies focused on specific conditions.

  • Tracking the prevalence of different conditions within a population.

  • Providing information to patients about their health conditions and treatment options.


Calculate Orders Within Each Interval

Problem Statement:

Given a table Orders with the following schema:

Column
Type

order_id

integer

order_date

date

Calculate the number of orders within each interval of days (e.g., 7 days).

SQL Solution:

SELECT
    order_date,
    COUNT(*) AS order_count
FROM Orders
GROUP BY FLOOR((order_date - MIN(order_date)) / 7)

Example:

Suppose we have the following Orders table:

order_id

order_date

1

2022-01-01

2

2022-01-02

3

2022-01-03

4

2022-01-04

5

2022-01-07

6

2022-01-09

7

2022-01-10

The query result will be:

order_date

order_count

2022-01-01

4

2022-01-07

2

2022-01-10

1

Explanation:

  • FLOOR((order_date - MIN(order_date)) / 7): This expression calculates the interval number for each order date. It subtracts the minimum order date from each order date and then divides the result by 7 (the interval size). The FLOOR function rounds down to the nearest integer, assigning each order date to an interval.

  • GROUP BY: The GROUP BY clause groups the results by the interval number, effectively counting the number of orders within each interval.

Real-World Applications:

  • Sales Analysis: Tracking orders within intervals can help businesses analyze sales patterns and identify trends. For example, a restaurant may want to count the number of orders during each week to determine peak hours or adjust staffing accordingly.

  • Resource Planning: If orders require specific resources, such as raw materials or labor, businesses can use this information to plan their resource allocation based on the expected order volume within different intervals.

  • Marketing Campaigns: By understanding the distribution of orders over time, businesses can tailor their marketing campaigns to target specific intervals with higher order volumes.


Exchange Seats

Problem:

You have two tables:

  • Students (id, name)

  • Seats (id, student_id)

Each student can occupy at most one seat.

You want to exchange the seats of two students. Specifically, given two student IDs, student_a and student_b, you want to update the Seats table to reflect that student_a now occupies the seat that was occupied by student_b, and vice versa.

Best & Performant SQL Solution:

UPDATE Seats
SET student_id = CASE
  WHEN student_id = @student_a THEN @student_b
  WHEN student_id = @student_b THEN @student_a
  ELSE student_id
END
WHERE student_id IN (@student_a, @student_b);

Explanation:

This solution uses a single UPDATE statement with a CASE expression to conditionally update the student_id field in the Seats table. Specifically:

  • If the student_id field is equal to @student_a, it is updated to @student_b.

  • If the student_id field is equal to @student_b, it is updated to @student_a.

  • Otherwise, the student_id field remains unchanged.

The WHERE clause ensures that only the rows corresponding to the two students (@student_a and @student_b) are updated.

Real-World Applications:

This query could be used in a school system to manage student seating arrangements in classrooms. For example, a teacher may want to exchange the seats of two students who are causing distractions or who would benefit from sitting near each other.

Example:

Consider the following tables:

Students:
id  name
1   Alice
2   Bob

Seats:
id  student_id
1   1
2   2

To exchange the seats of Alice (student ID 1) and Bob (student ID 2), we would use the following query:

UPDATE Seats
SET student_id = CASE
  WHEN student_id = 1 THEN 2
  WHEN student_id = 2 THEN 1
  ELSE student_id
END
WHERE student_id IN (1, 2);

This would result in the following updated Seats table:

Seats:
id  student_id
1   2
2   1

Not Boring Movies

Problem Statement: Find all movies that are not boring.

SQL Query:

SELECT title
FROM Movies
WHERE rating > 3;

Explanation:

The SQL query uses the SELECT statement to retrieve the title column from the Movies table. The WHERE clause filters the results to only include movies where the rating column is greater than 3.

Breakdown:

  • SELECT title: This part of the query specifies the column that we want to retrieve from the table. In this case, we want to retrieve the title column.

  • FROM Movies: This part of the query specifies the table that we want to select the data from. In this case, we want to select the data from the Movies table.

  • WHERE rating > 3: This part of the query filters the results to only include rows where the rating column is greater than 3.

Real-World Example:

This query could be used by a website or app to show users a list of movies that are not boring. This could be useful for users who are looking for movies to watch that they will enjoy.

Potential Applications:

  • A website or app could use this query to create a list of recommended movies for users.

  • A movie streaming service could use this query to filter movies by their rating.


Median Employee Salary

Problem Statement:

Given a table called Employees with columns id, name, and salary, find the median salary of employees.

Solution:

1. Median Function:

Create a user-defined function to calculate the median value from a set of numbers. This function assumes the input values are sorted in ascending order.

CREATE FUNCTION median(arr NUMERIC[]) RETURNS NUMERIC AS $$
DECLARE
    arr_size INTEGER := array_length(arr, 1);
    ret NUMERIC;
BEGIN
    IF arr_size % 2 = 1 THEN
        ret := arr[arr_size / 2 + 1];
    ELSE
        ret := (arr[arr_size / 2] + arr[arr_size / 2 + 1]) / 2;
    END IF;
    RETURN ret;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

2. Calculate Running Salaries:

Create a window function to calculate the running total of salaries for employees in descending order of salary.

SELECT
    id,
    name,
    salary,
    SUM(salary) OVER (ORDER BY salary DESC) AS running_salary
FROM
    employees
ORDER BY
    salary DESC;

3. Calculate Median Salary:

Use the median function to calculate the median value from the running_salary values.

SELECT
    median(running_salary) AS median_salary
FROM
    (
        SELECT
            id,
            name,
            salary,
            SUM(salary) OVER (ORDER BY salary DESC) AS running_salary
        FROM
            employees
    ) AS running_salaries;

Example:

Consider the following table:

| id | name   | salary |
|-----|--------|--------|
| 1    | John    | 500     |
| 2    | Mary    | 1000    |
| 3    | Tom     | 1500    |
| 4    | Lisa    | 2000    |

Output:

| median_salary |
|---------------|
| 1500          |

Applications:

  • Finding the median income of a population

  • Determining the midpoint of a distribution

  • Comparing performance of different groups based on salary levels


Students With Invalid Departments

Problem Statement:

Find all students who are enrolled in departments that don't exist.

Database Schema:

CREATE TABLE Students (
  id INT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  department_id INT,
  FOREIGN KEY (department_id) REFERENCES Departments(id)
);

CREATE TABLE Departments (
  id INT PRIMARY KEY,
  name VARCHAR(255) NOT NULL
);

SQL Solution:

SELECT
  s.id,
  s.name,
  s.department_id
FROM
  Students AS s
LEFT JOIN
  Departments AS d
ON
  s.department_id = d.id
WHERE
  d.id IS NULL;

Breakdown and Explanation:

  • LEFT JOIN is used to include all rows from the Students table, even if they don't have a matching row in the Departments table.

  • The WHERE d.id IS NULL condition filters out students who have valid department IDs (i.e., those with matching rows in the Departments table).

Real-World Application:

This query can be used in a university database to identify students who have registered for courses in non-existent departments. This allows for quick identification and correction of any errors in student enrollment data.


Immediate Food Delivery III

Immediate Food Delivery III

Problem Statement

Given two tables:

  • Restaurants with columns restaurant_id and name

  • Orders with columns order_id, restaurant_id, customer_id, order_time

You are asked to find the number of orders each restaurant has received within the last 30 minutes.

Solution

SELECT
  r.name AS restaurant_name,
  COUNT(o.order_id) AS num_orders
FROM Restaurants AS r
JOIN Orders AS o
  ON r.restaurant_id = o.restaurant_id
WHERE
  o.order_time BETWEEN DATE_SUB(NOW(), INTERVAL 30 MINUTE) AND NOW()
GROUP BY
  r.name
ORDER BY
  num_orders DESC;

Explanation

This query uses the following steps:

  1. Join the Restaurants and Orders tables on the restaurant_id column.

  2. Filter the Orders table to include only orders that were placed within the last 30 minutes.

  3. Group the results by the restaurant name.

  4. Count the number of orders for each restaurant.

  5. Sort the results in descending order by the number of orders.

Real-World Applications

This query can be used to identify popular restaurants based on the number of orders they receive. This information can be used to make decisions about which restaurants to highlight on a food delivery app or to offer special promotions to.


Count Occurrences in Text

Sure, here is a detailed breakdown and explanation of the Count Occurrences in Text LeetCode problem, along with a simplified implementation in SQL:

Problem Statement

The Count Occurrences in Text problem asks you to find the number of occurrences of a given substring within a larger string. For example, if you have the string "Hello World" and you want to find the number of occurrences of the substring "el", you would get the result 2.

SQL Implementation

Here is a simplified SQL implementation of the Count Occurrences in Text problem:

SELECT COUNT(*)
FROM table_name
WHERE column_name LIKE '%substring%';

In this implementation, we use the LIKE operator to find all rows in the table_name table where the column_name column contains the substring we are looking for. The % wildcard character is used to represent any number of characters before or after the substring.

Real-World Applications

Counting occurrences in text has many real-world applications, including:

  • Search engines: Search engines use this technique to find the number of times a keyword appears on a web page.

  • Spam filters: Spam filters use this technique to identify emails that contain certain keywords or phrases.

  • Data analysis: Data analysts use this technique to identify patterns and trends in text data.

Potential Gotchas

One potential gotcha to be aware of when using this technique is that it can be slow for large datasets. If you are working with a large dataset, you may want to consider using a more efficient algorithm, such as the Boyer-Moore algorithm.

Conclusion

Counting occurrences in text is a common task in data science and has many real-world applications. The SQL implementation provided in this article is a simple and efficient way to perform this task.


Combine Two Tables

LeetCode Problem Statement:

Table: Orders

order_id
customer_id
order_date
product_id
quantity

1

10

2022-01-01

1

10

2

20

2022-01-02

2

5

3

10

2022-01-03

1

20

4

30

2022-01-04

3

15

Table: Customers

customer_id
customer_name

10

John

20

Mary

30

Bob

Goal:

Combine the Orders and Customers tables into a single table, mapping the customer_id column in both tables to connect them.

SQL Solution:

SELECT * FROM Orders o
INNER JOIN Customers c ON o.customer_id = c.customer_id;

Breakdown and Explanation:

Step 1: SELECT * FROM Orders o

This line retrieves all rows from the Orders table and aliases it as o.

Step 2: INNER JOIN Customers c ON o.customer_id = c.customer_id

This line joins the Orders table with the Customers table on the customer_id column, effectively mapping customers to their orders.

Result:

The query combines all the columns from both tables into a single result set, creating a unified view of orders and customer information.

Real-World Applications:

  • Order Management: Display customer details alongside orders for better order tracking and customer service.

  • Customer Analysis: Analyze customer behavior by combining order history with demographic data.

  • Marketing: Personalize marketing campaigns by targeting customers with specific product preferences.

  • Fraud Detection: Identify potential fraudulent activity by linking customer information to suspicious order patterns.

  • Inventory Management: Forecast demand and optimize inventory levels by tracking customer orders and preferences.


Average Salary: Departments VS Company

Problem Description

Given two tables, Salaries and Departments, calculate the average salary for each department and the company as a whole.

Tables:

  • Salaries:

    • emp_id: Employee ID

    • salary: Employee salary

  • Departments:

    • dept_id: Department ID

    • dept_name: Department name

SQL Query:

SELECT
  d.dept_name,
  AVG(s.salary) AS avg_salary
FROM Salaries AS s
JOIN Departments AS d
  ON s.emp_id = d.dept_id
GROUP BY
  d.dept_name
UNION ALL
SELECT
  'Company Average',
  AVG(s.salary)
FROM Salaries AS s;

Explanation:

  • The JOIN clause combines the Salaries and Departments tables on the common column emp_id.

  • The GROUP BY clause groups the results by department name and calculates the average salary for each department.

  • The UNION ALL operator combines the department-level results with the overall company average.

  • The SELECT clause retrieves the department name and the average salary for each department and the company as a whole.

Sample Data and Output:

Salaries

emp_id
salary

1

1000

2

1200

3

1500

4

1800

Departments

dept_id
dept_name

1

Sales

2

Marketing

3

Engineering

Output:

dept_name
avg_salary

Sales

1400

Marketing

1800

Engineering

1500

Company Average

1562.5

Real-World Application:

The query can be used by HR departments to analyze salary trends and compare compensation levels across different departments and the company as a whole. It can also be used to identify departments with lower-than-average salaries or to allocate compensation budgets more effectively.


Find Latest Salaries

Problem Statement:

Find the latest salaries for each employee.

Table Schema:

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  salary INT,
  start_date DATE
);

Sample Data:

INSERT INTO employees (id, name, salary, start_date) VALUES
(1, 'John Doe', 50000, '2020-01-01'),
(2, 'Jane Smith', 60000, '2020-03-01'),
(3, 'Mike Jones', 40000, '2020-05-01'),
(4, 'Mary Johnson', 55000, '2021-01-01'),
(5, 'Bob Johnson', 45000, '2021-03-01');

SOLUTION IN MYSQL:

SELECT id, name, MAX(salary) AS latest_salary
FROM employees
GROUP BY id, name;

Breakdown and Explanation:

  1. MAX(salary) AS latest_salary: This expression calculates the maximum salary for each employee. The MAX() function returns the highest salary for each group of employees. The AS latest_salary alias gives the result column a meaningful name.

  2. GROUP BY id, name: This clause groups the employees by their ID and name. This ensures that the MAX() function is applied separately to each employee, resulting in the latest salary for each individual.

  3. SELECT id, name, latest_salary: This clause selects the employee's ID, name, and latest salary from the result of the GROUP BY operation.

Real-World Applications:

This query can be used in various real-world applications, such as:

  • Finding the latest salary history of employees for payroll purposes

  • Identifying employees who have received the most recent salary increases

  • Analyzing salary trends within an organization


The Number of Seniors and Juniors to Join the Company

SELECT
    s.dept_name AS department,
    COUNT(j.job_id) AS juniors,
    COUNT(s.emp_id) AS seniors
FROM
    seniors s
LEFT JOIN
    juniors j ON s.senior_id = j.senior_id
GROUP BY
    s.dept_name
ORDER BY
    s.dept_name;

This SQL statement counts the number of seniors and juniors in each department. The seniors table contains the senior employees, and the juniors table contains the junior employees. The LEFT JOIN statement joins the two tables on the senior_id column, which is the foreign key in the juniors table that references the primary key in the seniors table.

The GROUP BY statement groups the results by department, and the COUNT function counts the number of senior and junior employees in each department. The ORDER BY statement orders the results by department name.

Here is an example of the output of this SQL statement:

+-----------+---------+---------+
| department | juniors | seniors |
+-----------+---------+---------+
| Engineering | 3       | 2       |
| Marketing  | 2       | 1       |
| Sales      | 1       | 3       |
+-----------+---------+---------+

This output shows that the Engineering department has 3 junior employees and 2 senior employees, the Marketing department has 2 junior employees and 1 senior employee, and the Sales department has 1 junior employee and 3 senior employees.

This SQL statement can be used to analyze the distribution of senior and junior employees across different departments. This information can be used to make decisions about hiring, training, and development programs.


Report Contiguous Dates

Problem: Find contiguous date ranges in a table of dates.

SQL Query:

WITH DateRanges AS (
    SELECT dt, LEAD(dt) OVER (ORDER BY dt) AS next_dt
    FROM dates
)

SELECT dt AS start_date, COALESCE(next_dt, MAX(dt)) AS end_date
FROM DateRanges
WHERE next_dt IS NULL OR next_dt - dt > 1
GROUP BY start_date
ORDER BY start_date;

Breakdown:

  1. Create a table of consecutive date ranges:

    • The DateRanges Common Table Expression (CTE) calculates the next date for each date in the dates table.

  2. Identify contiguous date ranges:

    • The WHERE clause checks if the next date is NULL (indicating the end of a contiguous range) or if the difference between the current and next date is greater than 1 (indicating a gap).

  3. Group by start date:

    • The GROUP BY clause groups the contiguous date ranges by their start date.

  4. Calculate end date:

    • The COALESCE function assigns the next_dt as the end date if it's not NULL. Otherwise, it assigns the maximum date in the dates table.

Example:

-- Sample data
CREATE TABLE dates (dt DATE);
INSERT INTO dates VALUES ('2023-01-01'), ('2023-01-02'), ('2023-01-04'), ('2023-01-05'), ('2023-01-07');

-- Execute query
SELECT * FROM DateRanges;

Output:

start_date  end_date
2023-01-01  2023-01-02
2023-01-04  2023-01-05
2023-01-07  2023-01-07

Real-World Applications:

  • Calculating sales figures for specific date ranges.

  • Identifying periods of activity or inactivity in a process.

  • Analyzing user visits to a website over time.


Article Views II

Problem Description:

Given a table articles with columns article_id, author_id, title, view_count, and last_viewed_at, you need to find the top view count for each author along with their average view count.

SQL Query:

SELECT
    author_id,
    MAX(view_count) AS max_view,
    AVG(view_count) AS avg_view
FROM
    articles
GROUP BY
    author_id;

Breakdown and Explanation:

  • SELECT: Specifies the columns to be retrieved:

    • author_id: The ID of the author

    • MAX(view_count): The maximum view count for each author

    • AVG(view_count): The average view count for each author

  • FROM: Specifies the table to be used: articles

  • GROUP BY: Groups the rows by author_id. This means that the results will be grouped together for each unique author.

  • HAVING: (Optional) Can be used to filter the results based on the groupings. In this case, it is not used.

Real-World Application:

This query can be used in a website or analytics dashboard to track the performance of authors and identify those with the highest engagement. It can also help in optimizing content strategies and promoting popular articles.


Change Null Values in a Table to the Previous Value

Problem: You have a table with a column that contains null values. You want to change these null values to the previous value in the column.

Best & Performant Solution:

UPDATE table_name
SET column_name = (
    SELECT column_name
    FROM table_name
    WHERE row_id < table_name.row_id
    ORDER BY row_id DESC
    LIMIT 1
)
WHERE column_name IS NULL;

Explanation (Simplified):

This solution uses a subquery to find the previous value for each row that has a null value.

  1. The subquery selects the column_name value from the same table (table_name) but only for rows where the row_id is less than the current row's row_id. This ensures that we find the previous value.

  2. The subquery is sorted in descending order by row_id to get the most recent previous value.

  3. LIMIT 1 is used to select only the first row in the sorted result, which is the immediate previous value.

  4. The main UPDATE query sets the column_name to the value obtained from the subquery for all rows where column_name is null.

Real-World Implementation:

Consider a table named Sales with the following data:

| row_id | product_sales |
|---------|----------------|
| 1       | 100           |
| 2       | 200           |
| 3       | NULL           |
| 4       | 400           |
| 5       | 500           |

Running the following query will fill the null value in row 3 with the previous value 200:

UPDATE Sales
SET product_sales = (
    SELECT product_sales
    FROM Sales
    WHERE row_id < Sales.row_id
    ORDER BY row_id DESC
    LIMIT 1
)
WHERE product_sales IS NULL;

After running the query, the Sales table will look like this:

| row_id | product_sales |
|---------|----------------|
| 1       | 100           |
| 2       | 200           |
| 3       | 200           |
| 4       | 400           |
| 5       | 500           |

Potential Applications:

This solution can be used in various real-world applications, such as:

  • Filling missing data in time-series data, where the values are expected to be consecutive.

  • Creating interpolated data for estimation or prediction tasks.

  • Imputing missing values in datasets for data analysis and modeling.

  • Maintaining data integrity by ensuring that columns with sequential or auto-incrementing values do not contain gaps or inconsistencies.


Sellers With No Sales

Problem: Find all sellers who have not made any sales.

SQL Query:

SELECT
  seller_id
FROM
  Sellers
EXCEPT
SELECT
  DISTINCT seller_id
FROM
  Sales;

Explanation:

  1. The EXCEPT operator is used to find rows in the Sellers table that are not in the Sales table.

  2. The DISTINCT keyword is used to ensure that we only get unique seller IDs in the Sales subquery.

Example:

Consider the following tables:

Sellers Table:

seller_id
name

1

John Smith

2

Jane Doe

3

Michael Jones

4

Mary Brown

Sales Table:

sale_id
seller_id
product_id

1

1

100

2

2

200

3

3

300

Result:

| seller_id | |---|---| | 4 |

Mary Brown (seller ID 4) is the only seller who has not made any sales.

Real-World Application:

This query can be used to identify inactive sellers who may not be generating revenue for the company. The company can then take steps to address this issue, such as providing additional training or incentives to these sellers.


Find Active Users

Problem: Find Active Users

SQL:

-- Table: Users
-- Columns:
--   id         INT UNSIGNED NOT NULL,
--   name       VARCHAR(255) NOT NULL,
--   last_active TIMESTAMP NOT NULL,
--   PRIMARY KEY (id)
--
-- Table: Posts
-- Columns:
--   id         INT UNSIGNED NOT NULL,
--   author_id  INT UNSIGNED NOT NULL,
--   title      VARCHAR(255) NOT NULL,
--   created_at TIMESTAMP NOT NULL,
--   PRIMARY KEY (id)
--
SELECT DISTINCT name
FROM Users
WHERE last_active >= DATE_SUB(NOW(), INTERVAL 30 DAY)
ORDER BY last_active DESC;

Breakdown:

  • SELECT DISTINCT name: Selects the distinct names of active users.

  • FROM Users: Specifies the Users table to query.

  • WHERE last_active >= DATE_SUB(NOW(), INTERVAL 30 DAY): Filters users who have been active within the last 30 days.

  • ORDER BY last_active DESC: Orders the results in descending order of last activity.

How it Works:

  1. The query starts by selecting the distinct name column from the Users table.

  2. It then applies a filter to include only users whose last_active timestamp is greater than or equal to 30 days ago. This identifies users who have been active recently.

  3. Finally, the query orders the results in descending order of last_active to display the most recently active users first.

Example:

Consider the following tables:

Users:

id
name
last_active

1

John

2023-03-08 18:00:00

2

Mary

2023-02-15 12:00:00

3

Bob

2023-04-01 15:00:00

Posts:

id
author_id
title
created_at

1

1

Post 1

2023-03-09 10:00:00

2

3

Post 2

2023-04-02 13:00:00

3

2

Post 3

2023-02-16 14:00:00

Executing the query will return the following result:

name
John
Bob

Real-World Applications:

  • Marketing and customer engagement: Identifying active users can help businesses target their marketing campaigns and engage with customers who are actively interacting with their products or services.

  • Product development: Analyzing user activity can provide insights into which features are being used the most and identify areas for improvement.

  • Security and fraud detection: Monitoring user activity can help identify suspicious patterns or detect potential threats.


Rolling Average Steps

Problem:

Implement a rolling average query that calculates the average number of steps taken by a user over a specified number of days.

SQL Query:

WITH RollingAverage AS (
    SELECT
        user_id,
        date,
        steps,
        SUM(steps) OVER (ORDER BY date ROWS BETWEEN 6 AND 6 PRECEDING) AS rolling_average
    FROM steps_table
)
SELECT
    user_id,
    date,
    rolling_average
FROM RollingAverage
WHERE rolling_average IS NOT NULL;

Explanation:

Common Table Expression (CTE):

  • The WITH clause creates a Common Table Expression (CTE) called RollingAverage.

Window Function:

  • The window function SUM(steps) OVER (ORDER BY date ROWS BETWEEN 6 AND 6 PRECEDING) calculates the sum of steps over the previous 6 days for each row.

  • ROWS BETWEEN 6 AND 6 PRECEDING indicates that the window frame extends from 6 rows before the current row.

Final Query:

  • The final query selects the user_id, date, and rolling_average from the RollingAverage CTE.

  • It filters out rows where rolling_average is null (i.e., there were less than 6 days of data available for a given row).

Real-World Application:

This query can be used in a fitness tracking app to:

  • Calculate a user's average daily steps over a specified period.

  • Track the user's progress and identify trends in their activity levels.

  • Provide personalized recommendations for exercise goals.


Weather Type in Each Country

Problem Statement:

Given two tables:

Country (country_name, country_code)
Weather (country_code, weather_type, measurement_value)

Write a SQL query to find the weather type for each country.

Solution:

SELECT
  c.country_name,
  w.weather_type
FROM
  Country c
INNER JOIN
  Weather w
ON
  c.country_code = w.country_code;

Explanation:

  1. Country Table: Stores the country names and their corresponding country codes.

  2. Weather Table: Stores the weather types (e.g., "Sunny", "Rainy", "Snowy") and their corresponding measurement values for different countries.

  3. The INNER JOIN operation combines the two tables based on the common column country_code. This ensures that only rows where the country code matches are included in the result.

  4. The query retrieves the country_name and weather_type columns from the joined table, providing a list of weather types for each country.

Example:

Country Table:

| country_name | country_code |
|---|---|
| United States | US |
| France | FR |
| Australia | AU |

Weather Table:

| country_code | weather_type | measurement_value |
|---|---|---|
| US | Sunny | 75 |
| US | Rainy | 60 |
| FR | Cloudy | 55 |
| AU | Sunny | 80 |

Result:

country_name
weather_type

United States

Sunny

United States

Rainy

France

Cloudy

Australia

Sunny

Real-World Applications:

This query can be used for various purposes, such as:

  • Displaying weather forecasts on websites or mobile apps.

  • Analyzing weather patterns and trends across different countries.

  • Providing customized weather alerts based on location.

  • Conducting research on climate change and its impact on different regions.


Products With Three or More Orders in Two Consecutive Years

Problem Statement: Find products that have received at least three orders in two consecutive years (e.g., 2021 and 2022).

Solution:

/* Find the total orders for each product in each year */
WITH ProductYearOrderCount AS (
    SELECT
        product_id,
        YEAR(order_date) AS order_year,
        COUNT(*) AS order_count
    FROM
        orders
    GROUP BY
        product_id,
        order_year
),

/* Find products with at least 3 orders in two consecutive years */
ProductConsecutiveThreeOrders AS (
    SELECT
        product_id
    FROM
        ProductYearOrderCount
    /* Group by product_id to count consecutive years with at least 3 orders */
    GROUP BY
        product_id
    HAVING
        COUNT(DISTINCT order_year) >= 2
        AND SUM(order_count >= 3) >= 2
        AND MAX(order_year) - MIN(order_year) <= 1
)

SELECT
    p.product_name
FROM
    products p
JOIN
    ProductConsecutiveThreeOrders c
ON
    p.product_id = c.product_id;

Breakdown:

  1. ProductYearOrderCount: This subquery counts the orders for each product in each year.

  2. ProductConsecutiveThreeOrders: This subquery identifies products with at least three orders in two consecutive years by:

    • Grouping the products by ID and counting the number of distinct order years.

    • Checking that the number of years with at least three orders is at least two.

    • Ensuring that the difference between the maximum and minimum order year is at most one (indicating consecutive years).

  3. Final Query: This query joins the original products table with the ProductConsecutiveThreeOrders subquery to retrieve the product names of those that meet the criteria.

Example:

CREATE TABLE products (
    product_id INT,
    product_name VARCHAR(255)
);

CREATE TABLE orders (
    order_id INT,
    product_id INT,
    order_date DATE
);

INSERT INTO products (product_id, product_name) VALUES
(1, 'Product 1'),
(2, 'Product 2'),
(3, 'Product 3');

INSERT INTO orders (order_id, product_id, order_date) VALUES
(1, 1, '2021-01-01'),
(2, 1, '2021-03-01'),
(3, 1, '2021-05-01'),
(4, 2, '2021-02-01'),
(5, 2, '2021-04-01'),
(6, 3, '2021-06-01'),
(7, 1, '2022-02-01'),
(8, 1, '2022-04-01'),
(9, 2, '2022-03-01'),
(10, 2, '2022-05-01'),
(11, 3, '2022-07-01');

-- Execute the final query
SELECT
    p.product_name
FROM
    products p
JOIN
    ProductConsecutiveThreeOrders c
ON
    p.product_id = c.product_id;

Result:

+--------------+
| product_name |
+--------------+
| Product 1    |
| Product 2    |
+--------------+

Real-World Applications:

  • Identifying popular products with sustained demand for targeted advertising or inventory management.

  • Analyzing customer behavior to determine if specific products are seasonal or have consistent sales throughout the year.

  • Tracking product performance over time to identify potential issues or growth opportunities.


Number of Unique Subjects Taught by Each Teacher

Problem:

Number of Unique Subjects Taught by Each Teacher

Given two tables:

  • Teachers (id, name)

  • Subjects (id, name)

And a join table:

  • TeacherSubject (teacher_id, subject_id)

Write a SQL query to find the number of unique subjects taught by each teacher.

Solution:

SELECT
  t.name AS TeacherName,
  COUNT(DISTINCT s.name) AS NumberOfUniqueSubjects
FROM Teachers AS t
JOIN TeacherSubject AS ts
  ON t.id = ts.teacher_id
JOIN Subjects AS s
  ON ts.subject_id = s.id
GROUP BY
  t.name
ORDER BY
  NumberOfUniqueSubjects DESC;

Explanation:

  1. JOIN Tables: We join the Teachers, TeacherSubject, and Subjects tables using the teacher_id and subject_id fields to link them.

  2. COUNT DISTINCT Subjects: For each row in the TeacherSubject table, we count the number of unique subjects taught by the teacher using COUNT(DISTINCT s.name).

  3. GROUP BY Teacher Name: We group the results by the teacher's name to count the unique subjects for each teacher.

  4. ORDER BY Number of Subjects: Finally, we sort the results in descending order of the number of unique subjects taught by each teacher.

Result:

The query returns a table with the teacher's name and the number of unique subjects they teach.

Real-World Application:

This query can be used in a school or university system to track the number of subjects taught by each teacher. This information can be used for administrative purposes, such as assigning teachers to classes or evaluating their workload.


Market Analysis II

Problem:

Find the top k products with the highest total revenue in a given period.

SQL Query:

SELECT product_id, SUM(quantity * unit_price) AS total_revenue
FROM sales_data
WHERE sales_date BETWEEN '2023-01-01' AND '2023-01-31'
GROUP BY product_id
ORDER BY total_revenue DESC
LIMIT k;

Breakdown:

SELECT product_id, SUM(quantity * unit_price) AS total_revenue: Selects the product ID and calculates the total revenue for each product by summing the product of quantity sold and unit price for each sale within the specified period.

FROM sales_data: Specifies the sales data table to query.

WHERE sales_date BETWEEN '2023-01-01' AND '2023-01-31': Filters the data to include sales only within the specified period.

GROUP BY product_id: Groups the results by product ID to calculate the total revenue for each product.

ORDER BY total_revenue DESC: Orders the results in descending order of total revenue.

LIMIT k: Limits the results to the top k products with the highest total revenue.

Real-World Applications:

  • Analyzing sales data to identify top-selling products

  • Identifying which products are driving the most revenue

  • Making informed decisions about product inventory and marketing strategies

  • Tracking customer purchasing trends


Managers with at Least 5 Direct Reports

Problem Statement: Find all managers who have at least 5 direct reports.

SQL Query:

SELECT ManagerID, COUNT(*) AS DirectReports
FROM EmployeeTable
GROUP BY ManagerID
HAVING COUNT(*) >= 5;

Explanation:

  • The EmployeeTable table contains columns EmployeeID and ManagerID.

  • The COUNT(*) function returns the number of rows in a group.

  • The GROUP BY clause groups the results by ManagerID.

  • The HAVING clause filters the results to include only managers with at least 5 direct reports.

Simplified Explanation:

  1. We count the number of direct reports for each manager (COUNT(*) AS DirectReports).

  2. We group the results by manager (GROUP BY ManagerID).

  3. We filter out managers with less than 5 direct reports (HAVING COUNT(*) >= 5).

Real World Applications:

  • Identifying top performers in a management team.

  • Assessing the efficiency of managers in managing large teams.

  • Planning for succession planning by identifying potential replacements for managers with a high number of direct reports.

Example:

| ManagerID | DirectReports |
|---|---|
| 1 | 5 |
| 2 | 7 |
| 3 | 3 |
| 4 | 6 |

In this example, managers with IDs 2 and 4 have at least 5 direct reports and would be selected by the query.


Manager of the Largest Department

Problem: Find the manager who manages the largest number of employees.

SQL Query:

SELECT ManagerID, COUNT(*) AS NumEmployees
FROM EmployeeTable
GROUP BY ManagerID
ORDER BY NumEmployees DESC
LIMIT 1;

Explanation:

  1. SELECT ManagerID, COUNT(*) AS NumEmployees: This line selects the manager ID and the number of employees managed by that manager. The COUNT(*) function returns the number of rows in a group, which in this case is the number of employees managed by each manager.

  2. FROM EmployeeTable: This line specifies the table from which the data is retrieved. In this case, it is the EmployeeTable.

  3. GROUP BY ManagerID: This line groups the results by the manager ID. This means that all the employees managed by a particular manager are grouped together.

  4. ORDER BY NumEmployees DESC: This line orders the results in descending order of the number of employees managed. This means that the manager with the largest number of employees appears first in the results.

  5. LIMIT 1: This line limits the results to only the top row. This means that only the manager with the largest number of employees is returned.

Example:

Consider the following EmployeeTable:

EmployeeID
ManagerID

1

2

2

3

3

1

4

2

5

3

The following SQL query would return the manager with the largest number of employees:

SELECT ManagerID, COUNT(*) AS NumEmployees
FROM EmployeeTable
GROUP BY ManagerID
ORDER BY NumEmployees DESC
LIMIT 1;

The results of this query would be:

ManagerID
NumEmployees

2

2

This indicates that manager 2 manages the largest number of employees (2 employees).

Real-World Applications:

This query can be useful in a variety of real-world scenarios, such as:

  • Identifying managers who are responsible for a large number of employees.

  • Analyzing the distribution of employees across different managers.

  • Making decisions about promotions and job assignments.


Class Performance

Problem:

Given a table student_performance with the following columns:

  • student_id (integer)

  • subject_id (integer)

  • score (float)

  • test_time (timestamp)

Find the student with the highest average score in each subject.

SQL Query:

-- Step 1: Calculate the average score for each student in each subject
SELECT
    student_id,
    subject_id,
    AVG(score) AS avg_score
FROM
    student_performance
GROUP BY
    student_id,
    subject_id;

-- Step 2: Find the highest average score for each subject
WITH SubjectAvg AS (
    SELECT
        subject_id,
        MAX(avg_score) AS max_avg_score
    FROM
        student_avg
)
-- Step 3: Find the students with the highest average score in each subject
SELECT
    student_id,
    subject_id
FROM
    student_avg
JOIN
    SubjectAvg ON student_avg.subject_id = SubjectAvg.subject_id
    AND student_avg.avg_score = SubjectAvg.max_avg_score;

Explanation:

  • Step 1: Calculate the average score for each student in each subject using GROUP BY and AVG.

  • Step 2: Find the highest average score for each subject using MAX in a Common Table Expression (CTE) named SubjectAvg.

  • Step 3: Join SubjectAvg with student_avg to find the students with the highest average score in each subject.

Breakdown:

GROUP BY: Groups the rows in a table based on the specified columns. In this case, we group the rows by student_id and subject_id to calculate the average score for each student in each subject.

AVG: Calculates the average value of a numeric column for each group.

Common Table Expressions (CTEs): Temporary tables that can be used in a query. In this case, SubjectAvg is used to store the highest average score for each subject.

MAX: Finds the maximum value of a numeric column.

JOIN: Combines rows from two or more tables based on a common column. In this case, we join SubjectAvg and student_avg on subject_id to find the students with the highest average score in each subject.

Real-World Applications:

  • Identifying top performers in a school or organization

  • Analyzing student performance trends over time

  • Evaluating the effectiveness of different learning methods


Find the Quiet Students in All Exams

Problem Statement:

Find the students who never participated (scored zero) in all the exams.

Schema:

CREATE TABLE Exam (
  StudentId INT PRIMARY KEY,
  ExamId INT,
  Score INT DEFAULT 0
);

Solution:

SELECT StudentId
FROM Exam
GROUP BY StudentId
HAVING SUM(Score) = 0;

Explanation:

  1. SELECT StudentId: This selects the unique student IDs from the Exam table.

  2. GROUP BY StudentId: This groups the results by student ID, combining all the exam scores for each student.

  3. HAVING SUM(Score) = 0: This filter only returns the student IDs where the sum of all exam scores is zero, indicating that the student never scored non-zero in any exam.

Example Usage:

SELECT StudentId, SUM(Score) AS TotalScore
FROM Exam
GROUP BY StudentId
HAVING SUM(Score) = 0;

Output:

+------------+------------+
| StudentId  | TotalScore |
+------------+------------+
| 1          | 0          |
| 3          | 0          |
+------------+------------+

This output shows that students with IDs 1 and 3 have never scored non-zero in any exam.

Real-World Applications:

  • Identifying students who need academic support: The results of this query can help identify students who may need additional attention or support in their studies.

  • Evaluating student participation: It can be used to assess student engagement and participation in class activities.

  • Attendance tracking: By considering exams as attendance records, this query can identify students who have not participated (attended) any exams.


Fix Product Name Format

Problem: Given a table products with the following schema:

CREATE TABLE products (
  product_id INT NOT NULL,
  product_name VARCHAR(255) NOT NULL,
  PRIMARY KEY (product_id)
);

Convert product names to lowercase and remove special characters. For example, "iPhone 13 Pro" should become "iphone 13 pro".

Solution:

UPDATE products
SET product_name = LOWER(
  REGEXP_REPLACE(product_name, '[^a-zA-Z0-9 ]', '')
);

Explanation:

The solution uses two MySQL functions:

  • LOWER() converts the product name to lowercase.

  • REGEXP_REPLACE() removes all non-alphanumeric characters ([^a-zA-Z0-9 ]) from the product name.

The resulting product name is then stored in the product_name column.

Example:

INSERT INTO products (product_id, product_name) VALUES
(1, 'iPhone 13 Pro'),
(2, 'Samsung Galaxy S22 Ultra'),
(3, 'Google Pixel 6 Pro');

UPDATE products
SET product_name = LOWER(
  REGEXP_REPLACE(product_name, '[^a-zA-Z0-9 ]', '')
);

SELECT * FROM products;

Output:

+-----------+-----------------+
| product_id | product_name     |
+-----------+-----------------+
| 1          | iphone 13 pro    |
| 2          | samsung galaxy s22 ultra |
| 3          | google pixel 6 pro |
+-----------+-----------------+

Real-World Applications:

Fixing product names in this way can improve search results and data analysis. For example:

  • A search for "iphone" will now match products with the name "iPhone" or "iphone".

  • Analysis of product sales by category will be more accurate because all "iPhone" products will be grouped together.


Number of Times a Driver Was a Passenger

Problem Statement:

Given two tables:

  • drivers: Contains information about drivers.

  • passengers: Contains information about passengers.

Find the number of times each driver has been a passenger.

SQL Solution:

SELECT d.id, d.name, COUNT(*) AS passenger_count
FROM drivers AS d
JOIN passengers AS p ON d.id = p.driver_id
GROUP BY d.id, d.name;

Breakdown and Explanation:

  • JOIN Operation:

    • We use the JOIN operation to combine rows from the drivers and passengers tables based on the common id column.

    • This creates a new table that contains all drivers and their corresponding passenger records.

  • GROUP BY Operation:

    • The GROUP BY operation groups the results by the driver's id and name columns.

    • This combines all passenger records for each driver into a single row.

  • COUNT(*) Function:

    • The COUNT(*) function counts the number of passenger records for each driver.

Real-World Applications:

  • Taxi Service:

    • Track the number of rides each taxi driver has completed.

  • Ride-Sharing Service:

    • Determine which drivers have the highest passenger demand.

  • Bus Service:

    • Monitor the number of passengers carried by each bus driver.


Popularity Percentage

Problem Statement:

Given a table containing a list of votes, find the percentage of votes each candidate received.

Table Structure:

CREATE TABLE votes (
  candidate TEXT,
  num_votes INTEGER
);

SQL Solution:

SELECT
  candidate,
  (num_votes * 100.0 / SUM(num_votes)) AS percentage
FROM
  votes
GROUP BY
  candidate
ORDER BY
  percentage DESC;

Explanation:

  • Window Function SUM(): Calculates the total number of votes by summing the num_votes column. This provides a benchmark for calculating percentages.

  • Division by SUM(): Divides each candidate's vote count by the total votes to calculate their percentage.

  • ORDER BY: Sorts the results in descending order of percentage, showing the candidates who received the most votes at the top.

Real-World Applications:

  • Election Analysis: Determine the popularity of candidates in an election.

  • Market Research: Track the popularity of products or services.

  • Social Media Analytics: Analyze the engagement of different posts or content creators.

  • Customer Feedback: Gauge the satisfaction levels of customers.


Find the Start and End Number of Continuous Ranges

Problem Statement:

Given a table ranges containing start and end numbers of a range, find the start and end numbers of all continuous ranges.

Example:

Input:

start
end

1

5

6

9

10

13

14

16

Output:

start
end

1

5

6

9

10

13

14

16

Solution Explanation:

The idea is to use the LAG() function to get the previous range end, and then check if the current range start is less than or equal to the previous range end. If it is, then the ranges are continuous.

SQL Query:

SELECT MIN(start) AS start, MAX(end) AS end
FROM
(
    SELECT *, LAG(end) OVER (ORDER BY start) AS prev_end
    FROM ranges
) AS subquery
WHERE start <= prev_end
GROUP BY prev_end
ORDER BY start;

Breakdown of the Query:

  1. Get Previous Range End: The subquery SELECT *, LAG(end) OVER (ORDER BY start) AS prev_end FROM ranges adds a column prev_end to the ranges table. This column contains the end number of the previous range for each row.

  2. Filter Continuous Ranges: The WHERE start <= prev_end clause filters out any rows where the current range start is not less than or equal to the previous range end. This ensures that only continuous ranges are included.

  3. Group Ranges: The GROUP BY prev_end clause groups the rows by the prev_end column. This merges all continuous ranges with the same previous range end into a single row.

  4. Get Range Start and End: The MIN(start) AS start and MAX(end) AS end expressions calculate the start and end numbers of each continuous range, respectively.

  5. Order Results: The ORDER BY start clause orders the results in ascending order by the range start.

Real-World Applications:

This query can be used in various real-world applications, such as:

  • Time Management: Finding continuous time slots within a schedule.

  • Inventory Management: Identifying continuous stock levels that need to be replenished.

  • Data Analysis: Identifying patterns and trends in continuous data.


Average Selling Price

Problem Statement:

Given two tables, Orders and Products, find the average selling price of products.

Tables:

Orders (order_id, product_id, quantity)

Products (product_id, price)

Solution:

SELECT
  AVG(p.price) AS average_selling_price
FROM
  Orders AS o
JOIN
  Products AS p
  ON o.product_id = p.product_id;

Breakdown:

  • Step 1: Join the Tables: We join the Orders and Products tables on the product_id column to connect orders with product prices.

  • Step 2: Calculate Average Price: We use the AVG() function on the price column from the Products table to calculate the average selling price.

Simplified Explanation:

Imagine a store that sells two products:

  • Product 1: $10

  • Product 2: $20

A customer orders 2 units of Product 1 and 1 unit of Product 2. So, the customer pays:

  • Product 1: $10 x 2 = $20

  • Product 2: $20 x 1 = $20

Total amount paid: $40 Number of products sold: 3 (2 + 1)

Average selling price = $40 / 3 = $13.33

Real-World Application:

  • Retail: Calculate the average selling price of products in a retail store.

  • Manufacturing: Determine the average selling price of finished goods.

  • Financial Analysis: Analyze the profitability of products by comparing their average selling price with their production costs.


Friends With No Mutual Friends

Problem: Find pairs of friends who do not have any mutual friends.

Example:

Table: Friend

| Person1 | Person2 |
| --------| --------|
| A       | B       |
| B       | C       |

In this example, A and C are friends with no mutual friends.

Solution:

SELECT f1.Person1, f1.Person2, f2.Person1, f2.Person2
FROM Friend f1
JOIN Friend f2 ON f1.Person1 = f2.Person2
WHERE NOT f1.Person1 = f2.Person1
AND NOT f1.Person2 = f2.Person2
AND NOT EXISTS (
    SELECT 1
    FROM Friend
    WHERE Person1 = f1.Person1
    AND Person2 = f2.Person1
);

Breakdown:

  • The JOIN clause joins the Friend table with itself using the Person1 and Person2 columns. This creates pairs of rows where one person is the Person1 of one row and the Person2 of the other row.

  • The WHERE clause removes pairs of rows where the Person1 and Person2 columns are the same. This ensures that we only consider pairs of different people.

  • The NOT EXISTS subquery removes pairs of rows where there exists a mutual friend.

Output:

| Person1 | Person2 | Person1 | Person2 |
| --------| --------| --------| --------|
| A       | B       | C       | D       |

Real-World Application:

This query can be used to identify pairs of users in a social network who may be potential friends. By identifying users who do not have any mutual friends, we can recommend them as potential connections.


Finding the Topic of Each Post

Problem Statement:

Write a SQL query to find the topic of each post in a forum. Posts can belong to multiple topics.

Example Table:

post_id
post_content
topic_id

1

Hello world!

1

2

This is a post

2

3

About Python

1

4

SQL query

3

5

Java tutorial

2

Answer Table:

post_id
topic

1

General

2

Programming

3

Python

4

SQL

5

Java

Explanation:

To find the topic of each post, we need to join the Posts and Topics tables on the topic_id column. Then, we can use the GROUP_CONCAT() function to concatenate the topic names for each post.

Real-World Applications:

This query can be used to find the topics that are most commonly discussed in a forum. This information can be used to improve the forum's organization and make it easier for users to find the content they are interested in.

Simplified Implementation:

SELECT
  post_id,
  GROUP_CONCAT(topic_name) AS topic
FROM Posts
INNER JOIN Topics
  ON Posts.topic_id = Topics.topic_id
GROUP BY
  post_id;

Symmetric Coordinates

Problem:

Given a table coordinates_table with columns x, y, and z, find all pairs of points that lie symmetrically across the origin.

SQL Solution:

WITH SymmetricPoints AS (
  SELECT
    x AS x1,
    y AS y1,
    z AS z1,
    -x AS x2,
    -y AS y2,
    -z AS z2
  FROM coordinates_table
)
SELECT
  x1,
  y1,
  z1,
  x2,
  y2,
  z2
FROM SymmetricPoints;

Explanation:

This problem can be solved by creating a new table SymmetricPoints that contains the original coordinates and their symmetric counterparts. The - operator can be used to negate the original coordinates and create the symmetric points. The WITH clause allows us to create a temporary table that can be used in the main query. The final SELECT statement returns the pairs of symmetric points.

Example:

Consider the following table:

+---+---+---+
| x | y | z |
+---+---+---+
| 1 | 2 | 3 |
| -5 | 4 | -2 |
| 7 | 0 | 1 |
+---+---+---+

The SymmetricPoints table would look like this:

+----+----+----+----+----+----+
| x1 | y1 | z1 | x2 | y2 | z2 |
+----+----+----+----+----+----+
| 1  | 2  | 3  | -1 | -2 | -3 |
| -5 | 4  | -2 | 5  | -4 | 2  |
| 7  | 0  | 1  | -7 | 0  | -1 |
+----+----+----+----+----+----+

The final result would be:

+----+----+----+----+----+----+
| x1 | y1 | z1 | x2 | y2 | z2 |
+----+----+----+----+----+----+
| 1  | 2  | 3  | -1 | -2 | -3 |
| -5 | 4  | -2 | 5  | -4 | 2  |
+----+----+----+----+----+----+

Real-World Application:

This problem has applications in physics, geometry, and computer graphics. For example, it can be used to find the center of mass of a system or to compute the moment of inertia of an object.


User Purchase Platform

Problem Statement

Given a table purchases with the following schema:

| user_id | product_id | purchase_date | purchase_amount |

Write a SQL query to create a user purchase platform that shows the top 10 products purchased by each user.

Solution

WITH UserPurchaseCounts AS (
  SELECT
    user_id,
    product_id,
    COUNT(*) AS purchase_count
  FROM purchases
  GROUP BY
    user_id,
    product_id
),
RankedProducts AS (
  SELECT
    user_id,
    product_id,
    purchase_count,
    RANK() OVER (PARTITION BY user_id ORDER BY purchase_count DESC) AS rank
  FROM UserPurchaseCounts
)
SELECT
  user_id,
  product_id
FROM RankedProducts
WHERE
  rank <= 10;

Breakdown

1. UserPurchaseCounts Subquery:

This subquery groups the purchases by user_id and product_id and counts the number of purchases made for each combination.

2. RankedProducts Subquery:

This subquery ranks the products within each user's purchase history using the RANK() function. The PARTITION BY clause ensures that the ranking is done separately for each user.

3. Final Query:

The final query selects the user_id and product_id for all products that are ranked within the top 10 for each user.

Real-World Application

This query can be used to create personalized user purchase experiences, such as:

  • Product Recommendations: Showing users products they are likely to purchase based on their previous purchases.

  • Loyalty Programs: Rewarding users for purchasing specific products or combinations of products.

  • Fraud Detection: Identifying unusual purchase patterns that may indicate fraudulent activity.


The Category of Each Member in the Store

Problem: Given a table that stores the categories of members in a store, find the category of each member.

Table:

member_id
category

1

Gold

2

Silver

3

Bronze

Query:

SELECT
  member_id,
  category
FROM members;

Output:

member_id
category

1

Gold

2

Silver

3

Bronze

Explanation:

The query simply selects all the columns from the members table. This will give us a list of all the members and their respective categories.

Real-World Applications:

This query can be used in a variety of real-world applications, such as:

  • Customer segmentation: Businesses can use this query to segment their customers into different categories based on their membership status. This information can then be used to tailor marketing campaigns and other promotions.

  • Loyalty programs: Businesses can use this query to track the progress of their members in loyalty programs. This information can then be used to reward members for their loyalty.

  • Fraud detection: Businesses can use this query to identify members who are attempting to commit fraud. For example, a business could flag members who are trying to make purchases with stolen credit cards.


Binary Tree Nodes

SQL Solution

WITH RECURSIVE Tree AS (
    SELECT id, parent_id, 1 AS depth
    FROM table_name
    WHERE parent_id IS NULL  -- Start with the root node
    UNION ALL
    SELECT t.id, t.parent_id, tr.depth + 1
    FROM table_name t
    JOIN Tree tr ON t.parent_id = tr.id
)
SELECT id, depth FROM Tree;

Explanation

  1. Recursive Common Table Expression (CTE): The WITH RECURSIVE Tree AS ( ... ) clause defines a recursive CTE named Tree. It serves as the base for generating a hierarchical representation of the tree structure.

  2. Initialization: The base case of the recursion selects the root node (where parent_id is NULL) and sets its depth to 1.

  3. Recursive Step: The recursive part selects child nodes and increments their depth by 1 based on the depth of their parent nodes. This step populates the CTE with all the nodes and their respective depths.

  4. Projection: The final SELECT statement projects the id and depth columns from the Tree CTE. This gives us the desired result: a list of nodes with their corresponding depths.

Real-World Applications

  • Genealogical Trees: Represent family relationships and track lineage.

  • Organizational Charts: Model hierarchical structures within companies or organizations.

  • File Systems: Organize files and folders into a nested hierarchy.

  • Graph Algorithms: Perform depth-first or breadth-first search operations on tree structures.


Dynamic Unpivoting of a Table

Problem:

You have a table containing multiple columns, and you want to "unpivot" it to create a new table with two columns: one for the original column names and one for the corresponding values.

Example:

Original Table:

Name
Age
City

John

25

New York

Mary

30

London

Bob

28

Paris

Unpivoted Table:

Column_Name
Value

Name

John

Age

25

City

New York

Name

Mary

Age

30

City

London

Name

Bob

Age

28

City

Paris

Solution:

The Dynamic Unpivoting technique involves using a combination of SQL functions and dynamic SQL to create the unpivoted table. Here's a step-by-step breakdown:

  1. Create a temporary table to store the column names:

CREATE TEMP TABLE ColumnNames AS
SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'OriginalTable';
  1. Generate the unpivoted SQL statement dynamically:

DECLARE @UnpivotSQL nvarchar(max) = '';
SELECT @UnpivotSQL += 'SELECT '""' + Column_Name + '""" AS Column_Name, ' + Column_Name + ' AS Value FROM OriginalTable '
FROM ColumnNames;

-- Remove the trailing space and semicolon
SET @UnpivotSQL = LEFT(@UnpivotSQL, LEN(@UnpivotSQL) - 1);
  1. Execute the dynamic SQL statement:

EXEC (@UnpivotSQL);

Explanation:

  • The INFORMATION_SCHEMA.COLUMNS table provides metadata about the columns in the original table.

  • The first SQL statement creates a temporary table ColumnNames that contains the names of all columns in the original table.

  • The second SQL statement dynamically generates the unpivot SQL statement using the column names from the ColumnNames table.

  • The EXEC statement executes the dynamically generated SQL statement, creating the unpivoted table.

Potential Applications:

  • Reporting: Unpivoting can help create reports that summarize data across multiple columns in a table.

  • Data Analysis: Unpivoted data can be used to perform more complex data analysis tasks, such as identifying trends and patterns.

  • Data Integration: Unpivoting can be useful when integrating data from different sources with different schemas.


Pizza Toppings Cost Analysis

Problem Statement

You have a pizza shop and offer various toppings to your customers. Each topping has a different cost per serving. Given a list of orders, you need to calculate the total cost of all the toppings for each order.

Input Table:

order_id
topping_id
quantity

1

1

2

1

2

1

2

1

1

2

3

2

Toppings Table:

topping_id
topping_name
cost_per_serving

1

Pepperoni

0.50

2

Mushrooms

0.25

3

Onions

0.15

Output Table:

order_id
total_topping_cost

1

1.25

2

0.60

Solution in SQL

SELECT o.order_id, SUM(t.cost_per_serving * o.quantity) AS total_topping_cost
FROM orders o
JOIN toppings t ON o.topping_id = t.topping_id
GROUP BY o.order_id;

Explanation

  1. INNER JOIN: The query starts by joining the orders table (aliased as o) with the toppings table (aliased as t) on the topping_id column using an INNER JOIN. This operation ensures that only orders with valid topping IDs are considered.

  2. SUM Aggregation: After joining the tables, the query uses SUM() to calculate the total topping cost for each order. It multiplies the cost_per_serving for each topping by the quantity ordered and then sums these values for each order.

  3. GROUP BY: The GROUP BY clause groups the results by the order_id column. This step ensures that the total topping cost is calculated separately for each order.

Real-World Example

This query can be used in a real-world pizza shop to calculate the total cost of toppings for each customer's order. This information can be used for inventory management, cost analysis, and billing purposes.


Winning Candidate

SQL Code:

WITH CandidateScores AS (
    SELECT candidate, SUM(score) AS total_score
    FROM Votes
    GROUP BY candidate
),
WinningCandidate AS (
    SELECT candidate
    FROM CandidateScores
    WHERE total_score = (SELECT MAX(total_score) FROM CandidateScores)
)
SELECT *
FROM WinningCandidate;

Breakdown and Explanation:

Common Table Expression (CTE):

  • CandidateScores: This CTE calculates the total score for each candidate by summing the scores from the Votes table.

Subquery:

  • SELECT MAX(total_score) FROM CandidateScores: This subquery finds the maximum total score from all candidates.

WinningCandidate CTE:

  • This CTE selects the candidate with the maximum total score, effectively identifying the winning candidate.

SELECT Statement:

  • Finally, the SELECT * statement retrieves all columns from the WinningCandidate CTE, which contains the information about the winning candidate.

Example:

Suppose we have a Votes table with the following data:

candidate
score

John

40

Mary

50

Bob

30

The Winning Candidate query would produce the following result:

| candidate | |---|---| | Mary |

Applications in Real World:

This query can be used in various scenarios:

  • Election Results: Determine the winner of an election based on the number of votes received.

  • Customer Feedback: Identify the best-rated product or service based on customer reviews.

  • School Performance: Find the top-performing student in a class based on their exam scores.


Primary Department for Each Employee

Problem Statement:

Given a table named Employees with the following columns:

  • id (int)

  • name (string)

  • department (string)

You need to find the primary department for each employee. The primary department is the department with the highest number of employees.

SQL Solution:

SELECT
  e.name,
  d.name AS department
FROM Employees AS e
JOIN (
  SELECT
    department,
    COUNT(*) AS employee_count
  FROM Employees
  GROUP BY
    department
  ORDER BY
    employee_count DESC
  LIMIT 1
) AS d
  ON e.department = d.department;

Breakdown:

  1. Join the Employees Table: We start by joining the Employees table with a subquery to find the department with the highest number of employees.

  2. Subquery: The subquery calculates the employee count for each department and sorts the results in descending order.

  3. Limit 1: We only want the department with the highest count, so we limit the results to 1 row.

  4. Join on Department: We then join the Employees table with the subquery result on the department column to identify the primary department for each employee.

Example:

Consider the following Employees table:

id
name
department

1

John

Sales

2

Mary

Engineering

3

Bob

Sales

4

Jane

Engineering

5

Tom

Marketing

Result:

| name | department |
|---|---|
| John | Sales |
| Mary | Engineering |
| Bob | Sales |
| Jane | Engineering |
| Tom | Marketing |

Real-World Application:

This query can be useful in HR systems to identify the primary department of employees for various purposes, such as:

  • Staffing decisions

  • Resource allocation

  • Performance evaluations


Countries You Can Safely Invest In

Problem Statement

Given a table of countries and their risk ratings, find the countries that are safe to invest in.

Table Schema

CREATE TABLE countries (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(255) NOT NULL,
  risk_rating INT NOT NULL,
  PRIMARY KEY (id)
);

Solution

SELECT name
FROM countries
WHERE risk_rating <= 5;

Explanation

The SELECT statement retrieves the name column from the countries table. The WHERE clause filters the results to only include countries with a risk_rating of 5 or less.

Example

SELECT name
FROM countries
WHERE risk_rating <= 5;

| name | |---|---| | Canada | | Switzerland | | Sweden |


**Real World Applications**

This query can be used by investors to identify countries that are considered safe for making investments. By investing in countries with low risk ratings, investors can reduce the risk of losing their money.


---
# Game Play Analysis II

**Problem Statement:**

Given two tables:

Game_Play ( player_id INT, game_id INT, start_time TIMESTAMP, end_time TIMESTAMP )

Player ( player_id INT, name VARCHAR(255) )


Find the top 10 players with the longest total playtime.

**Best & Performant Solution:**

```sql
WITH PlayerTotalTime AS (
  SELECT
    player_id,
    SUM(end_time - start_time) AS total_time
  FROM Game_Play
  GROUP BY player_id
)
SELECT
  p.name,
  ptt.total_time
FROM Player p
JOIN PlayerTotalTime ptt ON p.player_id = ptt.player_id
ORDER BY ptt.total_time DESC
LIMIT 10;

Explanation:

The solution uses a Common Table Expression (CTE) named PlayerTotalTime to calculate the total playtime for each player. It then joins the Player table with the PlayerTotalTime CTE to get the player names and total playtime. Finally, it orders the results by total playtime in descending order and limits the results to the top 10 players.

Simplified Explanation:

  1. Create a new table called PlayerTotalTime that calculates the total playtime for each player.

  2. Join the Player table with the PlayerTotalTime table to get the player names and total playtime.

  3. Sort the results by the total playtime in descending order.

  4. Limit the results to the top 10 players.

Real World Implementation and Examples:

This problem can be used to analyze gameplay data in real-world applications. For example, a game developer could use this query to identify the players who are most engaged with their game. This information could then be used to tailor marketing campaigns or improve game design.

Potential Applications:

  • Identifying the most engaged players in a game.

  • Analyzing player behavior patterns.

  • Improving game design by understanding how players interact with the game.


Employee Bonus

Problem:

Given a table of employee bonuses, find the total bonus received by each employee.

Table:

bonuses (employee_id, bonus_amount)

Example:

employee_id
bonus_amount

1

100

2

200

1

300

3

400

Output:

employee_id
total_bonus

1

400

2

200

3

400

Solution:

SELECT employee_id, SUM(bonus_amount) AS total_bonus
FROM bonuses
GROUP BY employee_id;

Explanation:

  • The SUM(bonus_amount) function calculates the total bonus received by each employee.

  • The GROUP BY employee_id clause groups the results by employee ID, so that the total bonus is calculated for each employee.

Real-World Application:

This query can be used to generate a report of the total bonuses received by employees in a company. This report can be used for performance evaluations, compensation planning, and other HR-related tasks.


Second Degree Follower

Problem Statement:

Given a table of social media followers, find the second-degree followers of a given user.

Table Schema:

CREATE TABLE followers (
  id INT PRIMARY KEY,
  follower INT,
  followed INT,
  FOREIGN KEY (follower) REFERENCES users (id),
  FOREIGN KEY (followed) REFERENCES users (id)
);

Example Data:

INSERT INTO followers (id, follower, followed) VALUES
(1, 1, 2),
(2, 1, 3),
(3, 2, 4),
(4, 3, 5),
(5, 4, 6);

Example Query:

SELECT u.name
FROM users u
JOIN followers f ON u.id = f.follower
JOIN followers sf ON f.followed = sf.follower
WHERE f.followed = 2;

Output:

John

Breakdown and Explanation:

1. Find the followers of the given user (first-degree followers):

SELECT follower FROM followers WHERE followed = 2;

This query returns the IDs of the first-degree followers of user 2, which are 1.

2. Find the followers of the first-degree followers (second-degree followers):

SELECT follower FROM followers WHERE followed IN (SELECT follower FROM followers WHERE followed = 2);

This query returns the IDs of the second-degree followers of user 2, which are 4 and 6.

3. Join the results to get the names of the second-degree followers:

SELECT u.name FROM users u JOIN followers f ON u.id = f.follower WHERE f.followed IN (SELECT follower FROM followers WHERE followed = 2);

This query joins the users table with the followers table using the follower column to get the names of the second-degree followers.

Potential Applications:

  • Social Media Platform: Identify the second-degree connections of users to provide recommendations or targeted advertising.

  • Customer Relationship Management (CRM): Track the connections between customers to understand their relationships and interactions.

  • Fraud Detection: Identify potential fraud by analyzing the connections between users involved in suspicious activities.


Find Cutoff Score for Each School

Problem:

Find the cutoff score for each school, where the cutoff score is the minimum score required to be admitted to that school.

Table:

Students (id, name, score, school_id)
Schools (id, name)

SQL Query:

WITH SchoolCutoff AS (
  SELECT
    S.school_id,
    MIN(S.score) AS cutoff_score
  FROM Students AS S
  GROUP BY
    S.school_id
)
SELECT
  S.name,
  C.cutoff_score
FROM Schools AS S
JOIN SchoolCutoff AS C
  ON S.id = C.school_id;

Breakdown:

Step 1: Calculate Cutoff Scores for Each School

The subquery SchoolCutoff calculates the minimum score (cutoff score) for each school. It groups the students by their school ID and finds the minimum score for each group.

Step 2: Join Schools and Cutoff Scores

The main query joins the Schools table with the SchoolCutoff subquery using the school ID. This combines the school names with their respective cutoff scores.

Result:

The result is a table that lists each school's name and its corresponding cutoff score.

Example:

Students Table:

id
name
score
school_id

1

John

75

1

2

Mary

80

1

3

Bob

65

2

4

Alice

70

2

Schools Table:

id
name

1

Harvard

2

Stanford

Result:

name
cutoff_score

Harvard

75

Stanford

65

Real-World Application:

This query can be used by universities to determine the cutoff scores for admission to different programs or schools. It helps ensure that only students who meet or exceed the minimum requirements are admitted.


Department Highest Salary

Problem Statement: Given a table containing employee data, including their department and salary, find the highest salary for each department.

SQL Query:

SELECT department, MAX(salary) AS highest_salary
FROM employee
GROUP BY department;

Explanation:

The query can be broken down into the following steps:

  1. Select the department and maximum salary: The SELECT statement selects two columns:

    • department: The department of the employee.

    • MAX(salary): The maximum salary for each department. This is calculated using the MAX() aggregate function.

  2. Group by department: The GROUP BY clause groups the results by department. This means that for each department, the maximum salary will be calculated based on all employees in that department.

Example:

Consider the following employee table:

employee_id
department
salary

1

Sales

50000

2

Marketing

60000

3

Sales

45000

4

Engineering

70000

The query will return the following result:

department
highest_salary

Sales

50000

Marketing

60000

Engineering

70000

Real-World Applications:

This query can be used in various real-world applications, such as:

  • Human Resource Management: To identify the highest-paid employee in each department for salary negotiations.

  • Budget Planning: To estimate the total budget required for salaries in each department based on the highest salary.

  • Performance Analysis: To compare the performance of different departments based on the average or highest salary of their employees.


Immediate Food Delivery I

LeetCode Problem: Immediate Food Delivery I

Problem Statement:

A delivery company provides immediate food delivery services. You are given a table named orders that contains the following columns:

Column
Type
Description

order_id

INT

Unique ID of the order

restaurant_id

INT

ID of the restaurant

rider_id

INT

ID of the rider assigned to deliver the order

delivery_time

VARCHAR(20)

Time taken for delivery in "HH:MM:SS" format

You need to write a SQL query that calculates the average delivery time for each rider.

Solution:

-- Calculate the average delivery time for each rider
SELECT rider_id,
       AVG(delivery_time) AS avg_delivery_time
FROM orders
GROUP BY rider_id;

Explanation:

  1. SELECT rider_id, AVG(delivery_time): This line selects the rider_id and calculates the average delivery_time for each rider.

  2. GROUP BY rider_id: This line groups the results by rider_id, so that the average delivery time is calculated for each rider separately.

Example:

Rider ID
Average Delivery Time

1

00:15:00

2

00:20:00

3

00:25:00

Real World Applications:

This query can be used by the delivery company to:

  • Identify riders who have consistently fast delivery times.

  • Monitor rider performance and provide training or support to improve delivery times.

  • Optimize delivery routes and allocate riders to minimize delivery times.


Active Users

Problem: Find the number of active users on a website.

Input: A table named user_sessions with the following columns:

  • user_id: The unique ID of the user.

  • start_time: The time when the user's session started.

  • end_time: The time when the user's session ended.

Output: A table named active_users with the following columns:

  • user_id: The unique ID of the active user.

Solution:

SELECT DISTINCT user_id
FROM user_sessions
WHERE start_time >= NOW() - INTERVAL 1 DAY;

Breakdown:

  1. The SELECT DISTINCT user_id clause selects the distinct user IDs of active users.

  2. The FROM user_sessions clause specifies the input table.

  3. The WHERE start_time >= NOW() - INTERVAL 1 DAY clause filters the rows to only include user sessions that started within the last 24 hours.

Performance:

The query is optimized for performance because it uses the following techniques:

  • Indexing: The user_sessions table should be indexed on the start_time column to improve the performance of the query.

  • Filtering: The WHERE clause filters out inactive users, which reduces the number of rows that need to be processed.

  • DISTINCT: The DISTINCT keyword prevents duplicate user IDs from being returned, which can improve performance.

Real-World Application:

This query can be used to identify active users on a website for various purposes, such as:

  • Website Analytics: Tracking the number of active users can provide insights into website traffic and engagement.

  • Targeted Marketing: Active users can be targeted with personalized marketing campaigns.

  • Customer Support: Identifying active users can help customer support teams prioritize support requests.


Leetflex Banned Accounts

Problem Statement:

A social media platform, Leetflex, bans accounts for violating their community guidelines. They store the banned accounts in a table called BannedAccounts. Given a list of account IDs, determine which accounts are banned.

SQL Implementation:

SELECT account_id
FROM BannedAccounts
WHERE account_id IN (SELECT account_id FROM InputAccounts);

Simplified Explanation:

  1. Input Table: The InputAccounts table contains the list of account IDs we want to check.

  2. BannedAccounts Table: The BannedAccounts table contains the account IDs of all banned accounts.

  3. IN Operator: The IN operator checks if the account ID from the InputAccounts table is present in the BannedAccounts table.

  4. SELECT Statement: The query retrieves the account IDs from the BannedAccounts table that match the ones in the InputAccounts table.

Real-World Application:

This query can be used by Leetflex to identify banned accounts from a list of user-submitted account IDs. This is essential for enforcing their community guidelines and maintaining the integrity of the platform. For example, if someone reports an account for harassment, Leetflex can use this query to determine if the account has already been banned.

Example:

InputAccounts Table:

account_id

1

2

3

BannedAccounts Table:

account_id

2

4

Query Result:

account_id

2

Explanation: Account ID 2 is present in both the InputAccounts and BannedAccounts tables, indicating that it is a banned account.


Find Customers With Positive Revenue this Year

SQL Query

SELECT customer_id
FROM customers
WHERE year(revenue_date) = year(now()) AND revenue > 0;

Explanation

This query finds customers who have generated positive revenue in the current year. It uses the following steps:

  1. Extract the year from the revenue_date column: The year() function extracts the year component from a date value. This is used to filter out revenue data from previous years.

  2. Compare the extracted year with the current year: The year(now()) expression returns the current year. The query checks if the year extracted from the revenue_date column matches the current year.

  3. Filter for positive revenue: The revenue > 0 condition ensures that only customers with positive revenue are included in the results.

Real-World Examples

This query can be used in various real-world applications, such as:

  • Sales analysis: Identifying customers who are generating revenue can help businesses understand their customer base and target sales efforts accordingly.

  • Customer segmentation: Segmenting customers based on their revenue can help businesses personalize marketing campaigns and provide tailored promotions.

  • Customer retention: Focusing on customers with positive revenue can help businesses identify valuable customers and develop strategies to retain them.

Potential Applications

  • Customer relationship management (CRM) systems

  • Financial reporting platforms

  • Sales analytics dashboards


Find the Missing IDs

Problem Statement

Given a table employees with the following columns:

  • employee_id (primary key)

  • employee_name

  • manager_id (foreign key referencing the employee_id column of the same table)

Find the employee IDs of employees who don't have managers.

SQL Query

SELECT employee_id
FROM employees
WHERE manager_id IS NULL;

Explanation

  • The SELECT clause retrieves the employee_id column.

  • The FROM clause selects from the employees table.

  • The WHERE clause specifies that only employees with a NULL value in the manager_id column are selected.

Real-World Application

This query can be used to find employees who report directly to the CEO or other top-level executives. This information can be useful for organizational planning and communication.


Find Total Time Spent by Each Employee

Problem Statement:

Given a table EmployeeHours that stores employee hours worked, find the total time spent by each employee.

Example:

Table:

employee_id
date
hours_worked

1

2022-01-01

8

1

2022-01-02

6

2

2022-01-01

10

Output:

employee_id
total_hours

1

14

2

10

Solution:

SELECT
  employee_id,
  SUM(hours_worked) AS total_hours
FROM EmployeeHours
GROUP BY
  employee_id;

Explanation:

  1. SELECT employee_id, SUM(hours_worked) AS total_hours: This line calculates the total hours worked for each employee and assigns it to the alias total_hours.

  2. FROM EmployeeHours: Specifies the table from which to fetch data.

  3. GROUP BY employee_id: Groups the data by employee ID, ensuring that each employee's hours are summed correctly.

Real-World Applications:

  • Tracking employee productivity by calculating total hours worked.

  • Identifying employees with high or low work volume.

  • Planning employee schedules based on total hours available.


Project Employees I

Problem:

Find all employees who work on at least three different projects.

SQL Query:

SELECT EmployeeID
FROM Employees
GROUP BY EmployeeID
HAVING COUNT(ProjectID) >= 3;

Explanation:

  1. SELECT EmployeeID: Select the unique employee IDs.

  2. FROM Employees: From the 'Employees' table.

  3. GROUP BY EmployeeID: Group the results by employee ID to count the number of projects each employee works on.

  4. HAVING COUNT(ProjectID) >= 3: Filter the results to include only employees who work on three or more projects.

Example:

EmployeeID
Count(ProjectID)

10

3

20

5

30

2

Result:

| EmployeeID | |---|---| | 20 |

Breakdown:

  • Group BY: Groups rows in a table based on one or more columns. In this case, we group by 'EmployeeID' to count the number of projects for each employee.

  • Having: Filters the results of a group operation based on a condition. In this case, we filter to include only groups (employees) with three or more projects.

Real-World Application:

This query can be used to identify employees who are involved in multiple projects, which can be helpful in:

  • Project management: Tracking employee workload and project involvement.

  • Resource allocation: Identifying employees with diverse skills who can contribute to multiple projects.

  • Performance evaluation: Assessing employees based on their contributions to different projects.


Consecutive Available Seats

Problem Statement:

Find the maximum number of consecutive available seats in a row of seats in a theater.

SQL Query:

WITH SeatIntervals AS (
    SELECT seat_number AS start, seat_number + 1 AS end
    FROM seats
    WHERE is_available = 1
), GapIntervals AS (
    SELECT start, start - 1 AS gap_size
    FROM SeatIntervals
    WHERE end IN (SELECT start FROM SeatIntervals)
)
SELECT MAX(gap_size) AS max_consecutive_available_seats
FROM GapIntervals;

Breakdown and Explanation:

  1. SeatIntervals: This CTE (Common Table Expression) creates intervals for each consecutive available seat. Each interval starts at the seat number and ends at the next seat number plus 1. For example, if seats 5 and 6 are available, it will create an interval [5, 6].

  2. GapIntervals: This CTE creates intervals for the gaps between consecutive available seats. It starts at the start of each SeatInterval and ends at the start minus 1. For example, if there is a gap between seats 4 and 5, it will create an interval [4, 3].

  3. MAX(gap_size): Finally, we calculate the maximum gap size, which represents the maximum number of consecutive available seats.

Real-World Application:

This query can be used in a theater booking system to find the best seats for a group of people. By maximizing the number of consecutive available seats, you can ensure that the group can sit together.


Find Users With Valid E-Mails

Problem Statement:

Find all users in a database table who have valid email addresses. A valid email address is one that contains an "@" symbol and a period ".".

Solution:

SELECT *
FROM Users
WHERE email LIKE '%@%' AND email LIKE '%.%';

Breakdown and Explanation:

  1. *SELECT : This selects all columns (fields) from the "Users" table.

  2. FROM Users: This specifies which table to search in, in this case, the "Users" table.

  3. WHERE: This is the filtering condition that specifies which rows to select.

  4. email LIKE '%@%': This checks if the email column contains the "@" symbol anywhere within the string. The wildcard character "%" matches any number of characters.

  5. AND email LIKE '%.%': This checks if the email column contains the "." symbol anywhere within the string.

Together, these conditions ensure that only rows with email addresses that contain both "@" and "." are selected.

Real-World Application:

This query can be useful in various scenarios:

  • Validating user input: Websites and applications often require users to provide email addresses, and this query can ensure that the entered emails are in a valid format.

  • Cleaning up data: Databases can contain outdated or invalid email addresses, and this query can help identify and remove such records.

  • Marketing campaigns: Email marketing campaigns rely on accurate email addresses to reach the intended recipients. This query can help ensure that the target list contains only valid emails.


Trips and Users

Problem:

Given two tables: Trips and Users.

Trips:
| id | user_id | start_time | end_time |
|---|---|---|---|
| 1 | 10 | 2022-01-01 10:00:00 | 2022-01-01 12:00:00 |
| 2 | 15 | 2022-01-02 14:00:00 | 2022-01-02 16:00:00 |
| 3 | 20 | 2022-01-03 09:00:00 | 2022-01-03 11:00:00 |

Users:
| user_id | name |
|---|---|
| 10 | John |
| 15 | Mary |
| 20 | Bob |

Find the total duration of trips for each user.

SQL Solution:

SELECT u.name, SUM(TIMESTAMPDIFF(SECOND, t.start_time, t.end_time)) AS total_duration
FROM Trips t
JOIN Users u ON t.user_id = u.user_id
GROUP BY u.name;

Explanation:

  • Join the Trips and Users tables: This is done using the JOIN clause, which matches rows from the two tables based on the common column user_id.

  • Calculate the duration of each trip: This is done using the TIMESTAMPDIFF() function, which calculates the time difference between two timestamps (in seconds).

  • Sum the durations for each user: The SUM() function is used to sum the durations of all trips for each user.

  • Group the results by user name: The GROUP BY clause groups the results by the name column from the Users table.

Output:

| name | total_duration |
|---|---|
| John | 7200 |
| Mary | 7200 |
| Bob | 7200 |

Additional Notes:

  • This query assumes that the timestamps in the Trips table are in the same time zone. If they are not, you may need to use the CONVERT_TIMEZONE() function to convert them to a common time zone.

  • The query can be optimized using an index on the user_id column in the Trips table.

  • This query can be used in a variety of real-world applications, such as:

    • Calculating the total duration of trips for employees in a travel expense reimbursement system.

    • Analyzing the usage of shared vehicles in a carpooling application.

    • Determining the most popular travel routes for a ride-sharing service.


Percentage of Users Attended a Contest

SELECT user_id,
       COUNT(*) AS total_contests_attended
FROM contest_logs
WHERE contest_id IN (SELECT contest_id
                     FROM contests
                     WHERE start_date >= '2022-01-01'
                       AND end_date <= '2022-12-31')
GROUP BY user_id
ORDER BY total_contests_attended DESC;

In this query:

  • The contest_logs table contains a record for each time a user attends a contest. Each row includes the user_id and the contest_id.

  • The contests table contains information about each contest, including the contest_id, start_date, and end_date.

  • The subquery in the WHERE clause selects the contest_ids of all contests that took place between '2022-01-01' and '2022-12-31'.

  • The GROUP BY clause groups the results by user_id to count the number of contests each user attended.

  • The ORDER BY clause sorts the results in descending order by the number of contests attended.


Monthly Transactions II

Problem Statement:

Given a table Transactions containing the following columns:

  • user_id (int)

  • month (string)

  • transaction_amount (int)

You need to find the total transaction amount for each user in each month.

Best & Performant SQL Solution:

SELECT
  user_id,
  month,
  SUM(transaction_amount) AS total_amount
FROM Transactions
GROUP BY
  user_id,
  month;

Breakdown and Explanation:

  • SELECT ... specifies the columns to be included in the result.

    • user_id is the user identifier.

    • month is the month in which the transaction occurred.

    • SUM(transaction_amount) calculates the total amount of transactions for each user in each month.

  • FROM Transactions specifies the table from which the data is retrieved.

  • GROUP BY groups the results by the user ID and month to calculate the total amount for each user in each month.

Real-World Implementation and Examples:

This query can be used in various real-world applications, such as:

  • Financial Analysis: To analyze user spending patterns over time.

  • Customer Segmentation: To identify users with similar transaction behaviors.

  • Sales Forecasting: To predict future transaction amounts based on historical data.

Example:

Consider the following Transactions table:

user_id
month
transaction_amount

1

January

100

1

February

200

2

January

150

2

March

250

3

February

300

3

March

400

The following query would return the total transaction amount for each user in each month:

SELECT
  user_id,
  month,
  SUM(transaction_amount) AS total_amount
FROM Transactions
GROUP BY
  user_id,
  month;

Output:

user_id
month
total_amount

1

January

100

1

February

200

2

January

150

2

March

250

3

February

300

3

March

400


Customers Who Bought Products A and B but Not C

Problem Statement: Find customers who have purchased products A and B, but haven't purchased product C.

SQL Solution:

WITH ProductPurchases AS (
  SELECT
    customer_id,
    product_id
  FROM
    purchases
), PurchasesCounts AS (
  SELECT
    customer_id,
    SUM(CASE WHEN product_id = 'A' THEN 1 ELSE 0 END) AS num_bought_A,
    SUM(CASE WHEN product_id = 'B' THEN 1 ELSE 0 END) AS num_bought_B,
    SUM(CASE WHEN product_id = 'C' THEN 1 ELSE 0 END) AS num_bought_C
  FROM
    ProductPurchases
  GROUP BY
    customer_id
)
SELECT
  customer_id
FROM
  PurchasesCounts
WHERE
  num_bought_A > 0 AND num_bought_B > 0 AND num_bought_C = 0;

Breakdown:

  1. ProductPurchases: A common table expression (CTE) that selects the customer_id and product_id from the purchases table.

  2. PurchasesCounts: Another CTE that groups the purchases by customer_id and counts the number of times products A, B, and C were purchased.

  3. The final query uses the PurchasesCounts CTE to select customer_ids where products A and B were purchased more than 0 times, but product C was not purchased.

Real-World Application:

  • Identifying customers who may be interested in purchasing product C based on their past purchases.

  • Understanding customer behavior and product preferences.

  • Targeted marketing campaigns to offer product C to specific customers.


Bank Account Summary

Problem Statement:

Write an SQL query to get a summary of a bank account. The summary should include the account number, account balance, and a list of all transactions made on the account.

Sample Data:

Accounts:
+-------------+-------------+
| account_id  | balance     |
+-------------+-------------+
| 1001        | 1000.00     |
| 1002        | 500.00      |
| 1003        | 2500.00     |
+-------------+-------------+

Transactions:
+-------------+-------------+-------------+
| transaction_id  | account_id  | amount      |
+-------------+-------------+-------------+
| 2001        | 1001        | 100.00      |
| 2002        | 1001        | 200.00      |
| 2003        | 1002        | 50.00       |
| 2004        | 1003        | 1000.00     |
+-------------+-------------+-------------+

Solution:

WITH AccountSummary AS (
    SELECT
        a.account_id,
        a.balance,
        SUM(t.amount) AS total_transactions
    FROM
        Accounts a
    LEFT JOIN
        Transactions t ON a.account_id = t.account_id
    GROUP BY
        a.account_id, a.balance
)
SELECT
    account_id,
    balance,
    total_transactions,
    (
        SELECT
            GROUP_CONCAT(amount)
        FROM
            Transactions t
        WHERE
            t.account_id = AccountSummary.account_id
    ) AS transactions
FROM
    AccountSummary;

Explanation:

  1. Create a Common Table Expression (CTE) named AccountSummary to calculate the account summary:

    • Join the Accounts and Transactions tables using a left join on the account_id column.

    • Group the results by account_id and balance.

    • Calculate the total amount of transactions for each account using SUM(t.amount) and alias it as total_transactions.

  2. Select the Columns:

    • From the AccountSummary CTE, select the account_id, balance, total_transactions, and a subquery to retrieve the list of transactions.

  3. Subquery to Get Transactions:

    • The subquery selects the amount column from the Transactions table where t.account_id matches the AccountSummary.account_id.

    • The results are concatenated using GROUP_CONCAT and returned as the transactions column.

Output:

+-------------+-------------+-------------+------------------------+
| account_id  | balance     | total_transactions | transactions           |
+-------------+-------------+-------------+------------------------+
| 1001        | 1000.00     | 2            | 100.00,200.00         |
| 1002        | 500.00      | 1            | 50.00                  |
| 1003        | 2500.00     | 1            | 1000.00                |
+-------------+-------------+-------------+------------------------+

Real-World Application:

This query can be used to provide a summary of a bank account to the account holder or for internal reporting purposes. The summary includes key information such as the account balance, total transactions, and a list of all transactions made on the account.


Find Median Given Frequency of Numbers

SQL Solution:

WITH FrequencyTable AS (
  SELECT number, frequency
  FROM frequency_table
)
SELECT number
FROM (
  SELECT number, SUM(frequency) OVER (ORDER BY number ASC) AS partial_sum
  FROM FrequencyTable
) AS CumulativeTable
WHERE partial_sum = FLOOR((
  SELECT SUM(frequency)
  FROM FrequencyTable
) / 2);

Explanation:

  1. FrequencyTable: Create a temporary table to store the numbers and their frequencies.

  2. CumulativeTable: Calculate the cumulative sum of frequencies for each number. This helps us identify the median number, which is the number with a cumulative sum equal to half the total sum of frequencies.

  3. Median Query: Select the number that satisfies this condition: its cumulative sum is equal to the floor (rounded down) of half the total sum of frequencies. This gives us the median number.

Example:

Suppose we have a frequency table like this:

Number
Frequency

1

4

2

6

3

2

Cumulative Table:

Number
Partial Sum

1

4

2

10

3

12

Median Query:

SELECT number
FROM CumulativeTable
WHERE partial_sum = FLOOR((SELECT SUM(frequency) FROM FrequencyTable) / 2);

This query returns the number 2, which is the median since its cumulative sum (10) is equal to half the total sum of frequencies (12).

Applications:

Finding the median of a distribution is useful in many real-world applications, such as:

  • Data Analysis: Identifying the middle value of a dataset, representing the "typical" value.

  • Statistics: Calculating the 50th percentile, which is often used to summarize data.

  • Machine Learning: Evaluating the performance of models by using the median as a threshold or target value.


Immediate Food Delivery II

Problem:

Given two tables:

  • Restaurants(

    • restaurant_id (int)

    • restaurant_name (string)

    • address (string)

    • phone_number (string)

  • Orders (

    • order_id (int)

    • restaurant_id (int)

    • customer_id (int)

    • order_time (string)

    • total_amount (int)

  • Customers (

    • customer_id (int)

    • customer_name (string)

    • address (string)

    • phone_number (string)

Find all restaurants that can deliver food to customers within a certain time frame T. The time frame T can be calculated as the difference between the current time and the order time.

SOLUTION:

SELECT DISTINCT
  R.restaurant_id,
  R.restaurant_name
FROM Restaurants AS R
JOIN Orders AS O
  ON R.restaurant_id = O.restaurant_id
WHERE
  O.order_time >= DATE_SUB(NOW(), INTERVAL T MINUTE);

Explanation:

  1. Join the Restaurants and Orders tables on the common column restaurant_id to link restaurant information with customer orders.

  2. Filter the orders by checking if the order time is greater than or equal to the current time minus the specified time frame T. This condition ensures that we only select orders that were placed within the specified time frame.

  3. Group the results by the restaurant_id and restaurant_name to find all the unique restaurants that meet the time frame criteria.

  4. Finally, select the distinct restaurant_id and restaurant_name to get the complete list of restaurants that can deliver food within the specified time frame.

Example:

Suppose we have the following data:

Restaurants:
+---------------+------------------+------------------+----------------+
| restaurant_id | restaurant_name  | address          | phone_number    |
+---------------+------------------+------------------+----------------+
| 1             | Pizza Palace      | 123 Main Street   | 555-1234        |
| 2             | Burgers & Fries   | 456 Oak Street    | 555-2345        |
| 3             | Sushi Delight     | 789 Pine Street   | 555-3456        |

Orders:
+----------+---------------+------------+-----------------+---------------+
| order_id | restaurant_id | customer_id | order_time      | total_amount  |
+----------+---------------+------------+-----------------+---------------+
| 1        | 1             | 10          | 2023-01-01 12:00 | $50.00         |
| 2        | 2             | 15          | 2023-01-01 13:00 | $40.00         |
| 3        | 3             | 20          | 2023-01-01 14:00 | $60.00         |
| 4        | 1             | 10          | 2023-01-01 15:00 | $45.00         |
| 5        | 2             | 15          | 2023-01-01 16:00 | $50.00         |

Customers:
+------------+-------------------+-------------------+-----------------+
| customer_id | customer_name     | address           | phone_number    |
+------------+-------------------+-------------------+-----------------+
| 10          | John Doe          | 101 Elm Street     | 555-5678        |
| 15          | Jane Smith        | 202 Cedar Street    | 555-6789        |
| 20          | Michael Johnson    | 303 Maple Street    | 555-7890        |

If we run the query with T = 15, we will get the following result:

+---------------+------------------+
| restaurant_id | restaurant_name  |
+---------------+------------------+
| 1             | Pizza Palace      |
| 2             | Burgers & Fries   |

This result shows that both Pizza Palace and Burgers & Fries can deliver food to customers within a 15-minute time frame. Sushi Delight is excluded because its order was placed outside the time frame (14:00, which is 15 minutes past the current time).

Applications:

This query can be used in real-world applications that provide on-demand food delivery services. It can help customers find restaurants that can deliver food within a specific time frame, ensuring that they receive their food quickly and efficiently.


Count the Number of Experiments

Problem: Count the Number of Experiments

SQL:

SELECT COUNT(*) AS experiment_count
FROM experiments;

Explanation:

This SQL query counts the number of rows in the experiments table. Each row represents an experiment, so the count of rows is the count of experiments.

Breakdown:

  • The SELECT clause specifies the columns to be returned in the result set. In this case, we want to count the number of experiments, so we select COUNT(*). The COUNT(*) function counts all rows in the table, regardless of their column values.

  • The FROM clause specifies the table to be used in the query. In this case, we use the experiments table.

  • The WHERE clause can be used to filter the rows in the table based on certain criteria. In this case, we don't use a WHERE clause because we want to count all experiments.

Example:

Consider the following experiments table:

| id | name |
|---|---|
| 1 | Experiment 1 |
| 2 | Experiment 2 |
| 3 | Experiment 3 |

If we run the above SQL query on this table, we will get the following result:

| experiment_count |
|-----------------|
| 3                |

This means that there are three experiments in the experiments table.

Real-World Application:

This SQL query can be used in various real-world applications, such as:

  • To track the number of experiments conducted in a scientific study.

  • To determine the number of experiments that have been completed in a laboratory setting.

  • To count the number of experiments that have been published in a scientific journal.


User Activities within Time Bounds

Problem:

Given a table activities with columns id, user_id, start_time, and end_time, find the users who have activities within a specific time range.

Solution:

SELECT DISTINCT user_id
FROM activities
WHERE start_time >= '2022-01-01' AND end_time <= '2022-12-31';

Breakdown:

  • The SELECT statement retrieves the user_id column.

  • The DISTINCT keyword ensures that each user is listed only once.

  • The FROM clause specifies the activities table.

  • The WHERE clause filters the rows based on the following condition:

    • The start_time column must be greater than or equal to '2022-01-01'.

    • The end_time column must be less than or equal to '2022-12-31'.

Real-World Application:

This query can be used to analyze user activity within a specific time period. For example, a business could use this query to determine which users have been active in the last year or to identify users who have recently stopped using the platform.

Simplification:

In plain English, the query finds all the users who have had activities between January 1, 2022, and December 31, 2022. This is useful if you want to identify users who have been active during a particular time period, such as during a promotional campaign or holiday season.


Account Balance

Problem Statement:

Given a table Account containing the following columns:

  • account_id: Unique identifier for each account

  • balance: Current balance of the account

Write a query to calculate the sum of balances for each account.

Solution:

SELECT
  account_id,
  SUM(balance) AS total_balance
FROM
  Account
GROUP BY
  account_id;

Explanation:

  1. The SELECT statement retrieves the account_id and the sum of balance for each account_id.

  2. The FROM clause specifies the Account table from which the data is retrieved.

  3. The GROUP BY clause groups the results by account_id so that the SUM function can calculate the total balance for each account.

Real-World Application:

This query is useful in any system that tracks financial transactions, such as banking or accounting systems. It can be used to:

  • Calculate the total balance of a customer's savings and checking accounts

  • Monitor account balances to identify potential fraud or suspicious activity

  • Generate reports on account activity

Example:

Consider the following data in the Account table:

account_id
balance

100

1000

101

2000

100

500

The query would return the following result:

account_id
total_balance

100

1500

101

2000

This shows that account 100 has a total balance of $1500 (1000 + 500), while account 101 has a balance of $2000.


Suspicious Bank Accounts

Problem Statement:

You have a database of bank accounts and their corresponding transactions. A bank account is considered suspicious if it has more than $1,000 in deposits on a single day. Your task is to identify all the suspicious accounts.

SQL Solution:

SELECT account_number
FROM transactions
GROUP BY account_number
HAVING SUM(amount) > 1000

Explanation:

  • The SELECT statement fetches the account numbers from the transactions table.

  • The GROUP BY statement groups the transactions by account number.

  • The HAVING SUM(amount) > 1000 clause filters out the accounts with total deposits greater than $1,000.

Real-World Applications:

Suspicious bank account detection systems are used in fraud detection and anti-money laundering efforts. By identifying accounts that exhibit unusual activity, banks can prevent financial crimes and protect their customers.

Additional Notes:

  • The above query assumes that all transactions are positive (deposits). If there are also withdrawals, you can use the ABS() function to treat them as positive values for the comparison.

  • You can optimize the query by creating an index on the account_number column.

Simplified Explanation:

Imagine a bank has a bunch of accounts and keeps track of all the deposits made into each account. To find the suspicious accounts, we need to check each account and see if the total amount of deposits on any day exceeds $1,000.

The SQL query does this by grouping all the transactions for each account and then checking if the total amount of deposits is more than $1,000. If it is, the account is considered suspicious.

This is like checking all the kids in a school and seeing if any of them have more than $1,000 in their piggy banks. The query groups the kids by class and then checks each class to see if the total amount of money in the piggy banks is more than $1,000. If it is, the class is considered suspicious.


Second Highest Salary

Problem:

Find the second highest salary paid to an employee.

Solution:

SELECT DISTINCT
  Salary  -- Select the distinct salary values
FROM Employee  -- From the Employee table
ORDER BY
  Salary DESC  -- Order the salaries in descending order
LIMIT 1, 1;  -- Limit the result to the second highest salary (starting from row 1, limit to 1 row)

Breakdown:

  • SELECT DISTINCT Salary: Selects only unique salary values to avoid duplicates.

  • FROM Employee: Specifies the table to search for salaries.

  • ORDER BY Salary DESC: Orders the salaries in descending order, so that the highest salary is at the top.

  • LIMIT 1, 1: Limits the result to the second row, as the first row will contain the highest salary.

Real-World Example:

Consider an employee database with the following table:

ID
Name
Salary

1

John

1000

2

Mary

1200

3

Bob

900

The query would return 1200, which is the second highest salary.

Potential Applications:

  • Identifying employees for bonuses or promotions based on their salary.

  • Analyzing salary trends within a company or industry.

  • Calculating average or median salaries for comparison purposes.


Low-Quality Problems

LeetCode Problem:

Find the average salary of all employees in a company who earn more than a certain threshold.

SQL Query:

SELECT AVG(salary)
FROM Employee
WHERE salary > (SELECT AVG(salary) FROM Employee);

Explanation:

  1. Subquery: The subquery (SELECT AVG(salary) FROM Employee) calculates the average salary of all employees in the company.

  2. Main Query: The main query selects the average salary (AVG(salary)) from the Employee table.

  3. Filter: The WHERE clause filters the results to include only employees whose salary is greater than the average salary calculated in the subquery.

Real-World Application:

This query can be used by a company to determine which employees are earning above-average salaries. It can help with:

  • Performance reviews: Identifying top performers who deserve promotions or bonuses.

  • Compensation planning: Setting salaries and benefits that are competitive within the industry.

  • Budgeting: Forecasting employee expenses by estimating the cost of salaries above the average.

Example:

Consider the following Employee table:

id
name
salary

1

John Doe

10,000

2

Jane Smith

15,000

3

Michael Jones

20,000

4

Susan Miller

12,000

5

William Brown

18,000

The average salary in this company is (10000 + 15000 + 20000 + 12000 + 18000) / 5 = 15000.

The employees earning above average are:

  • Jane Smith (15,000)

  • Michael Jones (20,000)

  • William Brown (18,000)

The average salary of these employees is (15000 + 20000 + 18000) / 3 = 17666.67.


Order Two Columns Independently

Problem:

Given two tables:

  • orders: id, order_date, customer_id

  • products: id, product_name, price

Write an SQL query to order the products independently for each order by price. The result should show the product with the highest price at the top for each order.

SOLUTION 1 (Using a Subquery):

SQL:

SELECT id, order_date, customer_id, 
(SELECT product_name
FROM products
WHERE id = (SELECT product_id
            FROM order_products
            WHERE order_id = orders.id
            ORDER BY price DESC
            LIMIT 1)) AS highest_priced_product
FROM orders
ORDER BY order_date, customer_id;

Breakdown:

  • The subquery (SELECT product_name ...) finds the product name for the highest-priced product in each order.

  • It uses the nested subquery (SELECT product_id ...) to get the product ID for the highest-priced product, ordered by price in descending order.

  • The outer query selects the order details and assigns the highest-priced product to the highest_priced_product column.

  • The result is ordered by order date and customer ID.

SOLUTION 2 (Using a Window Function):

SQL:

SELECT id, order_date, customer_id, 
MAX(product_name) OVER (PARTITION BY id ORDER BY price DESC) AS highest_priced_product
FROM orders
JOIN order_products ON orders.id = order_products.order_id
JOIN products ON order_products.product_id = products.id
ORDER BY order_date, customer_id;

Breakdown:

  • The window function MAX(product_name) OVER (PARTITION BY id ORDER BY price DESC) calculates the highest-priced product for each order.

  • The PARTITION BY id clause groups the products by order ID.

  • The ORDER BY price DESC clause orders the products within each partition by price in descending order.

  • The MAX() function then returns the product name with the highest price for each partition.

Real-World Applications:

  • Displaying products in e-commerce stores for each customer's order.

  • Analyzing customer preferences and purchasing patterns based on the highest-priced products ordered.

  • Identifying upselling and cross-selling opportunities by recommending related products with higher prices.


Product Sales Analysis V

LeetCode Problem: Product Sales Analysis V

Problem Statement: Given a table containing daily product sales, determine the total sales for each product over a specified date range.

Example Input Table:

CREATE TABLE Sales (
  product_id INT,
  product_name VARCHAR(255),
  sale_date DATE,
  sale_amount DECIMAL(10, 2)
);

Sample Data:

INSERT INTO Sales (product_id, product_name, sale_date, sale_amount) VALUES
(1, 'Product A', '2023-03-01', 100.00),
(2, 'Product B', '2023-03-02', 200.00),
(1, 'Product A', '2023-03-03', 300.00),
(2, 'Product B', '2023-03-04', 400.00),
(3, 'Product C', '2023-03-05', 500.00);

Parameters:

  • from_date: Start date of the date range (inclusive).

  • to_date: End date of the date range (inclusive).

Output:

  • product_id: ID of the product.

  • product_name: Name of the product.

  • total_sales: Total sales of the product over the specified date range.

SQL Solution:

SELECT
  product_id,
  product_name,
  SUM(sale_amount) AS total_sales
FROM Sales
WHERE
  sale_date >= from_date
  AND sale_date <= to_date
GROUP BY
  product_id, product_name;

Explanation:

  • JOIN Operation: None required in this case.

  • FROM Clause: Selects the Sales table.

  • WHERE Clause: Filters rows based on the specified date range.

  • GROUP BY Clause: Groups rows by product_id and product_name.

  • SELECT Clause: Calculates the total sales (sum of sale_amount) for each product and returns the product_id, product_name, and total sales.

Example Input and Output:

-- Example Input Parameters
DECLARE @from_date DATE = '2023-03-01';
DECLARE @to_date DATE = '2023-03-04';

-- Execute the Query
SELECT
  product_id,
  product_name,
  SUM(sale_amount) AS total_sales
FROM Sales
WHERE
  sale_date >= @from_date
  AND sale_date <= @to_date
GROUP BY
  product_id, product_name;

-- Example Output
+-----------+-------------+-------------+
| product_id | product_name | total_sales |
+-----------+-------------+-------------+
| 1         | Product A   | 400.00       |
| 2         | Product B   | 600.00       |
+-----------+-------------+-------------+

Real-World Applications:

  • Sales Reporting: Analyze product sales over time to identify trends and make data-driven decisions.

  • Inventory Management: Track sales to ensure optimal inventory levels and avoid overstocking or shortages.

  • Customer Behavior Analysis: Understand customer preferences and identify cross-selling opportunities by analyzing sales of different products together.

  • Financial Analysis: Calculate total revenue generated from product sales for financial reporting and planning.


Page Recommendations

Problem:

Find the most popular page visited by users from a specific country.

SQL Solution:

-- Count page views by country
SELECT country, page_url, COUNT(*) AS page_count
FROM page_views
GROUP BY country, page_url
-- Find the page with the highest count for each country
SELECT country, MAX(page_count) AS max_page_count, page_url
FROM (
    SELECT country, page_url, COUNT(*) AS page_count
    FROM page_views
    GROUP BY country, page_url
) AS page_counts
GROUP BY country

Breakdown:

  • The first query groups page views by country and page URL and counts the number of views for each combination.

  • The second query uses a subquery to find the maximum page count for each country.

  • The final query groups the results by country and selects the page URL with the maximum count for each country.

Example:

+---------+-------------+----------------+
| country | page_url     | max_page_count |
+---------+-------------+----------------+
| USA     | page1.html   | 1000           |
| UK      | page2.html   | 500            |
| Canada  | page3.html   | 250            |
+---------+-------------+----------------+

Applications:

  • Website analytics: Track the most popular pages visited by users from different countries to understand user demographics and preferences.

  • Marketing: Target specific products or services to users based on their country and page preferences.

  • Content optimization: Tailor website content based on the most popular pages visited by users from different countries to increase engagement.


Get the Second Most Recent Activity

Problem Statement:

Given a table of activities, write a SQL query to find the second most recent activity for each user.

Input Table:

activities (user_id, activity_date)

Output Table:

last_2_activities (user_id, activity_date)

Solution:

Step 1: Find the Most Recent Activity

SELECT user_id, MAX(activity_date) AS most_recent_date
FROM activities
GROUP BY user_id;

This subquery finds the most recent activity date for each user. We assign it an alias, most_recent_date.

Step 2: Find the Second Most Recent Activity

SELECT a.user_id, a.activity_date
FROM activities AS a
JOIN (
  SELECT user_id, MAX(activity_date) AS most_recent_date
  FROM activities
  GROUP BY user_id
) AS b ON a.user_id = b.user_id AND a.activity_date < b.most_recent_date
ORDER BY a.user_id, a.activity_date DESC;

This subquery finds the second most recent activity for each user. We join the main activities table (aliased as a) with the subquery from Step 1 (aliased as b) on the user ID. We then filter out the most recent activity by checking if a.activity_date is less than b.most_recent_date. Finally, we sort the results in descending order of activity date to get the second most recent activity.

Final Query:

WITH MostRecentActivity AS (
  SELECT user_id, MAX(activity_date) AS most_recent_date
  FROM activities
  GROUP BY user_id
)
SELECT a.user_id, a.activity_date
FROM activities AS a
JOIN MostRecentActivity AS b ON a.user_id = b.user_id AND a.activity_date < b.most_recent_date
ORDER BY a.user_id, a.activity_date DESC;

Explanation:

MostRecentActivity Subquery:

This subquery calculates the most recent activity date for each user. It groups the activities by user ID and finds the maximum activity date for each user.

Main Query:

The main query joins the activities table with the MostRecentActivity subquery on the user ID. It then filters out the most recent activity by comparing the activity dates. Finally, it sorts the results in descending order of activity date to get the second most recent activity for each user.

Real-World Application:

This query can be useful in scenarios where you need to retrieve historical data for analysis. For example, you could use it to:

  • Analyze user behavior patterns

  • Identify trends over time


Swap Salary

Problem Statement:

Given a table Salaries with the following columns:

  • emp_id (integer) - Employee ID

  • name (string) - Employee name

  • salary (integer) - Employee salary

Swap the salaries of two employees with specific IDs.

Simplified Solution:

To swap the salaries of employees with IDs id1 and id2, you can use the following query:

-- Swap salaries of employees with IDs id1 and id2
UPDATE Salaries
SET salary = CASE
    WHEN emp_id = id1 THEN (SELECT salary FROM Salaries WHERE emp_id = id2)
    WHEN emp_id = id2 THEN (SELECT salary FROM Salaries WHERE emp_id = id1)
    ELSE salary
END
WHERE emp_id IN (id1, id2);

Breakdown and Explanation:

  1. CASE Statement: The CASE statement is used to conditionally update the salary column based on the value of emp_id.

  2. WHEN Clause: If emp_id equals id1, the salary is updated to the salary of the employee with emp_id equals id2.

  3. ELSE Clause: If neither of the WHEN clauses match, the salary is left unchanged.

  4. WHERE Clause: The WHERE clause ensures that the update is only applied to employees with emp_id in the list (id1, id2).

Example:

Let's say we have a table with the following data:

emp_id
name
salary

1

John Doe

5000

2

Jane Smith

6000

To swap the salaries of John and Jane, we can use the following query:

UPDATE Salaries
SET salary = CASE
    WHEN emp_id = 1 THEN (SELECT salary FROM Salaries WHERE emp_id = 2)
    WHEN emp_id = 2 THEN (SELECT salary FROM Salaries WHERE emp_id = 1)
    ELSE salary
END
WHERE emp_id IN (1, 2);

After executing this query, the salaries will be swapped:

emp_id
name
salary

1

John Doe

6000

2

Jane Smith

5000

Real-World Applications:

Swapping salaries is useful in scenarios where employees need to adjust their salaries for various reasons, such as:

  • Fairness and equity adjustments

  • Promotions and demotions

  • Salary negotiations

  • Temporary salary adjustments (e.g., for training or special projects)


Sales by Day of the Week

Problem:

Given a table Sales that contains information about sales transactions, determine the total sales for each day of the week.

Schema:

CREATE TABLE Sales (
  sales_id INT PRIMARY KEY,
  sales_date TIMESTAMP,
  sales_amount INT
);

Solution:

SELECT
  strftime('%w', sales_date) AS day_of_week,
  SUM(sales_amount) AS total_sales
FROM Sales
GROUP BY day_of_week
ORDER BY day_of_week;

Implementation and Explanation:

  1. Convert sales_date to Day of Week:

    • Use the strftime function with the '%w' format specifier to extract the day of the week from the sales_date column. This will convert the timestamp to an integer representing the day of the week (1=Sunday, 2=Monday, ..., 7=Saturday).

  2. Group by Day of Week:

    • Use the GROUP BY clause to group the results by the day_of_week column. This will create separate groups for each day of the week.

  3. Sum Sales Amount:

    • Within each group, use the SUM aggregate function to calculate the total sales amount.

  4. Order by Day of Week:

    • Use the ORDER BY clause to sort the results in ascending order by day_of_week for easy readability.

Example:

| day_of_week | total_sales |
| ----------- | ----------- |
| 1           | 100         |
| 2           | 200         |
| 3           | 300         |
| 4           | 400         |
| 5           | 500         |
| 6           | 600         |
| 7           | 700         |

Real-World Applications:

  • Sales Analysis: Retailers can use this query to identify which days of the week are most profitable for sales.

  • Scheduling: Businesses can optimize staffing and inventory levels based on the sales volume for each day of the week.

  • Marketing Campaigns: Marketers can tailor campaigns to target different customer segments based on their weekday shopping patterns.


Game Play Analysis I

Game Play Analysis I

Problem:

You're given a table called GamePlays with the following columns:

  • user_id: ID of the user

  • game_id: ID of the game

  • session_start: Start time of the game session

  • session_end: End time of the game session

Your task is to find the total number of unique users who played a game for at least n minutes.

Solution:

SELECT
  COUNT(DISTINCT user_id)
FROM GamePlays
WHERE
  CAST((session_end - session_start) AS INTEGER) / 60 >= n;

Breakdown:

  • The DISTINCT keyword before user_id ensures that only unique user IDs are counted.

  • The CAST function converts the time difference from seconds to minutes.

  • The INTEGER data type ensures that the result is an integer.

  • The >= operator checks if the converted time difference is greater than or equal to n minutes.

Real-World Applications:

This query can be used in various real-world scenarios:

  • Measuring User Engagement: Track the number of users who play a game for a significant amount of time to identify highly engaged players.

  • Retention Analysis: Determine how many users continue playing a game after a certain period of inactivity by measuring the number of unique users who have played in the past n minutes.

  • Marketing Campaigns: Target users who have recently played a game for a specific duration to promote new features or game updates.


Duplicate Emails

LeetCode Problem:

Duplicate Emails

Problem Statement:

Find all email addresses that appear more than once in a table of email addresses.

Table:

emails (email)

Example:

Input:
| email           |
|-----------------|
| john@example.com |
| mary@example.com |
| bob@example.com  |
| john@example.com |
| alice@example.com |

Output:
| email           |
|-----------------|
| john@example.com |

Solution:

The most straightforward solution is to use the COUNT() function to count the occurrences of each email address and then filter for those that appear more than once.

SELECT email
FROM emails
GROUP BY email
HAVING COUNT(*) > 1;

Explanation:

  1. GROUP BY email: This groups the rows in the emails table by the email column.

  2. COUNT(*): This counts the number of rows in each group.

  3. HAVING COUNT(*) > 1: This filters for the groups that have more than one row, indicating that the email address appears more than once.

Example Usage:

This query can be used in a variety of real-world applications, such as:

  • Detecting duplicate email addresses in a user database to prevent multiple accounts from being created with the same email.

  • Identifying potential spam emails, as spammers often use the same email address to send multiple emails.

  • Analyzing email usage patterns to identify popular email domains or email service providers.


Students and Examinations

Problem:

Find the students who have taken all the exams.

SQL Query:

SELECT StudentID
FROM Students
EXCEPT
SELECT DISTINCT StudentID
FROM Exams
WHERE StudentID NOT IN (
    SELECT StudentID
    FROM Students
);

Breakdown:

  • The SELECT statement retrieves the StudentID from the Students table.

  • The EXCEPT operator removes from this result set any StudentID that is present in the second query.

  • The second query uses the SELECT DISTINCT statement to retrieve the StudentID of students who have taken at least one exam.

  • The WHERE clause filters out students who are not in the Students table.

Real-World Example:

This query can be used to determine which students have completed all of their exams in a particular semester. This information can be used to:

  • Identify students who need additional support or tutoring.

  • Provide feedback to instructors on the effectiveness of their exams.

  • Generate reports on student progress.

Code Implementation:

-- Create the Students table
CREATE TABLE Students (
  StudentID INT PRIMARY KEY,
  Name VARCHAR(255)
);

-- Insert data into the Students table
INSERT INTO Students (StudentID, Name) VALUES
(1, 'John Doe'),
(2, 'Jane Smith'),
(3, 'Jack Jones');

-- Create the Exams table
CREATE TABLE Exams (
  ExamID INT PRIMARY KEY,
  StudentID INT,
  Score INT,
  FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);

-- Insert data into the Exams table
INSERT INTO Exams (ExamID, StudentID, Score) VALUES
(1, 1, 90),
(2, 2, 85),
(3, 3, 95),
(4, 1, 80);

-- Query to find students who have taken all the exams
SELECT StudentID
FROM Students
EXCEPT
SELECT DISTINCT StudentID
FROM Exams
WHERE StudentID NOT IN (
    SELECT StudentID
    FROM Students
);

Output:

StudentID
---------
2

This output shows that student with StudentID 2 has taken all the exams.


Activity Participants

LeetCode SQL Competitive Coding Problem:

Problem Statement:

Find the most active participants in a series of activities.

SQL Implementation:

WITH ActivityCounts AS (
  SELECT participant, COUNT(*) AS activity_count
  FROM ActivityParticipants
  GROUP BY participant
),
RankedParticipants AS (
  SELECT participant, activity_count,
    DENSE_RANK() OVER (ORDER BY activity_count DESC) AS rank
  FROM ActivityCounts
)
SELECT participant
FROM RankedParticipants
WHERE rank <= 3;

Explanation:

Step 1: Count Activity Participation

The ActivityCounts subquery counts the number of activities each participant participated in.

Step 2: Rank Participants

The RankedParticipants subquery ranks the participants based on their activity count in descending order. The DENSE_RANK function is used to assign consecutive ranks to participants with the same activity count.

Step 3: Select Most Active Participants

The final query selects the participants with the top 3 ranks.

Real-World Applications:

  • Identifying the most engaged users on a social media platform.

  • Determining the top performers in a fitness competition.

  • Tracking the most active employees in a project management system.


Consecutive Transactions with Increasing Amounts

Problem: Given a table of financial transactions, identify consecutive transactions where the amount increases.

Table:

transactions (id, amount, time)

Query:

WITH CTE AS (
  SELECT id, amount, time, ROW_NUMBER() OVER (ORDER BY id) AS row_num
  FROM transactions
)
SELECT t1.id, t1.amount
FROM CTE t1
JOIN CTE t2 ON t1.row_num + 1 = t2.row_num AND t1.amount < t2.amount;

Breakdown:

1. Common Table Expression (CTE):

  • Creates a temporary table called CTE that adds a row_num column for each row.

2. Joining Rows:

  • Joins t1 and t2 based on the row_num column.

  • The condition t1.row_num + 1 = t2.row_num ensures that the rows are consecutive.

3. Filtering by Amount:

  • The condition t1.amount < t2.amount ensures that the amount increases between consecutive rows.

Real-World Applications:

  • Fraud Detection: Identifying suspicious patterns of increasing transactions can help detect fraudulent activity.

  • Financial Analysis: Tracking the increasing trend of transactions can provide insights into spending habits and investment strategies.

  • Customer Segmentation: Identifying customers with a history of consecutive increasing transactions can help segment them for targeted marketing campaigns.

Example:

Transactions Table:

id
amount
time

1

100

2023-01-01

2

120

2023-01-02

3

150

2023-01-03

4

70

2023-01-04

Result:

id
amount

1

100

2

120

3

150


Election Results

Question:

Implement a SQL query to find the top k candidates in an election based on the number of votes they received.

Optimal Solution:

WITH RankedCandidates AS (
    SELECT candidate_id, SUM(votes) AS total_votes
    FROM election_results
    GROUP BY candidate_id
)
SELECT candidate_id, total_votes
FROM RankedCandidates
ORDER BY total_votes DESC
LIMIT k;

Breakdown:

Step 1: Create a Common Table Expression (CTE) to Rank Candidates

WITH RankedCandidates AS (
    SELECT candidate_id, SUM(votes) AS total_votes
    FROM election_results
    GROUP BY candidate_id
)
  • Creates a CTE called RankedCandidates that calculates the total votes for each candidate.

Step 2: Select the Top K Candidates

SELECT candidate_id, total_votes
FROM RankedCandidates
ORDER BY total_votes DESC
LIMIT k;
  • Selects the candidate_id and total_votes from RankedCandidates and orders them in descending order by the number of votes.

  • Uses the LIMIT clause to return only the top k candidates.

Simplification:

  • The election results are stored in a table called election_results with columns like candidate_id and votes.

  • We first calculate the total votes for each candidate and store them in a CTE called RankedCandidates.

  • Then we select the top k candidates from RankedCandidates based on the number of votes they received.

Real-World Application:

  • This query can be used to determine the winners in elections, where the candidates with the highest number of votes are declared victors.

  • It can also be used to rank contestants in competitions, such as sports tournaments or talent shows.


Biggest Single Number

Problem Statement:

Find the largest single digit number in a column of a table.

SQL Query:

SELECT MAX(SUBSTRING(num_column, 1, 1)) AS largest_digit
FROM table_name;

Breakdown:

  • SUBSTRING(num_column, 1, 1): This function extracts the first character (single digit) from the num_column.

  • MAX(): This function finds the maximum value among the extracted single digits.

Example:

Consider the following table:

num_column

123

456

789

Running the query returns:

largest_digit
9

Explanation:

The query extracts the first character (single digit) from each row of the num_column. The extracted digits are '1', '4', and '7'. The MAX() function then finds the maximum value among these digits, which is '9'.

Real-World Applications:

  • Finding the highest-valued digit in a set of numeric codes.

  • Analyzing financial data to identify the largest single digit in a series of values.

  • Determining the most frequent single digit in a dataset for statistical purposes.


NPV Queries

Problem Statement:

Given a list of cash flows, calculate the Net Present Value (NPV) using a specified discount rate.

SQL Implementation:

-- Define the cash flow table
CREATE TABLE CashFlows (
    Period INT,  -- Period number
    CashFlow INT  -- Cash flow amount
);

-- Insert sample data
INSERT INTO CashFlows (Period, CashFlow) VALUES
(0, 1000),
(1, 500),
(2, -250),
(3, 100),
(4, 50);

-- Calculate the NPV
SELECT SUM(CashFlow / POWER(1.0 + DiscountRate, Period)) AS NPV
FROM CashFlows
WHERE Period >= 0;

Breakdown and Explanation:

  1. Cash Flow Table: We create a table named CashFlows to store the period and cash flow amount for each period.

  2. Sample Data: We insert sample cash flow data into the table, including an initial investment of $1000, subsequent cash inflows and outflows, and a final cash flow of $50 in period 4.

  3. NPV Calculation: The NPV is calculated using the formula:

    NPV = ∑ (CashFlow / (1 + DiscountRate)^Period)
    • CashFlow is the cash flow amount for a specific period.

    • DiscountRate is the specified discount rate.

    • Period is the period number.

  4. SQL Query: We use a SQL query to calculate the NPV by summing the present value of each cash flow using the POWER function to adjust for the discount rate. We exclude periods before 0 (the initial investment) from the calculation.

Real-World Applications:

NPV is a widely used financial metric to evaluate investment projects. It helps businesses determine the profitability of a project by considering the time value of money. Potential applications include:

  • Analyzing the financial feasibility of a new product launch.

  • Determining the return on investment for a marketing campaign.

  • Evaluating capital budgeting decisions, such as purchasing new equipment or constructing a new facility.


Employees Earning More Than Their Managers

Problem Statement

Given a table Employees with columns id, name, salary, and manager_id, find all employees who earn more than their managers.

Example

id
name
salary
manager_id

1

John

1000

2

2

Mary

1200

null

3

Bob

950

2

4

Alice

1100

2

Output:

id
name
salary
manager_id

1

John

1000

2

4

Alice

1100

2

Solution

  1. Find all employees and their managers.

  2. Filter out employees who earn more than their managers.

SELECT e.id, e.name, e.salary, e.manager_id
FROM Employees e
INNER JOIN Employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Explanation

  1. INNER JOIN the Employees table with itself on the manager_id column. This creates a new table that contains all employees and their managers.

  2. WHERE clause filters out employees who earn more than their managers.

Real-World Applications

This query can be used to identify employees who may be underpaid compared to their managers. It can also be used to ensure that there is a fair pay structure within an organization.


The Users That Are Eligible for Discount

Problem Statement:

Find all users who are eligible for a discount.

Solution:

/* Users Table */
CREATE TABLE Users (
  id INT NOT NULL,
  email VARCHAR(255) NOT NULL,
  discount TINYINT NOT NULL DEFAULT 0,
  PRIMARY KEY (id)
);
INSERT INTO Users (id, email, discount) VALUES
(1, 'user1@example.com', 0),
(2, 'user2@example.com', 1),
(3, 'user3@example.com', 0),
(4, 'user4@example.com', 1);

/* Orders Table */
CREATE TABLE Orders (
  id INT NOT NULL,
  user_id INT NOT NULL,
  order_date DATETIME NOT NULL,
  total_amount DECIMAL(10, 2) NOT NULL,
  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES Users (id)
);
INSERT INTO Orders (id, user_id, order_date, total_amount) VALUES
(1, 1, '2022-01-01', 100.00),
(2, 2, '2022-01-02', 200.00),
(3, 3, '2022-01-03', 300.00),
(4, 4, '2022-01-04', 400.00);

/* Query to find eligible users */
SELECT DISTINCT u.id, u.email
FROM Users u
JOIN Orders o ON u.id = o.user_id
WHERE o.total_amount >= 200.00 AND u.discount = 0;

Explanation:

  1. Create Tables (Users & Orders): We start by creating the Users and Orders tables with their respective columns.

  2. Insert Data into Tables: We insert sample data into both tables to demonstrate how the query will work.

  3. Discount Eligibility Query: The main query retrieves eligible users by joining the Users and Orders tables and filtering based on the following conditions:

    • o.total_amount >= 200.00: The user has placed at least one order with a total amount of $200 or more.

    • u.discount = 0: The user is not currently receiving a discount.

Output:

id
email

3

user3@example.com

4

user4@example.com

These users have placed orders totaling $200 or more but are not yet receiving a discount. Therefore, they are eligible for a discount.

Real-World Application:

This query can be used in e-commerce websites to identify customers who are eligible for discounts based on their purchase history. By offering discounts to such customers, businesses can encourage repeat purchases and increase customer loyalty.


Unique Orders and Customers Per Month

Problem:

You are given a table called orders that contains the following columns:

order_id | user_id | order_date

You want to find the number of unique orders and customers per month.

Solution:

To solve this problem, we can use the following SQL query:

SELECT 
    strftime('%Y-%m', order_date) AS month,
    COUNT(DISTINCT order_id) AS num_orders,
    COUNT(DISTINCT user_id) AS num_customers
FROM 
    orders
GROUP BY 
    month
ORDER BY 
    month;

Explanation:

The strftime('%Y-%m', order_date) expression extracts the year and month from the order_date column. The COUNT(DISTINCT order_id) expression counts the number of distinct order IDs, which gives us the number of unique orders. The COUNT(DISTINCT user_id) expression counts the number of distinct user IDs, which gives us the number of unique customers. The GROUP BY month clause groups the results by month. The ORDER BY month clause orders the results by month.

Real-World Example:

This query can be used to analyze the number of unique orders and customers per month for an online store. This information can be used to identify trends and patterns in customer behavior, such as when orders are most likely to be placed or when new customers are most likely to sign up.

Potential Applications:

This query can be used for a variety of purposes, including:

  • Identifying trends and patterns in customer behavior

  • Forecasting demand

  • Optimizing marketing campaigns

  • Improving customer service


Count Apples and Oranges

Problem Statement:

You have two tables:

  • Apples: Contains information about apples, including their id and quantity.

  • Oranges: Contains information about oranges, including their id and quantity.

Write a query to count the total number of apples and oranges in the database.

Solution:

SELECT
  COUNT(*) AS total_fruits
FROM (
  SELECT * FROM Apples
  UNION ALL
  SELECT * FROM Oranges
);

Breakdown:

  • The main query uses a subquery to combine the rows from the Apples and Oranges tables into a single table.

  • The COUNT(*) function is used to count the total number of rows in the combined table, which gives us the total number of fruits.

Example:

| Total Fruits |
| ------------ |
| 20           |

Applications:

This query can be used in the following real-world scenarios:

  • Inventory management: To track the total number of apples and oranges in a warehouse.

  • Data analysis: To identify trends and patterns in the production and consumption of fruits.

  • Sales forecasting: To predict future demand for apples and oranges based on historical data.


Friend Requests II: Who Has the Most Friends

Problem Statement:

You are given a table Friend_Requests that represents friend requests between users on a social media platform. The table has the following schema:

CREATE TABLE Friend_Requests (
    requester_id INT NOT NULL,
    receiver_id INT NOT NULL,
    status VARCHAR(10) NOT NULL  -- 'pending', 'accepted', 'rejected'
);

Your task is to write a SQL query to find the user with the most accepted friends.

Solution:

-- Count the number of accepted friend requests for each user
SELECT requester_id, receiver_id, COUNT(*) AS friend_count
FROM Friend_Requests
WHERE status = 'accepted'
GROUP BY requester_id, receiver_id

-- Find the user with the maximum number of accepted friends
SELECT requester_id, MAX(friend_count) AS max_friends
FROM (
    SELECT requester_id, receiver_id, COUNT(*) AS friend_count
    FROM Friend_Requests
    WHERE status = 'accepted'
    GROUP BY requester_id, receiver_id
) AS subquery
GROUP BY requester_id

Explanation:

  1. Count Accepted Friend Requests: The first subquery counts the number of accepted friend requests for each user. It does this by grouping the rows by requester_id and receiver_id and then counting the number of rows in each group.

  2. Find User with Most Friends: The second subquery finds the user with the maximum number of accepted friends. It does this by first finding the maximum number of friends for each user in the first subquery and then grouping the results by requester_id. The MAX() function is used to find the maximum value for each group.

Real-World Application:

This query can be used in a social media platform to identify the most popular users. This information can be used for various purposes, such as:

  • Recommending users to follow

  • Displaying leaderboards of popular users

  • Identifying influencers for targeted advertising


Count Student Number in Departments

Problem Statement:

Given two tables, Students and Departments, find the count of students in each department.

Table Schema:

  • Students:

    • student_id (int)

    • name (string)

    • department_id (int)

  • Departments:

    • department_id (int)

    • name (string)

Implementation:

-- Count the number of students in each department
SELECT
  d.name AS department_name,
  COUNT(s.student_id) AS number_of_students
FROM
  Students s
JOIN
  Departments d
ON
  s.department_id = d.department_id
GROUP BY
  d.name
ORDER BY
  d.name;

Explanation:

  1. JOIN the Students and Departments tables:

    • The JOIN clause connects the Students and Departments tables on the common column department_id. This creates a new table that includes rows from both tables.

  2. Count the number of students in each department:

    • The COUNT() function counts the number of rows in the Students table for each department. This gives us the number of students in each department.

  3. GROUP BY department name:

    • The GROUP BY clause groups the results by the department_name column. This means that the results will show the count of students for each department separately.

  4. ORDER BY department name:

    • The ORDER BY clause sorts the results by the department_name column in ascending order.

Output:

department_name
number_of_students

Computer Science

50

Mathematics

30

Physics

20

Real-World Applications:

This query can be used in various real-world scenarios, such as:

  • Generating reports on student enrollment by department

  • Allocating resources (e.g., teachers, classrooms) based on student numbers

  • Analyzing student trends and patterns within departments


Product Sales Analysis IV

Problem Statement

You are given a table named ProductSales that contains the following columns:

  • product_id (integer) - The unique identifier of the product.

  • quantity_sold (integer) - The quantity of the product sold.

  • sale_date (date) - The date when the product was sold.

You are asked to write a SQL query that calculates the total quantity sold and the average sale date for each unique product.

Solution

SELECT product_id, SUM(quantity_sold) AS total_quantity_sold, AVG(sale_date) AS average_sale_date
FROM ProductSales
GROUP BY product_id;

Explanation

The above SQL query uses the GROUP BY clause to group the rows in the ProductSales table by the product_id column. This means that for each unique product_id, the query will calculate the total quantity sold and the average sale date.

The SUM() function is used to calculate the total quantity sold for each product, while the AVG() function is used to calculate the average sale date for each product.

The GROUP BY clause is important because it ensures that the query only returns one row for each unique product_id. This is necessary because the SUM() and AVG() functions can only be applied to a single group of rows.

Example

Consider the following ProductSales table:

product_id
quantity_sold
sale_date

1

10

2023-01-01

1

20

2023-01-02

2

30

2023-01-03

The following SQL query would return the following results:

SELECT product_id, SUM(quantity_sold) AS total_quantity_sold, AVG(sale_date) AS average_sale_date
FROM ProductSales
GROUP BY product_id;
product_id
total_quantity_sold
average_sale_date

1

30

2023-01-01.5

2

30

2023-01-03

Real-World Applications

The query provided above can be used in a variety of real-world applications, such as:

  • Inventory management: The query can be used to determine the total quantity of a product that has been sold and the average date on which it was sold. This information can be used to manage inventory levels and ensure that there is always enough stock on hand.

  • Sales forecasting: The query can be used to forecast future sales by identifying trends in the total quantity sold and the average sale date. This information can be used to make decisions about product pricing, marketing, and staffing.

  • Product performance analysis: The query can be used to compare the performance of different products. This information can be used to identify products that are selling well and products that are not selling well.


Highest Salaries Difference

Problem Statement:

Find the difference between the maximum and minimum salaries for each department.

SQL Query:

SELECT department, MAX(salary) - MIN(salary) AS salary_difference
FROM employee
GROUP BY department;

Breakdown and Explanation:

SELECT department, MAX(salary) - MIN(salary) AS salary_difference:

  • SELECT department: Selects the department column.

  • MAX(salary): Calculates the maximum salary for each department.

  • MIN(salary): Calculates the minimum salary for each department.

  • -: Subtracts the minimum salary from the maximum salary to find the difference.

  • AS salary_difference: Aliases the result as salary_difference.

FROM employee:

  • Selects data from the employee table.

GROUP BY department:

  • Groups the results by the department column.

Real-World Application:

This query can be used in HR systems to analyze salary disparities within departments. It can help identify departments where there is a significant gap between the highest and lowest salaries. This information can be used to address potential pay inequality issues.

Example:

Consider the following employee table:

employee_id
department
salary

1

Sales

100000

2

Sales

80000

3

Engineering

120000

4

Engineering

90000

Query Result:

department
salary_difference

Sales

20000

Engineering

30000

This result shows that the Sales department has a salary difference of $20,000, while the Engineering department has a salary difference of $30,000.


Team Scores in Football Tournament

Problem Statement:

In a football tournament, there are two teams in each match. Each match has a home team and an away team. The home team wins 3 points if they win the match, 1 point if they draw, and 0 points if they lose. The away team wins 0 points if they lose, 1 point if they draw, and 3 points if they win.

Given a table Matches that contains the results of the matches played in the tournament, you need to find the final scores of each team.

Table Structure:

CREATE TABLE Matches (
    match_id INT PRIMARY KEY,
    home_team_id INT,
    away_team_id INT,
    home_team_score INT,
    away_team_score INT
);

Example:

INSERT INTO Matches (match_id, home_team_id, away_team_id, home_team_score, away_team_score) VALUES
(1, 1, 2, 2, 1),
(2, 3, 4, 0, 3),
(3, 5, 6, 1, 1);

Output:

| team_id | total_score |
|---|---|
| 1 | 3 |
| 2 | 0 |
| 3 | 0 |
| 4 | 3 |
| 5 | 1 |
| 6 | 1 |

Solution:

To find the total score for each team, we can use a CASE expression to determine the points awarded to each team based on the match result. We then use a SUM() function to calculate the total points for each team.

SELECT team_id, SUM(CASE
    WHEN m.home_team_score > m.away_team_score THEN 3
    WHEN m.home_team_score = m.away_team_score THEN 1
    ELSE 0
END) AS total_score
FROM Matches m
GROUP BY team_id;

Breakdown:

  • The CASE expression evaluates the match results and assigns points accordingly:

    • If the home team score is greater than the away team score, the home team wins 3 points.

    • If the home team score is equal to the away team score, both teams draw and earn 1 point each.

    • Otherwise, the away team wins 3 points.

  • We group the results by the team ID using the GROUP BY clause.

  • We use the SUM() function to calculate the total points earned by each team.

Real-World Applications:

This query can be used to find the final standings in a football tournament or league. It can also be used to track the performance of individual teams over time.


Merge Overlapping Events in the Same Hall

Problem Statement:

You are given a table event that stores information about events happening in different halls.

CREATE TABLE event (
  id INT PRIMARY KEY,
  hall_id INT,
  start_time DATETIME,
  end_time DATETIME
);

You need to merge overlapping events in the same hall into a single event.

Solution:

SELECT hall_id, MIN(start_time) AS start_time, MAX(end_time) AS end_time
FROM event
GROUP BY hall_id
ORDER BY hall_id, start_time;

Explanation:

  1. Group By Hall ID: Group the events by their hall_id. This will give us a list of all the events in each hall.

  2. Calculate Minimum Start Time: For each group of events in a hall, find the minimum start_time. This will be the start time of the merged event.

  3. Calculate Maximum End Time: For each group of events in a hall, find the maximum end_time. This will be the end time of the merged event.

  4. Order by Hall ID and Start Time: Finally, order the merged events by their hall_id and start_time.

Example:

| hall_id | start_time | end_time |
|---------|------------|----------|
| 1        | 2022-05-01 | 2022-05-03 |
| 1        | 2022-05-02 | 2022-05-04 |
| 1        | 2022-05-05 | 2022-05-07 |
| 2        | 2022-06-01 | 2022-06-02 |
| 2        | 2022-06-02 | 2022-06-03 |

Output:

| hall_id | start_time | end_time |
|---------|------------|----------|
| 1        | 2022-05-01 | 2022-05-07 |
| 2        | 2022-06-01 | 2022-06-03 |

Real-World Applications:

This query can be used in real-world applications such as:

  • Event Management: To merge overlapping events in a calendar to avoid scheduling conflicts.

  • Room Booking: To find available time slots in a conference room by merging overlapping bookings.

  • Resource Allocation: To optimize resource utilization by merging overlapping tasks.


Number of Calls Between Two Persons

  1. Problem Statement: Given a table CallLog that records the call history of a group of people, find the number of calls between two specific persons, A and B.

  2. Table Schema:

CallLog (
  CallerId INT,
  ReceiverId INT,
  CallTime TIMESTAMP
)
  1. SQL Query:

SELECT COUNT(*) AS NumberOfCalls
FROM CallLog
WHERE (CallerId = A AND ReceiverId = B) OR (CallerId = B AND ReceiverId = A);
  1. Explanation:

    • The query first calculates the number of calls where A was the caller and B was the receiver, or vice versa.

    • The OR keyword combines the two conditions into a single expression.

  2. Example:

| CallerId | ReceiverId | CallTime |
|---|---|---|
| 1 | 2 | 2023-03-08 12:34:56 |
| 2 | 3 | 2023-03-09 11:12:34 |
| 3 | 4 | 2023-03-10 10:23:15 |
| 4 | 1 | 2023-03-11 09:34:26 |
  • To find the number of calls between person 1 and person 2, the query would be:

SELECT COUNT(*) AS NumberOfCalls
FROM CallLog
WHERE (CallerId = 1 AND ReceiverId = 2) OR (CallerId = 2 AND ReceiverId = 1);
  • The result would be 2, as there are two calls recorded in the table between these two persons.

  1. Real-World Applications:

    • Telecom companies can use this query to analyze call patterns and identify frequently called contacts.

    • Law enforcement agencies can use it to investigate communication networks and track relationships between individuals.


Product Price at a Given Date

Problem: You have a table Product that contains the following columns:

  • product_id (int)

  • price (float)

  • date (date)

You want to find the price of a product on a given date.

Solution:

SELECT price
FROM Product
WHERE product_id = ? AND date = ?;

Explanation: This query uses the equality operator (=) to find the row in the Product table that has the specified product_id and date. It then returns the price column from that row.

Example:

SELECT price
FROM Product
WHERE product_id = 1 AND date = '2023-01-01';

This query would return the price of the product with product_id 1 on January 1, 2023.

Real-World Applications: This query can be used in a variety of real-world applications, such as:

  • Tracking the price history of a product

  • Finding the lowest price for a product on a given date

  • Generating invoices for products sold on a given date


Customers with Maximum Number of Transactions on Consecutive Days

Problem Statement:

Given a table of customer transactions transactions with columns customer_id, transaction_date, and amount, find the customers who have the maximum number of consecutive days with at least one transaction.

Best & Performant SQL Solution:

WITH CustomerConsecutiveTransactionDays AS (
    SELECT
        customer_id,
        transaction_date,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY transaction_date) AS row_num,
        CASE
            WHEN LAG(transaction_date, 1, NULL) OVER (PARTITION BY customer_id ORDER BY transaction_date) = DATE_SUB(transaction_date, INTERVAL 1 DAY)
            THEN 1
            ELSE 0
        END AS consecutive_flag
    FROM
        transactions
), MaxConsecutiveTransactionDays AS (
    SELECT
        customer_id,
        MAX(consecutive_flag) AS max_consecutive_days
    FROM
        CustomerConsecutiveTransactionDays
    GROUP BY
        customer_id
), CustomerWithMaxConsecutiveTransactions AS (
    SELECT
        customer_id,
        transaction_date
    FROM
        CustomerConsecutiveTransactionDays
    WHERE
        max_consecutive_days = (
            SELECT
                max_consecutive_days
            FROM
                MaxConsecutiveTransactionDays
            WHERE
                customer_id = CustomerConsecutiveTransactionDays.customer_id
        )
)
SELECT
    customer_id,
    GROUP_CONCAT(transaction_date) AS consecutive_transactions
FROM
    CustomerWithMaxConsecutiveTransactions
GROUP BY
    customer_id;

Explanation:

  1. Calculate Consecutive Transaction Days: The CustomerConsecutiveTransactionDays CTE calculates the row number of transactions for each customer and adds a flag (consecutive_flag) to indicate whether a transaction is consecutive to the previous one.

  2. Find Maximum Consecutive Days: The MaxConsecutiveTransactionDays CTE finds the maximum consecutive transaction days for each customer.

  3. Identify Customers with Maximum Consecutive Transactions: The CustomerWithMaxConsecutiveTransactions CTE selects the customers with the maximum consecutive transaction days.

  4. Group Transactions by Customer: The final query groups the selected transactions by customer and returns the customer_id and a comma-separated list of consecutive_transactions.

Real-World Applications:

This query is useful for analyzing customer loyalty and engagement. Businesses can use it to:

  • Identify customers who are most actively engaged with their products or services.

  • Reward customers for consecutive purchases to encourage repeat business.

  • Target marketing campaigns to customers with high engagement levels.


Group Sold Products By The Date

Problem: Given a table SoldProducts with the following columns:

  • id (int)

  • product_id (int)

  • date (date)

  • quantity (int)

Group the sold products by date and product ID and calculate the total quantity sold for each group.

Solution:

SELECT 
    date, 
    product_id, 
    SUM(quantity) AS total_quantity_sold
FROM 
    SoldProducts
GROUP BY 
    date, product_id

Breakdown:

  • SELECT ...: Selects the date, product_id, and the sum of quantity as total_quantity_sold.

  • FROM SoldProducts: Specifies the table to be queried from.

  • GROUP BY ...: Groups the rows in the table by date and product_id. The rows with the same date and product_id are grouped together.

  • SUM(quantity) ...: Calculates the sum of quantity for each group. This gives the total quantity sold for each date and product_id combination.

Example: Consider the following SoldProducts table:

id
product_id
date
quantity

1

1

2023-01-01

10

2

2

2023-01-01

5

3

1

2023-01-02

15

4

2

2023-01-02

10

The query would produce the following output:

date
product_id
total_quantity_sold

2023-01-01

1

10

2023-01-01

2

5

2023-01-02

1

15

2023-01-02

2

10

Real-World Applications:

This query can be used in real-world scenarios such as:

  • Analyzing sales trends for different products over time.

  • Identifying products with the highest and lowest sales on specific dates.

  • Forecasting future sales based on historical data.


Friend Requests I: Overall Acceptance Rate

Problem:

Given a table FriendRequests containing friend request records, the task is to find the overall acceptance rate of friend requests.

Table Schema:

CREATE TABLE FriendRequests (
  id INT PRIMARY KEY,
  sender_id INT NOT NULL,
  receiver_id INT NOT NULL,
  status INT NOT NULL,
  created_at TIMESTAMP NOT NULL
);

SQL Query:

WITH AcceptedRequests AS (
  SELECT
    sender_id,
    receiver_id
  FROM FriendRequests
  WHERE
    status = 1
), TotalRequests AS (
  SELECT
    sender_id,
    COUNT(receiver_id) AS total_requests
  FROM FriendRequests
  GROUP BY
    sender_id
)
SELECT
  SUM(AR.sender_id) / SUM(TR.total_requests) AS acceptance_rate
FROM AcceptedRequests AS AR
JOIN TotalRequests AS TR
  ON AR.sender_id = TR.sender_id;

Explanation:

Step 1: Get Accepted Requests

The AcceptedRequests Common Table Expression (CTE) filters the FriendRequests table to include only accepted requests (i.e., status = 1).

WITH AcceptedRequests AS (
  SELECT
    sender_id,
    receiver_id
  FROM FriendRequests
  WHERE
    status = 1
)

Step 2: Get Total Requests

The TotalRequests CTE calculates the total number of requests sent by each sender.

WITH TotalRequests AS (
  SELECT
    sender_id,
    COUNT(receiver_id) AS total_requests
  FROM FriendRequests
  GROUP BY
    sender_id
)

Step 3: Calculate Acceptance Rate

The outer query joins the AcceptedRequests and TotalRequests CTEs on the sender_id and calculates the acceptance rate by dividing the number of accepted requests by the total number of requests.

SELECT
  SUM(AR.sender_id) / SUM(TR.total_requests) AS acceptance_rate
FROM AcceptedRequests AS AR
JOIN TotalRequests AS TR
  ON AR.sender_id = TR.sender_id;

Output:

The output is a single row with a decimal value representing the overall acceptance rate of friend requests.

Real-World Applications:

  • Measuring the success of social media platforms in connecting users

  • Identifying popular users or influencers

  • Understanding user behavior and engagement patterns


Find Followers Count

Problem Statement

Given a table of users and their followers, find the count of followers for each user.

SQL Query:

SELECT user_id, COUNT(*) AS follower_count
FROM followers
GROUP BY user_id;

Breakdown and Explanation:

  1. SELECT user_id, COUNT(*) AS follower_count: This line selects the user's ID and counts the number of rows in the followers table where the user_id column matches the current row's user_id. The result is stored in a new column named follower_count.

  2. FROM followers: This line specifies that the data is being selected from the followers table.

  3. GROUP BY user_id: This line groups the results by the user_id column. This means that for each unique user_id, the query will return a single row containing the user_id and the total count of followers associated with that user_id.

Real-World Application:

This query can be used in social networking applications to display the number of followers for each user. It can also be used for analysis, such as identifying the most popular users or tracking the growth of user followings over time.

Example:

user_id
follower_count

1

10

2

5

3

15

This table shows the number of followers for each user ID. User 1 has 10 followers, User 2 has 5 followers, and User 3 has 15 followers.


Warehouse Manager

Problem Statement

You are given a table WarehouseManagers with the following schema:

| Column | Type |
|---|---|
| ManagerID | int |
| ManagerName | varchar(255) |
| Department | varchar(255) |
| Salary | int |

Find the Department with the maximum average salary.

SOLUTION

-- Calculate the average salary for each department
SELECT Department, AVG(Salary) AS AverageSalary
FROM WarehouseManagers
GROUP BY Department;

-- Find the department with the maximum average salary
SELECT Department
FROM (
    SELECT Department, AVG(Salary) AS AverageSalary
    FROM WarehouseManagers
    GROUP BY Department
) AS Subquery
WHERE AverageSalary = (SELECT MAX(AverageSalary) FROM Subquery);

Breakdown of the Solution

  1. Calculate the average salary for each department:

SELECT Department, AVG(Salary) AS AverageSalary
FROM WarehouseManagers
GROUP BY Department;

This query calculates the average salary for each department and stores the result in a temporary table called Subquery.

  1. Find the department with the maximum average salary:

SELECT Department
FROM Subquery
WHERE AverageSalary = (SELECT MAX(AverageSalary) FROM Subquery);

This query finds the department with the maximum average salary from the Subquery table.

Real-World Application

This query can be used to identify the departments with the highest average salaries, which can be helpful for HR planning and budgeting. For example, a company may want to offer higher bonuses or promotions to employees in departments with the highest average salaries to retain top talent.

Additional Notes

  • This solution uses a subquery to calculate the maximum average salary. This approach is efficient because it only needs to scan the data once to calculate the average salary for each department.

  • The AVG() function is used to calculate the average salary. This function takes a set of values and returns the average of those values.


Project Employees III

Problem Statement

Given a table of projects and employees, find the number of employees working on each project.

Input Table

Project
Employee

1

John

1

Mary

2

Jane

2

Peter

2

Susan

3

Michael

Output Table

Project
Employee_Count

1

2

2

3

3

1

Solution

SELECT
  Project,
  COUNT(*) AS Employee_Count
FROM Projects
JOIN Employees
  ON Projects.Project = Employees.Project
GROUP BY
  Project;

Explanation

  1. Join the tables: We join the Projects and Employees tables on the Project column to create a single table that contains all the project and employee data.

  2. Count the employees: We use the COUNT(*) function to count the number of employees for each project.

  3. Group the results: We group the results by project to get the employee count for each project.

Real-World Applications

This query can be used to track employee workload and resource allocation in a project management system. It can also be used to identify projects that are understaffed or overstaffed.

Potential Applications

  • Project planning

  • Resource allocation

  • Employee management

  • Performance evaluation


The Number of Users That Are Eligible for Discount

Problem: Find the number of users eligible for a discount.

Data:

CREATE TABLE users (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL,
  orders INT NOT NULL DEFAULT 0,
  PRIMARY KEY (id),
  UNIQUE INDEX idx_email (email)
);

CREATE TABLE orders (
  id INT NOT NULL AUTO_INCREMENT,
  user_id INT NOT NULL,
  amount DECIMAL(10, 2) NOT NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id),
  FOREIGN KEY (user_id) REFERENCES users(id)
);

Solution:

SELECT COUNT(*) AS num_eligible_users
FROM users
WHERE orders >= 3;

Explanation:

  1. COUNT(*) counts the number of rows.

  2. WHERE orders >= 3 filters out users with less than 3 orders.

  3. The result is the number of users who have placed at least 3 orders.

Potential Applications:

  • Identifying customers who are eligible for loyalty discounts.

  • Segmenting users based on their purchase history.

  • Analyzing customer behavior and targeting marketing campaigns.


Students Report By Geography

Students Report By Geography

Problem:

Given a table of students with their names, countries, and grades, find the average grade for students from each country.

Solution:

SELECT country, AVG(grade) AS average_grade
FROM students
GROUP BY country;

Breakdown:

  • The SELECT clause specifies the columns to be included in the result: country and the average grade.

  • The FROM clause specifies the table from which to retrieve the data: students.

  • The GROUP BY clause groups the rows in the table by the country column.

  • The AVG() function calculates the average grade for each group.

Example:

Country
Average Grade

USA

85

UK

90

France

75

Applications in Real World:

  • Analyzing student performance by country

  • Identifying countries with the highest or lowest average grades

  • Targeting educational resources to specific regions


Big Countries

Problem:

Find all countries with a population greater than 100 million.

SQL Query:

SELECT name
FROM Country
WHERE population > 100000000;

Explanation:

  1. SELECT name: This part of the query selects the name column from the Country table. This column contains the names of the countries.

  2. FROM Country: This specifies that we are selecting data from the Country table.

  3. WHERE population > 100000000: This is a filter condition. It checks for all rows in the Country table where the population column is greater than 100 million.

Real-World Applications:

The query can be used in various applications, such as:

  • Analyzing global population trends

  • Identifying countries with high birth rates

  • Studying the distribution of population across the world


Sales Analysis II

Problem Statement:

Given a table Sales with columns product_id, date, quantity, and price, find the total sales for each product in a given date range.

Solution:

SELECT product_id,
       SUM(quantity) AS total_quantity,
       SUM(price * quantity) AS total_sales
FROM Sales
WHERE date BETWEEN '2023-01-01' AND '2023-03-31'
GROUP BY product_id;

Breakdown:

  • SELECT product_id, SUM(quantity) AS total_quantity, SUM(price * quantity) AS total_sales: This line selects the product ID, total quantity sold, and total sales for each product.

  • FROM Sales: This specifies the table from which to retrieve the data.

  • WHERE date BETWEEN '2023-01-01' AND '2023-03-31': This filters the rows based on dates within the specified range ('January 1, 2023' to 'March 31, 2023').

  • GROUP BY product_id: This groups the results by product ID, so that the total sales and quantity are calculated for each product separately.

Example:

| product_id | total_quantity | total_sales |
|-------------|----------------|-------------|
| 1           | 100            | 1000        |
| 2           | 50             | 750         |
| 3           | 75             | 1125        |

Explanation:

This query will return a table with three columns: product ID, total quantity sold, and total sales for products sold between '2023-01-01' and '2023-03-31'.

Performance:

The GROUP BY operation can be expensive, especially for large datasets. To improve performance, if the table has an index on the product_id column, the database can use the index to efficiently retrieve the results.

Real-World Applications:

This query can be used in a variety of scenarios, including:

  • Sales analysis: To track sales for different products over time and identify trends.

  • Inventory management: To ensure adequate stock levels for high-selling products.

  • Product performance evaluation: To compare the sales of different products and identify areas for improvement.


Find the Subtasks That Did Not Execute

Problem: Find the Subtasks That Did Not Execute

SQL Query:

SELECT SubtaskID
FROM Subtasks
WHERE TaskID NOT IN (SELECT TaskID FROM CompletedTasks);

Explanation:

This query retrieves the SubtaskIDs of subtasks that do not have a corresponding entry in the CompletedTasks table.

Breakdown:

  1. Subtasks Table: Stores information about subtasks, including their SubtaskID.

  2. CompletedTasks Table: Stores information about completed tasks, including their TaskID.

  3. TaskID Column: This column is present in both tables and establishes a relationship between subtasks and their parent tasks.

  4. NOT IN Operator: The NOT IN operator checks whether the TaskID of a subtask is present in the CompletedTasks table.

Simplified Explanation:

Imagine you have a task management system. Each task can have multiple subtasks. This query finds subtasks that belong to tasks that have not been completed yet.

Real-World Example:

  • Inventory Management: Find products that have not been checked into the warehouse.

  • Project Management: Identify sub-steps of a project that still need to be completed.

Code Implementation:

import mysql.connector

# Connect to the database
connection = mysql.connector.connect(
    host="localhost",
    user="root",
    password="",
    database="tasks"
)

# Create a cursor to execute queries
cursor = connection.cursor()

# Execute the query
cursor.execute("SELECT SubtaskID FROM Subtasks WHERE TaskID NOT IN (SELECT TaskID FROM CompletedTasks);")

# Fetch the results
results = cursor.fetchall()

# Print the list of subtask IDs
for result in results:
    print(result[0])  # SubtaskID

# Close the cursor and connection
cursor.close()
connection.close()

Shortest Distance in a Line

Problem Statement:

Given a line segment defined by two points (x1, y1) and (x2, y2), find the shortest distance from a point (x3, y3) to the line segment.

SQL Solution:

WITH LineSegment AS (
  SELECT
    x1, y1, x2, y2
  FROM Points
), Distance AS (
  SELECT
    x3, y3,
    SQRT(POW(x3 - x1, 2) + POW(y3 - y1, 2)) AS dist1,
    SQRT(POW(x3 - x2, 2) + POW(y3 - y2, 2)) AS dist2,
    CASE
      WHEN MIN(dist1, dist2) = dist1
      THEN (x1, y1)
      ELSE (x2, y2)
    END AS nearest_point
  FROM LineSegment
  JOIN Points ON Points.x = LineSegment.x1 AND Points.y = LineSegment.y1
)
SELECT
  MIN(SQRT(POW(Distance.x3 - Distance.nearest_point.x, 2) + POW(Distance.y3 - Distance.nearest_point.y, 2))) AS shortest_distance
FROM Distance;

Step-by-Step Explanation:

1. Define the Line Segment:

The LineSegment CTE (Common Table Expression) defines the line segment using the coordinates of its endpoints.

2. Calculate Distances:

The Distance CTE calculates the distances from point (x3, y3) to the two endpoints of the line segment: dist1 and dist2. It also identifies the nearest endpoint, nearest_point, to point (x3, y3).

3. Find Minimum Distance:

The main query retrieves the minimum of the two distances, dist1 and dist2, and also finds the corresponding nearest endpoint. This represents the shortest distance from point (x3, y3) to the line segment.

4. Calculate Final Answer:

The final query calculates the shortest distance from point (x3, y3) to the line segment by finding the minimum of the distances between the given point and the nearest endpoint identified in the previous step.

Example:

Let's say we have a line segment defined by the points (1, 2) and (3, 4) and a point (2, 3).

The following query would return the shortest distance:

WITH LineSegment AS (
  SELECT
    1 AS x1, 2 AS y1, 3 AS x2, 4 AS y2
), Distance AS (
  SELECT
    2 AS x3, 3 AS y3,
    SQRT(POW(2 - 1, 2) + POW(3 - 2, 2)) AS dist1,
    SQRT(POW(2 - 3, 2) + POW(3 - 4, 2)) AS dist2,
    CASE
      WHEN MIN(dist1, dist2) = dist1
      THEN (1, 2)
      ELSE (3, 4)
    END AS nearest_point
  FROM LineSegment
  JOIN Points ON Points.x = LineSegment.x1 AND Points.y = LineSegment.y1
)
SELECT
  MIN(SQRT(POW(Distance.x3 - Distance.nearest_point.x, 2) + POW(Distance.y3 - Distance.nearest_point.y, 2))) AS shortest_distance
FROM Distance;

Result:

shortest_distance = 1

Orders With Maximum Quantity Above Average

Problem Statement

Given a table 'Orders' with the following schema:

| Column | Type |
|---|---|
| id | int |
| product_id | int |
| quantity | int |

Write an SQL query to find all orders with a quantity greater than the average quantity for all orders.

Solution

The following SQL query solves the problem:

SELECT *
FROM Orders
WHERE quantity > (SELECT AVG(quantity) FROM Orders);

Breakdown

The query is a simple SELECT statement that retrieves all rows from the 'Orders' table where the quantity column is greater than the average quantity of all orders.

The subquery (SELECT AVG(quantity) FROM Orders) calculates the average quantity of all orders in the table. This value is then used in the main query to filter out orders with quantities that are above average.

Example

Consider the following 'Orders' table:

| id | product_id | quantity |
|---|---|---|
| 1 | 1 | 10 |
| 2 | 2 | 5 |
| 3 | 3 | 15 |
| 4 | 4 | 12 |
| 5 | 5 | 8 |

The average quantity of all orders is:

(10 + 5 + 15 + 12 + 8) / 5 = 10

Therefore, the query SELECT * FROM Orders WHERE quantity > 10 would return the following rows:

| id | product_id | quantity |
|---|---|---|
| 1 | 1 | 10 |
| 3 | 3 | 15 |
| 4 | 4 | 12 |

Real-World Applications

This query can be used in a variety of real-world applications, such as:

  • Identifying customers who are purchasing above-average quantities of a product

  • Flagging orders that may be fraudulent due to unusually high quantities

  • Analyzing sales trends to identify products that are selling well and products that are not selling well


Leetcodify Similar Friends

Problem:

Find pairs of friends who have at least k common friends.

Table Schema:

CREATE TABLE Friends (
    user_id1 INT NOT NULL,
    user_id2 INT NOT NULL,
    PRIMARY KEY (user_id1, user_id2)
);

Solution:

SELECT
    f1.user_id1 AS user1,
    f1.user_id2 AS user2
FROM
    Friends AS f1
JOIN
    Friends AS f2 ON f1.user_id1 = f2.user_id2
WHERE
    f1.user_id2 != f2.user_id1
    AND EXISTS (
        SELECT
            *
        FROM
            Friends AS f3
        WHERE
            (f3.user_id1 = f1.user_id1 AND f3.user_id2 = f2.user_id2)
            OR (f3.user_id1 = f2.user_id1 AND f3.user_id2 = f1.user_id2)
    )
    AND (
        SELECT
            COUNT(*)
        FROM
            Friends AS f4
        WHERE
            (f4.user_id1 = f1.user_id1 AND f4.user_id2 = f2.user_id2)
            OR (f4.user_id1 = f2.user_id1 AND f4.user_id2 = f1.user_id2)
    ) >= k;

Explanation:

  1. Join the Friends table with itself on the condition that f1.user_id1 is equal to f2.user_id2. This ensures that we only consider pairs of friends.

  2. Filter out pairs where f1.user_id2 is equal to f2.user_id1, as these pairs are duplicates.

  3. Check if there exists a common friend f3 between f1 and f2. This is done using the subquery in the EXISTS clause.

  4. Count the number of common friends between f1 and f2 using another subquery.

  5. Filter out pairs with less than k common friends.

Example:

SELECT
    f1.user_id1 AS user1,
    f1.user_id2 AS user2
FROM
    Friends AS f1
JOIN
    Friends AS f2 ON f1.user_id1 = f2.user_id2
WHERE
    f1.user_id2 != f2.user_id1
    AND EXISTS (
        SELECT
            *
        FROM
            Friends AS f3
        WHERE
            (f3.user_id1 = f1.user_id1 AND f3.user_id2 = f2.user_id2)
            OR (f3.user_id1 = f2.user_id1 AND f3.user_id2 = f1.user_id2)
    )
    AND (
        SELECT
            COUNT(*)
        FROM
            Friends AS f4
        WHERE
            (f4.user_id1 = f1.user_id1 AND f4.user_id2 = f2.user_id2)
            OR (f4.user_id1 = f2.user_id1 AND f4.user_id2 = f1.user_id2)
    ) >= 2;

This query returns all pairs of friends who have at least 2 common friends.

Possible applications in real world:

  • Recommending friends to users on social media platforms based on their common connections.

  • Identifying influencers in a network by analyzing their number of common friends.

  • Detecting fraudulent activities by identifying connected groups of users.


Confirmation Rate

Problem Statement:

Given a table of bookings, determine the confirmation rate for each hotel.

Table:

bookings (
  hotel_id INT,
  confirmed BOOLEAN
)

Query:

SELECT
  hotel_id,
  SUM(confirmed) AS total_confirmed,
  COUNT(*) AS total_bookings,
  ROUND((SUM(confirmed) * 100.0) / COUNT(*), 2) AS confirmation_rate
FROM bookings
GROUP BY hotel_id;

Explanation:

  1. SUM(confirmed): Counts the number of confirmed bookings for each hotel.

  2. COUNT(*): Counts the total number of bookings for each hotel.

  3. ROUND((SUM(confirmed) * 100.0) / COUNT(*), 2): Calculates the confirmation rate as a percentage by dividing the number of confirmed bookings by the total bookings and multiplying by 100. The ROUND function rounds the result to two decimal places.

Result:

The query returns the following columns for each hotel:

  • hotel_id: The unique identifier for the hotel.

  • total_confirmed: The total number of confirmed bookings.

  • total_bookings: The total number of bookings.

  • confirmation_rate: The confirmation rate as a percentage.

Real-World Application:

This query is useful for hotel managers to track the performance of their confirmation process. A high confirmation rate indicates that the hotel is efficiently confirming bookings, while a low rate suggests areas for improvement.


Calculate the Influence of Each Salesperson

Problem Statement:

Sales records are maintained in a table "Sales", which has columns like "SalespersonId", "SalesAmount", and "Date". Calculate the influence of each salesperson by computing their average daily sales. The influence is defined as the salesperson's average daily sales divided by the total average daily sales of all salespeople.

SQL Query:

WITH SalespersonDailySales AS (
    SELECT
        SalespersonId,
        DATE(Date) AS SaleDate,
        SUM(SalesAmount) AS DailySales
    FROM
        Sales
    GROUP BY
        SalespersonId,
        SaleDate
), SalespersonTotalSales AS (
    SELECT
        SalespersonId,
        SUM(DailySales) AS TotalSales
    FROM
        SalespersonDailySales
    GROUP BY
        SalespersonId
)
SELECT
    SalespersonId,
    (
        AVG(DailySales) / (
            SELECT
                AVG(DailySales)
            FROM
                SalespersonDailySales
        )
    ) AS Influence
FROM
    SalespersonDailySales
GROUP BY
    SalespersonId;

Explanation:

  1. SalespersonDailySales: This Common Table Expression (CTE) calculates the daily sales for each salesperson.

    • SELECT SalespersonId, DATE(Date) AS SaleDate, SUM(SalesAmount) AS DailySales

    • It groups the sales records by salesperson and the date of sale, then sums up the sales amounts for each salesperson on each date.

  2. SalespersonTotalSales: This CTE calculates the total sales for each salesperson.

    • SELECT SalespersonId, SUM(DailySales) AS TotalSales

    • It groups the SalespersonDailySales CTE by salesperson and sums up their daily sales.

  3. Final Query: This query calculates the influence of each salesperson.

    • (AVG(DailySales) / (SELECT AVG(DailySales) FROM SalespersonDailySales))

    • This calculates the average daily sales for each salesperson and divides it by the overall average daily sales of all salespeople.

    • GROUP BY SalespersonId

    • This groups the results by salesperson, giving the influence of each salesperson.

Real-World Application:

This query can be used to:

  • Evaluate the performance of individual salespersons.

  • Identify high-performing salespersons who can mentor or train others.

  • Make data-driven decisions about sales strategies and resource allocation.


Flight Occupancy and Waitlist Analysis

LeetCode SQL Problem:

Flight Occupancy and Waitlist Analysis

Problem Statement:

Analyze flight data to determine the flight occupancy rate and waitlist status for a given airline.

SQL Solution:

-- Calculate the flight occupancy rate for each flight
WITH FlightOccupancy AS (
  SELECT
    flight_id,
    SUM(passengers) AS total_passengers,
    SUM(capacity) AS total_capacity,
    ROUND((SUM(passengers) / SUM(capacity)) * 100, 2) AS occupancy_rate
  FROM Flights
  GROUP BY
    flight_id
),

-- Check if a flight has any waitlisted passengers
WaitlistStatus AS (
  SELECT
    flight_id,
    CASE
      WHEN SUM(waitlisted) > 0 THEN 'Waitlisted'
      ELSE 'Not Waitlisted'
    END AS waitlist_status
  FROM Flights
  GROUP BY
    flight_id
)

-- Combine the results
SELECT
  FlightOccupancy.flight_id,
  FlightOccupancy.total_passengers,
  FlightOccupancy.total_capacity,
  FlightOccupancy.occupancy_rate,
  WaitlistStatus.waitlist_status
FROM FlightOccupancy
INNER JOIN WaitlistStatus
  ON FlightOccupancy.flight_id = WaitlistStatus.flight_id;

Breakdown and Explanation:

1. Calculate Flight Occupancy Rate:

  • The FlightOccupancy Common Table Expression (CTE) calculates the total passengers, total capacity, and occupancy rate for each flight.

  • It does this by grouping the data by flight ID and summing the passengers and capacity columns.

  • The occupancy rate is then calculated as the number of passengers divided by the capacity, multiplied by 100 to get a percentage.

2. Check Waitlist Status:

  • The WaitlistStatus CTE determines if a flight has any waitlisted passengers.

  • It does this by grouping the data by flight ID and summing the waitlisted column.

  • If the sum is greater than 0, the waitlist status is set to 'Waitlisted', otherwise it is set to 'Not Waitlisted'.

3. Combine Results:

  • The main query joins the two CTEs to combine the flight occupancy rate and waitlist status information for each flight.

Real-World Applications:

  • Airline Management: Airlines can use this analysis to identify flights with low occupancy rates and adjust schedules or pricing accordingly.

  • Passenger Experience: Customers can use this information to choose flights with lower waitlist rates or higher occupancy rates for a more comfortable flying experience.

  • Revenue Optimization: Airlines can use this analysis to maximize revenue by selling more seats on flights with high occupancy rates and adjusting prices on flights with low occupancy rates.


User Activity for the Past 30 Days II

Problem Statement:

Find the number of active users for each day in the past 30 days. A user is considered active if they have performed any action on the website or app.

SQL Query:

WITH UserActivity AS (
    SELECT
        user_id,
        DATE(timestamp) AS activity_date
    FROM
        user_actions
    WHERE
        timestamp >= DATE('now', '-30 days')
)
SELECT
    activity_date,
    COUNT(DISTINCT user_id) AS active_users
FROM
    UserActivity
GROUP BY
    activity_date
ORDER BY
    activity_date;

Breakdown:

CTE (UserActivity):

  • This Common Table Expression (CTE) is used to create a temporary table that contains the user IDs and activity dates for the past 30 days.

  • DATE(timestamp) extracts the date from the timestamp column.

Main Query:

  • SELECT activity_date, COUNT(DISTINCT user_id) AS active_users: Counts the number of distinct user IDs for each activity date.

  • GROUP BY activity_date: Groups the results by activity date.

  • ORDER BY activity_date: Orders the results by activity date in ascending order.

Real-World Applications:

  • User Engagement Analysis: This query can help website and app owners track user activity over time and identify any trends or fluctuations.

  • Trend Analysis: By comparing the number of active users across different days, you can identify patterns and make data-driven decisions about marketing campaigns or product updates.

  • User Segmentation: You can use the results to segment users based on their activity patterns and tailor marketing efforts accordingly.

Example:

Consider the following table:

user_id
timestamp

1

2023-03-01 12:00:00 PM

2

2023-03-02 09:30:00 AM

3

2023-03-03 02:30:00 PM

1

2023-03-04 06:15:00 PM

Running the SQL query on this table would produce the following results:

activity_date
active_users

2023-03-01

1

2023-03-02

2

2023-03-03

3

2023-03-04

1


Movie Rating

Movie Rating

Problem Statement: Given a table of movie ratings by users, find the top 10 movies with the highest average rating.

Input Table:

user_id
movie_id
rating

1

1

5

2

2

4

3

3

3

4

4

5

5

2

4

6

5

2

7

3

4

8

1

3

Output:

movie_id
average_rating

1

4

2

4

3

3.5

4

5

5

2

SQL Solution:

WITH MovieAverageRatings AS (
  SELECT movie_id, AVG(rating) AS average_rating
  FROM ratings
  GROUP BY movie_id
)
SELECT movie_id, average_rating
FROM MovieAverageRatings
ORDER BY average_rating DESC
LIMIT 10;

Breakdown:

  • The WITH clause is used to create a common table expression (CTE) called MovieAverageRatings.

  • The CTE calculates the average rating for each movie.

  • The GROUP BY clause groups the ratings by movie_id.

  • The AVG() function calculates the average rating for each movie.

  • The SELECT clause selects the movie_id and average_rating from the CTE.

  • The ORDER BY clause sorts the results in descending order of average_rating.

  • The LIMIT clause limits the results to the top 10 movies.

Real-World Applications:

  • Creating a list of recommended movies for users based on their average rating.

  • Identifying popular movies for marketing or promotion.

  • Analyzing user preferences and trends in movie ratings.


Employees With Missing Information

Problem Statement:

Find all employees who have missing information in any of these columns: first_name, last_name, email, or phone_number.

Solution:

SELECT
  emp_id,
  first_name,
  last_name,
  email,
  phone_number
FROM
  employees
WHERE
  first_name IS NULL
  OR last_name IS NULL
  OR email IS NULL
  OR phone_number IS NULL;

Breakdown:

  • Step 1: Select the necessary columns.

    • We select the employee ID (emp_id), first name (first_name), last name (last_name), email (email), and phone number (phone_number) columns from the employees table.

  • Step 2: Use the WHERE clause to filter the results.

    • We filter the results using the WHERE clause to return only employees who have at least one of the following columns set to NULL.

      • first_name IS NULL: Employees with missing first names.

      • last_name IS NULL: Employees with missing last names.

      • email IS NULL: Employees with missing emails.

      • phone_number IS NULL: Employees with missing phone numbers.

Real-World Applications:

  • Data Integrity Verification: Ensuring that all employee information is complete and accurate for compliance purposes or internal processes.

  • Employee Management: Identifying employees with incomplete profiles to avoid communication issues or data errors.

  • HR Reporting: Generating reports on employees with missing information to improve data quality and overall HR management.


Actors and Directors Who Cooperated At Least Three Times

SQL Implementation:

SELECT
  A.actor_name,
  D.director_name,
  COUNT(DISTINCT M.movie_name) AS num_collaborations
FROM Actor AS A
JOIN Movie AS M
  ON A.actor_id = M.actor_id
JOIN Director AS D
  ON M.director_id = D.director_id
GROUP BY
  A.actor_name,
  D.director_name
HAVING
  COUNT(DISTINCT M.movie_name) >= 3;

Breakdown and Explanation:

  1. Join Tables: We join the Actor (A), Movie (M), and Director (D) tables based on the relationships between them:

    • A.actor_id = M.actor_id

    • M.director_id = D.director_id

  2. Group By: Group the results by the actor's name (A.actor_name) and director's name (D.director_name). This creates groups for each unique actor-director pair.

  3. Aggregate Function: Count the distinct movie names (DISTINCT M.movie_name) for each group. This gives us the number of collaborations between each actor and director.

  4. Having Clause: Filter the results to include only those groups where the number of collaborations is greater than or equal to 3.

Real-World Application:

This query can be used to identify actors and directors who have worked together multiple times in the film industry. It provides insights into long-standing collaborations and can be useful for:

  • Identifying actors and directors who have a strong professional chemistry.

  • Analyzing the success rate of collaborations between specific individuals.

  • Predicting future collaborations based on past history.

  • Creating lists of notable actor-director pairings for awards or promotional purposes.


The Number of Passengers in Each Bus I

Problem:

You have a table called Buses that contains information about buses and their passengers. Each row in the table represents a bus, with the following columns:

  • bus_id: The unique ID of the bus.

  • passengers: The number of passengers on the bus.

You want to write a query that finds the number of passengers in each bus.

Solution:

The following query will find the number of passengers in each bus:

SELECT bus_id, passengers
FROM Buses;

Explanation:

The SELECT statement selects the bus_id and passengers columns from the Buses table. The FROM statement specifies that the rows to be selected should come from the Buses table.

Real-world Application:

This query can be used to track the number of passengers on each bus in a public transportation system. This information can be used to improve scheduling, determine which buses need to be replaced, and estimate the revenue generated by each bus.

Potential Applications in the Real World:

  • Public transportation: Tracking the number of passengers on each bus can help transportation planners improve scheduling and determine which buses need to be replaced.

  • School transportation: Tracking the number of passengers on each school bus can help schools ensure that there are enough buses to meet the needs of their students.

  • Private transportation: Tracking the number of passengers on each private bus can help businesses optimize their transportation operations and estimate revenue.


Classes More Than 5 Students

Problem:

Find all classes that have more than five students.

SQL Query:

SELECT class_id, count(*) AS student_count
FROM students
GROUP BY class_id
HAVING student_count > 5;

Breakdown:

  • FROM students: Selects all rows from the 'students' table.

  • GROUP BY class_id: Groups the rows by the 'class_id' column, aggregating the rows for each class.

  • HAVING student_count > 5: Filters the results to include only classes with more than five students.

Explanation:

  1. The GROUP BY clause creates a set of groups, each containing the rows that share the same value in the specified column. In this case, we are grouping by class_id.

  2. The COUNT(*) function counts the number of rows in each group. This gives us the total number of students in each class.

  3. The HAVING clause filters the groups based on a condition. In this case, we are only interested in groups with more than five students.

Applications:

  • School administrators can use this query to identify classes that have exceeded their capacity.

  • Teachers can use this query to identify classes that need additional support or resources.

  • Parents can use this query to find out how many students are in their child's class.


Product's Worth Over Invoices

Problem:

Given two tables:

  • Products with columns (id, name, price)

  • Invoices with columns (id, product_id, quantity)

Find the total worth of each product over all invoices.

Solution:

SELECT
  p.id,
  p.name,
  SUM(p.price * i.quantity) AS total_worth
FROM Products p
JOIN Invoices i ON p.id = i.product_id
GROUP BY
  p.id, p.name;

Breakdown:

  1. Join the Products and Invoices tables: We use an INNER JOIN to match products with the corresponding invoices. The join condition is p.id = i.product_id, which ensures that only products with matching invoices are included.

  2. Calculate the product worth: For each matched product and invoice, we calculate the worth as p.price * i.quantity. This gives us the value of the product sold in that invoice.

  3. Group and aggregate: We group the results by the p.id and p.name columns to get the total worth for each product. The SUM() function is used to accumulate the worth values for each product.

Real-World Application:

This query can be used in a business analytics system to track the total sales value of products over a period of time. It allows businesses to analyze product performance, identify trends, and make informed decisions about inventory management and pricing strategies.


Create a Session Bar Chart

Problem Statement:

Given a table sessions with columns user_id, timestamp, find the total number of active users in each hour.

SQL Solution:

SELECT
  STRFTIME('%Y-%m-%d %H:00:00', timestamp) AS hour,
  COUNT(DISTINCT user_id) AS active_users
FROM sessions
GROUP BY hour
ORDER BY hour;

Breakdown:

  • STRFTIME('%Y-%m-%d %H:00:00', timestamp): Extracts the hour from the timestamp column and formats it as 'YYYY-MM-DD HH:00:00'.

  • COUNT(DISTINCT user_id): Counts the number of distinct user_id values for each hour.

  • GROUP BY hour: Groups the results by hour.

  • ORDER BY hour: Orders the results by hour.

Explanation:

  1. The STRFTIME function extracts the hour from the timestamp column and converts it to the specified format. This ensures that all timestamps within an hour are grouped together.

  2. The COUNT(DISTINCT user_id) function counts the number of unique user_id values for each hour. This gives us the total number of active users during that hour.

  3. The GROUP BY clause groups the results by hour, so we get one row for each hour.

  4. The ORDER BY clause orders the results by hour, making it easier to read and analyze the data.

Real-World Application:

This query can be used to analyze user activity patterns, such as:

  • Identifying peak usage hours

  • Monitoring user engagement over time

  • Optimizing server capacity based on usage trends


Customer Who Visited but Did Not Make Any Transactions

Problem Statement:

Find all customers who visited the store but did not make any transactions.

SQL Solution:

SELECT DISTINCT c.customer_id, c.first_name, c.last_name
FROM Customers c
LEFT JOIN Transactions t ON c.customer_id = t.customer_id
WHERE t.transaction_id IS NULL;

Explanation:

This query uses a LEFT JOIN to combine the Customers table with the Transactions table. For each customer, the query checks if there is a matching transaction. If there is no matching transaction, the query returns the customer's information in the result set.

Breakdown:

  • SELECT DISTINCT c.customer_id, c.first_name, c.last_name: This part of the query selects the columns we are interested in: the customer ID, first name, and last name. The DISTINCT keyword is used to ensure that only unique customer records are returned.

  • FROM Customers c: This part of the query specifies the table we are selecting from, which is the Customers table. We assign the alias c to this table for brevity.

  • LEFT JOIN Transactions t ON c.customer_id = t.customer_id: This part of the query performs a LEFT JOIN between the Customers table and the Transactions table. The LEFT JOIN operation matches rows from the Customers table with rows from the Transactions table based on the customer_id column. If there is no matching row in the Transactions table, the t table will have a NULL value for the transaction_id column.

  • WHERE t.transaction_id IS NULL: This part of the query filters the results to include only customers who have a NULL value for the transaction_id column. This means that these customers visited the store but did not make any transactions.

Real-World Applications:

This query can be used in a variety of real-world applications, such as:

  • Identifying potential customers who may be interested in making a purchase.

  • Analyzing customer behavior to improve marketing strategies.

  • Tracking customer engagement with a business.


Calculate Salaries

Problem Statement:

Given a table employees with the following columns:

  • employee_id (primary key)

  • name

  • salary

Write a SQL query to calculate the salaries of all employees.

Best & Performant Solution:

SELECT
    employee_id,
    name,
    salary
FROM
    employees;

Breakdown and Explanation:

  1. SELECT Clause: This clause specifies the columns to be included in the result set. In this case, we want to include all columns: employee_id, name, and salary.

  2. FROM Clause: This clause specifies the table from which we want to retrieve the data. In this case, we want to retrieve data from the employees table.

Real World Applications:

This query can be used in various real-world applications, such as:

  • Payroll Processing: To determine the salary of each employee for payroll purposes.

  • Compensation Analysis: To analyze the compensation structure of a company and identify any disparities or trends.

  • Human Resource Reporting: To generate reports on employee salaries and benefits for internal or external use.


Product Sales Analysis I

Problem Statement

Given a table Sales with columns product_id, sales_date, and sales_amount, find the total sales for each product category and the percentage contribution of each category to the total sales.

SQL Solution

-- First, create a temporary table to calculate the total sales for each product category
CREATE TEMP TABLE CategorySales AS
SELECT
  product_id,
  SUM(sales_amount) AS total_sales
FROM Sales
GROUP BY product_id;

-- Then, join the temporary table with the Sales table to get the sales date for each product category
SELECT
  s.sales_date,
  s.product_id,
  cs.total_sales,
  (cs.total_sales / SUM(cs.total_sales)) * 100 AS percentage_contribution
FROM Sales s
JOIN CategorySales cs
  ON s.product_id = cs.product_id
GROUP BY
  s.sales_date,
  s.product_id,
  cs.total_sales;

Explanation

This solution uses a temporary table to calculate the total sales for each product category. Then, it joins the temporary table with the Sales table to get the sales date for each product category. Finally, it groups the results by sales date, product ID, and total sales to calculate the percentage contribution of each category to the total sales.

Applications

This solution can be used to analyze sales data by product category. This information can be used to identify trends, make informed decisions about product marketing, and improve overall sales performance.


Calculate Trapping Rain Water

Problem Description:

You have a bunch of containers in a row. Each container can hold some amount of water. Some of them are not filled to the brim. Calculate how much rain water you could collect in these containers if it rains.

Example:

Input:

Container
Water Level

1

3

2

0

3

2

4

1

Output:

2 units of water

Explanation:

There are two potential areas to collect rainwater:

  • Between containers 1 and 3: The left container is 3 units high and the right container is 2 units high. The height difference is 1 unit, which means you can collect 1 unit of water here.

  • Between containers 3 and 4: The left container is 2 units high and the right container is 1 unit high. The height difference is 1 unit, which means you can collect 1 unit of water here.

Total water collected: 1 + 1 = 2 units

Best Solution:

WITH Subquery AS (
  SELECT *,
    LEAD(water_level, 1) OVER (ORDER BY container_id ASC) AS next_level,
    LAG(water_level, 1) OVER (ORDER BY container_id DESC) AS prev_level
  FROM Container
), MinMax AS (
  SELECT MIN(water_level) AS min_level, MAX(water_level) AS max_level
  FROM Subquery
)
SELECT SUM(CASE
  WHEN water_level < min_level AND next_level IS NOT NULL
  THEN min_level - water_level
  WHEN water_level < max_level AND prev_level IS NOT NULL
  THEN max_level - water_level
  ELSE 0
END) AS trapped_water
FROM Subquery
JOIN MinMax ON 1=1;

Explanation:

  1. Subquery:

    • Add columns for the next and previous water levels to each container.

  2. MinMax:

    • Calculate the minimum and maximum water levels of all the containers.

  3. Final Query:

    • For each container, check if the current water level is less than the minimum or maximum water level.

    • If the current level is less than the minimum, it means water can flow from the next container.

    • If the current level is less than the maximum, it means water can flow from the previous container.

    • Sum up the potential water collected from all the containers.

Real-World Applications:

  • Rainfall analysis: Predicting runoff and flooding risks.

  • Civil engineering: Designing water retention systems, such as dams and reservoirs.

  • Agriculture: Optimizing irrigation techniques and minimizing water loss.


Ad-Free Sessions

Problem Statement:

Given two tables, Sessions and Purchases, determine the total number of ad-free sessions for each product.

Database Schema:

| Sessions Table | |---|---| | session_id | integer | | product_id | integer | | ad_free | boolean |

| Purchases Table | |---|---| | purchase_id | integer | | product_id | integer | | user_id | integer |

Solution:

SELECT
  s.product_id,
  SUM(CASE WHEN s.ad_free = 1 THEN 1 ELSE 0 END) AS ad_free_sessions
FROM Sessions AS s
JOIN Purchases AS p
  ON s.product_id = p.product_id
GROUP BY
  s.product_id;

Breakdown:

  1. Join the Sessions and Purchases tables: Use a JOIN to match sessions to purchases based on the product_id column. This ensures that you only count sessions for products that have been purchased.

  2. Count ad-free sessions: The CASE expression checks whether the ad_free column in the Sessions table is set to 1. If it is, the expression evaluates to 1, otherwise it evaluates to 0. The SUM() function is then used to count the number of ad-free sessions for each product.

  3. Group results: The GROUP BY s.product_id clause groups the results by product ID. This allows you to count the ad-free sessions for each product separately.

Example:

product_id
ad_free_sessions

1

3

2

5

3

0

Real-World Applications:

This query can be used by businesses to analyze the effectiveness of their ad-free offerings. For example:

  • Identifying products with the highest ad-free session rates can help businesses decide which products to invest more in.

  • Tracking changes in ad-free session rates over time can help businesses understand the impact of new ad campaigns or changes to their pricing models.

  • Comparing ad-free session rates across different platforms or channels can help businesses determine which marketing efforts are most effective.


Reformat Department Table

Problem Statement:

You are given a table called Department with columns Id (unique identifier), Name, and Parent_Id. The table represents a hierarchical structure where each department has a parent department, except for the root department which has a Parent_Id of NULL.

Reformat the table to have a new column called Path that contains the path from the root department to the current department. The path should be separated by a forward slash (/).

Example:

Original Table:

Id
Name
Parent_Id

1

HR

NULL

2

Sales

1

3

IT

1

4

Dev

2

5

QA

4

Reformatted Table:

Id
Name
Parent_Id
Path

1

HR

NULL

/HR

2

Sales

1

/HR/Sales

3

IT

1

/HR/IT

4

Dev

2

/HR/Sales/Dev

5

QA

4

/HR/Sales/Dev/QA

SQL Solution:

ALTER TABLE Department
ADD COLUMN Path VARCHAR(255);

UPDATE Department
SET Path = '/Department.Name'
WHERE Parent_Id IS NULL;

UPDATE Department
SET Path = (
    SELECT CASE
        WHEN d1.Path IS NULL THEN '/Department.Name'
        ELSE d1.Path || '/' || Department.Name
    END
    FROM Department d1
    WHERE d1.Id = Department.Parent_Id
)
WHERE Parent_Id IS NOT NULL;

Breakdown:

  1. Add the 'Path' Column: Use the ALTER TABLE statement to add a new column named Path of type VARCHAR(255) to the Department table.

  2. Set the Root Department's Path: For the root department with Parent_Id of NULL, set the Path to /Department.Name. This path represents the department's name at the root level.

  3. Recursively Update Child Departments' Paths: Use a recursive UPDATE statement to update the Path column for all child departments. The path of a child department is calculated as the concatenation of its parent department's Path and its own name, separated by a forward slash.

Real-World Applications:

The Path column can be useful in various applications:

  • Hierarchical Navigation: Easily navigate through the department hierarchy by traversing the paths.

  • Permission Management: Control user access based on their department's position within the hierarchy.

  • Reporting and Analysis: Group data and perform analysis based on department paths.

  • User Interface Design: Display department structures in a tree-like view for user interaction.


Total Sales Amount by Year

Problem Statement:

Given a sales table with the following columns:

  • order_id

  • product_id

  • order_date

  • quantity

  • unit_price

Calculate the total sales amount for each year.

SQL Solution:

-- Calculate the total sales amount for each year
SELECT
    strftime('%Y', order_date) AS year,  -- Extract the year from the order date
    SUM(quantity * unit_price) AS total_sales_amount  -- Calculate the total sales amount for each year
FROM
    sales  -- Your sales table
GROUP BY
    year  -- Group the results by year
ORDER BY
    year;  -- Order the results by year

Breakdown and Explanation:

  • strftime('%Y', order_date) AS year: This line extracts the year from the order_date column using the strftime function.

  • SUM(quantity * unit_price): This line calculates the total sales amount for each year by multiplying the quantity and unit_price columns and then summing the results.

  • GROUP BY year: This line groups the results by year, so that the total sales amount is calculated for each unique year.

  • ORDER BY year: This line orders the results by year in ascending order.

Real-World Applications:

This query can be used to analyze sales trends over time. For example, a business could use this query to:

  • Identify years with the highest and lowest sales

  • Track sales growth or decline over the years

  • Compare sales performance to previous years

  • Forecast future sales trends


Maximum Transaction Each Day

Problem Statement:

You are given a table Transactions that contains the following columns:

  • id (primary key)

  • customer_id

  • amount

  • date

Find the maximum transaction amount for each day.

Explanation:

The goal of this problem is to find the highest transaction amount that occurred on each day. We can achieve this by grouping the transactions by day and then finding the maximum amount within each group.

SQL Solution:

SELECT
  date,
  MAX(amount) AS max_amount
FROM Transactions
GROUP BY
  date;

Breakdown:

  • The SELECT statement retrieves the date column and the maximum amount for each date.

  • The FROM clause specifies the Transactions table as the source of data.

  • The GROUP BY clause groups the transactions by date, which means that all transactions that occurred on the same day will be grouped together.

  • The MAX() function is used to find the maximum amount within each group.

Real-World Applications:

This query can be used in various real-world applications, such as:

  • Identifying the busiest days for a business based on transaction volume.

  • Analyzing spending patterns and identifying days with unusually high transactions.

  • Detecting potential fraudulent transactions by comparing daily maximum amounts to established baselines.

Example:

Consider the following Transactions table:

id
customer_id
amount
date

1

100

100

2023-07-01

2

200

200

2023-07-01

3

300

300

2023-07-02

4

400

400

2023-07-02

5

500

500

2023-07-03

Running the SQL query on this table will produce the following result:

date
max_amount

2023-07-01

200

2023-07-02

400

2023-07-03

500

This result shows that the maximum transaction amounts for each day are:

  • July 1, 2023: $200

  • July 2, 2023: $400

  • July 3, 2023: $500


Maximize Items

Problem:

Given a table Items with columns:

  • id: Integer

  • name: String

  • size: Integer

Find the items that maximize the total size of all selected items while ensuring that the total size of selected items does not exceed a given limit.

Constraints:

  • 1 <= id <= 1000

  • name is a string

  • 1 <= size <= 10000

  • 1 <= limit <= 1000000

Solution:

WITH LargestItems AS (
  SELECT id, name, size,
    ROW_NUMBER() OVER (ORDER BY size DESC) AS rank
  FROM Items
)
SELECT id, name, size
FROM LargestItems
WHERE rank <= (
  SELECT COUNT(*)
  FROM LargestItems
  WHERE SUM(size) <= @limit
);

Explanation:

  1. Create a Common Table Expression (CTE) LargestItems:

    • This CTE selects all items with their size and rank (largest size first).

  2. Subquery to Find the Maximum Item Count:

    • The subquery calculates the count of items with a total size that does not exceed the given limit. This gives us the maximum number of items we can select.

  3. Filter LargestItems by Rank:

    • We filter the LargestItems CTE to select only the items with a rank less than or equal to the maximum item count.

Real-World Applications:

  • Knapsack Problem: Maximizing the value of items that can be fit into a limited-capacity backpack.

  • Resource Allocation: Distributing resources (e.g., storage space, processing power) optimally to maximize utilization.

  • Inventory Management: Determining the most valuable items to stock within a limited warehouse capacity.

Simplified Example:

Items Table:

id
name
size

1

Item A

10

2

Item B

20

3

Item C

30

4

Item D

40

5

Item E

50

Given Limit: 60

Output:

id
name
size

3

Item C

30

4

Item D

40


Customers Who Never Order

Problem Statement:

Given a table orders that contains the following columns:

  • order_id (int): Unique ID of the order

  • customer_id (int): ID of the customer who placed the order

  • order_date (date): Date when the order was placed

Find the customers who have never placed an order.

Best & Performant Solution:

SELECT customer_id
FROM customers
EXCEPT
SELECT customer_id
FROM orders;

Breakdown and Explanation:

  1. SELECT customer_id FROM customers: This subquery retrieves all the unique customer IDs from the customers table.

  2. EXCEPT: The EXCEPT operator is used to exclude any rows that are present in the second subquery.

  3. SELECT customer_id FROM orders: This subquery retrieves all the unique customer IDs from the orders table.

  4. Putting it together: The EXCEPT operator ensures that only the customer IDs that are not present in the orders table (i.e., customers who haven't placed any orders) are returned.

Real-World Application:

This query can be useful for identifying inactive customers in an e-commerce system. Businesses can use this information to target these customers with special promotions or incentives to encourage them to make purchases.


Hopper Company Queries I

LeetCode Problem:

Find all employees who have a manager and the manager's manager is the CEO.

SQL Query:

SELECT E.employee_id, E.name
FROM Employee E
JOIN Employee M ON E.manager_id = M.employee_id
JOIN Employee C ON M.manager_id = C.employee_id
WHERE C.name = 'CEO';

Breakdown and Explanation:

  1. Join Tables:

    • We first join the Employee table with itself using an inner join (JOIN). This creates a Cartesian product, meaning it pairs each row in the Employee table with every other row.

    • We alias the second table as M to represent the manager of each employee.

    • We then join M with the Employee table again to get the manager of each manager. We alias this third table as C.

  2. Filter Results:

    • We use the WHERE clause to filter the results based on the condition that the manager of the manager (C.name) is equal to 'CEO'.

Real-World Application:

This query can be useful in scenarios where you need to identify employees who are reporting to a specific manager and that manager is also reporting to a higher-level manager. For example, in a company with a hierarchical structure, this query can be used to find all employees who are directly or indirectly reporting to the CEO.


Nth Highest Salary

Problem Statement:

Given a table employees with the following columns:

  • emp_id

  • name

  • salary

Find the Nth highest salary among all employees.

Example Table:

emp_id
name
salary

1

John Doe

10000

2

Jane Smith

12000

3

Michael Johnson

15000

4

David Wilson

8000

5

Sarah Jones

9000

Nth Highest Salary Function:

WITH RankedSalaries AS (
    SELECT emp_id, name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM employees
)
SELECT name, salary
FROM RankedSalaries
WHERE rank = N;

Breakdown:

  1. Common Table Expression (CTE) - The WITH clause creates a temporary table called RankedSalaries that ranks the employees in descending order of salary using the DENSE_RANK() function, which assigns consecutive ranks to each employee without any gaps.

  2. Subquery - The SELECT statement selects the name and salary of the employee with the Nth ranking.

How it Works:

  1. The RankedSalaries CTE creates a new table that contains the original employee information along with their calculated ranks.

  2. The outer SELECT statement then filters the RankedSalaries table to find the employee with the Nth rank.

Example Usage:

To find the 3rd highest salary in the example table:

WITH RankedSalaries AS (
    SELECT emp_id, name, salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rank
    FROM employees
)
SELECT name, salary
FROM RankedSalaries
WHERE rank = 3;

Output:

name
salary

Michael Johnson

15000

Potential Applications:

This function can be used in various real-world scenarios, such as:

  • Finding the highest-paid employees in a company

  • Calculating salary ranges for different job roles

  • Analyzing salary distribution patterns

  • Setting HR policies related to compensation and benefits


Customers Who Bought All Products

Problem:

Find the customers who have purchased all the products in a database.

Table Structure:

CREATE TABLE customers (
  customer_id INT NOT NULL,
  customer_name VARCHAR(255) NOT NULL,
  PRIMARY KEY (customer_id)
);

CREATE TABLE products (
  product_id INT NOT NULL,
  product_name VARCHAR(255) NOT NULL,
  PRIMARY KEY (product_id)
);

CREATE TABLE orders (
  order_id INT NOT NULL,
  customer_id INT NOT NULL,
  product_id INT NOT NULL,
  quantity INT NOT NULL,
  order_date DATE NOT NULL,
  PRIMARY KEY (order_id),
  FOREIGN KEY (customer_id) REFERENCES customers (customer_id),
  FOREIGN KEY (product_id) REFERENCES products (product_id)
);

Solution:

SELECT
  c.customer_id,
  c.customer_name
FROM
  customers AS c
JOIN
  (
    SELECT DISTINCT
      customer_id
    FROM
      orders
    GROUP BY
      customer_id
    HAVING
      COUNT(DISTINCT product_id) = (
        SELECT
          COUNT(*)
        FROM
          products
      )
  ) AS t
ON
  c.customer_id = t.customer_id;

Explanation:

  1. Join Customers and Order Details: We join the customers table and a subquery that selects distinct customer IDs who have ordered all products (HAVING COUNT(DISTINCT product_id) = (SELECT COUNT(*) FROM products)).

  2. Final Result: The query returns the customer IDs and names of customers who have purchased all products.

Real-World Application:

This query can be used to identify customers who are potentially loyal to a brand or who have a complete set of products in a specific category. This information can be used to target marketing campaigns or provide personalized recommendations.


Highest Grade For Each Student

Problem Statement

Given a table containing student records, find the highest grade for each student.

Example Table:

student_id
name
grade

1

John

90

1

John

92

2

Mary

80

2

Mary

85

Output:

student_id
highest_grade

1

92

2

85

SQL Solution:

-- Using the MAX() aggregate function
SELECT student_id, MAX(grade) AS highest_grade
FROM student_records
GROUP BY student_id;

Explanation:

  1. MAX() Aggregate Function: The MAX() function returns the maximum value in a group of rows. In this case, it finds the highest grade for each student.

  2. GROUP BY Clause: The GROUP BY clause groups the results by student_id, so that the MAX() function is applied separately to each student's grades.

Breakdown:

  • student_id: The column that identifies each student.

  • grade: The column that contains the grades.

  • highest_grade: The column that stores the highest grade for each student.

Real-World Applications:

This query can be used in various educational applications, such as:

  • Finding the top-performing students in a class.

  • Calculating average grades for students.

  • Identifying students who need additional support.


New Users Daily Count

LeetCode Problem: New Users Daily Count

SQL Query:

-- Create a temporary table to store the daily count of new users
CREATE TEMP TABLE DailyNewUserCount AS
SELECT
    DATE(created_at) AS date,
    COUNT(*) AS new_users
FROM
    users
WHERE
    NOT EXISTS (
        SELECT
            *
        FROM
            users
        WHERE
            created_at < DATE(users.created_at) AND user_id = users.user_id
    )
GROUP BY
    date;

-- Select the date and new user count from the temporary table
SELECT
    date,
    new_users
FROM
    DailyNewUserCount;

Explanation:

Step 1: Create Temporary Table with Daily New User Count

  • We create a temporary table named DailyNewUserCount to store the daily count of new users.

  • We use the DATE() function to extract the date from the created_at column.

  • We count the number of distinct user_id for each date to get the daily new user count.

  • The subquery ensures that we only count users who have not been created on previous dates.

Step 2: Select Date and New User Count

  • From the temporary table, we select the date and new_users columns.

  • This gives us a list of dates and the corresponding number of new users for each date.

Real-World Applications:

  • User Growth Analysis: Tracking new users daily can help businesses understand their user acquisition rate and growth trends.

  • Campaign Effectiveness: By comparing the daily new user count to marketing campaigns, businesses can evaluate the effectiveness of their user acquisition efforts.

  • Product Usage Analysis: Analyzing the daily new user count can provide insights into the usage patterns and onboarding experience of new users.

  • Customer Support Optimization: High daily new user counts may indicate a need for additional customer support resources to assist with onboarding and troubleshoot issues.


Find Third Transaction

Problem Statement:

You have a table Transactions with the following columns:

  • id (int)

  • amount (int)

  • sender_id (int)

  • receiver_id (int)

  • timestamp (timestamp)

You want to find the third transaction in the table.

Solution:

SELECT
  *
FROM Transactions
ORDER BY timestamp
LIMIT 2, 1;

Explanation:

The ORDER BY timestamp clause orders the transactions by their timestamps in ascending order. The LIMIT 2, 1 clause skips the first two transactions and returns the next one.

Example:

Consider the following table:

id
amount
sender_id
receiver_id
timestamp

1

100

1

2

2023-01-01 12:00:00

2

200

3

4

2023-01-02 14:00:00

3

300

5

6

2023-01-03 16:00:00

4

400

7

8

2023-01-04 18:00:00

5

500

9

10

2023-01-05 20:00:00

The query would return the following result:

id
amount
sender_id
receiver_id
timestamp

3

300

5

6

2023-01-03 16:00:00

This is the third transaction in the table.

Applications in Real World:

This query can be used in various real-world applications, such as:

  • Identifying fraudulent transactions by looking for anomalous patterns in the sequence of transactions.

  • Tracking the flow of money in a system by identifying the source and destination of each transaction.

  • Analyzing customer behavior by understanding the types of transactions they make and the frequency of their transactions.


Rearrange Products Table

Problem:

Rearrange the Products table so that products with higher price values are listed first.

Solution:

SELECT *
FROM Products
ORDER BY price DESC;

Explanation:

The ORDER BY clause allows us to sort the results of a query. In this case, we sort the results in descending order by the price column. This means that products with higher prices will be listed first.

Real-World Application:

This query can be used in a variety of applications, such as:

  • Displaying a list of products on a website, with the most expensive products listed first

  • Generating a report of the most expensive products sold in a given period

  • Identifying products that are overpriced compared to their competitors

Potential Performance Improvements:

If the Products table is very large, the query may take a long time to execute. In this case, you can improve performance by creating an index on the price column. An index is a data structure that helps the database quickly find rows based on the values in a particular column.

To create an index on the price column, you can use the following query:

CREATE INDEX idx_products_price ON Products (price);

Once the index is created, the query will be able to execute much faster.


Unpopular Books

Problem:

Find books with less than 100 sales.

SQL Query:

SELECT
  BookId,
  Title,
  Sales
FROM
  Books
WHERE
  Sales < 100;

Breakdown:

  • The SELECT clause specifies the columns to retrieve: BookId, Title, and Sales.

  • The FROM clause specifies the table to retrieve the data from, which is Books in this case.

  • The WHERE clause filters the results to only include books with sales less than 100.

Real-World Application:

This query can be used by a bookstore to identify books that are not selling well and need to be discounted or removed from inventory.


Article Views I

Problem: Find the number of views for each article.

SQL Query:

SELECT article_id, COUNT(*) AS views
FROM article_views
GROUP BY article_id;

Breakdown:

  • article_views: The table containing the article views.

  • article_id: The ID of the article.

  • COUNT(*) AS views: The number of views for each article. The alias views is used to name the column.

  • GROUP BY article_id: Groups the results by article ID, so that the count is calculated for each article.

Example:

CREATE TABLE article_views (
  article_id INT,
  user_id INT,
  date DATETIME
);
INSERT INTO article_views VALUES
  (1, 10, '2023-01-01'),
  (1, 20, '2023-01-02'),
  (2, 30, '2023-01-03'),
  (2, 40, '2023-01-04'),
  (3, 50, '2023-01-05');

SELECT article_id, COUNT(*) AS views
FROM article_views
GROUP BY article_id;

Output:

article_id | views
---------- | -----
1          | 2
2          | 2
3          | 1

Real-World Application:

This query can be used to analyze the popularity of articles on a website, such as in a content management system or blog. The results can be used to:

  • Identify which articles are most popular with users.

  • Track the performance of different articles over time.

  • Make decisions about which articles to promote or feature.


Customer Order Frequency

Customer Order Frequency (SQL)

Objective: Find the average number of orders placed by each customer in a given table.

SQL Implementation:

-- Table: orders
-- Columns:
-- id - Order ID
-- customer_id - Customer ID

SELECT
    customer_id,
    COUNT(*) AS order_count,
    AVG(order_count) OVER (PARTITION BY customer_id) AS avg_orders
FROM
    orders
GROUP BY
    customer_id;

Breakdown:

  1. SELECT: Select columns for the result:

    • customer_id: The unique identifier for each customer.

    • order_count: The total number of orders placed by each customer.

    • avg_orders: The average number of orders placed by each customer.

  2. FROM: Specify the input table orders that contains the order records.

  3. GROUP BY: Group the rows by customer_id to calculate the order count and average for each customer.

  4. COUNT(): Counts the number of orders for each customer and stores it in the order_count column.

  5. AVG() OVER(): Calculates the average number of orders for each customer using the PARTITION BY clause to group by customer_id. The OVER() clause specifies that the average should be calculated within each customer partition.

Example:

customer_id
order_count
avg_orders

1

5

5

2

3

3

3

7

7

This result shows that:

  • Customer with ID 1 has placed 5 orders, averaging 5 orders.

  • Customer with ID 2 has placed 3 orders, averaging 3 orders.

  • Customer with ID 3 has placed 7 orders, averaging 7 orders.

Applications:

  • Customer Segmentation: Identifying customer groups based on their order frequency can help businesses tailor marketing campaigns.

  • Loyalty Programs: Rewarding customers with higher order frequencies can encourage loyalty and repeat business.

  • Inventory Management: Understanding the average order frequency for each customer can help businesses optimize inventory levels.

  • Fraud Detection: Customers with unusually high or low order frequencies may be flagged for potential fraud investigation.


All Valid Triplets That Can Represent a Country

Problem Statement

Given a table Country containing country information, write a query to find all valid triplets that can represent a country.

Country (code, name)

A valid triplet is a set of three country codes that satisfy the following conditions:

  • The first two codes are neighboring countries, sharing a border.

  • The third code is a neighbor of the first country, but not the second country.

SELECT c1.code, c2.code, c3.code
FROM Country c1
JOIN Country c2 ON c1.code != c2.code AND c1.name = c2.name
JOIN Country c3 ON c3.code != c1.code AND c1.code = c3.name
WHERE c2.code != c3.code;

Breakdown and Explanation

  1. Join Countries with Shared Borders:

    JOIN Country c2 ON c1.code != c2.code AND c1.name = c2.name

    This query joins Country with itself (using the alias c2) on the condition that the two countries have different codes but the same name, indicating that they share a border.

  2. Find Neighbors of the First Country:

    JOIN Country c3 ON c3.code != c1.code AND c1.code = c3.name

    This query joins Country (aliased as c3) with the original table c1. It ensures that c3 is not the same country as c1 and that c1 is a neighbor of c3.

  3. Filter Out Invalid Pairs:

    WHERE c2.code != c3.code

    This condition eliminates cases where c2 and c3 are the same country. The result is a list of all valid triplets that can represent a country.


The Most Frequently Ordered Products for Each Customer

Problem Statement

Given a table of customer orders, find the most frequently ordered products for each customer.

Table Schema

CREATE TABLE orders (
    customer_id INT NOT NULL,
    product_id INT NOT NULL,
    order_date DATE NOT NULL
);

Solution

The following SQL query uses the ROW_NUMBER() function to rank the products for each customer based on the number of orders:

SELECT customer_id,
       product_id,
       ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY COUNT(*) DESC) AS rank
FROM orders;

Output

The output of the query is a table with the following columns:

  • customer_id: The ID of the customer.

  • product_id: The ID of the product.

  • rank: The rank of the product for the customer.

Explanation

The ROW_NUMBER() function is used to assign a rank to each product for each customer. The PARTITION BY clause groups the products by customer, and the ORDER BY clause orders the products by the number of orders in descending order.

The ROW_NUMBER() function assigns a rank to each product within each partition. The first product in each partition is assigned a rank of 1, the second product is assigned a rank of 2, and so on.

Real-World Application

This query can be used to identify the most popular products for each customer. This information can be used to personalize marketing campaigns, improve product recommendations, and optimize inventory management.

Potential Applications

  • Personalized marketing campaigns: Retailers can use this information to send targeted marketing campaigns to customers based on their purchase history. For example, a retailer could send a discount coupon for a customer's favorite product.

  • Improved product recommendations: Online retailers can use this information to recommend products to customers based on their past purchases. For example, an online retailer could recommend a product that is similar to a product that the customer has purchased in the past.

  • Optimized inventory management: Retailers can use this information to optimize their inventory by stocking more of the products that are most popular with their customers. This can help to reduce lost sales and improve customer satisfaction.


Friday Purchases I

Problem Statement

Find the total sum of purchases made on Fridays for the following table:

user_id
purchase_date
purchase_amount

1

2023-01-09

10

2

2023-01-10

20

3

2023-01-11

30

4

2023-01-12

40

5

2023-01-13

50

Solution

SELECT SUM(purchase_amount)
FROM purchases
WHERE strftime('%w', purchase_date) = '5';

Explanation

  1. strftime('%w', purchase_date) = '5': This condition checks if the day of the week for the purchase_date column is equal to '5', which represents Friday (according to the ISO 8601 calendar).

  2. SUM(purchase_amount): Calculates the sum of all purchase_amount values that satisfy the condition in step 1, representing the total amount spent on Fridays.

Real-Time Example

Suppose you have an online shopping website and you want to track the total revenue generated on Fridays. The provided SQL query can be used to retrieve this information. The results can be used to analyze sales patterns and optimize marketing campaigns for higher conversions on Fridays.

Potential Applications

  • Sales Analysis: Determine the days of the week with the highest sales.

  • Marketing Optimization: Target users with promotions and discounts on days with high purchasing activity.

  • Inventory Management: Predict demand based on historical sales patterns by day of the week.

  • Revenue Forecasting: Estimate future revenue based on historical data and anticipated trends.


Tournament Winners

Problem Statement:

Tournament Winners (LeetCode Problem #1046)

Write an SQL query to find the players who won at least one tournament.

Schema:

| Table: Players |
| Column: id | Type: INTEGER | Primary Key |
| Column: name | Type: VARCHAR(255) |
| Column: team | Type: VARCHAR(255) |

| Table: Tournaments |
| Column: id | Type: INTEGER | Primary Key |
| Column: name | Type: VARCHAR(255) |
| Column: winner_id | Type: INTEGER | Foreign Key (References Players.id) |

Solution:

SELECT
  P.id,
  P.name,
  P.team
FROM Players AS P
INNER JOIN Tournaments AS T
  ON P.id = T.winner_id;

Explanation:

  1. Join the Tables: We join the Players and Tournaments tables on the winner_id column, which links the player who won a tournament to the tournament itself.

  2. Filter for Winners: The INNER JOIN operator only returns rows that have matching values in both tables. In this case, it selects only the players who have won at least one tournament.

  3. Select Player Data: The SELECT statement retrieves the id, name, and team columns from the Players table, which contains the information about the winning players.

Real-World Applications:

  • Track and reward players for their accomplishments in tournaments.

  • Identify and showcase the most successful teams and individuals in a competition.

  • Analyze player performance and team dynamics for improvement.


Number of Transactions per Visit

Problem:

You are given a table called Transactions with the following schema:

| Column Name | Type |
|-------------|------|
| user_id    | int   |
| visit_id   | int   |
| transaction_id | int   |
| amount      | int   |

Each row in this table represents a transaction made by a user during a visit to a website. You need to write an SQL query to find the number of transactions made by each user per visit.

Solution:

SELECT user_id, visit_id, COUNT(*) AS num_transactions
FROM Transactions
GROUP BY user_id, visit_id;

Explanation:

This query uses the GROUP BY clause to group the transaction records by the user_id and visit_id columns. The COUNT(*) function is then used to count the number of transactions in each group.

Breakdown:

  • The SELECT clause selects the user_id, visit_id, and num_transactions columns.

  • The FROM clause specifies the Transactions table.

  • The GROUP BY clause groups the rows by the user_id and visit_id columns.

  • The COUNT(*) function counts the number of rows in each group.

Real-World Example:

This query can be used to analyze website usage data. For example, you could use it to identify users who make multiple transactions during a single visit. This information could be used to target those users with personalized offers or discounts.


Form a Chemical Bond

Problem Statement:

Given two tables:

  • elements (id, symbol, atomic_number)

  • bonds (element1_id, element2_id, bond_type)

Form a chemical bond between two elements based on their atomic numbers.

Solution:

ALTER TABLE bonds
ADD COLUMN bond_length FLOAT;

UPDATE bonds
SET bond_length = 
    CASE
        WHEN element1_id = element2_id THEN 0
        WHEN (element1_id + element2_id) % 3 = 0 THEN 1.0
        WHEN (element1_id + element2_id) % 5 = 0 THEN 1.5
        ELSE 2.0
    END;

Explanation:

  • ALTER TABLE bonds ADD COLUMN bond_length FLOAT creates a new column bond_length of data type FLOAT in the bonds table.

  • UPDATE bonds SET bond_length = ... updates the bond_length column based on the following conditions:

    • If the two elements have the same atomic number, the bond length is 0.

    • If the sum of the atomic numbers is divisible by 3, the bond length is 1.0.

    • If the sum of the atomic numbers is divisible by 5, the bond length is 1.5.

    • Otherwise, the bond length is 2.0.

Real-World Applications:

This query can be used to:

  • Simulate chemical reactions: Predict the bond lengths of molecules formed by combining different elements.

  • Design materials: Determine the strength and properties of materials based on the bond lengths between atoms.

  • Understand molecular structure: Analyze the geometric arrangement of atoms in molecules.


Game Play Analysis IV

LeetCode Problem: Game Play Analysis IV

Problem Statement:

Given a table Gameplay that records player gameplay data, where:

  • player_id is the ID of the player

  • game_id is the ID of the game

  • event_type is the type of event that occurred during the game, either "START" or "END"

  • timestamp is the timestamp of the event

Find the number of players who have completed at least 5 games.

Best & Performant Solution:

WITH PlayerGamesCompleted AS (
  SELECT player_id, COUNT(*) AS num_games_completed
  FROM Gameplay
  WHERE event_type = 'END'
  GROUP BY player_id
  HAVING COUNT(*) >= 5
), PlayerCount AS (
  SELECT COUNT(*) AS num_players_completed_5_games
  FROM PlayerGamesCompleted
)
SELECT num_players_completed_5_games;

Breakdown:

1. PlayerGamesCompleted Common Table Expression (CTE):

  • Groups gameplay events by player_id and counts the number of "END" events for each player.

  • Filters out players who have completed less than 5 games.

2. PlayerCount CTE:

  • Counts the number of players who have completed at least 5 games from the PlayerGamesCompleted CTE.

3. Final Query:

  • Selects the count of players who have completed at least 5 games from the PlayerCount CTE.

Simplified Explanation:

  1. We first count the number of completed games for each player.

  2. Then, we only keep the players who have completed at least 5 games.

  3. Finally, we count the number of players in this filtered group to get the number of players who have completed at least 5 games.

Real-World Applications:

This query can be used in game analytics to identify players who are highly engaged and have progressed significantly in the game. This information can be used to reward active players, offer them exclusive perks, or track player retention rates.