Flight Routes using Recursive CTE
Flight Routes Problem
Database Source Table
FlightID | Origin | Destination |
---|---|---|
1 | A | B |
1 | B | C |
2 | D | E |
3 | F | G |
3 | G | H |
3 | H | I |
4 | A | N |
4 | M | D |
4 | D | A |
Desired Output table
FlightID | Origin | Destination | Route | Step |
---|---|---|---|---|
2 | D | E | D,E | 1 |
1 | A | C | A,B,C | 2 |
3 | F | I | F,G,H,I | 3 |
4 | M | N | M,D,A,N | 3 |
How the Solution Works
- The base query will fetch the relevant records from the source table using the conditions specified.
- It will then call then call the recursive portion of the query.
- The first iteration will take the output of the base query as the input and generate row entries.
- These row entries will be appended to the output of the base query.
- The second iteration will take the generated row entries/records of the first iteration as input.
- The inputs to the second iteration will only the records fetched as output from the first iteration.
- Meanwhile, that the overall output table would will continue to update with rows from each iteration getting appended.
- We can declare/separately track the number of iterations.
- It is easy to appreciate the flow of input, output across subsequent iterations and the continuously growing target table through the concrete example.
Summary
The base query retrieves the initial set of records from the source table based on specified conditions. The recursive portion then takes the output of the base query and generates additional row entries. These new entries are appended to the output of the base query. In each iteration, the input for the next iteration consists of the records generated in the previous iteration. This process continues, and the overall output table grows with each iteration. The number of iterations can be separately tracked, providing a clear view of the flow of input and output across subsequent iterations.
Deep-Dive Link
To deep-dive, go to this link
Flights Table
We will start from scratch in MySQL.
Let’s create a Flights table and populate it with dummy data.
SHOW tables;
SELECT * from Flights;
DROP TABLE Flights;
-- Create the table
CREATE TABLE Flights (
FlightID INT,
Origin CHAR(1),
Destination CHAR(1)
);
-- Insert data into the table
INSERT INTO Flights (FlightID, Origin, Destination) VALUES
(1, 'A', 'B'),
(1, 'B', 'C'),
(2, 'D', 'E'),
(3, 'F', 'G'),
(3, 'G', 'H'),
(3, 'H', 'I'),
(4, 'A', 'N'),
(4, 'M', 'D'),
(4, 'D', 'A');
Results of the Insert Query
FlightID | Origin | Destination |
---|---|---|
1 | A | B |
1 | B | C |
2 | D | E |
3 | F | G |
3 | G | H |
3 | H | I |
4 | A | N |
4 | M | D |
4 | D | A |
Solution using correlated Subquery
WITH RECURSIVE FlightRoutes AS (
-- Anchor Member
SELECT FlightID, Origin, Destination,
CAST(CONCAT(Origin, ',', Destination) AS CHAR(100)) AS Route,
1 AS Step
FROM Flights
WHERE Origin NOT IN (
SELECT Destination
FROM Flights AS subFlights
WHERE subFlights.FlightID = Flights.FlightID
)
UNION ALL
-- Recursive Member
SELECT fr.FlightID, fr.Origin, t.Destination,
CAST(CONCAT(fr.Route, ',', t.Destination) AS CHAR(100)),
fr.Step + 1 AS Step
FROM FlightRoutes fr
JOIN Flights t ON fr.FlightID = t.FlightID AND fr.Destination = t.Origin
WHERE POSITION(t.Destination IN fr.Route) = 0
AND fr.Step < 3
)
SELECT
FlightID,
Origin,
Destination AS FinalDestination,
Route,
Step AS CountOfConnectingFlights
FROM (
SELECT
FlightID,
Origin,
Destination,
Route,
Step,
ROW_NUMBER() OVER (PARTITION BY FlightID ORDER BY FlightID, Step DESC) AS RowNumber
FROM FlightRoutes
) AS tbl
WHERE RowNumber = 1;
Output
FlightID | Origin | Destination | Route | Step |
---|---|---|---|---|
2 | D | E | D,E | 1 |
1 | A | C | A,B,C | 2 |
3 | F | I | F,G,H,I | 3 |
4 | M | N | M,D,A,N | 3 |
Solution using correlated subquery and with additional check columns
WITH RECURSIVE FlightRoutes AS (
-- Anchor Member
SELECT FlightID, Origin, Destination, CAST(CONCAT(Origin,',',Destination) AS CHAR(100)) As Route, 1 as Step
FROM Flights
WHERE Origin NOT IN (
SELECT Destination
FROM Flights AS subFlights
WHERE subFlights.FlightID = Flights.FlightID
)
UNION ALL
-- Recursive Member
SELECT fr.FlightID, fr.Origin, t.Destination, CAST(CONCAT(fr.Route, ',', t.Destination) AS CHAR(100)), fr.Step+1 as Step
FROM FlightRoutes fr
JOIN Flights t ON fr.FlightID = t.FlightID AND fr.Destination = t.Origin
-- Terminating condition to avoid infinite recursion
WHERE POSITION(t.Destination IN fr.Route) = 0
AND fr.Step <2
)
select FlightID,
Origin,
Destination as FinalDestination,
Route,
Step As CountOfConnectingFlights,
SUBSTRING_INDEX(Route, ',', 1) as Origin_check,
SUBSTRING_INDEX(Route, ',', -1) as FinalDestination_check
From (Select FlightID, Origin, Destination, Route, Step,
ROW_NUMBER() OVER ( PARTITION BY FlightID
ORDER BY FlightID, Step DESC) as RowNumber
from FlightRoutes) as tbl
where RowNumber = 1;
Output
FlightID | Origin | Destination | Route | Step | Origin_check | FinalDestination_Check |
---|---|---|---|---|---|---|
2 | D | E | D,E | 1 | D | E |
1 | A | C | A,B,C | 2 | A | C |
3 | F | I | F,G,H,I | 3 | F | I |
4 | M | N | M,D,A,N | 3 | M | N |
Solution Refactored with joins instead of correlated subquery in Base query
Correlated subqueries are computationally inefficient as they require re-running the subquery for each record in the dataset.
Hence, we will be refactoring our code and utilise joins to achieve same outcome.
WITH RECURSIVE FlightRoutes AS (
-- base query/ anchor member
SELECT
f1.FlightID,
f1.Origin,
f1.Destination,
CAST(CONCAT(f1.Origin, ',', f1.Destination) AS CHAR(100)) AS Route,
1 AS Step
FROM
Flights AS f1
LEFT JOIN
Flights AS f2
ON
f1.Origin = f2.Destination
AND
f1.FlightID = f2.FlightID
WHERE
f2.Destination IS NULL
UNION ALL
-- recursive part
SELECT fr.FlightID,
fr.Origin,
t.Destination,
CAST(CONCAT(fr.Route, ',', t.Destination) AS CHAR(100)),
fr.Step+1 as Step
FROM FlightRoutes fr
JOIN Flights t
ON
fr.FlightID = t.FlightID
AND
fr.Destination = t.Origin
-- terminating condition
WHERE POSITION(t.Destination IN fr.Route) = 0
)
select FlightID,
Origin,
Destination as FinalDestination,
Route,
Step As CountOfConnectingFlights,
SUBSTRING_INDEX(Route, ',', 1) as Origin_check,
SUBSTRING_INDEX(Route, ',', -1) as FinalDestination_check
From (Select FlightID, Origin, Destination, Route, Step,
ROW_NUMBER() OVER ( PARTITION BY FlightID
ORDER BY FlightID, Step DESC) as RowNumber
from FlightRoutes) as tbl
where RowNumber = 1;
Output
FlightID | Origin | Destination | Route | Step | Origin_check | FinalDestination_Check |
---|---|---|---|---|---|---|
2 | D | E | D,E | 1 | D | E |
1 | A | C | A,B,C | 2 | A | C |
3 | F | I | F,G,H,I | 3 | F | I |
4 | M | N | M,D,A,N | 3 | M | N |