Greedy Algorithms Scheduling
Greedy Algorithms
Interval Scheduling
The Problem
Given a set of jobs with a start and end time. Find the largest set of jobs such that no two jobs are running at the same time (the jobs are compatible).
Input: A Set of jobs with a start and end time
Output: The largest set of jobs such that all are compatible with each other
Strategy
- Sort the jobs by end time in ascending order
- Create an empty set of selected jobs
- Iterate though the set of jobs
-
- If a job is compatible with the current selected jobs then add it
Checking For compatibility
- keep a record of the last job added
- if the start time is after, or at the finish time of the last job added the jobs are compatible
The set is compatible with itself and the last job added has the last finish time so any job that occurs after the finish is compatible with the rest fo the set.
Pseudocode
jobs <- The available jobs with .start and .end
jobs = jobs.sort(job.end) // Sort the job by end time in increasing order O(nlogn)
selected = [] // empty set
for( job : jobs ){
// the first job is added by default
if (selected.length ==0 || last.finish <= job.start){
selected.add(job)
last=job
}
}
return selected // An array of selected Jobs
Proof
It is necessary to prove that this algorithm generates the most efficient set.
proof by contradiction
- Assume the set (\(i_1,i_2,i_3...i_n\)) generated by the algorithm is not the most efficient
- The most efficient set (\(j_1,j_2,j_3...j_m\)) also exists such that the largest \(r\) can be found
- Select an \(r\) such that \(\forall v < r : i_v = j_v \land i_r \neq j_r\) (r is the largest value such that all pairs before r are equal)
As $i_r$ is selected by the finish first algorithm; no job exists that is compatible with \(\{ i_1...i_{r-1} \}\) and finishes before \(i_r\) so \(j_r\) finishes after or with \(i_r\)
This implies that \(i_r\) can replace \(j_r\) this is contradictory as it goes against the maximality of \(r\)
A more efficient solution can not be found, so the algorithm provides the most efficient solution
Interval Partitioning Problem
Given a set of jobs (intervals) schedule the jobs into the least number of rooms such that at any given time a room only has one job in it.
Input: A set of jobs with a start and end time
Output: A set of schedules for each room
Strategy
- Sort the jobs by start time
- Going through each job in turn
-
- check if the new schedule and add the job
**Checking For compatibility
- Store the schedules in a priority queue
- use the end time of the last job added as the key
- Check compatibility with queue.Find_Min
- if is not compatible is not compatible with any
- if is compatible add to it
Pseudocode
rooms= new PriorityQueue
jobs = jobs.sort(job.start)
for (job in jobs){
// the first job is added by default
if (rooms.length>0 && rooms.find_Min_Key <= job.start){
room=rooms.pop_Min()
room.add(job)
rooms.add(room,job.end)
}else{
room=[job]
rooms.add(room,job.end)
}
}
return rooms.values();
Proof
terminology
The
depth
of a set of jobs is the maximum number of jobs running at the same time.
equally
The
depth
of a set of intervals is the maximum number of intervals that contain any given point
When rooms.find_Min_Key is called the room with the earliest finish time is found, there are no rooms that has a finish time earlier. This shows the depth = roomCount
. Adding the job means depth+=1
and roomCount+=1
, roomCount matches depth. A more efficient solution can’t be found as depth
\(\nless\) roomCount
Minimizing Lateness
Given a set of jobs with a running time and deadline, order the jobs to result in the smallest maximum lateness.
Input: Jobs with running time and deadline
Output: Schedule of Jobs
Strategy
- Sort the jobs by the deadline
This is the optimal strategy adding start and end times can be added
Pseudocode
jobs <- The available jobs with .time and .deadline
jobs = jobs.sort(job.deadline) // Sort the job by deadline time in increasing order O(nlogn)
//not needed for correct order just being explicit
time=0
for (job in jobs){
job.setStart(time);
job.setEnd(time+job.time);
time+=job.time;
}
return jobs
Proof
Prove that the strategy produces an optimal solution
Observation 1: There exists an optimal solution with no idle time
Observation 2: The earliest-deadline-first strategy produces no idle time
Definition: Given a schedule S an inversion is a pair of jobs \(i\) , \(j\) where \(i \lt j\) (\(i\)’s deadline is before \(j\)’s) however \(j\) appears before \(i\)
Observation 3: The earliest-deadline-first strategy produces the unique schedule with no inversions
Observation 4: if an idle-free schedule has an inversion it has an adjacent inversion
Proof:
- Let \(i , j\) be the closest inversion
- Let \(k\) be the element to the right of \(j\)
- Case 1 [\(j \gt k\)] then \(j ,k\) is an adjacent inversion
- Case 2 [\(j \lt k\)] then \(i,k\) is a closer inversion as \(i \lt j \lt K\) this is contradictory as \(i,j\) is a closer inversion
Key Claim: Exchanging two adjacent inverted jobs reduces the number of inverses by 1 and does not increase the max lateness. When applied to an optimal solution the result will also be an optimal solution
Proof of claim
let \(l_x\) be the lateness before exchange and \(l'_x\) after for job \(x\).
let \(i,j\) be jobs s.t \(i \lt j\) but \(j\) is before \(i\) in the solution \(S\) (inversion)
\(\forall k \neq i,j : l'_k = l_k\)(1)
\(l'_j \leq l_j\) (2)
if job \(j\) is late (start time is getting later) \(l'_j = f'_j -d_j\)(3) \(l'_j = f_i -d_j\)(4) \(l'_j \leq f_i -d_i\)(5) \(l'_j \leq l_i\)(6)
- There is no change from the exchange to the lateness of any other jobs.
- The start time of \(j\) was moved forward so its new lateness has improved or remained the same
- The new lateness of \(j\) is \(f'_j-d_j\) definition (finish - delay)
- The new finish time of \(j\) is equal to the old finish time of \(i\)
- as \(i \lt j\) by definition \(d_i \lt d_j\) so \(f_i -d_j \leq f_i-d_i\)
- relying on the definition \(l_i = f_i-d_i\)
Shows that the new lateness of \(j)\) is no worse than the old lateness of \(i\) so the solution is still just as optimal.
Final Proof
Let \(S\) be the solution generated bu the earliest-deadline-first strategy
Let \(S'\) be an optimal solution with the fewest inversionsGoodVibes
- Can assume \(S'\) has no idle time (claim 1)
- Case 1 [\(S'\) has no inversions]
-
- Then \(S'=S\) the optimal solution was found
- Case 2 [\(S'\)] has an inversion
-
- let \(i-j\) be an adjacent inversion
-
- exchange \(i,j\), is still an optimal solution
-
- contradicts the “optimal solution with the fewest inversions”
So \(S =S'\) the earliest-deadline-first strategy is optimal
Generic Proof Strategies
Greedy Algorithm Stays Ahead
Show that at every stage of the the greedy algorithm produces a solution that is better then or as good as an optimal solution.
Structural
Discover structural bounds asserting that every solution must have a set value. Then show your algorithm achieves this bound.
Exchange
Gradually transform any optimal solution to the one found by the greedy algorithm without hurting the solution quality.
Greedy Algorithms Shortest Path
Shortest path problems
Single-Pair Shortest path problem
Input:
- A weighted graph/digraph
- A source node(start)
- A destination node(end) Output:
- A shortest path through the graph from the source to the destination
Single-Source shortest path problem
- A weighted graph/digraph
- A source node(start) Output:
- A set of shortest paths through the graph from the source to every node in the graph
- Can be represented as a tree with the source as the node
Dijkstra’s algorithm Greedy
Maintain a set of explored nodes \(S\)
let \(d[u]\) be the length of a shortest \(s \rightarrow u\) path
initialize \(S \leftarrow \{s\}\) , \(d[s] \leftarrow 0\)
repeatedly:
- choose unexplored node \(v\notin S\) that minimises \(\pi(v)\) where:
[\pi(v) = \min_{e = (u,v):u \in S}]
- add \(v\) to \(S\) , \(d[v]\leftarrow \pi(v)\)
- prev[v] \(\leftarrow e\)
The shortest path can then be found by traversing from any point to the start by following prev
Proof
Prove:
For every node \(u \in S\), \(d[u]\) is the shortest path \(s \rightarrow u\)
strategy: induction
base Case:
when \(\vert S \vert = 1\) is easy as \(S = \{s\}\), as \(s[s]=0\) there is no shorter solution
Inductive case:
Assume is true for \(\vert S \vert \geq 1\)
Let \(v\) be the next path added to \(S\) and let \((u,v)\) be the edge
A shortest path \(s \rightarrow u\) + \(L(u,v)\) is a \(s \rightarrow v\) path of length \(\pi(v)\)
Consider anther path \(P\)
- let \(e = (x,y)\) be the first edge in \(P\) that leaves \(S\)
- so \(x \in S \land y \notin nS\)
- let \(P'\) be the subpath of \(s \rightarrow x\)
- the length of \(P \geq \pi(v)\) as soon as it reaches y
[L(P) \geq L(P’)+ L(e) \geq d[x]+L(e) \geq \pi(y) \geq \pi(v)]
when L determines the length of a path or edge
Dijkstra’s algorithm Efficient
Efficient algorithm can be found by implementing some optimizations on the previous strategy.
Optimization 1
For each unexplored node \(v \in S\)
Maintain \(\pi(v)\) instead of computing form the definition
As elements are added to \(S\) for some \(v\notin S , \pi(v)\) can only decrease.
Suppose \(u\) is added to \(S\) if there is an edge \(e=(u,v)\) leaving \(u\)
then:
\(pi(v) \leftarrow \min {\pi(v), \pi(u)+L(e)}\)
Optimization 2
Use a min-optimized priority queue to chose an unexplored node that minimises \(\pi(v)\)
Implementation
graph <- graph to search
s <- start node
dist <- array of distances
prev <- array of previous nodes
//initialize start values
pq = priorityQueue()
for each vertex in graph{
dist[vertex] = INFINITY
prev[vertex] = UNDEFINED
}
dist[s] = 0;
for each vertex in graph{
pq.insert(s,dist[s]);
}
// calculate minimum path
while (pq.empty =false){
u = pq.removeMin();
for edge in graph.edgesFrom(u){
v = edge.goingTo()
if (dist[v] > dist[u]+edge.weight){
dist[v]=dist[u]+edge.weight
pq.decreaseKey(v,dist[v])
prev[v]=u
}
}
}
The shortest path for a node can be found by traversing prev from the node to the start
Complexity
The time complexity fo the algorithm depends on the priority queue algorithm chosen and the ratio of edges to nodes.
For a dense graph \(e\) is \(O(n^2)\) (where \(e\) is the number of edges and \(n\) is the number of nodes) using an array bases implementation is optimal as decreaseKey
is \(O(1)\)
For a sparse graph \(e\) is \(O(n)\) then a heap based method is better as removeMin
is \(O(n)\)
Minimum Spanning Tree
Spanning Tree
A spanning tree fo a graph is a subgraph fot he graph that is both acyclic and connected
Properties
if \(H=(V,T)\) is a subgraph of undirected graph \(G=(V,E)\) then the following are equivalent
- H is a spanning tree of G
- H is connected and hs \(\vert V \vert -1\) edges
- H is acyclic and hs \(\vert V \vert -1\) edges
- H is minimally connected, removing any edge disconnects it
- H is maximally acyclic, adding an edge creates a cycle
Minimum Spanning Tree
Given a weighted connected undirected graph a minimum spanning tree is one there the total sum of it’s edges weights are minimized.
Cuts and Cutset
Def Cut: A partition of nodes into two non-empty subsets of \(S\) and \(V \setminus S\)
Def Cutset: The cutset of a cut S is the ste of edges with exactly one endpoint in S
Proposition
A Cycle and a cutset intersect on an even number of edges
If cycle C is within cut S or outside then the intersection is empty and even
If a node on the Cycle is outside S and the cycle enters S then the cycle must leave, this can be repeated but results in an intersection of even size.
Fundamental Cycle
let \(H=(V,T)\) be a spanning tree of \(G(V,E)\)
For any edge \(e \in E \notin T : (V,T\cup {e})\) contains a cycle \(C\)
For any edge \(f\in C : (v,(T\cup{e})\setminus{f})\) is a spanning tree
if \(C_e \leq C_f\) then \((V,T)\) is not a minimum spanning tree.
Fundamental Cut Set
let \(H=(V,T)\) be a spanning tree of \(G(V,E)\)
for any edge \(f \in T : (V,T\setminus{f})\) has two connected components
Let \(D\) denote the cutset between the two components
For any edge \(e \in D : (V,(T\setminus{f})\cup{e})\) is a spanning tree
if \(C_e \leq C_f\) then \((V,T)\) is not a minimum spanning tree.
Greedy Algorithms
The Red Rule
let \(C\) be a cycle with no red edges
Select an uncolored edge of \(C\) of max cost and color it red.
The Blue Rule
let \(D\) be a cutset with no blue edges
select an uncolored edge of \(D\) of min cost and color it blue
Algorithm
Apply the red and blue rules nondeterministically until all edges are blue or red
The blue edges form a minimum spanning tree
Note: We can stop when n-1 edges are colored blue.
Proof
Step 1
Color Invariant: There exists a MST \((V,T')\) containing every blue edge and no red edge
Induction On Blue Rule
Base Case: No edges colored \(\implies\) Every MST satisfies invariant
Inductive Step:
Suppose the color invariant is true before the blue rule
Let \(D\) be a chosen cutset, let \(f \in D\) be a blue edge
If \(f \in T'\) then \(T'\) still satisfies the invariant
Otherwise: Consider fundament cycle \(C\) by adding \(f\) to \(T'\)
let \(e \in C\) be another edge in \(D\)
we can show \(e\)is uncolored and \(C_e \geq C_f\)
- as \(e \in T' \implies e\) is not red
- as blue rule \(\implies e\) not blue \(C_e \geq C_f\)
Thus \((T' \cup {f})\setminus {e}\) satisfies the invariant
Induction on Red Rule
Base Case: No edges colored \(\implies\) Every MST satisfies invariant
Suppose the color invariant is true before the red rule
let \(C\) be the chosen cycle and let \(e\) be the edge colored red
If \(e \notin T'\) then the \(T'\) still satisfies the invariant
Otherwise: Consider the fundamental cutset \(D\) by deleting \(e\) form \(T'\)
let \(f \in D\) be an edge in \(C\)
we can show \(f\) is uncolored and \(C_e \geq C_f\)
- as \(f \notin T' \implies f\) is not blue
- as red rule \(\implies f\) not red and \(C_e \geq C_f\)
Thus \((T' \cup {f})\setminus {e}\) satisfies the invariant
Step 2
Theorem: The algorithm terminates
Show that a uncolored edge or any other uncolored ede can be colored red or blue.
Select uncolored edge \(e\)
The blue edges form a forrest
Case 1
Both endpoints of \(e\) are in the same blue tree
- Apply red rule to the cycle formed by adidng \(e\) to the forrest
- Coloring \(e\) red
Case 2
Both endpoints of \(e\) are in different blue trees
- Apply blue rule to the cutset introduces by either of the two blue trees
- Some uncolored line in the cutset will be colored
Prim’s Algorithm
Algorithm
initialize \(S = {s}\) for any node \(s\)
let \(T = \emptyset\)
repeat \(n-1\) times
- Add to \(T\) a minimum cost edge with exactly one endpoint in \(S\)
- Add the other endpoint to \(S\)
Proof
Theorem: Prim’s algorithms computes a MST
Assume all vertices in \(S\) are connected by a blue edges
All vertices in cutset \(C\) of \(S\) are nto colored can apply blue rule by coloring cheapest edge in \(C\) blue
Complexity
Prims algorithm can run in \(O(m\log n)\) time
Pseudocode
graph <- graph to find MST
S <- empty set
T <- empty set
dist <- current known distances
//initialize start values
pq = priorityQueue()
for each vertex in graph{
dist[vertex] = INFINITY
prev[vertex] = UNDEFINED
}
S.add( arbitrary node from graph)
dist[s] = 0;
for each vertex in graph{
pq.insert(s,dist[s]);
}
// calculate minimum path
while (pq.empty =false){
u = pq.removeMin();
S.add(u)
for edge (u,v) where v not in S{
if (edge.length < dist[v]){
dist[v]=edge.weight
pq.decreaseKey(v,dist[v])
prev[v]=edge
}
}
}
Kruskal’s Algorithms
Consider edges in acceding order of coset
- Add to the tree unless it would form a cycle
Proof
Theorem: kruskal’s algorithm computes an MST
select edge \(e\)
Case 1: both end points of e in the same blue tree
- Color e by applying red rule to unique cycle Case 2: both endpoints of e in different blue trees
- Color e blue by applying blue rule to cutset defined by either tree
Complexity
Kruskal’s algorithm can run in \(O(m \log m)\)
Pseudocode
Using union-find data structure to dynamically maintain connected components
edges <- edges in the graph
vertices <- vertices in the graph
T <- empty set
Sort edges by cost
uf=UnionFind()
for vertex of vertices{
uf.makeSet(vertex)
}
for edge of edges{
if (uf.getSet(edge.start)!=uf.getSet(edge.end)){
T.add(edge)
uf.union(u,v)
}
}
return T
Reverse-delete algorithm
Start with all edges in T
Sort into descending order
Delete edge form T unless it would disconnect T
Proof
Theorem the reverse-delete algorithm
Case 1: removing edge does not disconnects \(T\)
- apply red rule to the cycle the line becomes red Case 2: removing edge disconnects \(T\)
- apply blue rule to cutset \(D\) introduced by either component
Complexity
Can be done in \(O(m \log n (\log \log n)^3)\)