Greedy Algorithms Scheduling

Greedy Algorithms

Interval Scheduling

The Problem

Given a set of jobs with a start and end time. Find the largest set of jobs such that no two jobs are running at the same time (the jobs are compatible).

Input: A Set of jobs with a start and end time

Output: The largest set of jobs such that all are compatible with each other

Strategy

  • Sort the jobs by end time in ascending order
  • Create an empty set of selected jobs
  • Iterate though the set of jobs
    • If a job is compatible with the current selected jobs then add it
Checking For compatibility
  • keep a record of the last job added
  • if the start time is after, or at the finish time of the last job added the jobs are compatible

The set is compatible with itself and the last job added has the last finish time so any job that occurs after the finish is compatible with the rest fo the set.

Pseudocode

jobs <- The available jobs with .start and .end

jobs = jobs.sort(job.end) // Sort the job by end time in increasing order O(nlogn)
selected = [] // empty set
for( job : jobs ){
    // the first job is added by default
    if (selected.length ==0 || last.finish <= job.start){
        selected.add(job)
        last=job
    }
}
return selected // An array of selected Jobs

Proof

It is necessary to prove that this algorithm generates the most efficient set.

proof by contradiction

  • Assume the set (\(i_1,i_2,i_3...i_n\)) generated by the algorithm is not the most efficient
  • The most efficient set (\(j_1,j_2,j_3...j_m\)) also exists such that the largest \(r\) can be found
  • Select an \(r\) such that \(\forall v < r : i_v = j_v \land i_r \neq j_r\) (r is the largest value such that all pairs before r are equal)

As $i_r$ is selected by the finish first algorithm; no job exists that is compatible with \(\{ i_1...i_{r-1} \}\) and finishes before \(i_r\) so \(j_r\) finishes after or with \(i_r\)

This implies that \(i_r\) can replace \(j_r\) this is contradictory as it goes against the maximality of \(r\)

A more efficient solution can not be found, so the algorithm provides the most efficient solution

Interval Partitioning Problem

Given a set of jobs (intervals) schedule the jobs into the least number of rooms such that at any given time a room only has one job in it.

Input: A set of jobs with a start and end time

Output: A set of schedules for each room

Strategy

  • Sort the jobs by start time
  • Going through each job in turn
    • check if the new schedule and add the job

**Checking For compatibility

  • Store the schedules in a priority queue
  • use the end time of the last job added as the key
  • Check compatibility with queue.Find_Min
  • if is not compatible is not compatible with any
  • if is compatible add to it

Pseudocode

rooms= new PriorityQueue
jobs = jobs.sort(job.start)
for (job in jobs){
    // the first job is added by default
    if (rooms.length>0 && rooms.find_Min_Key <= job.start){
        room=rooms.pop_Min()
        room.add(job)
        rooms.add(room,job.end)
    }else{
        room=[job]
        rooms.add(room,job.end)
    }
}
return rooms.values();

Proof

terminology

The depth of a set of jobs is the maximum number of jobs running at the same time.

equally

The depth of a set of intervals is the maximum number of intervals that contain any given point

When rooms.find_Min_Key is called the room with the earliest finish time is found, there are no rooms that has a finish time earlier. This shows the depth = roomCount. Adding the job means depth+=1 and roomCount+=1, roomCount matches depth. A more efficient solution can’t be found as depth \(\nless\) roomCount

Minimizing Lateness

Given a set of jobs with a running time and deadline, order the jobs to result in the smallest maximum lateness.

Input: Jobs with running time and deadline

Output: Schedule of Jobs

Strategy

  • Sort the jobs by the deadline

This is the optimal strategy adding start and end times can be added

Pseudocode

jobs <- The available jobs with .time and .deadline

jobs = jobs.sort(job.deadline) // Sort the job by deadline time in increasing order O(nlogn)

//not needed for correct order just being explicit
time=0
for (job in jobs){
    job.setStart(time);
    job.setEnd(time+job.time);
    time+=job.time;
}
return jobs

Proof

Prove that the strategy produces an optimal solution

Observation 1: There exists an optimal solution with no idle time

Observation 2: The earliest-deadline-first strategy produces no idle time

Definition: Given a schedule S an inversion is a pair of jobs \(i\) , \(j\) where \(i \lt j\) (\(i\)’s deadline is before \(j\)’s) however \(j\) appears before \(i\)

Observation 3: The earliest-deadline-first strategy produces the unique schedule with no inversions

Observation 4: if an idle-free schedule has an inversion it has an adjacent inversion

Proof:

  • Let \(i , j\) be the closest inversion
  • Let \(k\) be the element to the right of \(j\)
  • Case 1 [\(j \gt k\)] then \(j ,k\) is an adjacent inversion
  • Case 2 [\(j \lt k\)] then \(i,k\) is a closer inversion as \(i \lt j \lt K\) this is contradictory as \(i,j\) is a closer inversion

Key Claim: Exchanging two adjacent inverted jobs reduces the number of inverses by 1 and does not increase the max lateness. When applied to an optimal solution the result will also be an optimal solution

Proof of claim

let \(l_x\) be the lateness before exchange and \(l'_x\) after for job \(x\).

let \(i,j\) be jobs s.t \(i \lt j\) but \(j\) is before \(i\) in the solution \(S\) (inversion)

\(\forall k \neq i,j : l'_k = l_k\)(1)

\(l'_j \leq l_j\) (2)

if job \(j\) is late (start time is getting later) \(l'_j = f'_j -d_j\)(3) \(l'_j = f_i -d_j\)(4) \(l'_j \leq f_i -d_i\)(5) \(l'_j \leq l_i\)(6)

  1. There is no change from the exchange to the lateness of any other jobs.
  2. The start time of \(j\) was moved forward so its new lateness has improved or remained the same
  3. The new lateness of \(j\) is \(f'_j-d_j\) definition (finish - delay)
  4. The new finish time of \(j\) is equal to the old finish time of \(i\)
  5. as \(i \lt j\) by definition \(d_i \lt d_j\) so \(f_i -d_j \leq f_i-d_i\)
  6. relying on the definition \(l_i = f_i-d_i\)

Shows that the new lateness of \(j)\) is no worse than the old lateness of \(i\) so the solution is still just as optimal.

Final Proof

Let \(S\) be the solution generated bu the earliest-deadline-first strategy

Let \(S'\) be an optimal solution with the fewest inversionsGoodVibes

  • Can assume \(S'\) has no idle time (claim 1)
  • Case 1 [\(S'\) has no inversions]
    • Then \(S'=S\) the optimal solution was found
  • Case 2 [\(S'\)] has an inversion
    • let \(i-j\) be an adjacent inversion
    • exchange \(i,j\), is still an optimal solution
    • contradicts the “optimal solution with the fewest inversions”

So \(S =S'\) the earliest-deadline-first strategy is optimal

Generic Proof Strategies

Greedy Algorithm Stays Ahead

Show that at every stage of the the greedy algorithm produces a solution that is better then or as good as an optimal solution.

Structural

Discover structural bounds asserting that every solution must have a set value. Then show your algorithm achieves this bound.

Exchange

Gradually transform any optimal solution to the one found by the greedy algorithm without hurting the solution quality.




Greedy Algorithms Shortest Path

Shortest path problems

Single-Pair Shortest path problem

Input:

  • A weighted graph/digraph
  • A source node(start)
  • A destination node(end) Output:
  • A shortest path through the graph from the source to the destination

Single-Source shortest path problem

  • A weighted graph/digraph
  • A source node(start) Output:
  • A set of shortest paths through the graph from the source to every node in the graph
  • Can be represented as a tree with the source as the node

Dijkstra’s algorithm Greedy

Maintain a set of explored nodes \(S\)

let \(d[u]\) be the length of a shortest \(s \rightarrow u\) path

initialize \(S \leftarrow \{s\}\) , \(d[s] \leftarrow 0\)

repeatedly:

  • choose unexplored node \(v\notin S\) that minimises \(\pi(v)\) where:

[\pi(v) = \min_{e = (u,v):u \in S}]

  • add \(v\) to \(S\) , \(d[v]\leftarrow \pi(v)\)
  • prev[v] \(\leftarrow e\)

The shortest path can then be found by traversing from any point to the start by following prev

Proof

Prove:

For every node \(u \in S\), \(d[u]\) is the shortest path \(s \rightarrow u\)

strategy: induction

base Case:

when \(\vert S \vert = 1\) is easy as \(S = \{s\}\), as \(s[s]=0\) there is no shorter solution

Inductive case:

Assume is true for \(\vert S \vert \geq 1\)

Let \(v\) be the next path added to \(S\) and let \((u,v)\) be the edge

A shortest path \(s \rightarrow u\) + \(L(u,v)\) is a \(s \rightarrow v\) path of length \(\pi(v)\)

Consider anther path \(P\)

  • let \(e = (x,y)\) be the first edge in \(P\) that leaves \(S\)
    • so \(x \in S \land y \notin nS\)
  • let \(P'\) be the subpath of \(s \rightarrow x\)
  • the length of \(P \geq \pi(v)\) as soon as it reaches y

[L(P) \geq L(P’)+ L(e) \geq d[x]+L(e) \geq \pi(y) \geq \pi(v)]

when L determines the length of a path or edge

Dijkstra’s algorithm Efficient

Efficient algorithm can be found by implementing some optimizations on the previous strategy.

Optimization 1

For each unexplored node \(v \in S\)

Maintain \(\pi(v)\) instead of computing form the definition

As elements are added to \(S\) for some \(v\notin S , \pi(v)\) can only decrease.

Suppose \(u\) is added to \(S\) if there is an edge \(e=(u,v)\) leaving \(u\)

then:

\(pi(v) \leftarrow \min {\pi(v), \pi(u)+L(e)}\)

Optimization 2

Use a min-optimized priority queue to chose an unexplored node that minimises \(\pi(v)\)

Implementation

graph <- graph to search
s <- start node

dist <- array of distances
prev <- array of previous nodes

//initialize start values
pq = priorityQueue()
for each vertex in graph{
    dist[vertex] = INFINITY
    prev[vertex] = UNDEFINED
}
dist[s] = 0;
for each vertex in graph{
    pq.insert(s,dist[s]);
}
// calculate minimum path
while (pq.empty =false){
    u = pq.removeMin();
    for edge in graph.edgesFrom(u){
        v = edge.goingTo() 
        if (dist[v] > dist[u]+edge.weight){
            dist[v]=dist[u]+edge.weight
            pq.decreaseKey(v,dist[v])
            prev[v]=u
        }
    }

}

The shortest path for a node can be found by traversing prev from the node to the start

Complexity

The time complexity fo the algorithm depends on the priority queue algorithm chosen and the ratio of edges to nodes.

For a dense graph \(e\) is \(O(n^2)\) (where \(e\) is the number of edges and \(n\) is the number of nodes) using an array bases implementation is optimal as decreaseKey is \(O(1)\)

For a sparse graph \(e\) is \(O(n)\) then a heap based method is better as removeMin is \(O(n)\)




Minimum Spanning Tree

Spanning Tree

A spanning tree fo a graph is a subgraph fot he graph that is both acyclic and connected

Properties

if \(H=(V,T)\) is a subgraph of undirected graph \(G=(V,E)\) then the following are equivalent

  • H is a spanning tree of G
  • H is connected and hs \(\vert V \vert -1\) edges
  • H is acyclic and hs \(\vert V \vert -1\) edges
  • H is minimally connected, removing any edge disconnects it
  • H is maximally acyclic, adding an edge creates a cycle

Minimum Spanning Tree

Given a weighted connected undirected graph a minimum spanning tree is one there the total sum of it’s edges weights are minimized.

Cuts and Cutset

Def Cut: A partition of nodes into two non-empty subsets of \(S\) and \(V \setminus S\)

Def Cutset: The cutset of a cut S is the ste of edges with exactly one endpoint in S

Proposition

A Cycle and a cutset intersect on an even number of edges

If cycle C is within cut S or outside then the intersection is empty and even

If a node on the Cycle is outside S and the cycle enters S then the cycle must leave, this can be repeated but results in an intersection of even size.

Fundamental Cycle

let \(H=(V,T)\) be a spanning tree of \(G(V,E)\)

For any edge \(e \in E \notin T : (V,T\cup {e})\) contains a cycle \(C\)

For any edge \(f\in C : (v,(T\cup{e})\setminus{f})\) is a spanning tree

if \(C_e \leq C_f\) then \((V,T)\) is not a minimum spanning tree.

Fundamental Cut Set

let \(H=(V,T)\) be a spanning tree of \(G(V,E)\)

for any edge \(f \in T : (V,T\setminus{f})\) has two connected components

Let \(D\) denote the cutset between the two components

For any edge \(e \in D : (V,(T\setminus{f})\cup{e})\) is a spanning tree

if \(C_e \leq C_f\) then \((V,T)\) is not a minimum spanning tree.

Greedy Algorithms

The Red Rule

let \(C\) be a cycle with no red edges

Select an uncolored edge of \(C\) of max cost and color it red.

The Blue Rule

let \(D\) be a cutset with no blue edges

select an uncolored edge of \(D\) of min cost and color it blue

Algorithm

Apply the red and blue rules nondeterministically until all edges are blue or red

The blue edges form a minimum spanning tree

Note: We can stop when n-1 edges are colored blue.

Proof

Step 1

Color Invariant: There exists a MST \((V,T')\) containing every blue edge and no red edge

Induction On Blue Rule

Base Case: No edges colored \(\implies\) Every MST satisfies invariant

Inductive Step:

Suppose the color invariant is true before the blue rule

Let \(D\) be a chosen cutset, let \(f \in D\) be a blue edge

If \(f \in T'\) then \(T'\) still satisfies the invariant

Otherwise: Consider fundament cycle \(C\) by adding \(f\) to \(T'\)

let \(e \in C\) be another edge in \(D\)

we can show \(e\)is uncolored and \(C_e \geq C_f\)

  • as \(e \in T' \implies e\) is not red
  • as blue rule \(\implies e\) not blue \(C_e \geq C_f\)

Thus \((T' \cup {f})\setminus {e}\) satisfies the invariant

Induction on Red Rule

Base Case: No edges colored \(\implies\) Every MST satisfies invariant

Suppose the color invariant is true before the red rule

let \(C\) be the chosen cycle and let \(e\) be the edge colored red

If \(e \notin T'\) then the \(T'\) still satisfies the invariant

Otherwise: Consider the fundamental cutset \(D\) by deleting \(e\) form \(T'\)

let \(f \in D\) be an edge in \(C\)

we can show \(f\) is uncolored and \(C_e \geq C_f\)

  • as \(f \notin T' \implies f\) is not blue
  • as red rule \(\implies f\) not red and \(C_e \geq C_f\)

Thus \((T' \cup {f})\setminus {e}\) satisfies the invariant

Step 2

Theorem: The algorithm terminates

Show that a uncolored edge or any other uncolored ede can be colored red or blue.

Select uncolored edge \(e\)

The blue edges form a forrest

Case 1

Both endpoints of \(e\) are in the same blue tree

  • Apply red rule to the cycle formed by adidng \(e\) to the forrest
  • Coloring \(e\) red

Case 2

Both endpoints of \(e\) are in different blue trees

  • Apply blue rule to the cutset introduces by either of the two blue trees
  • Some uncolored line in the cutset will be colored

Prim’s Algorithm

Algorithm

initialize \(S = {s}\) for any node \(s\)

let \(T = \emptyset\)

repeat \(n-1\) times

  • Add to \(T\) a minimum cost edge with exactly one endpoint in \(S\)
  • Add the other endpoint to \(S\)

Proof

Theorem: Prim’s algorithms computes a MST

Assume all vertices in \(S\) are connected by a blue edges

All vertices in cutset \(C\) of \(S\) are nto colored can apply blue rule by coloring cheapest edge in \(C\) blue

Complexity

Prims algorithm can run in \(O(m\log n)\) time

Pseudocode

graph <- graph to find MST

S <- empty set
T <- empty set

dist <- current known distances

//initialize start values
pq = priorityQueue()
for each vertex in graph{
    dist[vertex] = INFINITY
    prev[vertex] = UNDEFINED
}
S.add( arbitrary node from graph)
dist[s] = 0;
for each vertex in graph{
    pq.insert(s,dist[s]);
}
// calculate minimum path
while (pq.empty =false){
    u = pq.removeMin();
    S.add(u)
    for edge (u,v) where v not in S{
        if (edge.length < dist[v]){
            dist[v]=edge.weight
            pq.decreaseKey(v,dist[v])
            prev[v]=edge
        }
    }
}

Kruskal’s Algorithms

Consider edges in acceding order of coset

  • Add to the tree unless it would form a cycle

Proof

Theorem: kruskal’s algorithm computes an MST

select edge \(e\)

Case 1: both end points of e in the same blue tree

  • Color e by applying red rule to unique cycle Case 2: both endpoints of e in different blue trees
  • Color e blue by applying blue rule to cutset defined by either tree

Complexity

Kruskal’s algorithm can run in \(O(m \log m)\)

Pseudocode

Using union-find data structure to dynamically maintain connected components

edges <- edges in the graph
vertices <- vertices in the graph
T <- empty set

Sort edges by cost
uf=UnionFind()

for vertex of vertices{
    uf.makeSet(vertex)
}
for edge of edges{
    if (uf.getSet(edge.start)!=uf.getSet(edge.end)){
        T.add(edge)
        uf.union(u,v)
    }
}
return T

Reverse-delete algorithm

Start with all edges in T

Sort into descending order

Delete edge form T unless it would disconnect T

Proof

Theorem the reverse-delete algorithm

Case 1: removing edge does not disconnects \(T\)

  • apply red rule to the cycle the line becomes red Case 2: removing edge disconnects \(T\)
  • apply blue rule to cutset \(D\) introduced by either component

Complexity

Can be done in \(O(m \log n (\log \log n)^3)\)