Line breaking slides

Updates

The problem

Dividing words into lines

Esthetic results (big differences, how to quantify)

optimization problem (bad news)

Not NP complete (good news)

What is unesthetic?

paragraphs with big holes in them

paragraphs with words squeezed together

paragraphs with words pulled apart

uneven paragraphs

strategies

choice: use dynamic programming or greedy alg.

Define penalty function

minimize penalty function over paragraph

Greedy algorithms

try to make an optimum solution at each step, hope the result is overall the best

greedy text breaking

fill up each line until full

l = 1;
i = 1;
while (i < n)
    linestart[l] = i
    length[l] += wordlength[i]
    while (length[l] + wordlength[i+1] <= limit && i <= n)
       length[l] += wordlength[i+1]
       i += 1
    endwhile
endwhile

At the end:

linestart[l] has index of initial word in line l.
length[l] has length of that line.

What is dynamic programming?

an optimization strategy that takes advantage of the structure of the problem

based on divide and conquer

based on remembering intermediate results

Simplest case: memoizing a function

In a non search context, fibonacci function:

f(n) = (n> 0) f(n-1) + f(n-2)

(else) 0

this is inefficient (unless we remember

all values of f(n) once we calculate them.

Dynamic programming works when we can simplify

exponential divide and conquer by remembering

a polynomial number of results of smaller problems,

whose solutions can be used to solve the larger

problem.

dynamic line breaking

search for minimum penalty function for paragraph

we will consider the value of adding a break at every point in the paragraph

But, we can ensure that we don't need to consider all previous breaks in combination with that

We need to guarantee that no break we consider later can invalidate an earlier optimum

The Dynamic line breaker

array cost[i] is the lowest penalty
of breaks from words i to n.

length(i, j) = sum of word lengths from i to j.

Function Legal(i, j) = boolean test as to
whether the line from word[i] to word[j] is a
legal line length.

nextbreak[i] = if word[i] is a break, index
of best break for all preceding lines.

penalty[n+1] = 0
for i=n to 1
   if length(i, n+1) <= optimum
      cost[i] = 0
      nextbreak[i] = n + 1;
   else if Legal(i, j) for some j
      choose r such that Legal(i , r)
            and cost[r] + penalty(Line(i, r) is minimal
       cost[i] = cost[r] + penalty(line(i, r)
       nextbreak[i] = r
   else
       cost[i] = infinity

at the end, if cost[1] < infinity, there is a legal,
optimal breaking sequence.

To get that sequence, follow the nextbreak array
from nextbreak[n].

dynamic line breaking discussion

For simplicity, the algorithm given works backwards from the end of the paragraph to the beginning, which makes the special case 0 penalties for words on the final line easy to accommodate within the algorithm. There are a number of reasons, however, why one might want to run the algorithm in the other direction; for one thing, in an interactive system, the line break array is more likely to be easy to process incrementally in that order, as typed input often appears at the end of a paragraph, and in any case, always follows previously typed data. Making this algorithm incremental in this way requires recalculating penalties from before the final line, so as to accommodate the special case for the last line correctly.

Breaking lines in a forward direction is also essential if paragraphs with variable line lengths are to be accommodated; while it is easy to parameterize the line-length by current line number, the line number is not known when breaking from the end of a paragraph back to the beginning.