Line breaking slides |
||
![]() |
||
![]() |
![]() |
The problemDividing words into linesEsthetic results (big differences, how to quantify)optimization problem (bad news)Not NP complete (good news)What is unesthetic?paragraphs with big holes in themparagraphs with words squeezed togetherparagraphs with words pulled apartuneven paragraphsstrategieschoice: use dynamic programming or greedy alg.Define penalty functionminimize penalty function over paragraphGreedy algorithmstry to make an optimum solution at each step, hope the result is overall the bestgreedy text breakingfill up each line until fulll = 1; i = 1; while (i < n) linestart[l] = i length[l] += wordlength[i] while (length[l] + wordlength[i+1] <= limit && i <= n) length[l] += wordlength[i+1] i += 1 endwhile endwhile At the end: linestart[l] has index of initial word in line l. length[l] has length of that line. What is dynamic programming?an optimization strategy that takes advantage of the structure of the problembased on divide and conquerbased on remembering intermediate resultsSimplest case: memoizing a functionIn a non search context, fibonacci function: f(n) = (n> 0) f(n-1) + f(n-2) (else) 0 this is inefficient (unless we remember all values of f(n) once we calculate them. Dynamic programming works when we can simplify exponential divide and conquer by remembering a polynomial number of results of smaller problems, whose solutions can be used to solve the larger problem. dynamic line breakingsearch for minimum penalty function for paragraphwe will consider the value of adding a break at every point in the paragraphBut, we can ensure that we don't need to consider all previous breaks in combination with thatWe need to guarantee that no break we consider later can invalidate an earlier optimumThe Dynamic line breakerarray cost[i] is the lowest penalty of breaks from words i to n. length(i, j) = sum of word lengths from i to j. Function Legal(i, j) = boolean test as to whether the line from word[i] to word[j] is a legal line length. nextbreak[i] = if word[i] is a break, index of best break for all preceding lines. penalty[n+1] = 0 for i=n to 1 if length(i, n+1) <= optimum cost[i] = 0 nextbreak[i] = n + 1; else if Legal(i, j) for some j choose r such that Legal(i , r) and cost[r] + penalty(Line(i, r) is minimal cost[i] = cost[r] + penalty(line(i, r) nextbreak[i] = r else cost[i] = infinity at the end, if cost[1] < infinity, there is a legal, optimal breaking sequence. To get that sequence, follow the nextbreak array from nextbreak[n]. dynamic line breaking discussionFor simplicity, the algorithm given works backwards from the end of the paragraph to the beginning, which makes the special case 0 penalties for words on the final line easy to accommodate within the algorithm. There are a number of reasons, however, why one might want to run the algorithm in the other direction; for one thing, in an interactive system, the line break array is more likely to be easy to process incrementally in that order, as typed input often appears at the end of a paragraph, and in any case, always follows previously typed data. Making this algorithm incremental in this way requires recalculating penalties from before the final line, so as to accommodate the special case for the last line correctly. Breaking lines in a forward direction is also essential if paragraphs with variable line lengths are to be accommodated; while it is easy to parameterize the line-length by current line number, the line number is not known when breaking from the end of a paragraph back to the beginning. |