1 # ===================================================================== 2 # xmlbreak.sed: 3 # break long [x]html input into lines, for better RCS versioning. 4 # 5 # Copyright (c) 2007,2008,2009 Carlo Strozzi 6 # 7 # This program is free software; you can redistribute it and/or modify 8 # it under the terms of the GNU General Public License as published by 9 # the Free Software Foundation; version 2 dated June, 1991. 10 # 11 # This program is distributed in the hope that it will be useful, 12 # but WITHOUT ANY WARRANTY; without even the implied warranty of 13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 14 # GNU General Public License for more details. 15 # 16 # You should have received a copy of the GNU General Public License 17 # along with this program; if not, write to the Free Software 18 # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. 19 # 20 # ===================================================================== 21 22 # ===================================================================== 23 # This program is meant to be used instead of tidy(1) on data produced 24 # by some buggy GUI editors, such as TinyMCE, that produce broken 25 # markup which will not pass through tidy. The resulting data may still 26 # be rendered correctly by browsers, so this may still be acceptable 27 # HTML but broken XHTML. So, if the latter mode is used, we need to 28 # tweak the input data which is not going to be scanned by tidy(1). 29 # This filter also helps not to bloat RCS versioning files because of 30 # those HTML editors that send all the data in one single line. 31 # ===================================================================== 32 33 # Turn CR+LF line-end convention into NL. 34 s/ $// 35 36 # Same as above, but for Macintosh clients. 37 s/ /\ 38 /g 39 40 # This looks a bit elaborated at a first glance, but we need to avoid 41 # the insertion of an additional newline at every repeated page editing. 42 43 s/\(.\)<\([^\/]\)/\1\ 44 <\2/g 45 46 # The TinyMCE WYSIWYG AJAX editor insists on turning hard line breaks 47 # into
sequences, which will result in duplicated line breaks in 48 #
 element contents that end with real newlines. Although it is
    49	# not up to TypeWriter to try and fix bugs of other programs, in this
    50	# case it may be worth doing it since the solution is general enough.
    51	# It sounds logical to me that if a 
follows a newline, then the 52 # newline can be safely removed as the task of breaking the line is 53 # the rendered page is left to the
itself. 54 55 s/\n
/
/g 56 57 # Make sure a blank is inserted before each newline, or XHTML 1.1 will 58 # swallow the spacing between the two words that are across the break. 59 60 s/ *$/ / 61 62 # EOF