{"id":418,"date":"2008-01-11T19:50:27","date_gmt":"2008-01-12T01:50:27","guid":{"rendered":"http:\/\/blog.codedread.com\/archives\/2008\/01\/11\/adobitrocity\/"},"modified":"2008-01-11T19:50:27","modified_gmt":"2008-01-12T01:50:27","slug":"adobitrocity","status":"publish","type":"post","link":"https:\/\/www.codedread.com\/blog\/archives\/2008\/01\/11\/adobitrocity\/","title":{"rendered":"Adobitrocity"},"content":{"rendered":"<p>I came across <a href=\"http:\/\/tnerual.eriogerg.free.fr\/\">Laurent Gr\u00e9goire<\/a>'s <span style=\"acronym\" title=\"Concurrent Versioning System\">CVS<\/span> <a href=\"http:\/\/tnerual.eriogerg.free.fr\/cvs.html\">Quick Reference Card<\/a>.  I needed something quick and handy to put on my thumb drive, so I figured this would do.  Only problem is that my Windows box only understood Adobe's PDF format.  I've grown to really dislike PDF, primarily for the fact that the Adobe Acrobat Reader takes forever to come up and has become bloated.  Since I have Adobe Acrobat Professional Version 8.0.0 installed here, I thought I'd see what formats I could convert the file into for doing some minor edits to the file<!--more--><\/p>\n<div class=\"ads\"><object type=\"text\/html\" width=\"468\" height=\"60\" data=\"http:\/\/www.codedread.com\/gads.php\"><\/object><\/div>\n<p>The CVS Quick Reference Card is a 3-columned, 2-page document consisting of 99% text, 2 horizontal lines, 2 vertical lines and a couple bullet symbols.  Here is a screenshot of the top-left of the document:<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.codedread.com\/images\/cvsqrc-fragment.png\" border=\"1\" alt=\"Screenshot of a portion of the CVS Quick Reference Card pdf consisting of text laid out in columns\"><\/img><\/p>\n<p>Looks nice.  The PDF weighs in at 84kb.  Not bad, really.  But what I want to do is add a couple more items to the cheat sheet and send it back to Laurent (like <a href=\"http:\/\/www.lullabot.com\/articles\/cvs_annotate_or_what_the_heck_were_they_thinking\">cvs annotate<\/a> for instance).  This is not easy with a document in PDF format.  These are the types of <a href=\"http:\/\/weblogs.macromedia.com\/jd\/archives\/2006\/05\/mcnealy_on_pdf.cfm#c35162\">modifications<\/a> that I'd like to do to PDFs  every once in awhile and it's why I consider PDF more of a closed \"publishing\" format than a true collaborative document format.  I can see the appeal of an electronic document that you know cannot be changed, but for my day-to-day use such documents are far and few between.  Plus I just <a href=\"http:\/\/blog.codedread.com\/archives\/2005\/03\/02\/improving-adobe-reader\/\">can't stand the Acrobat Reader<\/a>, I'm sorry. \ud83d\ude09<\/p>\n<h3>Adobe Output<\/h3>\n<p>I was surprised to see a variety of formats listed in Acrobat Professional under Save As (though unsurprised that <a href=\"http:\/\/www.w3.org\/Graphics\/SVG\" title=\"Scalable Vector Graphics\">SVG<\/a> was not present).  I decided that since I recently started experimenting with <a href=\"http:\/\/shared.snapgrid.com\/index.html\" title=\"GTDTiddlyWiki - a self-contained client-side wiki that you can use with simply a browser\">GTDTiddlyWiki<\/a> on my thumb drive for note-taking, that I would keep everything HTML-browser-centric and settled on the format \"HTML 4.01 with CSS 1.0 (*.htm, *.html)\".  What's the worst that can happen?<\/p>\n<p>Well, the first thing that happened was I got back some errors saying that some of the glyphs could not be converted.  Ok, I can live with some glyph funkiness.<\/p>\n<p>Then I brought up the resultant 249kb .htm file in Firefox and was dismayed to see it was not at all laid out in the nice 3-column layout of the PDF.  This pretty much makes the HTML+CSS output from Adobe Acrobat unusable.<\/p>\n<p>Then I looked at the source and was a little dumbfounded:<\/p>\n<div class=\"code\">\n<p>&#60;BODY bgcolor=white text=black link=blue vlink=purple alink=fushia ><\/p>\n<p>&#60;P><\/p>\n<p>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">CV<\/span>&#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">S<\/span> &#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">QUIC<\/span>&#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">K<\/span> &#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">REFERENC<\/span>&#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">E<\/span> &#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">CAR<\/span>&#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">D<\/span> &#60;\/SPAN<\/p>\n<p>>&#60;\/P><\/p>\n<p>&#60;P style=\"margin-bottom:0px; margin-left:0px; line-height:16px\"><\/p>\n<p>&#60;SPAN style=\"font-family:'sans-serif', 'CMT I'; color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">Overvie<\/span>&#60;\/SPAN<\/p>\n<p>>&#60;SPAN style=\"font-family:'sans-serif', 'CMT I'; color:#000000\"<\/p>\n<p>><span style=\"color:blue; font-weight:bold;\">w<\/span> &#60;\/SPAN<\/p>\n<p>>&#60;\/P><\/p>\n<\/div>\n<p>Maybe someone understands the logic of how contiguous text is broken up into many &#60;span> elements with the same style, but I certainly do not.<\/p>\n<p>I tried Plain Text - not a very legible document (well, at least not the nice 3-column layout of course).<\/p>\n<p>I tried PNG - good but each PNG page was 2339x1654 pixels requiring some manual scaling down for readability on my screen.<\/p>\n<p>I tried JPG - good, but fuzzy if not fullscreen.<\/p>\n<p>In fairness to Adobe, the RTF output was pretty legible and only 44kb.<\/p>\n<h3>SVG Output<\/h3>\n<p>So then I thought - I'm already half-into this, let's try to get a decent SVG out of it.<\/p>\n<p>I was going to try <a href=\"http:\/\/inkscape.org\/\">Inkscape<\/a>, but their <a href=\"http:\/\/wiki.inkscape.org\/wiki\/index.php\/Required_PDF_Support\">PDF Import<\/a> feature is still not there yet.  Too bad.<\/p>\n<p>I next tried the evaluation version of the <a href=\"http:\/\/www.pdftron.com\/downloads.html#PDF2SVGCMD\">PDF2SVG<\/a> command-line tool.  The SVG output was two pages, totalling 526kb uncompressed.  Bringing it up in the <a href=\"http:\/\/www.opera.com\/\" title=\"The Opera Web Browser\">best available desktop SVG viewer<\/a> pretty much crippled the browser (extremely sluggish) and resulted in the following:<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.codedread.com\/images\/cvsqrc-svg-frag.png\" alt=\"SVG output of CVS Quick Reference card as rendered in Opera 9.5 Beta showing poor quality of SVG output\" border=\"1\"><\/img><\/p>\n<p>Firefox and Safari were worse.<\/p>\n<p>I then looked at the SVG source:<\/p>\n<div class=\"code\">\n<p>&#60;g clip-path=\"url(#clp1)\" transform=\"matrix(1 0 0 -1 0 595.276)\"><\/p>\n<p>&#60;text transform=\"matrix(1 0 0 -1 0 0)\">&#60;tspan x=\"58.333,66.612,75.27\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">CVS<\/span>&#60;\/tspan>&#60;tspan x=\"85.452,94.06,102.88,107.22\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">QUIC<\/span>&#60;\/tspan>&#60;tspan x=\"115.49\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">K<\/span>&#60;\/tspan>&#60;tspan x=\"128.29,136.88,144.41\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">REF<\/span>&#60;\/tspan>&#60;tspan x=\"151.62,159.15,167.74,175.27,184.24\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">ERENC<\/span>&#60;\/tspan>&#60;tspan x=\"192.5\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">E<\/span>&#60;\/tspan>&#60;tspan x=\"203.85,212.13,220.79\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">CAR<\/span>&#60;\/tspan>&#60;tspan x=\"229.39\" y=\"-556.97\" class=\"ps00 ps20\"><span style=\"color:blue; font-weight:bold;\">D<\/span>&#60;\/tspan>&#60;\/text><\/p>\n<p>&#60;g id=\"xfrm1\" transform=\"matrix(1 0 0 1 27.78 552.638)\"><\/p>\n<p>&#60;g id=\"q2\" class=\"ps00 ps20\"><\/p>\n<p>&#60;path d=\"M0 0.199 L240.94 0.199\" class=\"ps01 ps10\"\/><\/p>\n<p>&#60;\/g><\/p>\n<p>&#60;g id=\"xfrm3\" transform=\"matrix(1 0 0 1 -27.78 -552.638)\"><\/p>\n<p>&#60;text transform=\"matrix(1 0 0 -1 0 0)\">&#60;tspan x=\"27.78,35.422,40.005,44.588\" y=\"-534.32\" class=\"ps00 ps21\"><span style=\"color:blue; font-weight:bold;\">Over<\/span>&#60;\/tspan>&#60;tspan x=\"48.782,53.365,56.424,61.007\" y=\"-534.32\" class=\"ps00 ps21\"><span style=\"color:blue; font-weight:bold;\">view<\/span>&#60;\/tspan><\/p>\n<\/div>\n<p>Next up was <a href=\"http:\/\/www.mattercast.com\/products.aspx\">matterCast's SVG Imprint<\/a>.  The output clocked in at 334kb uncompressed.  The SVG was similarly mangled with &#60;tspan>s breaking up the text as above.  Finally, they didn't produce valid SVG (the root &#60;svg> node was missing the namespace declaration: xmlns=\"http:\/\/www.w3.org\/2000\/svg\").  Once I fixed that, the files looked similar to the above, though the browser wasn't as sluggish.<\/p>\n<p>I then found <a href=\"http:\/\/freesvg.texterity.com:90\/\">this service<\/a> which converts submitted PDFs to SVGs and then sends you a link for free.  It's really a promotion of its <a href=\"http:\/\/www.texterity.com\/artstech\/textcafe\/\">TextCafe<\/a> conversion product which sounds nice in theory, especially the \"Detection of text blocks and paragraphs, which can be reflowed automatically\".  I sent the PDF to them and didn't hear anything back for an hour.  So I emailed them and got a response from Martin Hensel that it takes 6-12 hours.  When I got it, it was a zip file containing an HTML harness and some SVG, JPG files.  The SVG files have the same problems as other similar tools - basically a whole bunch of &#60;tspan> elements instead of text blocks.  The HTML harness is kind of a neat idea because it provide bookmarking\/search pane similar to Acrobat Reader (provided via JS).  However, the HTML harness insists that you have to install Adobe SVG Viewer.  This might have been a good idea 2 years ago, but these days not only is <span class=\"definition\" title=\"Adobe SVG Viewer\">ASV<\/span> no longer supported by Adobe, but all but one browser supports enough SVG these days to be useful.<\/p>\n<p>This isn't Opera or SVG's fault.  I guess it's really just an algorithmic problem?  The SVG output is similar in nature to the HTML output:  contiguous text is mangled into subsequent spans\/tspans with the same style applied.  I'm really curious if anyone has a clue why this happens - is it that the conversion engines are trying to duplicate (down to the pixel) Adobe's kerning from the PDF source and fails so it just defaults to fragments of text?  Is it that the PDF is \"optimized\" in such a way that it's not possible to determine what was a contiguous chunk of meaningful text?  Looking at Texterity's metadata.js it seems that at least they can determine a list of indexable terms from the PDF...<\/p>\n<p>I don't buy the idea that SVG is too verbose for something like this either.  <a href=\"http:\/\/zrusin.blogspot.com\/2007\/09\/git-cheat-sheet.html\">Zack Rusin's git Cheat Sheet<\/a> has fancy flow charts as well as hunks of text and was produced by Inkscape (not known for the conciseness of their SVG, shall we say?).  That files weighs in at only 161kb and that's not even compressed.  <em>And<\/em> it looks great in every browser that supports SVG (ok, I saw one glitch in Firefox).<\/p>\n<h3>Costs<\/h3>\n<p><a href=\"https:\/\/store1.adobe.com\/cfusion\/store\/index.cfm?store=OLS-US&view=ols_prod&category=\/Applications\/AcrobatPro&distributionMethod=FULL&nr=0&&sdid=BQMWR&s_kwcid=adobe%20acrobat%208%20professional&#124;1243220333\">Adobe Acrobat Profressional Version 8<\/a> = $450 USD<\/p>\n<p><a href=\"http:\/\/www.mattercast.com\/products.aspx\">matterCast's SVG Imprint<\/a> = $199.95 USD (requires .NET 1.1)<\/p>\n<p><a href=\"http:\/\/www.pdftron.com\/pdf2svg\/index.html\">PDFTron's PDF2SVG<\/a> = $549 USD (without annual maintenance contract)<\/p>\n<p><a href=\"http:\/\/freesvg.texterity.com:90\/\">Texterity's FreeSVG<\/a> = Free service if you're willing to submit your PDF to them<\/p>\n<p><a href=\"http:\/\/www.texterity.com\/artstech\/textcafe\/\">Texterity's TextCafe<\/a> = Must obtain a quote<\/p>\n<h3>Conclusion<\/h3>\n<p>Like I said at the top, I can appreciate the need for a document format that you know will render pixel-perfectly the way you want.  I can also appreciate that some authors want to make it difficult\/impossible for other people to modify their documents.  My gripes about PDF these days are mostly about using Acrobat Reader, but I think my experience this evening did indicate to me how difficult it is to convert from optimized PDF to some other format... it was an education.  I think it's fair to say that conversion to PDF is more-or-less a one-way street.  <i>[Update: See <a href=\"http:\/\/blog.codedread.com\/archives\/2008\/01\/11\/adobitrocity\/#comment-12252\">Fran\u00e7ois<\/a>'s comment below - he thinks it's the fault of the TeX->PDF conversion.  Since I don't know enough about TeX and don't have the time to really dig into the source, I'll believe him.]<\/i><\/p>\n<p>And yeah, I'm aware that my Google ads will probably be all for PDF converters.  The universe is funny that way.<\/p>\n<div class=\"ads\"><object type=\"text\/html\" width=\"468\" height=\"60\" data=\"http:\/\/www.codedread.com\/gads.php\"><\/object><\/div>\n","protected":false},"excerpt":{"rendered":"<p>I came across Laurent Gr\u00e9goire&#8217;s CVS Quick Reference Card. I needed something quick and handy to put on my thumb drive, so I figured this would do. Only problem is that my Windows box only understood Adobe&#8217;s PDF format. I&#8217;ve grown to really dislike PDF, primarily for the fact that the Adobe Acrobat Reader takes [&#8230;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,25,11],"tags":[],"class_list":["post-418","post","type-post","status-publish","format-standard","hentry","category-adobe","category-software","category-technology"],"_links":{"self":[{"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/posts\/418","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/comments?post=418"}],"version-history":[{"count":0,"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/posts\/418\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/media?parent=418"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/categories?post=418"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codedread.com\/blog\/wp-json\/wp\/v2\/tags?post=418"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}