Scribus is a pretty good free desktop publishing package. Aside from the cost, there are other good reasons to use it, as the files it generates aren’t locked into some proprietary unreadable format.
I wanted to convert a book from Scribus into an ePub file, which should be easy. ePubs are basically HTML with markup, and Scribus text boxes are text with markup. But it turns out that it’s not. I don’t care about any of the markup, or page headers/footers, or graphics. All I want is the body text of the document saved out, clearly marked up with the paragraph styles and character styles so that I can turn it into clean HTML for the ePub.
And there we find the problem with Scribus. You can open up each block of text, and see all the styles clearly. But it can only export as ASCII text. This seems a big deficiency, and not something that should be hard to do, but as with all free software projects people work on the things that they need, so I can’t blame anyone for the lack of feature. There seems to have been work on ePub exporting for a while, but nothing seems easy to use as yet. Maybe I didn’t look hard enough, or there’s a plugin, or a newer version, but it wasn’t obvious to me. I don’t want full conversion to ePub—that sounds like a hard problem, with graphics and all the other boxes and objects on the page converted—just the body text with the styles.
So I hacked together a quick Perl script to extract the text and styles from the Scribus document, and save as some sort of HTML that can be used in Sigil for ePub generation. Here’s the beauty of free software. If the document had been in a proprietary format I could not have converted it anything like as easily. Scribus documents are saved as compressed XML, so it just means uncompressing it and then looking for the useful bits.
I can’t guarantee this script will work for anyone else, and it certainly won’t work as-is—the style names and HTML tags need to be set to match your own document—but if it’s of any help to anyone then this article was worth the time to write.
Get the script at the GitHub gist here.
Usage is simple. First edit the script and set up the HTML tags for the style names in your Scribus document. Run it on the Scribus .sla file (in my experience it doesn’t even need to be ungzipped first) to output HTML:
$ ./scribus2epub.pl document.sla.gz > output.html
You can then edit the HTML file to extract the bits you need and clean it up. It should be fine to import it directly into Sigil and do that work there. As a bonus, if the %styles hash adds <hr class=”sigil_split_marker”/> in the appropriate places, you can then use the “Split at Markers” feature in Sigil to break the file up into separate chapters, too.