How to concatenate ODT files

(or anything else that libreoffice can read)

I recently found myself wanting to concatenate two ODT files. Now obviously, this is a very simple task with LibreOffice or OpenOffice (open them up —> copy from one —> paste to another), but being a massive nerd I wanted to see if this could be automated for batches of files. Well,

soffice --headless --convert-to html file1.odt file2.odt
pandoc file1.html file2.html -o output.odt

This requires LibreOffice (or OpenOffice) to convert the ODT files to HTML, and then pandoc (I cannot over-emphasise how great this tool is) to concatenate them properly and convert back to ODT. I wrote up a quick bash script to to it easily: save it somewhere in your $PATH as concat-odt and invoke it with concat-odt file1.odt file2.odt -o output.odt. This should work with any file format that LibreOffice can read, and will always output an ODT file, no matter the input:

#! /usr/bin/env bash
until [[ "$1" = '' ]]; do
  case "$1" in
    -o )  shift
      out="$1"
      ;;
    *  )  inputs+=("$1")
      ;;
  esac
  shift
done

soffice --headless --convert-to html --outdir /tmp "${inputs[@]}"
pandoc /tmp/"${inputs[@]%.*}".html -t odt -o "$out"

exit 0

Caveats

This method has caused me some trouble. Any blank lines in the document will be doubled, so

foo

bar

will become

foo


bar

I’m sure this is a solvable problem (and I’ll update this post if I ever get around to figuring out how to fix it), but for now it’s annoying. I think I may need to just run a regex through the intermediate HTML files and remove all <br>s, which seem to be placed wherever there’s an empty line by LibreOffice. That would cause it’s own problems, of course.

This method will also revert the documents to pandoc’s default styling. It will retain italics and bold and so on. This is, I think, a less easily solved problem; you could mitigate against it by using a custom template, if you don’t like 12 pt Times New Roman.

This method requires the creation of intermediate HTML files – in my script I brushed them out of the way and into /tmp – which isn’t a really big deal, but is something I personally find annoying. Oh well.

EDIT 17/07/2014: According to one of the comments, this will also break on anything containing tables. I wouldn’t exactly call that surprising – this whole thing was basically just me playing with a new toy, and I’m sure there are proper tools out there that can accomplish this task more cleanly and… well, better. So. Caveat emptor, I suppose. I’m sure there are dozens of other problems with this approach that I haven’t mentioned here (which I would like to hear about, but have no intention (or ability, really) of fixing).

Advertisements

Tags: , , ,

3 responses to “How to concatenate ODT files”

  1. Karl Schmidt says :

    Your bash scrip is broken and even after fixing it it fails if there are tables.

    Like

  2. Clay Claiborne says :

    This script won’t run as is. A done is missing.

    Like

  3. Ade Malsasa Akbar says :

    To evilsoup,

    Thank you. This pandoc of you is a big clue.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: