One more wibble about Typst: I used it for the latest Path of Cunning fanzine with @jgd. I would say “we used it” but our workflow tends to be that we both discuss the next thing to do and then I poke the layout engine until it produces the right sort of output.
But my goodness it is so much more pleasant an experience than doing layout in LibreOffice, which was how we did issues 1-5. It’s a combination of small things (can make the table of contents work the way we want it to and update automatically, can build things like the next issue deadline date automatically from the release date) and big things (an image stays where you put it and doesn’t suddenly decide to jump to the opposite column or the previous page just because you changed something later in the document). More generally, because everything except the images is plain text, once a thing is done it is done, so rather than a note to future me saying “click on this, then that, then that” I can just write a wrapper function that will do all those things the way we want them done (and if we ever make a big layout change, like a different header font, I can update that once and it’ll be right everywhere).
And thanks to pandoc we can still accept submissions in LibreOffice, MS Word, etc., and convert them as part of the layout process.
Sounds great. Scripted layouts FTW.
It reminded me of my PDF film festival ticket hack. My film festival habits tend to be “completely ignore it” or “go to literally dozens of films”. The first year that they did online sales with PDF tickets, I discovered that the PDFs were “one ticket per page”, with a smallish proportion of the page containing “actual ticket” and the majority containing “boilerplate text which was repeated on every other page too”.
I was unwilling to rely upon my phone and equally unwilling to waste dozens of pages printing them, so I manually cropped, cut, and pasted tickets in bulk onto a small number of new pages, and printed those. That was such an annoying process that the next year I spent the time figuring out how to automate it.
The PDF layout that they use has remained the same in all the years since, and so I’ve barely had to touch the script in the interim. Each year I book tickets, I just throw the massive multi-page PDF at my script, and it gives me back a PDF with a 2x6 grid of tickets on each page, and I print that instead.
Sharing it below, less because I think other NZIFF-goers will see it here, and more in case the general framework is helpful for anyone with a similar need!
#!/bin/sh
if [ -z "$1" ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
cat <<EOF
Usage:
$(basename "$0") file.pdf [dimensions]
$(basename "$0") - [dimensions]
Process file.pdf to produce a smaller output pdf with the tickets,
arranged in a grid format. The grid defaults to 2x6, but any "XxY"
value may be supplied as the dimensions argument.
file.pdf is expected to be a collection of Ticketek tickets; the
parameters were designed for New Zealand International Film Festival
tickets, but may apply to Ticketek tickets in general.
The file argument may be "-" to read the file from standard input.
EOF
exit 0
fi
# Check dependencies.
aptget=
which evince >/dev/null \
|| aptget="${aptget} evince"
which pdftoppm >/dev/null \
|| aptget="${aptget} poppler-utils"
which montage >/dev/null && which convert >/dev/null \
|| aptget="${aptget} imagemagick"
if [ -n "${aptget}" ]; then
printf "Required utilities not found. Install the following:\n" >&2
printf "sudo apt-get install%s\n" "${aptget}" >&2
exit 1
fi
tmpdir=$(mktemp -d -t ticketek.XXXXXXXXXX)
if [ $? -ne 0 ]; then
printf %s\\n "Unable to create temporary directory." >&2
exit 1
fi
# Dimensions of the resulting grid of tickets.
xy=${2:-2x6}
# Establish the input and output files.
file=$1
if [ "${file}" = "-" ]; then
# The output file will be written to the temporary directory.
pdf="grid-${xy}.pdf"
else
if [ ! -f "${file}" ]; then
printf "No such file: %s\n" "${file}" >&2
exit 1
fi
file=$(readlink -f "${file}")
# Output to a derivative of the original filename, indicating dimensions.
pdf="${file%.pdf}-${xy}.pdf"
fi
# Convert PDF to cropped image files
# Tile images 2x6 per page
# Create PDF of all pages
# Open in evince PDF viewer
################################################################################
cd "${tmpdir}" \
&& pdftoppm -gray -png -W 1080 -H 450 -x 50 -y 340 "${file}" ticket \
&& montage ticket-[0-9]*.png -geometry 1080x450 -tile "${xy}" page.png \
&& convert page*.png "${pdf}" \
&& evince "${pdf}"
It’s weird. Many people clearly think of a PDF as this monolithic Thing that can’t be changed, when in fact it’s more like a container format for chunks of text and image — and they can be extracted again.
Hooray! I’ve recently started communicating the inefficiencies of GUI/WUI based workflows (and workflow documentation) to my management. If a process is to be repeatable, it should be extremely, easily repeatable; to me, that means “automation”.
Adobe’s biggest victory, then. I know I’ve encountered dozens of people who assume that PDF content is immutable. If I had fewer scruples, I would exploit that assumption to great profit, I suppose.
I was afraid that this would go slower than with a GUI program, but it did not. The “Roger tinkers with layout” intervals were shorter, much happier for Roger, and produced repeatable reliable output. It’s great! .
I have to admit that it didn’t occur to me to try to extract images directly from the PDF. I’m just using tools to render and crop each page at known positions, saving the resulting images, and re-assembling them into a new document. I wonder whether accessing the ticket images directly would have remained as stable over so many years as what I actually did? The visual output has always been consistent for my purposes, but I’ve no idea whether the PDF internals have done the same. Something for me to look into the next time I have a need to do something with PDFs, though.
What are your preferred tools for manipulating PDF data?
poppler-utils. In particular pdfimages
, which you can tell to dump all the images from a range of pages or from the whole thing. The thing about that versus screenshotting is that you get the original image in the resolution at which they embedded it. However, if they used a transparency mask or various other sorts of manipulation those won’t come out in an obviously reproducible way. (When building monster pawn images for my GURPS Pathfinder game a while back, I ended up looking for the two largest bitmaps on a page and then using a heuristic to see which one was the mask.)
Also the Perl library PDF::API2
which is probably meant mostly for creating PDFs but has some useful tools for mucking about with them.
pdftk
is a good general utility.
cpdf
(non-commercial use only) has a “strip out images” which I often use before printing something.
And the ultimate interactive tool is good old inkscape; pdfimages does a great job with bitmaps, but if a rulebook has vector images it’ll generally recover them in a clean way. (E.g. so that I can include the actual icon in the mini-rulebook I’m writig.)
Good stuff. The poppler-utils
package provided the pdftoppm
utility that I used in my script. I was aware of poppler at the time on account of the pdf-tools
Emacs package (for reading, searching, annotating PDFs) which uses poppler behind the scenes to do the heavy lifting.