brhfl.com

Semaphore and sips redux

In this article, I do sem -j +5, allowing 5 jobs to run at a time. -j can be used with integers, percents, and +/– values such that one can say -j +0 -j -1 to run one fewer job than their available cores (+0), etc.

I was going to simply edit my last post, but this might warrant its own, as it’s really more about sem and parallel than it is sips. parallel’s manpage describes it as ‘a shell tool for executing jobs in parallel using one or more computers’. It’s kind of a better version of xargs, and it is super powerful. The manpage starts early with a recommendation to watch a series of tutorials on YouTube and continues on to example after example after example. It’s intense.

In my previous post, I suggested using sem for easy parallel execution of sips conversions. sem is really just an alias for parallel --semaphore, described by its manpage (yes, it gets its own manpage) as a ‘counting semaphore [that] simply waits for a semaphore to become available and then runs the command given’. It’s a convenient and fairly accessible way to parallelize tasks. Backing up for a second, it does have its own manpage, which focuses on some of the specifics about how it queues things up, how it waits to execute tasks, etc. It does this using toilet metaphors, which is a whole other conversation, but for the most part it’s fairly clear, and it’s what I tend to reference when I’m figuring something out using sem.

In my last post (and in years of converting things this way), I had to decide between automating the cleanup/rm process or parallelizing the sips calls. The problem is, if you do this:

for i in ./*.tif; sem -j +5 sips -s format png "$i" --out "${i/.tif/.png}" && rm "$i"

…the parallelism gets all thrown off. sem executes, cues up sips, presumably exits 0, and then rm destroys the file before sem even gets the chance to spawn sips. None of the files exist, and sips has nothing to convert. The sem manpage doesn’t really address chaining commands in this manner, presumably it would be too difficult to fit into a toilet metaphor. But it occurred to me that I might come up with the answer if I just looked through enough of the examples in the parallel manpage (worth noting that a lot of the parallel syntax is specific to not being run in semaphore mode). The solution is facepalmingly simple: wrap the && in double quotes:

for i in ./*.tif; sem -j +5 sips -s format png "$i" --out "${i/.tif/.png}" "&&" rm "$i"

…which works a charm. We could take this even further and feed the PNGs directly into optipng:

for i in ./*.tif; sem -j +5 sips -s format png "$i" --out "${i/.tif/.png}" "&&" rm "$i" "&&" optipng "${i/.tif/.png}"

…or potentially adding optipng to the sem queue instead:

for i in ./*.tif; sem -j +5 sips -s format png "$i" --out "${i/.tif/.png}" "&&" rm "$i" "&&" sem -j +5 optipng "${i/.tif/.png}"

…I’m really not sure which is better (and I don’t think time will help me since sem technically exits pretty quickly).