Pandoc and Python

    I’ve talked before about using Pandoc to convert Markdown to LaTeX. I’ve been using it a lot this year, and it’s been great to write more in Markdown and code less in LaTeX. Pandoc does a really nice job converting Markdown and even characters like $, %, , etc. into the correct LaTeX syntax (\$, \%, ---, etc.). It also leaves alone any chunks of LaTeX I intersperse in my Markdown files (e.g. equations) so that they’re still there in the final LaTeX output—married with the LaTeX created from Markdown.

    So that’s all the good. For me, the bad of Pandoc is that I don’t think the LaTeX code it generates is very nice to look at. It typesets just fine, but Pandoc inserts extra line breaks by default and does funny things with itemize and enumerate environments that make reading the LaTeX code later difficult—especially annoying when I need to go back into the LaTeX to edit/add things in the future.

    Fortunately I’ve been able to address these shortcomings by learning more about Pandoc’s options and bringing Python into the mix.

    To illustrate, let’s start with some Markdown written in Ulysses, which I use a lot for writing on both macOS and iOS:

    If I copy that Markdown to my clipboard and run a basic pandoc command in terminal like:

    pbpaste | pandoc -f markdown -t latex | pbcopy

    It will create the following LaTeX (shown in Sublime Text 3):

    If I never need to look at the LaTeX code again, this isn’t so bad. It will typeset just fine. But it’s very messy. Namely:

    1. Pandoc uses a \tightlist command to reduce line spacing inside itemize environments. This is unnecessary for me because I have preset styles for bulleted lists in itemize environments.
    2. Pandoc prefers to keep each \item on its own line. This is very inefficient space-wise because you’ll always have at least two lines for each item in a bulleted list. Visually it just looks bad to me because \item represents a bullet, which will precede the text on each line in the actual PDF (like all bullet points do).
    3. By default, Pandoc truncates lines to 60-something characters—essentially assuming the text will be viewed in a text editor that doesn’t wrap lines. As you can see in the screenshot above, I have Sublime Text 3 set to wrap lines for LaTeX because it usually makes LaTeX easier to read. This is just one aspect of LaTeX that makes it more prose-like than code-like. Paragraphs and long lines should not have arbitrary line breaks.

    Fortunately, Pandoc’s developers later added an option to preserve wrapping, which solves problem 3 above:

    pbpaste | pandoc -f markdown -t latex --wrap=preserve | pbcopy

    To fix problems 1 and 2 above (as well as others), I turned to Python, a move that was very wise in hindsight because it lead me to finally develop a reliable process for executing shell commands within Python—something I can see myself using a lot in the future for all kinds of things.

    This is my current Python script:

    import subprocess
    from subprocess import Popen, PIPE, STDOUT
    import sys
    import re
    
    # Function to get system clipboard contents
    def getClipboardData():
    	p = subprocess.Popen(['pbpaste'], stdout=subprocess.PIPE)
    	retcode = p.wait()
    	data = p.stdout.read()
    	return data
    
    # Function to put data on system clipboard
    def setClipboardData(data):
    	p = subprocess.Popen(['pbcopy'], stdin=subprocess.PIPE)
    	p.stdin.write(data)
    	p.stdin.close()
    	retcode = p.wait()
    
    # Get Markdown copied to clipboard
    input_text = getClipboardData()
    
    # Popen pandoc shell command
    p = Popen(['pandoc', '-f', 'markdown', '-t', 'latex', '--wrap=preserve'], stdout=PIPE, stdin=PIPE, stderr=STDOUT)
    
    # Pass Markdown text to pandoc through stdin and get raw LaTeX from pandoc
    latex = p.communicate(input=input_text)[0]
    
    # Clean LaTeX:
    latex = re.sub(r'\\tightlist\n', r'', latex) # remove \tightlist
    latex = re.sub(r'\\item\n\s+', r'\t\\item ', latex) # join \item with its text on a single line; also put tabs in front of \item
    latex = re.sub(r'\\label.*', r'', latex) # remove all LaTeX labels
    
    setClipboardData(latex)
    

    getClipboardData() and setClipboardData(data) are functions that I stole from Macdrifter. They are really handy for working with the clipboard in macOS. Since I work with Markdown in a lot of different ways, copying it to my clipboard has been the best all purpose way of getting it into a script like this.

    The biggest innovation in this script, for me, is Popen and communicate from the subprocess module. This is really powerful stuff because it basically lets me execute shell commands and work with stdin and stdout just like I would if I was running ad hoc Terminal commands.

    In my script, communicate passes the Markdown text via stdin to the pandoc shell command, then sends the pandoc output through stdout back to Python as a string. This was a huge milestone because once I had the raw LaTeX in a text string within Python, it made it possible to use Python to clean the LaTeX any way I liked.

    The final part of the script runs several regular expression substitutions to clean the output more. Honestly, these could have been done just as easily with the basic replace method, but I have a commitment to myself to use regular expressions as often as possible to get better at them.

    Running the script results in LaTeX that I think is much easier to read and takes up a lot fewer lines:

    I’m sure there are a thousand better ways of accomplishing what this script does for me, but I’m really happy that I can continue writing more LaTeX-bound text in Markdown and know that the final LaTeX will be even easier to work with in the future. It’s also opened the door for a lot more Python automation, which on some level, is probably the entire point of investing time in automation.

    Two great iPhone scanning apps

    As great as the iPhone’s camera is for taking pictures of people, landscapes, and cappuccino milk foam, it’s also turned the iPhone into the most obvious choice for short to medium-length document scanning.

    I use ScanBot almost every day to capture everything from receipts to miscellaneous paper mail items. I keep it on a black and white filter setting, and the scan quality is incredible. The resolution of text is right on par with what I get from a desktop ScanSnap scanner, and when I OCR it later, the text layer is perfect 99.9% of the time.

    The other day, I even used ScanBot to scan an 80-page section of a textbook that I didn’t want to butcher for my ScanSnap. ScanBot recognizes page borders automatically and nearly instantly, making multi-page scanning a breeze. It’s amazing.

    Just recently I discovered TextGrabber via David Sparks. TextGrabber does just what the name says: you simply take a picture of something containing text, then select the portion of the photo containing the part you want. TextGrabber OCRs just that cropped region, allowing you to grab the text right away.

    For a lot of things, this is way more efficient than storing a photo or PDF that would have to be titled with keywords, saved, and/or OCR’ed later. Things like business card information, serial numbers, a block of text from a newspaper article, etc. TextGrabber is made by ABBYY, who makes, in my experience, the most accurate OCR software in the world. DEVONthink Pro Office ships with ABBYY’s OCR technology, and it’s by far the most thorough I’ve ever used.

    Inbox self flagellation

    Kristin Aardsma on letting go of Inbox Zero:

    Inbox Zero is an arbitrary goal; there will always be another customer email or phone call or tweet. Inbox Zero is a fruitless fight for control. It became an image that tied us to our screens, that swallowed any self-care practices we had instilled over the years. It also became a habit of treating our customers less like humans who needed support and more like screens to get rid of.

    I have a deep respect—and I would go so far as to say deep understanding—of the philosophical roots from which Merlin Mann grew Inbox Zero into a internet movement in the mid-2000s. But that was a very different epoch in internet history.

    In the mid-2000s, email was the primary form of text-based internet communication, and it was actually feasible to deal with it like a paper inbox. But so many inboxes have proliferated since then, and email has only gotten worse. Today, to have an Inbox Zero mindset is to be adversarial with one’s self.

    There is just too much to read. Too much to process. Most importantly, too little to be gained by zeroing out an inbox in any moment of earth’s rotation about its axis. In the last year, more than ever, I’ve realized the most enlightened decision is to let the dam break, walk away, and focus on work outside of open inboxes.

    Fraud culture survived the financial crisis

    Stacy Cowley writing about Wells Fargo:

    “Everybody knew there was fraud going on, and the people trying to flag it were the ones who got in trouble,” said Ricky M. Hansen Jr., a former branch manager in Scottsdale, Ariz., who was fired after contacting both human resources and the ethics hotline about illegal accounts he had seen being opened.

    This, along with her other compilation of personal accounts from Wells Fargo employees, is telling. Wells Fargo’s crimes (why is the word crime still not used more in the context of banks?) are some of the most significantly exposed since the financial crisis.

    I can’t help but wonder if all the time spent on massive pieces of bank regulation like Dodd-Frank was misguided. Regulation focuses on metrics, audit procedures, and corporate governance policies that are no match for real human behavior in an environment that rewards deception.

    I really feel like regulation needs to focus more on human behavior and group psychology. After all, the greatest systemic risks arise once the herd’s mentality shifts so that ethical behavior becomes the exception to fraudulent behavior.

    Though in a different context, this is exactly what brought the financial system to its knees in 2008. The few people that understood and acted in accordance with the truth became the fools.

    Tables are hard unless you own TableFlip

    I’ve been a fan of Christian Tietze for a while. He was one of the original people to fork Notational Velocity in a more Markdown-centric direction—a burst of evolution in 2010 that later culminated in Brett Terpstra and David Halter’s nvALT.

    Christian’s latest contribution to the world of plain text writing is TableFlip, a wonderful Mac app that solves a problem that’s as old as HTML: making tables for the web sucks.

    Even though Fletcher Penney’s MultiMarkdown made creating tables in a Markdown-kind-of-way possible—and significantly easier than hand-coding HTML tables—creating a table in plain text is still a visually challenging task. For very small tables (e.g. 2x2), it’s fairly straightforward. But for larger tables, the column alignment becomes cumbersome.

    Before TableFlip, I usually created MultiMarkdown tables with a spreadsheet workflow. This worked well, but required a lot of ad hoc spreadsheet formula writing and also meant I had to store spreadsheet files indefinitely for any tables that I might want to edit later.

    TableFlip is the best of the plain text and spreadsheet-like world in one. It provides an intuitive spreadsheet-like tabular interface, which makes creating tables from scratch really easy.

    When you’re ready to plant the table in your plain text file, you can simply copy it as MultiMarkdown to your clipboard. As a bonus, it comes out beautifully aligned in plain text as well.

    You can go the other way, too. TableFlip can read an existing MultiMarkdown table into its tabular UI. You can even copy an existing MultiMarkdown table and create a new table in TableFlip from the clipboard.

    For me, this two-way feature is brilliantly simple and effective. It shows that TableFlip is made for anyone who knows all the practical difficulties of working with tables.

    I’m excited for the future of TableFlip. Christian is planning to add even more features, including LaTeX export. I really think TableFlip is a must-have for anyone that routinely creates tables as a part of any kind of plain text writing workflow. Like I said before, making tables sucks. But now I have to modify that statement:

    Making tables sucks unless you own TableFlip.

    At $18.99, it’s a no-brainer, but you can get it even cheaper than that through October 31.

    Diverting screenshots from the Desktop

    Downloads is by far the busiest folder on my Mac. Almost all files to Downloads before other destinations. Hazel keeps a watchful eye on Downloads and does all sorts of file moving magic, never leaving anything there for more than 24 hours.

    For some silly reason I’ve always accepted that screen shots in OS X (now macOS) are saved directly to the Desktop. Today I finally learned that screen shots, too, can go straight to Downloads (or any other folder). No more F11‘ing to view my Desktop. Now, like all other download-esque files, I can grab recent screen shots from the fan view of my Downloads dock shortcut.

    Survivor

    Modern survival is antithetical to everything evolution programed us for. Today we have to:

    1. Eat much less than is available
    2. Move much more than we have to
    3. Take many more daily risks than we have to

    Today, complacency is the tiger rustling in the bushes. Vulnerability is the key to longevity.

    Applauding the epitome

    David Hansson on not hating modern Apple:

    There’s just something deeply inspiring about seeing what companies, teams, and people can accomplish at the peak of their ability. Especially when it’s happening not just for a single season, but as a reign of excellence.

    Post-1997 Apple, Alabama football under Nick Saban, even the US economy post-World War II: non-fans love to make digs at a perennial winner. But I agree with David. It’s OK to admire any person or entity that can sustain peak success.

    Survival and hunger will naturally motivate any organism. But it’s uniquely human to redefine the possible just for the sake of it. It’s the loneliest of places to work—at the top. But those that do elevate us all.

    Pros and cons of the wireless future

    The ear bud status quo

    • Cons
      • Persistently tangled ear bud wires
      • Wires catching on things while walking
    • Pros
      • No battery life issue
      • Nothing additional to charge
      • Cheap (they’re in every drawer of your home by now)

    The wireless AirPod future

    • Cons
      • Another wire on the night stand (another battery to charge)
      • No music when the battery dies
      • Additional pairing step for use with Mac, iPad, etc.
      • Easier to drop and lose
      • More expensive to buy/replace
    • Pros
      • Fewer tangled wire issues
      • Only plug in a wire once per day instead of throughout day
      • More practical to use in a car (but why?)

    Conclusion: This is going to take some courage.

    Decoupling the original vision

    Efficient creative work at the edge of the adjacent possible is basically:

    1. Envisioning your dream home fully built
    2. Finding the most elemental materials to build it
    3. Building a completely different and better home than you ever could have imagined

    Inefficient creative work is the failure to release the original vision—the false notion that you can know in advance what lies beyond the been-done-before.

    It was the glass-half-emptiest of times

    The Guardian:

    In an ironic vicious circle,… the despair that people feel – about developments including the rise of Trump – is the same kind of thing that fuels the rise of Trump. (The campaign to leave the European Union seemed similarly focused on sweeping away the status quo and hoping for the best.) The sense that the world is an increasingly terrible place, whether or not it really is, is itself a phenomenon with real effects that we can’t afford to ignore.

    Earlier in the article:

    The trouble is that, when it comes to getting an accurate grip on things, the modern media and the human brain are both strikingly poorly designed.

    Perhaps making the world better has to start with the realization that it’s far, far better than we perceive. No easy feat when our social information systems and senses prioritize negativity above all else.

    A penny for your apps

    David Sparks made a sobering observation on the closing down of the note taking app Vesper:

    If the dream team couldn’t make it work, who can?

    In my mind, the challenge faced by productivity app developers can be traced to the Jekyll and Hyde personas of a capitalist citizen:

    1. Joe Seller wants a monopoly market
    2. Joe Buyer wants a commodity market

    In most markets, profits spike initially then dissipate as #1 moves to #2. With modern software, this transition happens at the speed of thought. It doesn’t matter how prolific an app team might be. Never before have so many people known how to code. Never before have software products been so accessible to so many.

    The decentralization of software development and distribution makes establishing monopolies or seller-controlled markets virtually impossible. Software is an extreme opposite to a product category like pharmaceuticals. An app developer can’t pull off an EpiPen. As soon as a great app appears, clones abound.

    I could go into the App Store right now and find 20 great note taking apps whose pros and cons cancel out. Any one of them would be great for me, but I can’t use them all.

    Pricing apps as non-digital goods is hopeless in the long run. If you’ve read the Internet at all, you’ve seen what I call “the latte rationalization,” which goes something like this:

    If you spend $5 a day on coffee, why can’t you spend $5 one time on an app that benefits you every day?

    This is a great example of an argument that holds up from a rational perspective but fails spectacularly from a behavioral perspective.

    Most people who buy apps do so in response to the pleasurable feeling of experiencing something new. It’s a sensation—a very fleeting one. This is extremely different from the recurring pleasure people get from regularly consuming caffeine and sugar—substances that please a much more primitive part of the brain than cerebral software-based productivity can ever hope to in the current version of the human mind.

    As I’ve written here before, software is simply a form of encoded human information. People are willing to purchase information, but they will only spend so much, and they will only purchase the same information so many times. There is no pleasure center in the brain for redundancy.

    A service model might be the answer, but only if buyers believe the service is unique and essential. That is, it must provide new and useful information on an ongoing basis. For most successful business models, this means the real product is not the app, but the thoughts that pass through the app.

    We may step back one day and realize that the software economy was humankind’s first (inadvertent) success at valuing knowledge. For now, we’re learning the hard way that few thoughts are original.

    Procrastination is for the over-serious

    “Just start” is the generic advice you usually hear from someone trying to help a procrastinator. Another: “break it into smaller pieces.” Seems logical, but not very interesting.

    In my personal experience, the best advice is: “make it fun.”

    No matter your age, your mind still wants to have a good time. The best products are by-products of someone having good time—enjoying the process. So change the setting, spend some money, or just goof-off until something begins to materialize. Why not?

    DEVONthink To Go 2

    As a quick follow-up to my “state of my content address,” DEVONthink To Go 2 is now in the App Store. And it’s wonderful. The best all-media content manager for the Mac now has the iOS companion app it deserves.

    DTTG isn’t just a bandaid for my Evernote problems. It’s vastly superior for my uses. There are multiple options for syncing data, everything is encrypted in transit and storage, and the sync granularity is unrivaled. I can finally feel comfortable syncing sensitive documents like tax forms, and I can maintain massive archives of non-synced data on my Mac and local network without having them blow up the size of the databases on my iPhone. A place for everything.

    Ulysses batch export

    I agree with Federico Viticci that Ulysses is a great plain text writing app, and version 2.6 makes it even better. I bought both the Mac and iOS versions earlier this year, and began using it for both professional and personal writing. It provides a consistent, tight writing experience on both the Mac, iPad, and iPhone.

    But the more I wrote in Ulysses, the more this closing thought from Jason Snell’s early 2015 review of Ulysses began echoing in my head:

    Futureproofing is a big worry for me. And that’s why it might be dumb for me to do all my writing in Ulysses — sure, it’s possible for me to get my writing out if I stop using Ulysses. But is it practical?

    So eventually I stopped doing much personal writing in Ulysses because most of my personal writing is highly fragmented—bits and pieces of thoughts that sometimes sit idle for years before coalescing with other things. On one hand, this makes the Ulysses “sheet” concept perfect for the way I write, but it gets harder to fight the constant worry of always having access to those thoughts. That’s why I love open plain text storage of notes—I never have to worry about solving a tricky batch export from a proprietary system.

    Since the time I started using Ulysses, I assumed there was no batch export of sheets because there’s nothing in the UI to indicate that, and Jason’s comments lead me to believe sheets could only be exported one at a time (perhaps that was true in 2015).

    Fortunately you can batch export any group of Ulysses sheets to plain text by simply dragging them to an external folder. You have to add the external folder to Ulysses first. (You can’t just drag into the Finder because you’ll get unreadable .ulysses files.)

    Simple, effective, and future-proof enough for me.

    2016 is a great year for writing apps. Scrivener is also better than ever with its awesome new iOS app. There’s no reason not to try them all. Whatever it takes to keep writing—do that.