Live Mardown to PDF and HTML Rendering with Vim

12 October 2021

Linux - Vim - Python

Abstract

The hardest part about transitioning to Vim for me was losing the interactive rendering for Markdown and LaTeX. It wasn’t something that was a dealbreaker, since most interactive rendering I do can be covered by the development tools of the framework or static site generator I’m working with, but even these are going to leave out single file markdown rendering. Since anything that makes its way onto GitHub is going to have a README.md, this is the one workflow that is present in almost everything I do.

Overview

There were two IDEs whose functionality I had enjoyed and wanted to emulate. VS Code allowed for markdown rendering out of the box with a simple keyboard shortcut. TeXMaker provided pdf rendering on demand. Choosing to only preview Markdown live in a browser would have allowed me to just send the name of the current buffer over to grip. I still ended up doing this, since having a live preview of how Markdown renders on GitHub is super helpful when writing READMEs. To get live PDF rendering, I turned to Python to make a simple wrapper that would run the given pandoc command every time a change is observed in the source file that is passed along to pandoc. Then it came down to whatever method I wanted to use to asynchronously call either grip or my python pandoc wrapper on the current buffer.

Live Markdown (github flavored) to HTML Rendering

As mentioned earlier, I chose to turn to grip for this. The main benefit of grip is that it is designed to render as close as it can to how it would on GitHub. Since GitHub primarily the only place where I upload raw markdown, any other markdown I am rendering is likely being rendered through some larger static site generator like Sphinx or Jekyll, and the live preview of those files can be easily handled through those ecosystems.

The way grip is able to match GitHub’s rendering is by using GitHub’s API directly. This runs into the problem of only being accessible online, and excessive use without providing authentication is going cause rate limiting errors. I’ve yet to run into any issues with rate limiting myself, and since it’s more than likely that this GitHub is intended to be eventually rendered by GitHub, these are non issues for me. If I was looking for something locally, I’d probably have reached for something like markdown-it-py and built it into what I ended up building for pdf rendering.

Live Markdown (pandoc) to PDF Rendering

Since I was focused on single document rendering, I was able to write something simple enough in Python to satisfy my needs. The two main parts that the Python script had to do were:

  1. Watch a specific file for changes.
  2. Run a predefined pandoc command with that file as input whenever a change occurs.

The only other part I wanted it to satisfy was simplicity. I didn’t want to have to define my own interface that I would have to manage to maintain compatibility with pandoc.

Usage

To satisfy these conditions, I wrote a simple Python package called live-pandoc. It can be installed with a simple:

$ pip install live-pandoc

It was designed to work identically to a pandoc command where the input file is supplied as the first argument. For example, if you were to render a README to pdf via pandoc as such:

$ pandoc README.md -o README.pdf

The live-pandoc equivalent would simply be:

$ live-pandoc README.md -o README.pdf

live-pandoc would then watch for changes in README.md, and every time they occur, would run pandoc README.md -o README.pdf.

How it Works

live-pandoc is extremely simple, and only depends on watchdog to simplify the file watching process. The script relies on two simple classes, one to act as the main observer, and the other to handle the modification events to our markdown file that the observer observes.

The Handler

For the handler, we subclass FileSystemEventHandler from watchdog.events. This handler allows us to override the following methods to define our own behavior, dispatch, on_any_event, on_created, on_deleted, on_modified, and on_moved. In our case, we only care about what happens when the file is modified, so we’re only going to override the on_modified method.

This means that we have to set up this class to call out to run the desired pandoc command each time a change happens to the file that we care about. Let’s first make sure that we can initialize the class with all the information that we’re going to need to do this.

class PandocRenderHandler(FileSystemEventHandler):

    DEFAULT_OUTFILE = 'out.pdf'

    def __init__(self, src_file, *pandoc_args):
        super(PandocRenderHandler, self).__init__()
        self._pandoc_thread_timer = None
        self.src_file = os.path.abspath(src_file)
        if pandoc_args:
            self.pandoc_cmd = ('pandoc', self.src_file) + pandoc_args
        else:
            self.pandoc_cmd = ('pandoc', self.src_file, '-o', PandocRenderHandler.DEFAULT_OUTFILE)

This allows us to initialize the handler with just an input file to watch, src_file, and any other arguments that we would want to pass along to pandoc. It also allows us to initialize a handler without any pandoc arguments, in which we’re going to choose to render to a file called out.pdf in the current working directory.

From here we’re going to define the method that allows us to render the document with pandoc.

class PandocRenderHandler(FileSystemEventHandler):
	
    # ...

    def pandoc_render(self):
        print("Running: {}".format(" ".join(self.pandoc_cmd)))
        process = subprocess.run(
            self.pandoc_cmd,
            stderr=subprocess.PIPE,
            stdout=subprocess.PIPE,
            universal_newlines=True
        )
        if process.returncode:
            print("WARNING: the following error was encoured from pandoc:")
            print(process.stderr)
        self._pandoc_thread_timer = None

This simply tries to run the pandoc command through subprocess.run. If we get an error, we’ll intercept it and throw it to stdout via a print call in Python so if there was an error on the pandoc side, that will be made raised in the terminal. Once that’s complete, we’ll set an instance variable called _pandoc_thread_timer to None.

This _pandoc_thread_timer is there to handle the fact that its extremely common to get more than one FileModifiedEvent from what you want to treat as a single modification. To solve this, we’ll dispatch our rendering method with a threading.Timer thread. This allows us to set a slight delay to when we call the pandoc_render method, and that if we get another FileModifiedEvent in the time frame (currently \(\frac{1}{4}\) second), we’ll cancel that current timer and start a new one to make sure that we’ve batched all the updates into a single render.

class PandocRenderHandler(FileSystemEventHandler):

    # ...

    def start_render_timer(self):
        self._pandoc_thread_timer = threading.Timer(0.25, self.pandoc_render)
        self._pandoc_thread_timer.start()

    def restart_render_timer(self):
        if self._pandoc_thread_timer:
            self._pandoc_thread_timer.cancel()
        self.start_render_timer()

Now we can override the on_modified method and check if the event was a FileModifiedEvent on the src_file that we decided to track on the handler. If there is timer currently running, we’ll call out to our restart_render_timer, and if there isn’t we’ll call to start_render_timer, which will setup a thread to start the render in 0.25 seconds.

class PandocRenderHandler(FileSystemEventHandler):

    # ...

    def on_modified(self, event):
        if isinstance(event, FileModifiedEvent) and event.src_path == self.src_file:
            if self._pandoc_thread_timer:
                self.restart_render_timer()
            else:
                self.start_render_timer()

The main limitation to this implementation is that we’re only calling out to pandoc_render, when our single file has changed. This method doesn’t track if we’re move or delete what we had called our src_file, and if you’re rendering markdown that requires multiple files, this won’t watch for changes to those files.

These changes could be addressed by overriding the on_move and on_delete methods, as well as watching for changes of any markdown (or any other document) file in the on_modified event as opposed to just the input file. I built this to be called with respect to a single vim buffer, so that’s why I’ve restricted it to only watch that single file. I didn’t want to get into the weeds, since any multi-file markdown rendering I do is normally handled through some other framework, but if your workflow is entirely vim, markdown/latex, and pandoc, changing how on_modified filters out render events should allow you to easily extend this to that use as well.

The Observer

Most of the hard work is already done. We can rely on the base Observer from watchdog.observers to handle watching a directory of choice. All we need to do is call the schedule method on the initialized observer with our PandocRenderHandler and the directory that we want it observer to watch before starting it and we’re good.

I’ve wrapped this into a class, only to allow for stopping the observer either via a stop method defined on the class, or when the system detects a KeyboardInterrupt.

class MarkdownWatcher:

    def __init__(self, src_file, *pandoc_args):
        self._kill = False
        if not os.path.isfile(src_file):
            raise FileNotFoundError("The file {} does not appear to exist.".format(src_file))
        file_dir = os.path.dirname(os.path.abspath(src_file))
        self.observer = Observer()
        self.handler = PandocRenderHandler(src_file, *pandoc_args)
        self.observer.schedule(self.handler, file_dir)

    def run(self):
        self.observer.start()
        try:
            while True:
                if self._kill:
                    self.observer.stop()
                    break
                time.sleep(0.1)
        except KeyboardInterrupt:
            self.observer.stop()
        self.observer.join()

    def stop(self):
        self._kill = True

Running as a Script

With these pieces, using it as a simple command line script was as simple as the following:

def main():
    if len(sys.argv) == 1:
        print("Error: No arguments given.")
        print()
        usage()
        sys.exit(1)
    elif len(sys.argv) == 2:
        file = sys.argv[1]
        if file in ('-h', '--help', 'help'):
            usage()
            sys.exit(0)
        pandoc_args = ()
    else:
        file = sys.argv[1]
        pandoc_args = sys.argv[2:]

    watcher = MarkdownWatcher(file, *pandoc_args)
    watcher.run()

if __name__ == '__main__':
   main()

This makes a very simple assumption that the 1st argument provided is the file, (if it’s not -h, --help, or help, and that any other arguments should be passed along to pandoc. It then initializes the watcher and starts it by calling the run method.

Integrating with Vim

Now that we have ways to live render Markdown to HTML and PDF, we need to find a way to hook these into Vim. The goal was that if I was currently editing a markdown file, I could simply hit an F key to start rendering into either live HTML or PDF.

Picking a Runner

I choose to utilize a plugin called AsyncRun, simply because I like that it populates the command output into the QuickFix window. There are other asynchronous job runners that this could be substituted for depending on your needs, NeoMake comes to mind, so does tpope’s vim-dispatch and apparently in neovim, this can be handled directly with Lua; see this post. The goal is to just have the ability to call out to live-pandoc or grip when we press a key, and then to be able to kill that process with another key.

I made sure that AsyncRun was installed in my ~/.vimrc, running :PlugInstall after modifying.

call plug#begin("~/.vim/plugged")
# Other plugins 
Plug 'skywind3000/asyncrun.vim'
# Rest of plugins
call plug#end()

Customizing our Runners

Then I placed any custom configuration for the plugin in ~/.vim/after/plugin/asyncrun.vim, that way if I choose to remove the AsyncRun plugin, I don’t have to worry about hunting through my .vimrc to find anywhere where I left configuration for it to have a functional .vimrc again, since this file will only be sourced after the AsyncRunplugin is loaded. While I could just throw all of the configuration after the call plug#end(), I’ve found that organizing my configuration like this has made things cleaner, and much easier to manage.

In the ~/.vim/after/plugin/asyncrun.vim file, I have configuration for custom runners for Python, Vimscript, JavaScript, and bash/shell scripts in addition to what I have defined for Markdown. I also have defined functions to track the status of a job started from that buffer, and return an icon that can be displayed in my status line. I’m not going to dive into the details of these things, but that’s what the g:job_presenting variable is taking care of, if it’s not clear why I’ve chosen to track it here.

The first thing I did was bind F3 to toggle the QuickFix window that displays the output of AsyncRun, and I bound F4 to kill any process that AsyncRun was currently running.

" Toggle quickfix for asyncrun with <F3>
noremap <silent> <F3> :call asyncrun#quickfix_toggle(8)<CR>
" Kill running async job with <F4>
noremap <F4> :call AsyncKill()<CR>

After that I defined the functions that would call out to live-pandoc/grip based on my current buffer.

The call to grip is easy, since all we need to give to grip is the name of the buffer, which AyncRun let’s us access as $(VIM_FILEPATH), and then calling it with the -b flag means that the result will open in the default browser window.

The call to live-pandoc will run the first render with plain pandoc, saving the output as the same name as the current markdown file, but replacing the .md extension with a .pdf. It then calls out to xdg-open on the newly rendered PDF to open it in what is configured as the default PDF viewer, before finally calling out to live-pandoc, with the same arguments we called to the initial pandoc call, to watch for changes. This way when live-pandoc, rewrites the rendered PDF, it will already be open, and the viewer should reload the PDF when it detects any changes.

" Render markdown to Github HTML via grip
function! StartMarkdownHTMLPreview() abort
	let g:job_presenting = 1
	AsyncRun -raw grip -b "$(VIM_FILEPATH)"
endfunction

"Render markdown to PDF via pandoc
function! StartMarkdownPDFPreview() abort
	let g:job_presenting = 1
	AsyncRun -raw pandoc "$(VIM_FILEPATH)" -o %:p:r.pdf; xdg-open %:p:r.pdf; live-pandoc "$(VIM_FILEPATH)" -o %:p:r.pdf
endfunction

Finally, we’ll set it up so that if we load a markdown file, we’ll bind F5 to render with grip to HTML, and F6 to render with live-pandoc to PDF in that buffer.

augroup MarkdownRunners
  autocmd!
  autocmd Filetype markdown map <silent> <buffer> <F5> <ESC>:w<CR>:call StartMarkdownHTMLPreview()<CR>
  autocmd Filetype markdown map <silent> <buffer> <F6> <ESC>:w<CR>:call StartMarkdownPDFPreview()<CR>
augroup END

5