Jens Nöckel's Homepage

Computer notes home

From LaTeX to the Web: some simple examples with tex4ht

Keep it simple

LaTeX lets you create PDF and Postscript files directly, but these formats are historically more print-media oriented. I'm saying "historically", because one can't really say that about the modern PDF specification any more. See for example my page on movies in PDF documents for a case in point.

Nevertheless, the format of choice for web-based communication is currently HTML, and increasingly the more general XML format. For about a decade, people have been more or less patiently waiting to get the MathML standard, a type of XML, implemented in common web browsers, and that's finally becoming a reality.

So the question is: how do you get all your LaTeX source files, which produce such beautiful PDF, to make beautiful HTML, too? Here I'll focus on one solution, based on tex4ht [note July 2009: the creator of tex4ht is Eitan Gurari. Sadly, he died last month.] I get this package through fink.

Here is a sample LaTeX source:

\documentclass[12pt]{article}
\usepackage[latin1]{inputenc}
\usepackage{graphicx}
\begin{document}
This shows inline math, where $\alpha$ is related to $\sqrt{\beta}=2$. Not to be confused with \(\sum_{\nu}a_{\nu}x^{\nu}\) and finally
\[
\sin\frac{1}{2\gamma\sum_{\mu=1}^{\infty}C_{\mu}} \ne \alpha
\]

\begin{figure}[t]
\center{\includegraphics{../IllustratorScreenShot.png}}

\caption{This is an unrelated figure.}
\end{figure}

And bye.
\end{document}

Assuming this file is called tex4htExample.tex and can be processed successfully with pdflatex, you should get a PDF file that looks like the one linked here: tex4htExample.pdf
Now try the following two procedures from the command line:

  1. htlatex tex4htExample
    This produces (among others) a file tex4htexample.html which you can inspect here.
  2. /sw/share/tex4ht/bin/mzlatex tex4htExample (the path prefix is only needed if you have installed tex4ht from fink)
    This produces (among others) a file tex4htexample.xml which you can inspect here, provided your browser understands this XML dialect. Some more remarks on this approach are at the Tex4Moz web site.

The difference between the HTML in the first case and the XML in the last example is the way math is represented. The displayed equation and some inline math in case 1 are bitmapped, making the base-line of the inline square root look wrong. This is something that latex2html handles better (out of the box).

Case 2, though, has no bitmapped maths at all. Everything is represented using MathML, which means the semantic information about the mathematical expression is completely preserved. Even with case 1, improvements are possible by adding your own configuration commands to help htlatex avoid bitmaps. However, since MathML support is getting better, it may soon be unnecessary to worry about HTML export at all.

The figure is placed where it appeared in the source, because the specification [t] (for top of the page) makes no sense in an HTML document that isn't page-oriented.

The fonts used for Math display in case 2 are still incomplete on my Mac. But there is hope. The Scientific and Technical Information Exchange (STIX) font creation project will release a free set of fonts that can be used across platforms and applications, specifically aimed at scientists who use TeX typesetting. Mozilla browsers will likely be using these fonts once they are released.


noeckel@uoregon.edu
Last modified: Sat Feb 10 21:52:27 PST 2007