I'm going to collect information that I believe to be workable, without having done a huge amount of testing. That's simply because I don't have MS Office on this Mac computer. I almost never use OpenOffice or LibreOffice, either — so the information on this page may be outdated.
Probably the biggest challenge is the conversion of formula objects into LaTeX. I'll return to this at the end. Conversion is a game that can be played at different levels of sophistication, and I'm looking for the simplest and cheapest routes here.
This is not a complete list of possible routes. More alternatives can be found at TUG. I'm only listing the things that I think are really worth trying.
Assume you have a Word filetext.doc
. Here I'll list some ways of dealing with this file, ranked in the order their quality:
textutil -convert html text.docThe converted HTML document has graphics and bitmapped formulas included. HTML is in principle a very readable source format, and at this point I would say one actually gains almost nothing in taking the extra step of converting this to LaTeX. The main point of LaTeX for me would be to be able to edit math formulas easily. But HTML conversion eliminates that possibility because it creates bitmaps from formulas. Nevertheless, there are several converters that all share the obvious name
html2latex
but differ in their capabilities as well as their implementation. An official place where you can find these converters (plus converters from HTML to other formats) is
html2things. Most of these are so old that they don't recognize modern HTML tags or, e.g., style sheets. I've tried and ruled out the sed
script, and found latex bugs with nc-html2latex
, so that the best remaining choice ended up being HTML to LaTeX (version 2.7). The fact that this converter happens to have no graphics support is really irrelevant for the reason stated above (images can't be edited anyway).