Journal of Information Science and Engineering, Vol. 36 No. 5, pp. 1021-1034

Rule Based Conversion of L^{A}T_{E}X Math Equations into Content MathML (CMML)

This paper discusses the formation of math grammar rules for LATEX math equations. These rules are used to generate Abstract Syntax Tree (AST) which extracts structural information from mathematical expressions given in LATEX format. Later AST is used to generate XML structure of mathematical expressions that make mathematical expressions machine-readable in heterogeneous environments. A rule-based algorithm is also proposed that converts LATEX math expressions into Content MathML (CMML), which produces semantic enrichment in web documents. The rules for writing LATEX math equations are formulated and implemented as LATEX Math Grammar (LMG), which are used for generating AST. Further, AST is converted into XML structure which is used to generate CMML encoding. Initially, the conversion algorithm is tested on 20 equations used in an NTCIR-12 math competition, then the algorithm is tested on NTCIR-12 Wikipedia-MathIR and ArXiv data sets. The results show that our algorithm is capable of converting LATEX complex equations into CMML extensively as compared to the existing ones as well as its time efficiency is better than contemporary systems.

Keywords:
Latex AST, XML tree, math grammar, CMML conversion, semantic analyzer