Journal of Information Science and Engineering, Vol. 36 No. 1, pp. 75-89

Retrieval of Mathematical Information with Syntactic and Semantic Structure over Web

Department of Computer Science
Institute of Business Administration
Karachi, 74400 Pakistan
E-mail: {shussain; skhoja}@iba.edu.pk


Efficient retrieval of mathematical expressions over web is a complex process as compared to simple text search. This is only possible when the syntactic (e.g. Textual) and semantic (e.g. Structural) information of a mathematical expression is retrieved properly and analyzed methodically. In this paper, we are proposing a technique that indexes expressions along with their syntactic and semantic information. These expressions are represented in Content-MathML(CMML). To improve the memory efficiency in index, an encoding technique is introduced which encode CMML mathematical expressions in Braille Unicode characters. In order to improve ranking of retrieved documents, a weighting function is introduced which assign a weight to each indexing term. The weighting score of each term contributes in ranking function that improves the rank of a document which contains query terms. The proposed technique is evaluated on NTCIR-12 Wikipedia and Arxiv corpora. Performance is also measured using NTCIR-MathIR evaluation criteria. The precision for Wikipedia-formula-queries is achieved 47% and for Arxiv is achieved 44% at top 5 documents.

Keywords: information retrieval, formula retrieval, term ranking, structure matching, term encoding, formula indexing

