Since mathematical expressions play fundamental roles in Science, Technology, Engineering and Mathematics (STEM) documents, it is beneficial to extract meanings from formulae. Such extraction enables us to construct databases of mathematical knowledge, search for formulae, and develop a system that generates executable codes automatically.
TeX is widely used to write STEM documents and provides us with a way to represent meanings of elements in formulae in TeX by macros. As a simple example, we can define a macro `\def\inverse#1{#1^{-1}}` and use it as `$\inverse{A}$` in documents to make it clear that the expression means "the inverse of matrix~$A$" rather than "value~$A$ to the power of $-1$". Using such meaningful representations is useful in practice for maintaining document sources, as well as converting TeX sources to other formal formats such as first-order logic and content markup in MathML. However, this manner is optional and not forced by TeX. As a result, many authors neglect it and write messy formulae in TeX documents (even with a wrong markup).
To make it possible to associate elements in formulae and their meanings automatically instead of requiring it of authors, recently I began research on detecting or disambiguating the meaning for each element in formulae by conducting synthetic analyses on mathematical expressions and natural language text. In this presentation, I will show the goal of my research, the approach I'm taking, and the current status of the
work.