Deep Learning for Symbolic Mathematics

Review of paper by Guillaume Lample and François Charton, Facebook AI Research, 2019

This paper uses deep sequence-to-sequence models to perform integration and solve differential equations in symbolic form.

What can we learn from this paper?

It is shown that deep neural network architectures developed for language translation can be used to perform complex symbolic mathematics. It’s exciting to know that such problems can be solved with deep learning.

However, you probably only need to read the paper thoroughly if you are actually interested in implementing these algorithms, as a lot of the discussion is devoted to specific implementation details.

Prerequisites (to understand the paper, what does one need to be familiar with?)

  • General calculus (integrals, derivatives, etc)
  • Sequence-to-sequence models (not necessary, but helpful)


Even the best rule-based software packages can only work with relatively simple symbolic expressions. However, all mathematical formulas can be written in a structured way that allows automated parsing and transformations. This paper explores applying neural language translation models to math equations written as binary expression trees in order to perform symbolic integration and find closed-form solutions of differential equations.


The first obstacle to training neural networks to perform symbolic computations is the difficulty of generating labeled training data. The authors apply several techniques to automatically create such data. For integration, they use forward generation of symbolic integrals of functions by existing software packages, as well as backward generation by differentiating various functions, which can be easily done in symbolic form. For differential equations, they differentiate a certain class of functions with known solutions.

Once the training data is generated, simplified, and cleaned of invalid expressions, it is used to train models with the standard transformer architecture from the machine translation domain, described in the classic 2017 paper by Vaswani et al which everyone interested in sequence models should read.

Once trained, the models were used to predict the answers to test problems. Beam search, which is a common technique for sequence-to-sequence tasks, was used. The idea of beam search of size n is to keep only the best n candidate solutions at each step while parsing the input sequence, thus limiting the space and time complexity of the problem.

In machine translation problems, it is usually preferred to have a very low beam search size, perhaps as low as 1. In this paper, higher beam search sizes of 10 and 50 were tried. Since, as opposed to translation, the correctness of results for these particular symbolic math problems can be easily verified by differentiation, all n final outputs of beam search were checked, and the problem was considered solved if at least one of them was correct.

For symbolic integration, size 1 beam search achieved very good results, which were only slightly improved by increasing the size to 10 and 50. All of the results far exceeded the accuracy of the rule-based commercial solvers (Mathematica, Matlab, Maple). For solving first- and second-order ODEs, the accuracy improved significantly when moving to higher beam sizes, once again exceeding that of Mathematica with a timeout of 30 seconds.

To summarize, this paper seems to be an important early step in developing deep-learning symbolic computation.

Original paper link

Github repository

Further reading

Leave a Reply