14 January

Developing a symbolic-expression library with C#. Part I. Computation, derivation, simplification, and other features

Original author: WhiteBlackGoose

For some reason, programming a calculator is associated with something that every beginner should do. Perhaps, the reason is history: computers were created exactly for computation. But we will develop a smart calculator, not sympy of course, but to be able to do basic algebraic operations, such as differentiation, simplicification, and features like compilation to speed up calculations.

What are the articles about?
It will superficially tell about assembling an expression, parsing from a string, variable substitution, analytic derivative, equation numerical solving, and definite integration, rendering to LaTeX format, complex numbers, compiling functions, simplifying, expanding brackets, and blah blah blah.
For those who urgently need to clone something, repository link.

Let's do it!

Who is this article for?
I think that the article may be useful for a beginner, but maybe those who are a little more experienced will also find something interesting. However, I hope to write an article so that it can be read without being a C# programmer at all.

Expression assembly

What is an expression?

When I was a child...
Then of course I wanted to write a calculator. What should it be able to do? Four basic operations, and that's enough. So, my goal was to calculate the value from a string expression, for example, «1 + (3/4 — (5 + 3 * 1))». I started my favorite Delphi 7 and wrote a parser, which first recursively processed parentheses, and then replaced the expression in parentheses with the calculated value, and removed the braces. Basically, a quite working way for me at that time.

Of course, this is not a string. It is pretty obvious that a mathematical formula is either a tree or a stack, and here we stop at the first one. So, each node of this tree is some kind of operation, variable, or constant.

An operation is either a function or an operator, which are actually about the same. Its children are arguments of a function (operator).

Class hierarchy in your code

Of course, the implementation can widely vary. However, the idea is that if your tree only consists of nodes and leaves, then they differ from each other being generalized by something. I call these «things» — entities. Therefore, the upper class will be the abstract class Entity.

As everyone knows from basic programming courses, an abstract class generalizes some classes on the one hand, and allows you to separate the logic and behavior of some objects on the other. An instance of an abstract class cannot be created, but its child can.

And there will also be four child classes: NumberEntity, VariableEntity, OperatorEntity, FunctionEntity.

How to build an expression?

We will start with building an in-code expression, i.e.

var x = new VariableEntity("x");
var expr = x * x + 3 * x + 12;

If you declare an empty class VariableEntity, then such a code will throw an exception, something like «I don't know how to multiply and add.»

Overriding Operators

Is a very important and useful feature of most languages, it allows you to customize the execution of arithmetic operations. It is syntactically implemented differently depending on the language. For example, an implementation in C#

public static YourClass operator +(YourClass a, YourClass b) {
    return new YourClass(a.ToString() + b.ToString());

Learn more about overriding statements in C#
My implementation is here.

(Ex|Im)plicit type casting

In compilable languages like C#, such a thing is usually present and allows you to cast the type if necessary without an additional call of myvar.ToAnotherType(). So, for example, it would be more convenient to write

NumberEntity myvar = 3;

Instead of the usual
NumberEntity myvar = new NumberEntity(3);

More on type casting in C#
My implementation here.


The Entity class has a Children field — this is just a list of instances of Entity, which are the arguments for this entity.

In fact, only two classes of the four can have children: OperatorEntity and FunctionEntity. We could have created, say, NodeEntity, inherited these two classes from it, created a LeafEntity, and inherit VariableEntity and NumberEntity from it.

When we call a function or operator, we should create a new entity, and put in it the children from which the function or operator is called. For example, the add operator in theory should look something like this:

public static Entity operator +(Entity a, Entity b){ 
    var res = new OperatorEntity("+");
    return res;

That is, now if we have entity x and entity 3, then x + 3 will return the entity of the sum operator with two children: 3 and x. So, we can build expression trees.

A function call is simpler and not as beautiful as that with an operator:

public Entity Sin(Entity a)
    var res = new FunctionEntity("sin");
    return res;

My implementation is here.

Ok, we made up an expression tree.

Variable substitution

Everything is extremely simple here. We have Entity — we check whether it is a variable itself; if so, we return the value, otherwise we run through the children.

In this «huge» 48-line file implements such a complex function.

Value calculation

Yeah, the calculator must calculate. Here we are supposed to add some kind of method to Entity

public Entity Eval()
    if (IsLeaf)
        return this;
        return MathFunctions.InvokeEval(Name, Children);

The leaf is unchanged, but for everything else we have a custom calculation. Once more, I will only provide an example:

public static Entity Eval(List<Entity> args)
    MathFunctions.AssertArgs(args.Count, 1);
    var r = args[0].Eval();
    if (r is NumberEntity)
        return new NumberEntity(Number.Sin((r as NumberEntity).Value));
        return r.Sin();

If the argument is a number, then we perform a numerical operation, otherwise we will return the arguments as they were.


This is the simplest unit, number. Arithmetic operations can be performed on it. By default, it is complex. It also has operations such as Sin, Cos, and some others.

If you are interested in its implementation, Number is described here.


Anyone can calculate the derivative numerically, and such a function is written truly in one line:

public double Derivative(Func<double, double> f, double x) => (f(x + 1.0e-5) - f(x)) * 1.0e+5;

But of course we want an analytical derivative. Since we already have an expression tree, we can recursively replace each node in accordance with the differentiation rule. It should work something like this:

I realized it in the following way:

public static Entity Derive(List<Entity> args, VariableEntity variable) {
    MathFunctions.AssertArgs(args.Count, 2);
    var a = args[0];
    var b = args[1];
    return a.Derive(variable) + b.Derive(variable);

public static Entity Derive(List<Entity> args, VariableEntity variable)
    MathFunctions.AssertArgs(args.Count, 2);
    var a = args[0];
    var b = args[1];
    return a.Derive(variable) * b + b.Derive(variable) * a;

And here is a workaround in itself:
public Entity Derive(VariableEntity x)
    if (IsLeaf)
        if (this is VariableEntity && this.Name == x.Name)
            return new NumberEntity(1);
            return new NumberEntity(0);
    return MathFunctions.InvokeDerive(Name, Children, x);

This is the Entity method. As we see, the leaf has only two states: either it is a variable by which we differentiate, then its derivative is 1, or it is a constant (number or VariableEntity), then its derivative is 0, or a node, then there is a reference by name (InvokeDerive refers to the dictionary of functions, where the desired one is located (for example, the sum or sine)).

Notice that I do not leave something like dy/dx here but claim that the derivative of the variable not by which we differentiate is 0. But here it is done differently.

All differentiation is described in a single file.

Simplification of an expression. Patterns

Simplification of an expression in the general case is non-trivial. Well, for example, which expression is simpler: $ x ^ 2 - y ^ 2 $ or $ (x - y) (x + y) $? At this point, we stick to some ideas, and basing on them we want to make those rules so they could accurately simplify the expression.

It is possible to write at every Eval that if we have the sum node, and its children are product nodes, then we will sort out four options, and if something is equal something else, we will take out the factor… But of course I do not want to do that. Therefore, we will use a system of rules and patterns. So what do we want? Something like this syntax:

{ any1 / (any2 / any3)   ->   any1 * any3 / any2 },
{ const1 * var1 + const2 * var1   ->   (const1 + const2) * var1 },
{ any1 + any1 * any2   ->   any1 * (Num(1) + any2) },

Here is an example of a tree in which a subtree was found (circled in green) that matches the pattern any1 + const1 * any1 (any1 found is circled in orange).

As you can see, sometimes it is important for us that one and the same entity must be repeated, for example, to reduce the expression $x + a * x$, we need $x$ to be both factor of the first and second monomial, because $x + a * y$ is no longer reduceable. Therefore, we need to make an algorithm that not only checks that the tree matches the pattern, but also

  1. Verify that the same Entity patterns match the same Entity.
  2. Gathers list of entites matching each pattern entity

The entry point looks something like this:
internal Dictionary<int, Entity> EqFits(Entity tree)
    var res = new Dictionary<int, Entity>();
    if (!tree.PatternMakeMatch(this, res))
        return null;
        return res;

In tree.PaternMakeMatch, we recursively add keys and their values to the dictionary. Here is an example of a list of the Entity pattern itself:

static readonly Pattern any1 = new Pattern(100, PatType.COMMON);        
static readonly Pattern any2 = new Pattern(101, PatType.COMMON);
static readonly Pattern const1 = new Pattern(200, PatType.NUMBER);
static readonly Pattern const2 = new Pattern(201, PatType.NUMBER);
static readonly Pattern func1 = new Pattern(400, PatType.FUNCTION);

When we write any1 * const1 — func1 and so on, each node will have a number — this is the key. In other words, when filling out the dictionary, these numbers will appear as keys: 100, 101, 200, 201, 400… And when building a tree, we will look at the value corresponding to the key and substitute it.

Implemented here.

Simplification. Tree Sort

In the article, to which I have already referred, the author decided to make it simple, and sorted it practically by the hash of the tree. He managed to reduce $a$ and $-a$, $b + c + b$ to turn $2b + c$. But we, of course, also want $(x + y) + x * y - 3 * x$ to be reduced, and in general more complicated things.

Will the patterns not work for that?

In general, patterns, that we used before, are a monstrously wonderful thing. It will allow you to reduce expressions like $(x-y)(x+y)$, and Pythagorean trigonometric identity, and other complex things. But the elementary palm, $(((((x + 1) + 1) + 1) + 1)$, it will not reduce, because the main rule here is the commutativity of the terms. Therefore, the first step is to extract the «linear children.»

Linear children

Actually for each node of the sum or difference (and, by the way, the product/division), we want to get a list of terms (factors).

This is basically straightforward. Let the LinearChildren(Entity node) function return a list, then we look at a child in node.Children: if child is not a sum, then result.Add (child), otherwise — result.AddRange (LinearChildren (child)).

Implemented not in the best way here.

Grouping children

So we have a list of children, but what's next? Suppose we have $\sin(x) + x + y + \sin(x) + 2 * x$. Obviously, our algorithm will receive five terms. Next, we want to group by similarity, for example, $x$ looks like $2 * x$ more than $sin(x)$.

Here is a good grouping:

Since the patterns in it will cope further with the conversion of $2 * x + x$ to $3 * x$.

That is, we first group by some hash, and then do MultiHang — converting n-ary summation to binary.

Node Hash

On the one hand, $x$ and $x + 1$ should be placed in one group. On the other hand, in the presence of $a * x$, placing in the same group with $ y * (x + 1) $ is pointless.

If you think about it, then $a * x + y * (x + 1) = (a + y) * x + y$. Although it seems to me, this is practically not easier, and certainly not necessary. Anyway, simplification is never an obvious thing, and this is certainly not the first thing to write when writing a “calculator”.

Therefore, we implement multi-level sorting. First, we pretend that $x - 3$ and $x$ to be the same. Then we pretend that $x + 1$ can be placed only with other $x - 1$, $x + 1$ etc. Finally, $x + 1$ is only compatible with $x + 1$. And now our $a * (x + 1)$ and $y * (x + 1)$ finally merged. Implemented quite simply:

internal string Hash(SortLevel level)
    if (this is FunctionEntity)
        return this.Name + "_" + string.Join("_", from child in Children select child.Hash(level));
    else if (this is NumberEntity)
        return level == SortLevel.HIGH_LEVEL ? "" : this.Name + " ";
    else if (this is VariableEntity)
        return "v_" + Name;
        return (level == SortLevel.LOW_LEVEL ? this.Name + "_" : "") + string.Join("_", from child in Children where child.Hash(level) != "" select child.Hash(level));

As you can see, the function entity affects sorting in any way (of course, because $f(x)$ with $x$ is cannot be somehow reduced). Likewise, we cannot reduce $x + y$ with $y$. But the constants and operators are taken into account at some levels (but not all). That is how it is performed

public Entity Simplify(int level)
    // First, we make the easiest simplification: calculating values where possible, multiplying by zero, etc.
    var stage1 = this.InnerSimplify();
    Entity res = stage1;
    for (int i = 0; i < level; i++)
        // This block is responsible for sorting. First we group something like x and x + 1 (variables and functions), then something like x-1 and x + 1 (variables, functions and constants), then something like x + 1 and x + 1 (everything is taken into account).
        switch (i)
            case 0: res = res.Sort(SortLevel.HIGH_LEVEL); break;
            case 2: res = res.Sort(SortLevel.MIDDLE_LEVEL); break;
            case 4: res = res.Sort(SortLevel.LOW_LEVEL); break;
        // Here we replace the patterns.
        res = TreeAnalyzer.Replace(Patterns.CommonRules, res).InnerSimplify();
    return res;

Is this the best implementation? I have been thinking for a long time how to make it maximally beautiful, although in my opinion, it is far from «beautiful» here.

I sort the tree here.

«Compilation» of functions

In quotation marks — since it is not in the IL code itself, but only in a very fast set of instructions. But it is very simple.

Substitution Problem

To calculate the value of a function, we just need to call the variable substitution and eval, for example

var x = MathS.Var("x");
var expr = x * x + 3;
var result = expr.Substitute(x, 5).Eval();

But it works slowly, about 1.5 microseconds per sine.


To speed up the calculation, we do a function calculation on the stack, namely:

1) We come up with the FastExpression class, which will have a list of instructions

2) When compiling, the instructions are stacked in the reverse order, that is, if there is a function $x * x + \sin(x) + 3$, then the instructions will be something like this:

PUSHVAR 0 // Variable Substitution Number 0 - x
CALL 6 // Call function number 6 - sine
CALL 0 // Call function number 0 - sum
Call 2

Next, when invoked, we run these instructions and return a Number.

An example of executing a sum statement:

internal static void Sumf(Stack<Number> stack)
    Number n1 = stack.Pop();
    Number n2 = stack.Pop();
    stack.Push(n1 + n2);

The sine call was reduced from 1500ns to 60ns (system Complex.Sin works for 30ns).
It is implemented here.

Fuh, seems like that's it for now. Although there is still something to tell about. Part II will be about analytical solving of equations, rendering to LaTeX, and we will yet speed up functions (surpassing the system ones!)

Link to the repository containing all the code, as well as tests and samples.

Actually, I continue to work on this project. It is distributed under MIT (meaning do literally whatever you want with it), and it will never become neither closed nor commercial. Moreover, if there are ideas for improvement and contribution, pull requests are very welcome.

Thanks for attention!

122 0
Leave a comment
Top of the day