banner
飞天御剑流

飞天御剑流

前端,jAVAsCRIPT
github

Frontend Compilation Principles the-super-tiny-compiler

Update after four months#

In fact, the recursive steps later only simulate the pointer functionality similar to the C language by leveraging the reference data type feature of JS 😐. It simply passes the pointer and continuously generates a data structure similar to a linked list, fills in the data, and finally obtains the target code... that's it... already.

At the beginning#

the-super-tiny-compiler is a compiler written in JavaScript on GitHub. The code comments claim that it may be the smallest compiler that can convert Lisp-style language into C-style language.

This compiler project can be said to be small but complete.

However, when I was reading the source code of the compiler, I was puzzled by the recursive process of generating a new AST.

So from now on, I will analyze the source code.

In simple terms, the principle of the compiler is as follows:
Lexical analysis
Syntax analysis (generate AST)
Convert oldAst to newAst
Finally, generate the target language syntax from the generated newAst
Taking (add 2 (subtract 4 2)) as input, the resulting AST structure is as follows

const ast = {
  type: "Program",
  body: [
    {
      type: "CallExpression",
      name: "add",
      params: [
        {
          type: "NumberLiteral",
          value: "2",
        },
        {
          type: "CallExpression",
          name: "subtract",
          params: [
            {
              type: "NumberLiteral",
              value: "4",
            },
            {
              type: "NumberLiteral",
              value: "2",
            },
          ],
        },
      ],
    },
  ],
};

After obtaining this structure, the following function is executed


function transformer(ast) {
  let newAst = {
    type: "Program",
    body: []
  }

  ast._context = newAst.body
}

A new property is added to the ast object, and this property is pointed to the body property of newAst. Then, the following code is executed

function transformer(ast) {
  let newAst = {
    type: "Program",
    body: []
  }

// Changing the properties of ast can directly affect newAst 
  ast._context = newAst.body

  traverser(ast, {
    // Process numbers
    NumberLiteral: {
      enter(node, parent) {
        parent._context.push({
          type: 'NumberLiteral',
          value: node.value
        })
      }
    },
    // Strings
    StringLiteral: {
      enter(node, parent) {
        parent._context.push({
          type: 'StringLiteral',
          value: node.value,
        });
      },
    },

    CallExpression: {
      enter(node, parent) {
        let expression = {
          type: 'CallExpression',
          callee: {
            type: 'Identifier',
            name: node.name
          },
          arguments: []
        }

        node._context = expression.arguments;

        if (parent.type !== "CallExpression") {
          expression = {
            type: 'ExpressionStatement',
            expression
          }
        }

        parent._context.push(expression)
      }
    }


  })

  return newAst
}

As can be seen, the data changes of newAst are completed by executing only one traverser function. The function takes the previous ast as a parameter and copies the behavior of newAst's body based on different types.

The internal of this function is as follows

function traverser(ast, visitor) {

  function traverseArray(array, parent) {
    
  }

  function traverseNode(node, parent) {
    // Check if the passed node has the corresponding property
    let methods = visitor[node.type]

    // If it does, assign it to the body of its parent node
    if (methods && methods.enter) {
      methods.enter(node, parent)
    }

    // Then separately handle the properties not included in the visitor
    switch (node.type) {
      // First execute the traversal of the body of the node
      case "Program":
        traverseArray(node.body, node);
        break;

      case "CallExpression":
        traverseArray(node.params, node)
        break;

      case "NumberLiteral":
      case "StringLiteral":
        break;

      default:
        throw new TypeError(node.type)
    }
  }

  traverseNode(ast, null)

}

When executing it, the main line is traverser -> traverseNode ->
The ast is passed as the node parameter. The execution process of this function involves recursive calls.
The first execution, methods is undefined, and then enters the switch statement.
In the first execution, the ast type is Program, and the body of ast is passed as a parameter, and then break;
At this point, the main line of the function has been executed.
The subsequent execution flow is a pseudo-recursion or nested call to ast.body. The function determines whether to generate a new ast based on its passed tree parameter, expression, and arguments.

It is worth mentioning that when generating newAst, there is a statement like this
node._context = expression.arguments;
It makes the _context property of the passed node point to a property under the current object, achieving the effect of reference. At this time, when using visitor to traverse the syntax tree, no matter what object is passed in, because the reference relationship of memory addresses has been built in the upper level, it is only necessary to add to parent._context property.

Finally, the generated newAst is used to generate our target language. There is nothing special about this. Just recursively generate the string.

In summary, seemingly simple small projects require consideration of many details if implemented by oneself. Therefore, the road to in-depth frontend development is long and arduous!

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.