Understanding Compilers - A Beginner's Guide with Interactive Demo

Understanding Compilers - A Beginner's Guide with Interactive Demo

Kite Eugine

Kite Eugine • Nov 26, 2025

Have you ever wondered how your favorite programming languages actually work under the hood? How Python, JavaScript, or C code gets executed on your computer? The magic behind it all is something called a compiler.

In this post, we'll explore the basics of compilers, walk through the fundamental concepts you need to understand, and build a fun interactive demo that "transpiles" Python code into JavaScript right in your browser.


What is a Compiler?

A compiler is a program that takes code written in one programming language (called the source language) and translates it into another language (the target language).

Depending on the target, this could be:

  • Machine code (like C → x86 instructions) - the raw binary that your CPU understands
  • Bytecode (like Python → Python bytecode) - an intermediate representation that's faster to interpret
  • Another programming language (like Python → JavaScript) - this is called transpiling

Compilers are everywhere—they make sure your human-readable code can actually run on your computer or a server. Without them, we'd all be writing in 1s and 0s!


Compiler vs. Interpreter: What's the Difference?

Before we dive deeper, let's clarify a common confusion:

  • Compiler: Translates the entire program into the target language before execution. The result is a standalone executable or script. Think: C, Go, or TypeScript.
  • Interpreter: Reads and executes the code line-by-line, during runtime. Think: Python (mostly), Ruby, or JavaScript (though modern JS engines use JIT compilation).

Many modern languages blur the lines—Python compiles to bytecode then interprets it, and JavaScript engines compile code on the fly. But for our purposes, we're building a transpiler: source-to-source compilation.


The Main Steps in a Compiler

Even the simplest compiler goes through several well-defined stages. Understanding these is key to building your own:

1. Lexical Analysis (Tokenization)

The first step is breaking your source code into tokens—the smallest meaningful pieces.

Think of it like breaking a sentence into words. For example:

name = "Cool Name"

Gets tokenized into:

  • name (identifier)
  • = (assignment operator)
  • "Cool Name" (string literal)

The component that does this is called a lexer or tokenizer. It strips out whitespace, comments, and converts your raw text into a structured sequence of tokens.

2. Parsing (Syntax Analysis)

Once you have tokens, the parser organizes them into a tree structure called an Abstract Syntax Tree (AST).

The AST represents the logical structure of your code. For our example above:

AssignmentNode
├── identifier: "name"
└── value: StringLiteral("Cool Name")

The parser enforces the language's grammar—making sure parentheses match, statements are properly formed, etc. If the code doesn't follow the rules, you get a syntax error here.

3. Semantic Analysis

This stage checks whether the code actually makes sense logically:

  • Are variables defined before they're used?
  • Do function calls match function signatures?
  • Are types compatible (in statically-typed languages)?

This is where you'd catch errors like calling a function that doesn't exist or trying to add a string to a number in a strictly-typed language.

The compiler often builds a symbol table here—a lookup table tracking all variables, functions, and their scopes.

4. Intermediate Representation (IR) (Optional but Common)

Many compilers generate an intermediate representation—a format that's easier to optimize and independent of both source and target languages.

For example, LLVM uses its own IR that many languages compile to. This allows one optimizer to work for many languages!

5. Code Generation

This is where the magic happens—the compiler walks through the AST (or IR) and generates code in the target language.

For a Python-to-JS transpiler:

def greet():
    print("Hello!")

Becomes:

function greet() {
    console.log("Hello!");
}

The code generator needs to map concepts from the source language to equivalent constructs in the target language.

6. Optimization (Optional)

The compiler can make the generated code faster, smaller, or more efficient:

  • Dead code elimination: Remove code that never runs
  • Constant folding: Compute 2 + 2 at compile time instead of runtime
  • Inlining: Replace function calls with the function body
  • Loop unrolling: Reduce loop overhead

Real-world compilers like GCC or Clang have dozens of optimization passes!


Key Concepts for Compiler Writers

Context-Free Grammars (CFG)

Programming languages are defined by context-free grammars—sets of rules that describe valid syntax.

Example rule in BNF notation:

<assignment> ::= <identifier> "=" <expression>
<expression> ::= <string> | <number> | <identifier>

This says: "An assignment is an identifier, followed by =, followed by an expression."

Recursion is Your Friend

Parsing and tree traversal rely heavily on recursive descent—breaking problems into smaller subproblems.

When parsing a function call like foo(bar(x)), the parser recursively handles the nested structure naturally.

Two Main Parsing Approaches

  1. Top-down parsing (recursive descent, LL parsers): Start from the highest-level rule and work down
  2. Bottom-up parsing (LR parsers, used by tools like Yacc): Build from tokens up to higher constructs

For hand-written parsers, recursive descent is simpler and more intuitive.


A Fun Demo: Tiny Python-to-JavaScript Transpiler

Now let's build something real! We'll create a toy transpiler that converts a tiny subset of Python into JavaScript and runs it in the browser.

What It Supports

  • Variable assignments: name = "value"
  • Function definitions: def function_name():
  • Print statements: print("text")
  • Function calls: function_name()

The Interactive Demo

Here's our transpiler in action. Try editing the Python code and clicking "Transpile & Run":

Try These Examples

Replace the code in the textarea with these:

Example 1: Simple greeting

greeting = "Hello, World!"
def show_greeting():
    print(greeting)
show_greeting()

Example 2: Multiple functions

message = "Compilers are fun!"
def first():
    print("First: " + message)
def second():
    print("Second: " + message)
first()
second()

How It Works (Technical Breakdown)

  1. Lexical Analysis: We split the input by newlines—each line becomes a "token" (very simplified!)

  2. Parsing: We use regular expressions to identify patterns:

    • /^\w+\s*=\s*".*"$/ matches assignments
    • /^def\s+(\w+)\(\):$/ matches function definitions
    • /^print\((.*)\)$/ matches print statements
  3. Code Generation: For each matched pattern, we output equivalent JavaScript:

    • name = "value"let name = "value";
    • def foo():function foo() {
    • print(x)console.log(x);
  4. Execution: We use eval() to run the generated JavaScript (never do this in production!)

What's Missing?

This toy compiler doesn't handle:

  • Indentation (Python's significant whitespace)
  • Multiple function parameters
  • If statements, loops, classes
  • Error handling and proper semantic analysis
  • Scoping rules
  • Type checking

But it's perfect for understanding the fundamental flow of compilation!


Learning Resources

Want to dive deeper? Check these out:

Books

  • "Crafting Interpreters" by Robert Nystrom (free online) - THE book for beginners
  • "Compilers: Principles, Techniques, and Tools" (The Dragon Book) - Classic but dense
  • "Writing An Interpreter In Go" by Thorsten Ball - Practical and modern

Online Tools

Projects to Study

  • Babel - JavaScript to JavaScript transpiler (transforms modern JS to older versions)
  • TypeScript - TypeScript to JavaScript compiler
  • Sucrase - Fast TypeScript/JSX transpiler
  • esbuild - Blazing-fast JavaScript bundler and transpiler

Your Turn: Experiment!

Here are challenges to extend the demo:

Easy:

  1. Add support for numeric variables: count = 42
  2. Support multiple function arguments: def greet(name):
  3. Add comments (ignore lines starting with #)

Medium:
4. Handle if statements: if x == 5:
5. Implement while loops: while x < 10:
6. Support string methods: name.upper()

Hard:
7. Implement proper indentation tracking
8. Add a symbol table for scoping
9. Generate source maps for debugging

Insane:
10. Build a real lexer with token objects
11. Create a proper AST with node types
12. Implement a multi-pass compiler with optimization


Takeaways

Let's recap what we've learned:

  • Compilers translate code from one language to another through well-defined stages

  • The main stages are: Lexing → Parsing → Semantic Analysis → Code Generation → Optimization

  • Key concepts include ASTs, grammars, tokens, and symbol tables

  • You can build toy transpilers with just a few lines of code

  • Hands-on practice is the best way to learn compiler concepts


What's Next?

Building toy compilers is fun, but what about real-world applications? Could we build a full Python-to-JavaScript transpiler that lets beginners deploy Python code to free JavaScript hosting platforms?

In my next post, "Why I Can't Build My Dream Python-to-JavaScript Transpiler (And Why That's Okay)", I explore this ambitious idea and the surprising technical challenges that make it nearly impossible.


Final Thoughts

Compilers are one of the most fascinating areas of computer science. They're at the intersection of:

  • Language design
  • Algorithm optimization
  • Abstract theory (grammars, automata)
  • Practical engineering

Starting with toy examples like our Python-to-JS transpiler gives you the foundation to understand production compilers. You don't need to build the next GCC—but understanding how Babel works or why TypeScript caught on makes you a better developer.

The journey from "I wonder how this works" to "I built a tiny compiler" is shorter than you think. So play with the demo, break things, and learn!

Comments (0)

No comments yet. Be the first to comment!

Related Posts