Skip to the content.

Architecture

Edigen is a source-to-source compiler for a small DSL. Its input is an .eds file and its output is Java source code for:

High-Level Structure

CLI
  Edigen
    ->
Translation orchestration
  Translator
    ->
Parsing
  Parser (generated from Grammar.jj)
    ->
AST
  Specification
    |- Decoder subtree
    `- Disassembler subtree
    ->
Visitor passes
  semantic checks + structural rewrites
    ->
Generators
  DecoderGenerator / DisassemblerGenerator
    ->
Template rendering
  Template + PrettyPrinter + *.edt
    ->
Generated Java source

Entry Points

CLI layer

net.emustudio.edigen.Edigen is the command-line entry point. Its responsibilities are small by design:

This keeps the actual compiler pipeline out of the CLI.

Translation layer

net.emustudio.edigen.Translator is the real application core. It performs three steps:

  1. parse the input file into a Specification
  2. run the fixed visitor pipeline over the AST
  3. run the decoder and disassembler generators

That class is the best place to understand the implementation order, because transform(...) defines the exact pass sequence used in production.

Parsing Layer

The parser grammar is defined in src/main/javacc/Grammar.jj. JavaCC generates net.emustudio.edigen.parser.Parser.

The parser is intentionally lightweight:

That separation matters because later visitors need to reason over the tree as a whole, including forward references and cross-links between decoder and disassembler sections.

AST Model

The root node is Specification, which always contains two subtrees:

All AST nodes extend TreeNode. TreeNode provides:

The important domain nodes are:

Visitor Model

All analysis, validation, transformation, and emission logic is expressed as subclasses of Visitor.

This is the central extension mechanism in the project. The base Visitor implementation simply traverses children, so each specialized visitor overrides only the node types it cares about.

That leads to a consistent structure:

Why the AST Is Mutated In Place

Edigen does not build a separate IR for every compiler phase. Instead, the original AST is progressively rewritten into a shape that is closer to code generation.

This has two practical benefits:

The tradeoff is that pass ordering is important and part of the architecture.

Generation Layer

The generation package has two public generators:

Both inherit from Generator, which handles:

The actual generated fragments come from dedicated visitors:

Template Layer

The final Java file is not assembled manually with string concatenation. Instead, a generator fills a Template object and writes into a .edt file.

Two helpers are important here:

The shipped templates are:

Generated Runtime Model

The generated decoder and disassembler are not just thin wrappers around emitted switch statements. The templates also define the runtime support code that every generated class shares.

The decoder template currently provides:

The disassembler template currently provides:

Extension Points

The architecture exposes two supported extension points without changing Java code:

Because the generators produce template variables rather than final files directly, template replacement is a first-class way to customize the emitted runtime while keeping the AST pipeline unchanged.

Debugging the Pipeline

Translator supports a debug mode that dumps the AST after each pass. Architecturally, this is important because the transformation pipeline is where most of the real compiler logic lives. If generated output looks wrong, the fastest way to reason about it is usually to inspect the post-pass tree rather than the final Java.