Architecture
Edigen is a source-to-source compiler for a small DSL.
Its input is an .eds file and its output is Java source code for:
- a decoder implementing
net.emustudio.emulib.plugins.cpu.Decoder - a disassembler implementing
net.emustudio.emulib.plugins.cpu.Disassembler
High-Level Structure
CLI
Edigen
->
Translation orchestration
Translator
->
Parsing
Parser (generated from Grammar.jj)
->
AST
Specification
|- Decoder subtree
`- Disassembler subtree
->
Visitor passes
semantic checks + structural rewrites
->
Generators
DecoderGenerator / DisassemblerGenerator
->
Template rendering
Template + PrettyPrinter + *.edt
->
Generated Java source
Entry Points
CLI layer
net.emustudio.edigen.Edigen is the command-line entry point.
Its responsibilities are small by design:
- print startup and error messages
- parse command-line arguments
- create
Translator - terminate with a non-zero exit code on failure
This keeps the actual compiler pipeline out of the CLI.
Translation layer
net.emustudio.edigen.Translator is the real application core.
It performs three steps:
- parse the input file into a
Specification - run the fixed visitor pipeline over the AST
- run the decoder and disassembler generators
That class is the best place to understand the implementation order, because transform(...) defines the exact pass
sequence used in production.
Parsing Layer
The parser grammar is defined in src/main/javacc/Grammar.jj.
JavaCC generates net.emustudio.edigen.parser.Parser.
The parser is intentionally lightweight:
- it recognizes decoder and disassembler syntax
- it creates AST nodes
- it attaches source line numbers
- it does not attempt to resolve names or normalize structure
That separation matters because later visitors need to reason over the tree as a whole, including forward references and cross-links between decoder and disassembler sections.
AST Model
The root node is Specification, which always contains two subtrees:
DecoderDisassembler
All AST nodes extend TreeNode.
TreeNode provides:
- parent/child relationships
- insertion-order child storage
- in-place mutation helpers such as
addChild(...)andremove() - deep copying via
copy() - tree dumping via
dump(...) - source line association
The important domain nodes are:
RuleA decoder rule, possibly with multiple names.VariantOne branch of a rule.SubruleA reference to another rule, or a value-capturing field.PatternA bit pattern.MaskA bit mask used during matching.FormatOne disassembler output format string.ValueOne disassembler parameter, bound to a decoder rule name.
Visitor Model
All analysis, validation, transformation, and emission logic is expressed as subclasses of Visitor.
This is the central extension mechanism in the project.
The base Visitor implementation simply traverses children, so each specialized visitor overrides only the node types
it cares about.
That leads to a consistent structure:
- semantic passes mutate or validate the same AST in place
- generator visitors render source from the normalized AST
- adding a new pass usually does not require changing node classes
Why the AST Is Mutated In Place
Edigen does not build a separate IR for every compiler phase. Instead, the original AST is progressively rewritten into a shape that is closer to code generation.
This has two practical benefits:
- passes can reuse the same node types and tree utilities
- generated source visitors can be simple because the hard structural work is already done
The tradeoff is that pass ordering is important and part of the architecture.
Generation Layer
The generation package has two public generators:
DecoderGeneratorDisassemblerGenerator
Both inherit from Generator, which handles:
- package/class name splitting
- template selection
- output file creation
- common template variable setup
The actual generated fragments come from dedicated visitors:
GenerateFieldsVisitorGenerateMethodsVisitorGenerateFormatsVisitorGenerateParametersVisitor
Template Layer
The final Java file is not assembled manually with string concatenation.
Instead, a generator fills a Template object and writes into a .edt file.
Two helpers are important here:
TemplateReplaces%name%variables, with special handling for block variables so inserted code keeps indentation.PrettyPrinterAdds indentation while generator visitors emit Java line-by-line.
The shipped templates are:
src/main/resources/Decoder.edtsrc/main/resources/Disassembler.edt
Generated Runtime Model
The generated decoder and disassembler are not just thin wrappers around emitted switch statements. The templates also define the runtime support code that every generated class shares.
The decoder template currently provides:
- memory-backed bit reading
- instruction image buffering
- a verify-on-read LRU cache keyed by memory address
- generated rule methods
The disassembler template currently provides:
- format lookup by decoded rule set
- constant-decoding strategies
- a last-position decode cache
- a last-render cache for mnemonic and byte string reuse
Extension Points
The architecture exposes two supported extension points without changing Java code:
- custom decoder template via
-dt - custom disassembler template via
-at
Because the generators produce template variables rather than final files directly, template replacement is a first-class way to customize the emitted runtime while keeping the AST pipeline unchanged.
Debugging the Pipeline
Translator supports a debug mode that dumps the AST after each pass.
Architecturally, this is important because the transformation pipeline is where most of the real compiler logic lives.
If generated output looks wrong, the fastest way to reason about it is usually to inspect the post-pass tree rather than
the final Java.