Assembler “as-z80”
The assembler syntax is similar to as-8080
assembler. Z80 instructions described here. Assembler supports the following features:
- macros (unlimited nesting)
- include other files support
- conditional assembly
- data definition
- relative addressing using labels
- literals and expressions in various radixes (bin, dec, hex, oct)
- output is in Intel HEX format
Running from the command line
The assembler is provided as part of emuStudio, and usually it is run from GUI. But it can be run also from the command line, as follows:
- on Linux:
> bin/as-z80 [--output output_file.hex] [source_file.asm]
- on Windows:
> bin\as-z80.bat [--output output_file.hex] [source_file.asm]
All command line options include:
Options:
--output, -o file: name of the output file
--version, -v : print version
--help, -h : this help
Lexical symbols
The assembler does not differentiate between upper and lower case (it is case-insensitive). The token/symbol types are as follows:
Type | Description |
---|---|
Keywords | instruction names; preprocessor directives (org , equ , var , macro , endm , include , if , endif ); data definitions (db , dw , ds ); CPU registers |
Identifiers | ([a-zA-Z_\?@])[a-zA-Z_\?@0-9]* except keywords |
Labels | |
Constants | strings or integers |
Operators | + , - , * , / , = , % , & , \| , ! , ~ , << , >> , > , < , >= , <= |
Comments | semi-colon (; ) with text after it until the end of the line |
Constants
Numeric constants can be only integers, encoded with one of several number radixes. The possible formats are written using regexes:
- binary numbers:
[0-1]+[bB]
- decimal numbers:
[0-9]+[dD]?
- octal numbers:
[0-7]+[oOqQ]
- hexadecimal numbers:
[0-9][0-9a-fA-F]*[hH]
or0[xX][0-9a-fA-F]+
Characters or strings must be enclosed in double-quotes, e,g,: LD E, "*"
Identifiers
Identifiers must fit to the following regex: ([a-zA-Z_\?@])[a-zA-Z_\?@0-9]*
. It means, that it has to start with a letter a-z (or A-Z) or the at-sign (@
). Then, it can be followed by letters, at-sign, or numbers.
However, they must not equal to any keyword.
Also, if an identifier is used for one kind of definition (label, variable, constant, or macro), it cannot be used for definition of another kind. For example, the following code is not valid
label:
label set 1
At first the identified label
is used for definition of a label, and on the second row the same identifier is used for definition of a variable. This is not allowed and will produce an error.
Instructions syntax
The program is basically a sequence of instructions. The instructions are separated by a new line. The instruction have optional and mandatory parts, e.g.:
LABEL: CODE OPERANDS ; COMMENT
Part | Required | Notes |
---|---|---|
LABEL | Optional | Identifier of the memory position, followed by a colon (: ). It can be used as forward or backward reference in instructions which expect memory address (or 16 bit number). |
CODE | Mandatory | Instruction name. |
OPERANDS | It depends | If applicable, a comma-separated (, ) operands of the instruction. |
COMMENT | Optional | semi-colon (; ) followed by any text until the end of the line. |
Fields CODE
and OPERANDS
must be separated by at least one space. For example:
HERE: LD C, 0 ; Put 0 into C register
DB 3Ah ; Data constant of size 1 byte
LOOP: JP LOOP ; Infinite loop
Labels are optional. Instructions and pseudo-instructions and register names are reserved for assembler and cannot be used as labels. Also, there cannot be more definitions of the same label.
Operands must be separated with comma (,
). There exist several operand types, which represent so-called “address modes”. Allowed address modes depend on the instruction. The possibilities are:
- Implicit addressing: instructions do not have operands. They are implicit.
- Register addressing: operands are registers. 8-bit general-purpose register names are:
A
,B
,C
,D
,E
,H
,L
. Register pairs have names:BC
,DE
,HL
. The stack pointer is defined asSP
, and program status word (used bypush
/pop
instructions) asAF
. Other 16-bit registers are defined asIX
,IY
. - Register indirect addressing: for example, loading a memory value at address in
HL
pair:LD A, (HL)
. - Immediate addressing: operand is the 8-bit constant. It can be also one character, enclosed in double-quotes.
- Direct addressing: operand is either 8-bit or 16-bit constant, which is understood as the memory location (address). For example:
LD (1234h), HL
.
Immediate data or addresses can be defined in various ways:
- Integer constant
- Integer constant as a result of evaluation of some expression (e.g.
2 << 4
, or2 + 2
) - Current address - denoted by special variable
$
. For example, instructionJP $+6
denotes a jump by 6-bytes further from the current address. - Character constants, enclosed in double-quotes (e.g.
LD A, "*"
) - Labels. For example:
JP THERE
will jump to the labelTHERE
. - Variables. For example:
VALUE VAR 'A'
LD A, VALUE
Expressions
An expression is a combination of the data constants and operators. Expressions are evaluated in compile-time. Given any two expressions, they must not be defined circularly. Expressions can be used anywhere a constant is expected.
There exist several operators, such as:
Expression | Notes |
---|---|
+ | Addition. Example: DB 2 + 2 ; evaluates to DB 4 |
- | Subtraction. Example: DW $ - 2 ; evaluates to the current compilation address minus 2. |
* | Multiply. |
/ | Integer division. |
= | Comparison for equality. Returns 1 if operands equal, 0 otherwise. Example: DB 2 = 2 ; evaluates to DB 1 . |
% | Remainder after integer division. Example DB 4 mod 3 ; evaluates to DB 1 . |
& | Logical and. |
\| | Logical or. |
~ | Logical xor. |
! | Logical not. |
<< | Shift left by 1 bit. Example: DB 1 SHL 3 ; evaluates to DB 8 |
>> | Shift right by 1 bit. |
> | Greater than. Example: DB 3 > 2 ; evaluates to DB 1 |
< | Less than. |
>= | Greater or equal than. |
<= | Less or equal than. |
Operator priorities are as follows:
Priority | Operator | Type |
---|---|---|
1 | ( ) | Unary |
2 | * , / , % , << , >> , > , < , >= , <= | Binary |
3 | + , - | Unary and binary |
4 | = | Binary |
5 | ! | Unary |
6 | & | Binary |
7 | \| , ~ | Binary |
All operators work with their arguments as if they were 16-bit. Their results are always 16-bit numbers. If there is expected an 8-bit number, the result is automatically “cut” using operation result AND 0FFh
. This may be unwanted behavior and might lead to bugs, but it is often useful so the programmer must ensure the correctness.
Defining data
Data can be defined using special pseudo-instructions. These accept constants. Negative integers are using two’s complement.
The following table describes all possible data definition pseudo-instructions:
Expression | Notes |
---|---|
DB [expression] | Define byte. The [expression] must be of size 1 byte. Using this pseudo-instruction, a string can be defined, enclosed in single quotes. For example: DB 'Hello, world!' is equal to DB 'H' , DB 'e' , etc. on separate lines. |
DW [expression] | Define word. The [expression] must be max. of size 2 bytes. Data are stored using little endian. |
DS [expression] | Define storage. The [expression] represents number of bytes which should be “reserved”. The reserved space will not be modified in memory. It is similar to “skipping” particular number of bytes. |
Examples
HERE: DB 0A3H ; A3
W0RD1: DB 5*2, 2FH-0AH ; 0A25
W0RD2: DB 5ABCH SHR 8 ; 5A
STR: DB "STRINGSpl" ; 535452494E472031
MINUS: DB -03H ; FD
ADD1: dw COMP ; 1C3B (assume COMP is 3B1CH)
ADD2: dw FILL ; B43E (assume FILL is 3EB4H)
ADD3: dw 3C01H, 3CAEH ; 013CAE3C
Including other source files
It is both useful and good practice to write modular programs. According to the DRY principle, the repetitive parts of the program should be refactored out into functions or modules. Functionally similar groups of these functions or modules can be put into a library, reusable in other programs.
The pseudo-instruction include
exists for the purpose of including already written source code into the current program. The pseudo-instruction is defined as follows:
INCLUDE "[filename]"
where [filename]
is a relative or absolute path to the file which will be included, enclosed in double-quotes. The file can include other files, but there must not be defined circular includes (the compiler will complain).
The current address (denoted by $
variable) below the include
pseudo-instruction will be updated by the binary size of the included file.
The namespace of the current program and the included file is shared. It means that labels or variables with the same name in the current program and the included file are prohibited. Include file “sees” everything in the current program as it was its part.
Example
Let a.asm
contains:
ld b, 80h
Let b.asm
contains:
include "a.asm"
Then compiling b.asm
will result in:
06 80 ; ld b, 80h
Origin address
Syntax: ORG [expression]
Sets the value to the $
variable. It means that from now on, the following instructions will be placed at the address given by the [expression]
. Effectively, it is the same as using DS
pseudo-instruction, but instead of defining the number of skipped bytes, we define concrete memory location (address).
The following two code snippets are equal:
Address | Block 1 | Block 2 | Opcode |
---|---|---|---|
2C00 | LD A,C | LD A,C | 79 |
2C01 | JP NEXT | JP NEXT | C3 10 2C |
2C04 | DS 12 | ORG $+12 | |
2C10 | NEXT: XOR A | NEXT: XOR A | AF |
Equate
Syntax: [identifier] EQU [expression]
Define a constant. The [identifier]
is a mandatory name of the constant.
[expression]
is the 16-bit expression.
The pseudo-instruction will define a constant - assign a name to the given expression. The name of the constant then can be used anywhere where the constant is expected and the compiler will replace it with the expression.
It is not possible to redefine a constant.
Variables
Syntax: [identifier] VAR [expression]
Define or re-define a variable. The [identifier]
is a mandatory name of the constant.
[expression]
is the 16-bit expression.
The pseudo-instruction will define a variable - assign a name to the given expression. Then, the name of the variable can be used anywhere where the constant is expected.
It is possible to redefine a variable, which effectively means to reassign a new expression to the same name and forgetting the old one. The reassignment is aware of locality, i.e. before it the old value will be used, after it the new value will be used.
Conditional assembly
Syntax:
if [expression]
i n s t r u c t i o n s
endif
At first, the compiler evaluates the [expression]
. If the result is 0, statements between if
and endif
are ignored.
Labels defined inside the if
block occupy the namespace even if the if-expression evaluates to 0. Hence, the following code yields an error (Label already defined
):
if 0
label1: ld (bc), a
endif
label1: hlt
Evaluation of the expression in the if
statement must not use forward references. For example, the following code is not valid (will produce an error):
if variable
ld (bc), a
endif
variable set $
In this case, variable is about to be set to current address, which would be 0 if the if
statement evaluates to false
. Otherwise, it evaluates to 1
. Both options would be semantically correct, and the compiler cannot know what was the programmer’s intention.
Defining and using macros
Syntax:
[identifier] macro [operands]
i n s t r u c t i o n s
endm
The [identifier]
is a mandatory name of the macro.
The [operands]
part is a list of identifiers, separated by commas (,
). Inside the macro, operands act as constants. If the macro does not use any operands, this part can be omitted.
The namespace of the operand identifiers is macro-local, i.e. the operand names will not be visible outside the macro. Also, the operand names can hide variables, labels, or constants defined in the outer scope.
The macros can be understood as “templates” which will be expanded in the place where they are “called”. The call syntax is as follows:
[macro name] [arguments]
where [macro name]
is the macro name as defined above. Then, [arguments]
are comma-separated expressions, in the order as the original operands are defined. The number of arguments must be the same as the number of macro operands.
The macro can be defined anywhere in the program, even in any included file. Also, it does not matter in which place is called - above or below the macro definition.
Examples
SHV MACRO
LOOP: RRCA ; Right rotate with carry
AND 7FH ; Clear MSB of accumulator
DEC D ; Decrement rotation counter - register D
JP NZ, LOOP ; Jump to next rotation
ENDM
The macro SHV
can be used as follows:
LD A, (TEMP)
LD D,3 ; 3 rotations
SHV
LD (TEMP), A
Or another definition:
SHV MACRO AMT
LD D,AMT ; Number of rotations
LOOP: RRCA
AND 7FH
DEC D
JP NZ, LOOP
ENDM
And usage:
LD A, (TEMP)
SHV 5
LD (TEMP), A
Which has the same effect as the previous example.
END psudo-instruction
On encountering END
pseudo-instruction, the compiler will allow only comments below this pseudo-instruction. It’s a marker of “program end”.
The following example won’t compile:
LD A, 0
END
HALT ; no code allowed, just comments!