A limitation of NASM is that it is a two-pass assembler; unlike TASM and others, it will always do exactly two assembly passes. Therefore it is unable to cope with source files that are complex enough to require three or more passes.
The first pass is used to determine the size of all the assembled code and data, so that the second pass, when generating all the code, knows all the symbol addresses the code refers to. So one thing NASM can’t handle is code whose size depends on the value of a symbol declared after the code in question. For example,
times (label-$) db 0 label: db 'Where am I?'
The argument to TIMES
in this case could equally legally
evaluate to anything at all; NASM will reject this example because it cannot tell the
size of the TIMES
line when it first sees it. It will just
as firmly reject the slightly paradoxical code
times (label-$+1) db 0 label: db 'NOW where am I?'
in which any value for the TIMES
argument is by definition wrong!
NASM rejects these examples by means of a concept called a critical expression, which is defined to be an
expression whose value is required to be computable in the first pass, and which must
therefore depend only on symbols defined before it. The argument to the TIMES
prefix is a critical expression; for the same reason, the
arguments to the RESB
family of pseudo-instructions are
also critical expressions.
Critical expressions can crop up in other contexts as well: consider the following code.
mov ax, symbol1 symbol1 equ symbol2 symbol2:
On the first pass, NASM cannot determine the value of symbol1
, because symbol1
is defined to be
equal to symbol2
which NASM hasn’t seen yet. On the second
pass, therefore, when it encounters the line mov ax,symbol1
,
it is unable to generate the code for it because it still doesn’t know the value of
symbol1
. On the next line, it would see the EQU
again and be able to determine the value of symbol1
, but by
then it would be too late.
NASM avoids this problem by defining the right-hand side of an EQU
statement to be a critical expression, so the definition of
symbol1
would be rejected in the first pass.
There is a related issue involving forward references: consider this code fragment.
mov eax, [ebx+offset] offset equ 10
NASM, on pass one, must calculate the size of the instruction mov eax,[ebx+offset]
without knowing the value of offset
. It has no way of knowing that offset
is small enough to fit into a one-byte offset field and that it
could therefore get away with generating a shorter form of the effective-address encoding; for all it knows, in pass
one, offset
could be a symbol in the code segment, and it
might need the full four-byte form. So it is forced to compute the size of the
instruction to accommodate a four-byte address part. In pass two, having made this
decision, it is now forced to honour it and keep the instruction large, so the code
generated in this case is not as small as it could have been. This problem can be solved
by defining offset
before using it, or by forcing byte size
in the effective address by coding [byte ebx+offset]
.