Yasm defines a set of
standard macros in the NASM preprocessor which are already defined when it starts to
process any source file. If you really need a program to be assembled with no pre-defined
macros, you can use the %clear
directive to empty the preprocessor
of everything.
Most user-level NASM syntax directives (see Chapter 5) are implemented as macros which invoke primitive directives; these are described in Chapter 5. The rest of the standard macro set is described here.
The single-line macros
__YASM_MAJOR__
, __YASM_MINOR__
, and __YASM_SUBMINOR__
expand to the major, minor, and subminor parts of the version number of Yasm being used. In addition,
__YASM_VER__
expands to a string representation of the Yasm version and
__YASM_VERSION_ID__
expands to a 32-bit BCD-encoded representation of
the Yasm version, with the major version in the most significant 8 bits, followed by the
8-bit minor version and 8-bit subminor version, and 0 in the least significant 8 bits.
For example, under Yasm 0.5.1, __YASM_MAJOR__
would be
defined to be 0, __YASM_MINOR__
would be defined as 5,
__YASM_SUBMINOR__
would be defined as 1, __YASM_VER__
would be defined as "0.5.1"
,
and __YASM_VERSION_ID__
would be defined as 000050100h
.
In addition, the single line macro __YASM_BUILD__
expands to the Yasm
“build” number, typically the
Subversion changeset number. It should be seen as less significant than the subminor
version, and is generally only useful in discriminating between Yasm nightly snapshots or
pre-release (e.g. release candidate) Yasm versions.
Like the C preprocessor, the NASM preprocessor allows the user to find out the file
name and line number containing the current instruction. The macro __FILE__
expands to
a string constant giving the name of the current input file (which may change through the
course of assembly if %include
directives are used), and
__LINE__
expands to a numeric constant giving the current line number in
the input file.
These macros could be used, for example, to communicate debugging information to a
macro, since invoking __LINE__
inside a macro definition
(either single-line or multi-line) will return the line number of the macro call, rather than definition. So to determine where in a piece of code a crash
is occurring, for example, one could write a routine stillhere
, which is passed a line number in EAX
and outputs something like “line 155: still here”. You could then write a macro
%macro notdeadyet 0 push eax mov eax, __LINE__ call stillhere pop eax %endmacro
and then pepper your code with calls to notdeadyet
until
you find the crash point.
__YASM_OBJFMT__
, and
its NASM-compatible alias __OUTPUT_FORMAT__
, expand to the object
format keyword
specified on the command line
with -f
(see Section 1.3.1.2).
For example, if yasm is invoked with
keyword
-f elf
, __YASM_OBJFMT__
expands
to elf
.
These expansions match the option given on the command line exactly, even when the
object formats are equivalent. For example, -f elf
and
-f elf32
are equivalent specifiers for the 32-bit ELF format,
and -f elf -m amd64
and -f elf64
are equivalent specifiers for the 64-bit ELF format, but __YASM_OBJFMT__
would expand to elf
and
elf32
for the first two cases, and elf
and elf64
for the second two cases.
The NASM preprocessor is
sufficiently powerful that data structures can be implemented as a set of macros. The
macros STRUC
and ENDSTRUC
are used to define a structure
data type.
STRUC
takes one parameter, which is the name of the data
type. This name is defined as a symbol with the value zero, and also has the suffix
_size
appended to it and is then defined as an EQU
giving the size of the structure. Once STRUC
has been issued, you are defining the structure, and should define
fields using the RESB
family of pseudo-instructions, and
then invoke ENDSTRUC
to finish the definition.
For example, to define a structure called mytype
containing a longword, a word, a byte and a string of bytes, you might code
struc mytype mt_long: resd 1 mt_word: resw 1 mt_byte: resb 1 mt_str: resb 32 endstruc
The above code defines six symbols: mt_long
as 0 (the
offset from the beginning of a mytype
structure to the
longword field), mt_word
as 4, mt_byte
as 6, mt_str
as 7, mytype_size
as 39, and mytype
itself as
zero.
The reason why the structure type name is defined at zero is a side effect of allowing structures to work with the local label mechanism: if your structure members tend to have the same names in more than one structure, you can define the above structure like this:
struc mytype .long: resd 1 .word: resw 1 .byte: resb 1 .str: resb 32 endstruc
This defines the offsets to the structure fields as mytype.long
, mytype.word
, mytype.byte
and mytype.str
.
Since NASM syntax has no intrinsic structure
support, does not support any form of period notation to refer to the elements of a
structure once you have one (except the above local-label notation), so code such as
mov ax,[mystruc.mt_word]
is not valid. mt_word
is a constant just like any other constant, so the correct
syntax is mov ax,[mystruc+mt_word]
or mov ax,[mystruc+mytype.word]
.
Having defined a structure type, the next thing
you typically want to do is to declare instances of that structure in your data segment.
The NASM preprocessor provides an easy way to do this in the ISTRUC
mechanism. To
declare a structure of type mytype
in a program, you code
something like this:
mystruc: istruc mytype at mt_long, dd 123456 at mt_word, dw 1024 at mt_byte, db 'x' at mt_str, db 'hello, world', 13, 10, 0 iend
The function of the AT
macro is to make use of the
TIMES
prefix to advance the assembly position to the correct
point for the specified structure field, and then to declare the specified data.
Therefore the structure fields must be declared in the same order as they were specified
in the structure definition.
If the data to go in a structure field requires more than one source line to specify,
the remaining source lines can easily come after the AT
line. For example:
at mt_str, db 123,134,145,156,167,178,189 db 190,100,0
Depending on personal taste, you can also omit the code part of the AT
line completely, and start the structure field on the next line:
at mt_str db 'hello, world' db 13,10,0
The ALIGN
and ALIGNB
macros provide a convenient way to
align code or data on a word, longword, paragraph or other boundary. The syntax of the
ALIGN
and ALIGNB
macros is
align 4 ; align on 4-byte boundary align 16 ; align on 16-byte boundary align 16,nop ; equivalent to previous line align 8,db 0 ; pad with 0s rather than NOPs align 4,resb 1 ; align to 4 in the BSS alignb 4 ; equivalent to previous line
Both macros require their first argument to be a power of two; they both compute the
number of additional bytes required to bring the length of the current section up to a
multiple of that power of two, and output either NOP fill or apply the TIMES
prefix to their second argument to perform the alignment.
If the second argument is not specified, the default for ALIGN
is NOP
, and the default for
ALIGNB
is RESB 1
. ALIGN
treats a NOP
argument specially by
generating maximal NOP fill instructions (not necessarily NOP opcodes) for the current
BITS
setting, whereas ALIGNB
takes its second argument literally. Otherwise, the two macros are equivalent when a
second argument is specified. Normally, you can just use ALIGN
in code and data sections and ALIGNB
in BSS sections, and never need the second argument except for special purposes.
ALIGN
and ALIGNB
, being
simple macros, perform no error checking: they cannot warn you if their first argument
fails to be a power of two, or if their second argument generates more than one byte of
code. In each of these cases they will silently do the wrong thing.
ALIGNB
(or ALIGN
with a
second argument of RESB 1
) can be used within structure
definitions:
struc mytype2 mt_byte: resb 1 alignb 2 mt_word: resw 1 alignb 4 mt_long: resd 1 mt_str: resb 32 endstruc
This will ensure that the structure members are sensibly aligned relative to the base of the structure.
A final caveat: ALIGNB
works relative to the beginning of
the section, not the beginning of the address
space in the final executable. Aligning to a 16-byte boundary when the section you’re
in is only guaranteed to be aligned to a 4-byte boundary, for example, is a waste of
effort. Again, Yasm does not check that the section’s alignment characteristics are
sensible for the use of ALIGNB
. ALIGN
is more intelligent and does adjust the section alignment to be the maximum specified
alignment.