ALT |
The Assembly Language Tutorial by vulture Email: Srstanek@aol.com / 1998 |
Some notes from Ice-Digga' before reading the
tutorial: Download
the tutorial, if you wan't it on your computer. |
The assembly language is composed of mnemonics which are a
small group
of letters that represent a command in machine language. For
example,
MOV is a mnemonic that basically MOVes a value somewhere. In the
Intel
assembly format, which is the standard for IBM PCs, the mnemonic
comes
first, then the destination, then the source.
There are 8 basic registers, each of which are special in their own ways.
AX = Accumulator Register
BX = Base Register
CX = Count Register
DX = Data Register
Each of those 16-bit registers is split into two 8-bit registers.
AH = Accumulator high
AL = Accumulator low
BH, BL, CH, CL, DH, DL
AX = AH*256 + AL
More registers...
SI = Source Index
DI = Destination Index
These and all the other registers cannot be split into 8 bits.
They are
entirely 16-bit. The Source Index register usually denotes a
pointer
in memory where the pointer is the source of some data, whereas
the
Destination Index points to a destination. These registers don't
*HAVE* to
stay within those guidelines though.
Then the stack registers...
BP = Base Pointer
SP = Stack Pointer
When you push a value onto the stack, the value is saved into
the place
in memory where the Stack Pointer points to. The Stack Pointer is
then
decremented by the size of the value pushed, normally 16 bits = 2
bytes.
The Stack Pointer should NOT ever be changed unless you know what
you're
doing and have a good reason. The Base Pointer generally means
the lowest
value that the Stack Pointer can reach. If you try to push a
value to the
stack when BP=SP, then you could possible overwrite some other
vital place
in memory. This is called a stack overflow. Just make sure that
you don't
use too much stack compared to what you've allocated. Especially
beware of
recursive functions with large local variables.
Then there are Segment Registers. Memory in real mode (just
think of it as
what you program in as normal. Although it is not the 386 and
higher's
native processing mode, it is still emulated by the processor) is
accessed
via a segment:offset pair. When people say that video memory
starts at
A000h , they're saying that the memory at [A000h:0] is the
top-left pixel.
Each segment contains 16 bytes, but you may access up to 65536
bytes in one.
For example, you can use [A000h:63999] as the bottom-right pixel
of the
320x200x256 video mode screen. You could also use [A001h:63983]
as the
same exact byte. Another exact location would be [A003h:63951].
The Segment Registers are
CS = Code Segment
DS = Data Segment
ES = Extra Segment
SS = Stack Segment
The Stack Segment, like your Stack Pointer, should not be
changed. It is
the segment where your PUSHed number is put. It is addressed as
either
SS:[SP] or [SS:SP]. I generally put the segment on the outside
for
clarity (this is the standard).
Your Data Segment is basically where data is accessed. If you
fail to
specify a segment when accessing memory, then DS is used as a
default. In
fact, while programming, you usually don't specify a segment, so
you need
to make sure that DS is set the the right value. For example,
[46Ch] is the
same as DS:[46Ch].
The Extra Segment is used as an extra value in case you want
to access a
different segment, but you don't want to change your other
segments.
The Code Segment is the segment where your code runs on. The
exact location
of your code is specified by the IP (Instruction Pointer).
Neither of these
should EVER be changed. The only time you change these is
indirectly like
when you return from a procedure or a function via the RET
command.
Now, onto the mnemonics!
MOV - this MOVes a value to a register or a place in memory. A
variable is
the exact same as a place in memory, but the assembler
pre-calculates the
pointer for it and substitutes that value each time the variable
is
accessed. You can move any immediate value (a number like, say,
4),
register, or memory location to a register or memory location.
You may NOT,
however, MOV a memory location to another memory location. For
example, if
Var1 and Var2 were two variables, you could not MOV Var1,Var2 ...
you would
have to save Var2 into a register and then MOV that register into
Var1.
Also, be sure to use the same size source and destination values.
Examples:
MOV AX,BX ; The value in AX now equals the one in BX
; The source was BX and it is stored in AX
MOV [BX],AL ; This stores AL into the memory location
; pointed at by BX. This does not change
; the value of the BX register; only the
; value of the memory location at DS:[BX]
Just a note.. you cannot modify any segment registers
directly.. you must
move a value to something else, and then that something else into
the
segment register.
Example:
MOV AX,0A000h ; 0A000h is the segment for the VGA
MOV ES,AX ; ES now holds the segment for VGA addressing
MOV BX,320 ; BX is a register that can address a place
; in memory. AX cannot, so we use BX. In
; other words, you can use [BX] but not [AX]
; Also, 320 in a 320x200x256 graphics mode
; points to the coordinate (0,1) as (x,y)
; since the memory increases by 1 as X
; increases and by 320 (the x-width) as Y
; increases
MOV AL,55 ; 55 is a blue color in 320x200x256
MOV ES:[BX],AL ; Move the value 55 into video at [320]=(0,1)
; This effectively plots a pixel into memory
MOV Var1,Var2 ; This does not work... instead use
MOV AX,Var2 ; Move the value of Var2 into AX
; This overwrites the old value of AX, but
; maybe we didn't need it in the first place
MOV Var1,AX ; Move the value of AX into Var1
; Thus, Var1 now equals Var2
You cannot move to or from IP at all except from indirect
methods
discussed later.
Then there are other commands. ASM is composed mostly by math
commands
which are really all that are needed besides the commands for
I/O.
MOV DEST,SRC : Moves SRC into DEST
ADD DEST,SRC : Adds SRC to DEST and stores into DEST
SUB DEST,SRC : Subtracts SRC from DEST and stores into DEST
MUL SRC : Multiplies AL, AX, or EAX by SRC depending on the size
of the
value of SRC. If SRC was 16 bits, then AX would get multiplied
by SRC. The final value is stored in AH:AL, DX:AX, or EDX:EAX;
again depending on the size of SRC. The colon just tells that
the value is expanded to a larger size but could not be held
in one register; thus two registers are used as a destination.
DIV SRC : Divides AH:AL, DX:AX, or EDX:EAX by SRC and stores into
AL, AX
or EAX. The value stored is an integer. It is not rounded, but
rather truncated the low lower value. 5 divided by 2 = 2. The
remainder is stored in either AH, DX, or EDX.
This reminds me, you might not know what these EDX or EAX are.
The E-prefix
tells that the value is 32 bits instead of 16. This can only be
used on the
8 basic registers as well as the Instruction Pointer.
You cannot access the upper 16 bits of an extended register, but
the lower
16 bits are still accessable as normal.
EAX = ??*65536 + AX = ??*65536 + AH*256 + AL
Example:
MOV DX,0 ; DX contains the high value
MOV AX,8 ; AX contains the low value
MOV BX,3 ; Can't divide by immediate, so store here
DIV BX ; Divide 8 by 3
; After the DIV, AX = 2 and DX = 2
ADD DX,BX ; DX = DX + BX --> DX now equals 5
ADD AX,BX ; AX = AX + BX --> AX now equals 5
SUB AX,DX ; We end here with AX=0, BX=3, DX=5
Then there are basic stack commands.
PUSH SRC : pushes a value in SRC to the memory at SS:[SP] and
SP is modified
by the size of SRC. SRC can be an immediate value, a 16 or 32
bit register (not 8 bit), or a 16 or 32 bit varible or memory
location.
POP DEST : pops the last value pushed into DEST. SP is modified
accordingly.
Example:
MOV AX,3
PUSH AX ; 3 (word size) is on the stack
Exact same now...
PUSH 3 ; 3 (word size) is on the stack
Exact same again...
MOV AX,3
MOV BX,SP ; Set BX to SP since SP is not an addressable
register
MOV [BX-2],AX ; 3 (word size) is on the stack
; Note here that you can often modify
; a direct memory location by an immediate
; number before writing there. That's
; what we do here
SUB SP,2 ; Adjust SP by a word size (16 bits=2 bytes)
Now I'll expand into ways that memory can be read/written.
[BX+SI] : This could also be specified as [SI+BX]
[BX+DI]
[BP+SI]
[BP+DI]
[SI]
[DI]
[imm16] : An immediate value. This can be like a variable and
such.
[BX]
[BX+SI+imm8] : The "imm" prefix always means
immediate
[BX+DI+imm8]
[BP+SI+imm8]
[BP+DI+imm8]
[SI+imm8]
[DI+imm8]
[BP+imm8]
[BX+imm8]
[BX+SI+imm16] : Here a 16 bit value is used instead of an 8
bit value
[BX+DI+imm16] : Also note that with two's complement negation
format,
[BP+SI+imm16] : you could displace the address by negative; you
don't
[BP+DI+imm16] : use 0FFFFh for -1 though... use [BX-1] or
whatever and
[SI+imm16] : the assembler modifies for two's complement for you
[DI+imm16]
[BP+imm16]
[BX+imm16]
Okay... now to continue...
AND DEST, SRC : logically ANDs DEST with SRC and stores in
DEST
OR DEST, SRC : logically ORs DEST with SRC and stores in DEST
XOR DEST, SRC : logically XORs DEST with SRC and stores in DEST
NOT DEST : logically NOTs DEST by reversing all bits and stores
in DEST
NEG DEST : uses two's complement negation format to negate DEST
and stores
in DEST
Example:
MOV AX,10111b ; the 'b' at the end makes it binary
XOR AX,01001b ; AX becomes 0000000000011110b
NOT AX ; AX becomes 0000000000000001b
NEG AX ; AX becomes 1111111111111111b
AND AX,10101b ; AX becomes 0000000000010101b
MOV BX,11101b ; Use BX instead of immediate value
OR AX,BX ; AX becomes 0000000000011101b
XOR AX,AX ; AX becomes 0000000000000000b
; XOR AX,AX is effectively a MOV AX,0
; (this is an optimization)
Those are the basics... here's some more helpful ones. These
are
286 and higher only, but don't worry since few people have
80186's and below
now.
SHL DEST,imm8 : Shifts DEST imm8 bits left. On a 286, the
assembler must
repeat imm8 shifts by 1
SHL DEST,CL : Shifts DEST left by CL bits. A Shift Left is
basically
a multiply by two
SHR DEST,imm8 : Shifts DEST imm8 bits right.
SHR DEST,CL : Shifts DEST right by CL bits.
SAL is a synonym for SHL
SAR is a synonym for SHR
ROL DEST,imm8 : Rotates DEST imm8 bits left. The bits cut off
are returned
on the right side of DEST rather than lost like SHL.
ROL DEST,CL : Rotates DEST left by CL bits. The bits cut off are
returned
on the right side of DEST rather than lost like SHL.
ROR DEST,imm8 : Rotates DEST imm8 bits right.
ROR DEST,CL : Rotates DEST CL bits left.
Now some more advanced commands...
XCHG DEST1,DEST2 : The values of DEST1 and DEST2 are swapped.
MOVZX DEST,SRC : DEST is larger in size than SRC. DEST is Zero
eXtended
with the value of SRC.
MOVSX DEST,SRC : DEST is again, larger in size than SRC. This
allows for
Sign eXtending so you can use signed integers.
LODSB : Same as MOV AL,DS:[SI] - "Load some byte" and
increments
LODSW : Same as MOV AX,DS:[SI] - "Load some word" and
increments by 2
LODSD : Same as MOV EAX,DS:[SI] - "Load some dword" and
increments by 4
STOSB : Same as MOV ES:[DI],AL - "Store some byte" and
increments
STOSW : Same as MOV ES:[DI],AX - SI is the source register.. just
a hint
STOSD : Same as MOV ES:[DI],EAX - DI is the destination.. any
similarities?
MOVSB : Moves byte at DS:[SI] to ES:[DI] and increments
MOVSW : Moves word at DS:[SI] to ES:[DI] and increments by 2
MOVSD : Moves dword at DS:[SI] to ES:[DI] and increments by 4
Also note here that LODxx and STOxx increment SI and DI by the
size B,W, or D; or decrement depending on the Direction Flag
which
is discussed later... normally, it will increment.
INC DEST : Same as ADD DEST,1 - Increment DEST
DEC DEST : Same as SUB DEST,1 - This takes up less space than the
other
IMUL DEST : Same as MUL but accepts signed integers instead of
unsigned
IDIV DEST : Same as DIV.. make sure your value is really signed..
maybe
using MOVSX would help if it's not
CBW : Convert byte to word - AX is sign-extend of AL
CWD : Convert word to doubleword - DX:AX is sign-extend of AX
CDQ : Convert dword to quadword - EDX:EAX is sign extend of EAX
I will not tell you how to use LEA, LDS, LES, LSS, etc. since
it is bad
coding and very slow.
Here's how you define variables as a certain size...
Var1 db imm8 ; immediate for initial value or '?' for
undefined
Var2 dw imm16
Var3 dd imm32
Var4 db 'String here' ; this is used to declare a string
Var5 db 500 dup(4) ; defines a 500 byte size array containing the
value 4
in each
Var6 dd 111 dup(1234h) ; 111 dword size array containing the
value 1234h
Var7 dw 50 dup(50 dup(?)) ; defines a 50 by 50 array with inital
value
; undefined
Now I will talk about flags. These are very important for some
math
functions and especially important for IF statements.
There are 9 basic flags... ODITSZAPC
Overflow Flag
Direction Flag
Interrupt Flag
Sign Flag
Zero Flag
* Can't remember the name (A....)
Parity Flag
Carry Flag
The Overflow Flag will be set after any math command (i.e.
ADD) if there
was an unsigned operand and it became signed, or vice versa.
Interrupt Flag tells the CPU wheter or not it can accept
hardware input.
Just forget this until you get into advanced programming.
Direction Flag tells the CPU to increment or decrement on a
LODSB, STOSB,
and similar commands.
Sign Flag tells wheter or not the final value was signed and
set if it was.
Zero Flag tells if the final value was zero and set if it was
zero.
Carry Flag is like the Overflow Flag but is set if you went
overflow on an
unsigned integer.
Parity Flag... don't remember.
Here's how these flags come in handy as well as some more
commands...
The famous IF command can be implemented with these
CMP xxx,yyy : Compares xxx and yyy (xxx AND yyy cannot both be variables)
Here's what CMP does
If xxx >= yyy then Carry flag cleared (unsigned) and Zero flag
might be set
If xxx <= yyy then Carry flag set (unsigned) and Zero flag
might be set
If xxx > yyy then Carry flag cleared (unsigned) and Zero flag
clear
If xxx < yyy then Carry flag set (unsigned) and Zero flag
clear
If xxx = yyy then Zero flag set
If xxx >= yyy (signed) then Zero flag might be set and SF=OF
Since the way flags work is very confusing, there are easy
commands to help
us: mainly the JMP command is associated.
JMP <label> : Sets IP to <label> thus jumping to
that place in the code
JA <label> : Jumps to <label> if above (unsigned)
JZ <label> : Jump if Zero (two operands are equal)
JE <label> : Same as JZ
JLE <label> : Jump if Less than or Equal (signed operands)
All of the following are available on all Intel compatible
assemblers:
They are basically all like English. I prefer JZ to JE even
though
JE is more English-like. JZ just makes more sense to a master
assembler
because of what CMP actually does in the CPU.
JA Jump Above (unsigned)
JAE Jump Above or Equal (unsigned)
JB Jump Below (unsigned)
JBE Jump Below or Equal (unsigned)
JC Jump Carry
JCXZ Jump if CX is Zero
JE Jump if Equal
JZ Jump if Zero (Equal)
JG Jump if Greater (signed)
JGE Jump if Greater or Equal (signed)
JL Jump if Less (signed)
JLE Jump if Less or Equal (signed)
JNA Jump if not Above (unsigned)
JNAE Jump if not Above or Equal (unsigned) (same as JB)
JNB Jump if not Below (unsigned)
JNBE Jump if not Below or Equal (unsigned)
JNC Jump if not Carry
JNE Jump if not Equal
JNG Jump if not Greater (signed)
JNGE Jump if not Greater or Equal (signed)
JNL Jump if not Less (signed)
JNLE Jump if not Less or Equal (signed)
JNO Jump if not Overflow
JNP Jump if not Parity
JNS Jump if not Signed
JNZ Jump if not Zero (Equal)
JO Jump if Overflow
JP Jump if Parity
JPE Jump if Parity Even (PF=1)
JPO Jump if Party Odd (PF=0)
JS Jump if Signed
JZ Jump if Zero (Equal)
These all jump to labels where execution is resumed, if the
conditions are
met.
Since math functions can set flags, they can also sometimes use them...
ADC DEST,SRC : Add with Carry
This will add SRC to DEST like ADD, but if the Carry flag is
set, then we increment DEST by another 1. Why have this you
ask? Because...
MOV AX,50000
MOV BX,20020
XOR DX,DX
ADD AX,BX ; AX would be 70020 but it is over 65536
; so the Carry Flag is set
ADC DX,0 ; Now 50000 + 20020 is in DX:AX
MOV BX,5000
DIV BX ; 70020 / 5000 = 14
; DX = 20, AX = 14
Other commands using flags...
SBB DEST, SRC : Subtract with Borrow
Just like ADC but subtracts an additional 1 if carry is set.
Flags can also be used to return the status of a procedure...
like if it
completed successfully or failed.
CLC clears the carry flag
CMC complements the carry flag
STC sets the carry flag
The carry flag is the general flag used to return an error
just because it
is the easiest to manipulate. Not all other flags are allowed to
change
via. a command. For example, you can't set the Zero flag
directly.
Here are all the direct flag manipulation commands:
CLC : Clear Carry Flag
CLD : Clear Direction Flag
CLI : Clear Interrupt Enable Flag
CMC : Complement Carry Flag
STC : Set Carry Flag
STD : Set Direction Flag
STI : Set Interrupt Enable Flag
PUSHF and POPF do what you think they should do but with flags
instead of
a value. The size is a word and the format is weird.
Now, I will explain about procedures and functions...
For these we use CALL and RET to call and return from procedures.
If you
want a function to return something, it is easier to just store
the return
value in a register and RET.
Also note that when you CALL NEAR (meaning not to a different
CS) that your
IP value is PUSHed to the stack. This way, you can RETurn from
the function.
If you CALL FAR (not usually used except in complex programs),
then both
your CS and IP are PUSHed to the stack and you must use the RETF
function to
return far rather than RET.
Here's how to write a simple function that writes the
character in AL to the
screen coordinates (5,10) in text mode.
WriteChar PROC ; AL is the character
PUSH ES ; Save ES in case it was used previously
MOV BX,0B800h ; B800h is the segment for video memory in text
mode
MOV ES,BX ; ES is now the segment of video memory
MOV BX,10*160+5 ; Screen width is 80 characters with 2 bytes per
character
MOV ES:[BX],AL ; Write AL to ES:[BX]
POP ES ; Get ES back
RET ; Return to previous IP address
; Note that if you PUSHed and did not pop, then you will
; return to an unknown address and your program will
; crash
ENDP ; End Procedure (not a command, but for the
; assembler to know where to quit)
Start: ; Define the label 'Start'
MOV AL,'Y' ; Move the character 'Y' into AL... this stores the
ASCII
; value of 'Y' as a number into AL
Call WriteChar ; Call the procedure to write AL at (5,10)
Now that you (hopefully) understand about procedures, it is
time to discuss
one of the most common, easy to use commands on the PC.
They are called Interrupts. All they are are predefined
functions that any
program may call. These greatly reduce the need for extra work.
For example,
printing a string might be tough for a beginner, so DOS
implements a
function for you. Also, basic Disk and Video I/O functions are
implemented.
To know what all the interrupts do, you must have a list or
just know. You
call an interrupt by specifying a number:
INT 21h
Interrupt 21h is the DOS interrupt for all basic I/O functions
as well as
file access. There are many subfunctions for which DOS knows what
you want
to do. The register which contains the subfunction as well as
other
parameters to be passed are in the other registers.
Here is an example of an interrupt as well as a completely
compatible
Turbo Assembler program:
;
; Interrupt 21h
; Subfunction for Print String
; AH=9
; DS:DX -> ASCII string (DS:DX points to the string you want
printed)
; terminated by a '$'
;
;
; Interrupt 21h
; Subfunction for Terminate Program
; AH=4Ch
; AL=error code
;
Segment Data ; The DATA segment
HelloWorld db 'Hello World!$' ; INT 21h / AH=9 uses '$' to define
; where to terminate
EndS ; End the Data Segment
Segment Code ; The Code Segment.. code goes here
Start: ; Declare where DOS starts our CS:IP
; Just a note here that when DOS loads an EXE file, it will
; not set DS to the data segment; we must do that
MOV AX,data ; Get the segment of our DATA segment
MOV DS,AX ; Set DS to the segment where DATA is
MOV AH,9 ; DOS Print String
MOV DX,offset HelloWorld ; DS:DX must point to the offset of
string
INT 21h ; Call INT 21h to print the string
MOV AX,4C00h ; DOS Exit Program
INT 21h ; Terminate the program - nothing happens after this
EndS ; End the Code Segment
End Start ; This tells TASM to not assemble anything after this
; Because of this, the EndS must come before here
Wow! Now we can write a program that prints "Hello
World!" to the screen.
Aren't we awesome.
I suggest getting Ralf Brown's interrupt list if you don't
know what all
the interrupts are (most likely you won't know and you'd want the
list
anyway).
Just one more common mnemonic.. the REP command. This repeats
a
command CX times. The CX register must be set to whatever number.
The common commands used with REP are MOVxx, LODxx, and STOxx.
Each repetition, DI and/or SI are incremented or decremented
depending on
the Direction Flag.
For example, if you wanted to set an entire array that is 50
words long to
zero, then you would do this:
XOR AX,AX ; Set AX to 0
MOV CX,50 ; Array is 50 large
PUSH ES ; Save ES
MOV DI,Seg Array ; ES must be the segment of the array
MOV ES,DI ; ES is now the segment
MOV DI,Offset Array ; DI must point to the array
REP STOSW ; Store 50 AX's at ES:DI
POP ES ; Get ES back
Array dw 50 dup(?) ; Don't forget that an array must be
declared
; And don't just put this anywhere in the program
; After all, you don't want an array to be
; executed!!
Now you're asking: "How do I do all those complicated graphics?"
I'll let you write your own graphic routines, but here I'll
show some
basic video and I/O functions. Just a hint: if you want to give
input
as (x,y) to a function so you can write a pixel to graphics mode,
you
must multiply the Y value by the X-width and then add the X value
to
get the address in video memory.
We can use a different interrupt to switch to graphics mode.
To do this we
use the BIOS graphics interrupt: INT 10h.
Here's a simple snippet of code to do some video stuff.
;
; Interrupt 10h
; Subfunction for Set video mode
; AH=00h
; AL=Video mode
;
;
; Interrupt 16h
; Subfunction for Get keypress
; AH=00h
;
MOV AX,3 ; AH becomes 0 and AL becomes 3
INT 10h ; Screen is cleared, cursor is set to (0,0), and text
; mode 80x25 is set if it is not already
; Video memory segment is now B800h
MOV AX,13h ; AH becomes 0 and AL becomes 13h
INT 10h ; Screen is cleared and graphics mode 320x200x256 is
entered
; Video memory segment is now A000h
MOV AX,0A000h ; TASM allows the use of hex, but you need to
start a number
; with 0 if it begins with a letter
MOV ES,AX ; Set ES to A000h
MOV DI,0 ; We want to fill ES:DI with a color to fill the screen
MOV CX,320*200/2 ; CX=32000 since we'll fill that many words for
; one screen
MOV AX,2828h ; Color 40 (28 hex) is a bright red. Since each
pixel is
; addressed per byte and we are storing by the word, we
; must put 40 into AH and AL.
REP STOSW ; Store 32000 2828h's at 0A000h:0 to fill the screen
; with red
WaitForKey: ; A Label so we can loop
XOR AX,AX ; Sets AH=0 and AL=0
INT 16h ; Wait for keypress
; Returned keypress is in AL
CMP AL,'E' ; Wait till 'E' is pressed (capital letter)
JNZ WaitForKey ; If it's not equal, get another keypress
MOV AX,3
INT 10h ; Switch back to text mode before leaving
MOV AX,4C00h
INT 21h ; Exit Program
That's the end for now. Keep in mind that if you have a
program to switch
to graphics mode and you don't switch back to text, then you will
be in
DOS in graphics mode. This can look confusing.