vulture's "The Assembly Language Tutorial"

ALT

The Assembly Language Tutorial
by vulture
Email: Srstanek@aol.com / 1998

Some notes from Ice-Digga' before reading the tutorial:

Download the tutorial, if you wan't it on your computer.
vulture refers to Ralph Brown's Interrupt list in this tutorial,
you can get it in the downloads page.

The assembly language is composed of mnemonics which are a small group
of letters that represent a command in machine language. For example,
MOV is a mnemonic that basically MOVes a value somewhere. In the Intel
assembly format, which is the standard for IBM PCs, the mnemonic comes
first, then the destination, then the source.

AX = Accumulator Register
BX = Base Register
CX = Count Register
DX = Data Register

AH = Accumulator high
AL = Accumulator low
BH, BL, CH, CL, DH, DL
AX = AH*256 + AL

These and all the other registers cannot be split into 8 bits. They are
entirely 16-bit. The Source Index register usually denotes a pointer
in memory where the pointer is the source of some data, whereas the
Destination Index points to a destination. These registers don't *HAVE* to
stay within those guidelines though.

When you push a value onto the stack, the value is saved into the place
in memory where the Stack Pointer points to. The Stack Pointer is then
decremented by the size of the value pushed, normally 16 bits = 2 bytes.
The Stack Pointer should NOT ever be changed unless you know what you're
doing and have a good reason. The Base Pointer generally means the lowest
value that the Stack Pointer can reach. If you try to push a value to the
stack when BP=SP, then you could possible overwrite some other vital place
in memory. This is called a stack overflow. Just make sure that you don't
use too much stack compared to what you've allocated. Especially beware of
recursive functions with large local variables.

Then there are Segment Registers. Memory in real mode (just think of it as
what you program in as normal. Although it is not the 386 and higher's
native processing mode, it is still emulated by the processor) is accessed
via a segment:offset pair. When people say that video memory starts at
A000h , they're saying that the memory at [A000h:0] is the top-left pixel.

Each segment contains 16 bytes, but you may access up to 65536 bytes in one.
For example, you can use [A000h:63999] as the bottom-right pixel of the
320x200x256 video mode screen. You could also use [A001h:63983] as the
same exact byte. Another exact location would be [A003h:63951].

The Stack Segment, like your Stack Pointer, should not be changed. It is
the segment where your PUSHed number is put. It is addressed as either
SS:[SP] or [SS:SP]. I generally put the segment on the outside for
clarity (this is the standard).

Your Data Segment is basically where data is accessed. If you fail to
specify a segment when accessing memory, then DS is used as a default. In
fact, while programming, you usually don't specify a segment, so you need
to make sure that DS is set the the right value. For example, [46Ch] is the
same as DS:[46Ch].

The Extra Segment is used as an extra value in case you want to access a
different segment, but you don't want to change your other segments.

The Code Segment is the segment where your code runs on. The exact location
of your code is specified by the IP (Instruction Pointer). Neither of these
should EVER be changed. The only time you change these is indirectly like
when you return from a procedure or a function via the RET command.

MOV - this MOVes a value to a register or a place in memory. A variable is
the exact same as a place in memory, but the assembler pre-calculates the
pointer for it and substitutes that value each time the variable is
accessed. You can move any immediate value (a number like, say, 4),
register, or memory location to a register or memory location. You may NOT,
however, MOV a memory location to another memory location. For example, if
Var1 and Var2 were two variables, you could not MOV Var1,Var2 ... you would
have to save Var2 into a register and then MOV that register into Var1.
Also, be sure to use the same size source and destination values.

Examples:
MOV AX,BX ; The value in AX now equals the one in BX
; The source was BX and it is stored in AX
MOV [BX],AL ; This stores AL into the memory location
; pointed at by BX. This does not change
; the value of the BX register; only the
; value of the memory location at DS:[BX]

Just a note.. you cannot modify any segment registers directly.. you must
move a value to something else, and then that something else into the
segment register.

Example:
MOV AX,0A000h ; 0A000h is the segment for the VGA
MOV ES,AX ; ES now holds the segment for VGA addressing
MOV BX,320 ; BX is a register that can address a place
; in memory. AX cannot, so we use BX. In
; other words, you can use [BX] but not [AX]
; Also, 320 in a 320x200x256 graphics mode
; points to the coordinate (0,1) as (x,y)
; since the memory increases by 1 as X
; increases and by 320 (the x-width) as Y
; increases
MOV AL,55 ; 55 is a blue color in 320x200x256
MOV ES:[BX],AL ; Move the value 55 into video at [320]=(0,1)
; This effectively plots a pixel into memory

MOV AX,Var2 ; Move the value of Var2 into AX
; This overwrites the old value of AX, but
; maybe we didn't need it in the first place
MOV Var1,AX ; Move the value of AX into Var1
; Thus, Var1 now equals Var2

You cannot move to or from IP at all except from indirect methods
discussed later.

Then there are other commands. ASM is composed mostly by math commands
which are really all that are needed besides the commands for I/O.

MOV DEST,SRC : Moves SRC into DEST
ADD DEST,SRC : Adds SRC to DEST and stores into DEST
SUB DEST,SRC : Subtracts SRC from DEST and stores into DEST
MUL SRC : Multiplies AL, AX, or EAX by SRC depending on the size of the
value of SRC. If SRC was 16 bits, then AX would get multiplied
by SRC. The final value is stored in AH:AL, DX:AX, or EDX:EAX;
again depending on the size of SRC. The colon just tells that
the value is expanded to a larger size but could not be held
in one register; thus two registers are used as a destination.
DIV SRC : Divides AH:AL, DX:AX, or EDX:EAX by SRC and stores into AL, AX
or EAX. The value stored is an integer. It is not rounded, but
rather truncated the low lower value. 5 divided by 2 = 2. The
remainder is stored in either AH, DX, or EDX.

This reminds me, you might not know what these EDX or EAX are. The E-prefix
tells that the value is 32 bits instead of 16. This can only be used on the
8 basic registers as well as the Instruction Pointer.
You cannot access the upper 16 bits of an extended register, but the lower
16 bits are still accessable as normal.

MOV DX,0 ; DX contains the high value
MOV AX,8 ; AX contains the low value
MOV BX,3 ; Can't divide by immediate, so store here
DIV BX ; Divide 8 by 3
; After the DIV, AX = 2 and DX = 2
ADD DX,BX ; DX = DX + BX --> DX now equals 5
ADD AX,BX ; AX = AX + BX --> AX now equals 5
SUB AX,DX ; We end here with AX=0, BX=3, DX=5

PUSH SRC : pushes a value in SRC to the memory at SS:[SP] and SP is modified
by the size of SRC. SRC can be an immediate value, a 16 or 32
bit register (not 8 bit), or a 16 or 32 bit varible or memory
location.
POP DEST : pops the last value pushed into DEST. SP is modified accordingly.

Example:
MOV AX,3
PUSH AX ; 3 (word size) is on the stack
Exact same now...
PUSH 3 ; 3 (word size) is on the stack
Exact same again...
MOV AX,3
MOV BX,SP ; Set BX to SP since SP is not an addressable
register
MOV [BX-2],AX ; 3 (word size) is on the stack
; Note here that you can often modify
; a direct memory location by an immediate
; number before writing there. That's
; what we do here
SUB SP,2 ; Adjust SP by a word size (16 bits=2 bytes)

[BX+SI] : This could also be specified as [SI+BX]
[BX+DI]
[BP+SI]
[BP+DI]
[SI]
[DI]
[imm16] : An immediate value. This can be like a variable and such.
[BX]

[BX+SI+imm8] : The "imm" prefix always means immediate
[BX+DI+imm8]
[BP+SI+imm8]
[BP+DI+imm8]
[SI+imm8]
[DI+imm8]
[BP+imm8]
[BX+imm8]

[BX+SI+imm16] : Here a 16 bit value is used instead of an 8 bit value
[BX+DI+imm16] : Also note that with two's complement negation format,
[BP+SI+imm16] : you could displace the address by negative; you don't
[BP+DI+imm16] : use 0FFFFh for -1 though... use [BX-1] or whatever and
[SI+imm16] : the assembler modifies for two's complement for you
[DI+imm16]
[BP+imm16]
[BX+imm16]

AND DEST, SRC : logically ANDs DEST with SRC and stores in DEST
OR DEST, SRC : logically ORs DEST with SRC and stores in DEST
XOR DEST, SRC : logically XORs DEST with SRC and stores in DEST
NOT DEST : logically NOTs DEST by reversing all bits and stores in DEST
NEG DEST : uses two's complement negation format to negate DEST and stores
in DEST

Example:
MOV AX,10111b ; the 'b' at the end makes it binary
XOR AX,01001b ; AX becomes 0000000000011110b
NOT AX ; AX becomes 0000000000000001b
NEG AX ; AX becomes 1111111111111111b
AND AX,10101b ; AX becomes 0000000000010101b
MOV BX,11101b ; Use BX instead of immediate value
OR AX,BX ; AX becomes 0000000000011101b
XOR AX,AX ; AX becomes 0000000000000000b
; XOR AX,AX is effectively a MOV AX,0
; (this is an optimization)

Those are the basics... here's some more helpful ones. These are
286 and higher only, but don't worry since few people have 80186's and below
now.

SHL DEST,imm8 : Shifts DEST imm8 bits left. On a 286, the assembler must
repeat imm8 shifts by 1
SHL DEST,CL : Shifts DEST left by CL bits. A Shift Left is basically
a multiply by two
SHR DEST,imm8 : Shifts DEST imm8 bits right.
SHR DEST,CL : Shifts DEST right by CL bits.

ROL DEST,imm8 : Rotates DEST imm8 bits left. The bits cut off are returned
on the right side of DEST rather than lost like SHL.
ROL DEST,CL : Rotates DEST left by CL bits. The bits cut off are returned
on the right side of DEST rather than lost like SHL.
ROR DEST,imm8 : Rotates DEST imm8 bits right.
ROR DEST,CL : Rotates DEST CL bits left.

XCHG DEST1,DEST2 : The values of DEST1 and DEST2 are swapped.
MOVZX DEST,SRC : DEST is larger in size than SRC. DEST is Zero eXtended
with the value of SRC.
MOVSX DEST,SRC : DEST is again, larger in size than SRC. This allows for
Sign eXtending so you can use signed integers.
LODSB : Same as MOV AL,DS:[SI] - "Load some byte" and increments
LODSW : Same as MOV AX,DS:[SI] - "Load some word" and increments by 2
LODSD : Same as MOV EAX,DS:[SI] - "Load some dword" and increments by 4
STOSB : Same as MOV ES:[DI],AL - "Store some byte" and increments
STOSW : Same as MOV ES:[DI],AX - SI is the source register.. just a hint
STOSD : Same as MOV ES:[DI],EAX - DI is the destination.. any similarities?
MOVSB : Moves byte at DS:[SI] to ES:[DI] and increments
MOVSW : Moves word at DS:[SI] to ES:[DI] and increments by 2
MOVSD : Moves dword at DS:[SI] to ES:[DI] and increments by 4

Also note here that LODxx and STOxx increment SI and DI by the
size B,W, or D; or decrement depending on the Direction Flag which
is discussed later... normally, it will increment.

INC DEST : Same as ADD DEST,1 - Increment DEST
DEC DEST : Same as SUB DEST,1 - This takes up less space than the other

IMUL DEST : Same as MUL but accepts signed integers instead of unsigned
IDIV DEST : Same as DIV.. make sure your value is really signed.. maybe
using MOVSX would help if it's not

CBW : Convert byte to word - AX is sign-extend of AL
CWD : Convert word to doubleword - DX:AX is sign-extend of AX
CDQ : Convert dword to quadword - EDX:EAX is sign extend of EAX

I will not tell you how to use LEA, LDS, LES, LSS, etc. since it is bad
coding and very slow.

Var1 db imm8 ; immediate for initial value or '?' for undefined
Var2 dw imm16
Var3 dd imm32
Var4 db 'String here' ; this is used to declare a string
Var5 db 500 dup(4) ; defines a 500 byte size array containing the value 4
in each
Var6 dd 111 dup(1234h) ; 111 dword size array containing the value 1234h
Var7 dw 50 dup(50 dup(?)) ; defines a 50 by 50 array with inital value
; undefined

Now I will talk about flags. These are very important for some math
functions and especially important for IF statements.

Overflow Flag
Direction Flag
Interrupt Flag
Sign Flag
Zero Flag
* Can't remember the name (A....)
Parity Flag
Carry Flag

The Overflow Flag will be set after any math command (i.e. ADD) if there
was an unsigned operand and it became signed, or vice versa.

Interrupt Flag tells the CPU wheter or not it can accept hardware input.
Just forget this until you get into advanced programming.

Direction Flag tells the CPU to increment or decrement on a LODSB, STOSB,
and similar commands.

Sign Flag tells wheter or not the final value was signed and set if it was.
Zero Flag tells if the final value was zero and set if it was zero.

Carry Flag is like the Overflow Flag but is set if you went overflow on an
unsigned integer.

Here's how these flags come in handy as well as some more commands...
The famous IF command can be implemented with these

Here's what CMP does
If xxx >= yyy then Carry flag cleared (unsigned) and Zero flag might be set
If xxx <= yyy then Carry flag set (unsigned) and Zero flag might be set
If xxx > yyy then Carry flag cleared (unsigned) and Zero flag clear
If xxx < yyy then Carry flag set (unsigned) and Zero flag clear
If xxx = yyy then Zero flag set
If xxx >= yyy (signed) then Zero flag might be set and SF=OF

Since the way flags work is very confusing, there are easy commands to help
us: mainly the JMP command is associated.

JMP <label> : Sets IP to <label> thus jumping to that place in the code
JA <label> : Jumps to <label> if above (unsigned)
JZ <label> : Jump if Zero (two operands are equal)
JE <label> : Same as JZ
JLE <label> : Jump if Less than or Equal (signed operands)

All of the following are available on all Intel compatible assemblers:
They are basically all like English. I prefer JZ to JE even though
JE is more English-like. JZ just makes more sense to a master assembler
because of what CMP actually does in the CPU.

JA Jump Above (unsigned)
JAE Jump Above or Equal (unsigned)
JB Jump Below (unsigned)
JBE Jump Below or Equal (unsigned)
JC Jump Carry
JCXZ Jump if CX is Zero
JE Jump if Equal
JZ Jump if Zero (Equal)
JG Jump if Greater (signed)
JGE Jump if Greater or Equal (signed)
JL Jump if Less (signed)
JLE Jump if Less or Equal (signed)
JNA Jump if not Above (unsigned)
JNAE Jump if not Above or Equal (unsigned) (same as JB)
JNB Jump if not Below (unsigned)
JNBE Jump if not Below or Equal (unsigned)
JNC Jump if not Carry
JNE Jump if not Equal
JNG Jump if not Greater (signed)
JNGE Jump if not Greater or Equal (signed)
JNL Jump if not Less (signed)
JNLE Jump if not Less or Equal (signed)
JNO Jump if not Overflow
JNP Jump if not Parity
JNS Jump if not Signed
JNZ Jump if not Zero (Equal)
JO Jump if Overflow
JP Jump if Parity
JPE Jump if Parity Even (PF=1)
JPO Jump if Party Odd (PF=0)
JS Jump if Signed
JZ Jump if Zero (Equal)

ADC DEST,SRC : Add with Carry
This will add SRC to DEST like ADD, but if the Carry flag is
set, then we increment DEST by another 1. Why have this you
ask? Because...

MOV AX,50000
MOV BX,20020
XOR DX,DX
ADD AX,BX ; AX would be 70020 but it is over 65536
; so the Carry Flag is set
ADC DX,0 ; Now 50000 + 20020 is in DX:AX
MOV BX,5000
DIV BX ; 70020 / 5000 = 14
; DX = 20, AX = 14

SBB DEST, SRC : Subtract with Borrow
Just like ADC but subtracts an additional 1 if carry is set.

Flags can also be used to return the status of a procedure... like if it
completed successfully or failed.

CLC clears the carry flag
CMC complements the carry flag
STC sets the carry flag

The carry flag is the general flag used to return an error just because it
is the easiest to manipulate. Not all other flags are allowed to change
via. a command. For example, you can't set the Zero flag directly.

CLC : Clear Carry Flag
CLD : Clear Direction Flag
CLI : Clear Interrupt Enable Flag
CMC : Complement Carry Flag
STC : Set Carry Flag
STD : Set Direction Flag
STI : Set Interrupt Enable Flag

PUSHF and POPF do what you think they should do but with flags instead of
a value. The size is a word and the format is weird.

Now, I will explain about procedures and functions...
For these we use CALL and RET to call and return from procedures. If you
want a function to return something, it is easier to just store the return
value in a register and RET.

Also note that when you CALL NEAR (meaning not to a different CS) that your
IP value is PUSHed to the stack. This way, you can RETurn from the function.
If you CALL FAR (not usually used except in complex programs), then both
your CS and IP are PUSHed to the stack and you must use the RETF function to
return far rather than RET.

Here's how to write a simple function that writes the character in AL to the
screen coordinates (5,10) in text mode.

WriteChar PROC ; AL is the character
PUSH ES ; Save ES in case it was used previously
MOV BX,0B800h ; B800h is the segment for video memory in text mode
MOV ES,BX ; ES is now the segment of video memory
MOV BX,10*160+5 ; Screen width is 80 characters with 2 bytes per character
MOV ES:[BX],AL ; Write AL to ES:[BX]
POP ES ; Get ES back
RET ; Return to previous IP address
; Note that if you PUSHed and did not pop, then you will
; return to an unknown address and your program will
; crash
ENDP ; End Procedure (not a command, but for the
; assembler to know where to quit)

Start: ; Define the label 'Start'
MOV AL,'Y' ; Move the character 'Y' into AL... this stores the ASCII
; value of 'Y' as a number into AL
Call WriteChar ; Call the procedure to write AL at (5,10)

Now that you (hopefully) understand about procedures, it is time to discuss
one of the most common, easy to use commands on the PC.

They are called Interrupts. All they are are predefined functions that any
program may call. These greatly reduce the need for extra work. For example,
printing a string might be tough for a beginner, so DOS implements a
function for you. Also, basic Disk and Video I/O functions are implemented.

To know what all the interrupts do, you must have a list or just know. You
call an interrupt by specifying a number:

Interrupt 21h is the DOS interrupt for all basic I/O functions as well as
file access. There are many subfunctions for which DOS knows what you want
to do. The register which contains the subfunction as well as other
parameters to be passed are in the other registers.

Here is an example of an interrupt as well as a completely compatible
Turbo Assembler program:

;
; Interrupt 21h
; Subfunction for Print String
; AH=9
; DS:DX -> ASCII string (DS:DX points to the string you want printed)
; terminated by a '$'
;

;
; Interrupt 21h
; Subfunction for Terminate Program
; AH=4Ch
; AL=error code
;

Segment Data ; The DATA segment
HelloWorld db 'Hello World!$' ; INT 21h / AH=9 uses '$' to define
; where to terminate
EndS ; End the Data Segment

Start: ; Declare where DOS starts our CS:IP
; Just a note here that when DOS loads an EXE file, it will
; not set DS to the data segment; we must do that
MOV AX,data ; Get the segment of our DATA segment
MOV DS,AX ; Set DS to the segment where DATA is

MOV AH,9 ; DOS Print String
MOV DX,offset HelloWorld ; DS:DX must point to the offset of string
INT 21h ; Call INT 21h to print the string

MOV AX,4C00h ; DOS Exit Program
INT 21h ; Terminate the program - nothing happens after this

EndS ; End the Code Segment
End Start ; This tells TASM to not assemble anything after this
; Because of this, the EndS must come before here

Wow! Now we can write a program that prints "Hello World!" to the screen.
Aren't we awesome.

I suggest getting Ralf Brown's interrupt list if you don't know what all
the interrupts are (most likely you won't know and you'd want the list
anyway).

Just one more common mnemonic.. the REP command. This repeats a
command CX times. The CX register must be set to whatever number.
The common commands used with REP are MOVxx, LODxx, and STOxx.
Each repetition, DI and/or SI are incremented or decremented depending on
the Direction Flag.

For example, if you wanted to set an entire array that is 50 words long to
zero, then you would do this:

PUSH ES ; Save ES
MOV DI,Seg Array ; ES must be the segment of the array
MOV ES,DI ; ES is now the segment
MOV DI,Offset Array ; DI must point to the array
REP STOSW ; Store 50 AX's at ES:DI
POP ES ; Get ES back

Array dw 50 dup(?) ; Don't forget that an array must be declared
; And don't just put this anywhere in the program
; After all, you don't want an array to be
; executed!!

I'll let you write your own graphic routines, but here I'll show some
basic video and I/O functions. Just a hint: if you want to give input
as (x,y) to a function so you can write a pixel to graphics mode, you
must multiply the Y value by the X-width and then add the X value to
get the address in video memory.

We can use a different interrupt to switch to graphics mode. To do this we
use the BIOS graphics interrupt: INT 10h.

MOV AX,3 ; AH becomes 0 and AL becomes 3
INT 10h ; Screen is cleared, cursor is set to (0,0), and text
; mode 80x25 is set if it is not already
; Video memory segment is now B800h

MOV AX,13h ; AH becomes 0 and AL becomes 13h
INT 10h ; Screen is cleared and graphics mode 320x200x256 is entered
; Video memory segment is now A000h

MOV AX,0A000h ; TASM allows the use of hex, but you need to start a number
; with 0 if it begins with a letter
MOV ES,AX ; Set ES to A000h
MOV DI,0 ; We want to fill ES:DI with a color to fill the screen
MOV CX,320*200/2 ; CX=32000 since we'll fill that many words for
; one screen
MOV AX,2828h ; Color 40 (28 hex) is a bright red. Since each pixel is
; addressed per byte and we are storing by the word, we
; must put 40 into AH and AL.
REP STOSW ; Store 32000 2828h's at 0A000h:0 to fill the screen
; with red

WaitForKey: ; A Label so we can loop
XOR AX,AX ; Sets AH=0 and AL=0
INT 16h ; Wait for keypress
; Returned keypress is in AL
CMP AL,'E' ; Wait till 'E' is pressed (capital letter)
JNZ WaitForKey ; If it's not equal, get another keypress

That's the end for now. Keep in mind that if you have a program to switch
to graphics mode and you don't switch back to text, then you will be in
DOS in graphics mode. This can look confusing.