not an expert on this, and although I have been planning for ages to return
to the subject, I will NOT do so now :-) still this page should be sufficient
to get you started.
Okay, let's start by saying that I am NOT an expert on this. In fact, I'm pretty much the absolute beginner, which makes me perfectly suited to write an introduction to assembly in PureBasic... Not!
Here's a little information I gathered over time, but I definitely do not have much experience with this, so feel free to doubt my statements and correct my endless mistakes...
This is not the next WikiPedia, but some terms will show up that you need to understand. I'll add some links if I run into them. Skip if you know and just want to see how PureBasic does things...
The smallest part a computer knows, can be either 0 or 1.
A group of four bits, numbered from 0 to 3.
bit3 bit2 bit1 bit0It can contain values from 0 to 15.
A group of eight bits, numbered from 0 to 7.
bit7 ... bit0It can contain values from 0 to 255. A byte is eight bits, or two nibbles.
A group of sixteen bits, numbered from 0 to 15.
bit15 ... bit0It can contain values from 0 to 2^16-1. A word is two bytes. The lower half (bit0 to bit7) is called the lo-byte, the upper half (bit8 to bit15) is called the hi-byte.
Also known as 'double word' or 'dword'. A group of 32 bits, numbered from 0 to 31, containing values from 0 to 2^32-1. A word contains four bytes or two words. The lower word (bit0 to bit15) is called the lo-word, the upper half (bit16 to bit31) is called the hi-word.
A group of 64 bits, numbered from 0 to 63, containing values from 0 to 2^64-1. A quad contains eight bytes or four words or two longs.
Also known as base 2. Your computer on its lowest level uses just two numbers, 0 and 1. We can represent any number in a combination of zeroes and ones. Each little part is called a 'bit'.
%00000001 = 2^0 = 1In Basic languages a 'percentage' symbol is often placed in front of a binary number to identify it, for example %1001 is equivalent to decimal 9. In other languages such as C the notation for that same binary number is different: &B1001.
Also known as base 10. It's what we humans use. That's what you get for having ten fingers...
Also known as base 16. Binary numbers are a bit hard to remember and way too long for practical purposes. Four bits together are called a 'nibble' and can be represented by one character in hexadecimal:
%00000001 = 2^0 = 1 = 16^0 = $01In most Basic dialects the '$' character is placed in front of the number to indicate it's a hexadecimal number. In C the combination &H is used. For example decimal 255 is in PureBasic $FF, in C it's written as either &HFF or 0XFF.
I still have to see a practical use for this one :-) but what the heck, I mostly listed this variation here as the C implementation has some consequences... Also known as base 8. Just group three bits together and turn them into their decimal equivalent.
%00000001 = 2^0 = 1 = 8^0 = &O01There's no PureBasic equivalent for this one. In C however there are three ways of writing octals! Decimal 31 can be written as &O23 (that's an ampersand and an 'ooh'), &023 (that's an ampersand and a zero) and 023 (that's just a number starting with a zero). Mightily confusing, especially when using a font which does not clearly differentiate between zeroes and capital 'o'. Good thing we don't have them in PureBasic.
The brains of our computer. A little black box that looks for instructions in memory, fetches the appropriate information, and then does something with it :-) A CPU only understands machinecode.
A smaller part inside the CPU, that can take some information. Depending on the type of CPU, registers can have different sizes.
An instruction for the CPU. On older CPU's an instruction would be a single byte. On newer CPU's instructions can take multiple bytes.
A parameter for the operand, typically data, numbers, memory addresses etc.
The actual instruction that the CPU reads, and pretty much unreadable for humans. An example in 'human programming language':
$01 ; open fridge
An easier to remember equivalent of the numbers that actually make up machinecode. The code above would read then:
OPNFR ; open FRidgeWhich, with sufficient exercise, may result in faster programming and you enlisting in Alcoholists Anonymous.
Also known as ASM. A collection of mnemonics and related data, making up a program. An assembler then takes the whole package and turns it into a program (well, almost, there's often a linker as well).
Fills in the blanks. When the assembler is done it may have created a complete set of instructions and data, but it still may miss some information for your program to run. The linker fills in this missing information, and makes sure your program can run on your operating system. Assuming you haven't made any mistakes :-)
(In the old days, there were no linkers.)
80186, 80286, 80386, 80486, Pentium, Pentium II, Pentium III, Pentium IV, Core 2, Sempron, Celeron, Duron, Ahtlon, Amd64, Phenom, Opteron...
All those numbers and names refer to CPU's, all based on or related to the 8086 of old, and all these processors have some level of compatibility with each other. For simplicity, x86 mostly refers to this group, although the older models are pretty much ignored and miss many instructions that are considerd 'standard' these days (think everything before the Pentium III).
On Win32 you can only use eax, ecx, edx, xmm0, xmm1, xmm2 and xmm3.
When AMD arrived on the scene with a 64 bit extension in the instruction set of the Amd64 processor, a new (64 bit) standard was set. It required certain hardware changes and a full rebuild of the operating systems to make the best use of the larger memory space. For the sake of simplicity, the x64 64 bit CPU's offer more memory and some protection mechanisms.
On Win64 you can use only use rax, rcx, rdx, r8, r9, xmm0, xmm1, xmm2 and xmm3.
The generic registers
The basic 4 registers when using Win32 are each 4 bytes wide: EAX, EBX, ECX and EDX. There are multiple ways to refer to them. Take for example register A:
31..16 15..8 7..0Please be aware: not all registers are created equal... certain instructions work only on certain registers.
The other registers
There's a lot more to be told about registers, but hey, this is only a survival guide :-) I'll add the stuff when I run into it :-) It's for now enough to know that they hold all other kinds of information, for example a pointer where the next instruction will be that the CPU executes, or a place where it stores the results of a calculation et. You may want to check the following links:
The 'stack' is a little table that can hold values or addresses. It's a 'last in first out' table. Think about a stack of papers, we put new sheets on top, and take them out from the top, ie. the last one on top is the first one to go out.
A typical use of the stack is to store the return address when calling a 'subroutine'. And, as there are only four generic registers, it's also used to store temporary values.
The easiest way to store all registers is using PUSHAD, which would store ALL registers in the 'stack':
! PUSHADOf course that's not the smart thing to do if you just need to push a single register or value, for which we have PUSH and POP.
! PUSH dword EAXNote! If you plan to use local variables inside a procedure grab them first before messing around with the stack!
PureBasic allows you to include assembly directly in your source code. The instructions will be processed by the PureBasic compiler, or directly passed on to the compiler. There are certain differences between the two methods.
No matter what method you use, the following rules always apply:
For Inline ASM: go in the PureBasic IDE to the menu Compiler / Compiler Options and switch on 'Enable inline ASM support'. By enclosing a section of your code with EnableASM and DisableASM you can now enter mnemonics directly as if they were PureBasic keywords...
EnableASMAs you can see you can use variable names directly.
If you have installed the AMS.HLP file you can move the cursor on top of the MOV instruction and hit F1, and you will see what that instruction does. Make sure you have enabled inline ASM support. Go in the PureBasic IDE to the menu Compiler / Compiler Options and switch on 'Enable inline ASM support'. You can now enter mnemonics directly as if they were PureBasic keywords, as well as access any purebasic variables...
Global b.lAs you can see, the regular rules apply to variable scope (local vs. global etc.).
We can return the results of our assembly code through a variable, by using MOV or something similar.
Procedure.l x()In procedures we can also leave a value behind in EAX, which will be returned by a ProcedureReturn without parameter:
Direct to compiler
It is also possible to pass the instructions directly to the assembler. In other words, the PureBasic compiler will not process the lines, but just passes them on. The following limitations apply:
Procedure.l x()And when directly passed to the compiler:
Procedure.l x()Here's another example, to show the differences between inline and direct:
; survival guide 10_4_400 assemblyWhen using Win64:
; survival guide 10_4_401 assembly