Wednesday, October 4, 2017

Hello Assembly

*wipes the dust off the blog* Hello there, long time not writing but today I'm willing to talk about the Assembly language (not like the previous article that illustrated how to write your first Assembly program)

In the beginning of the computing revolution, the computers were hardwired to perform only one operation, the circuit of the computer was designed to perform a single operation.
By the time, it became obvious very quickly that we need a universal computer that can perform any task that it is programmed to do.
And this was the real beginning (from my perspective) of the computing revolution.

When you talk about the general purpose computers, you ought to mention two guys who without them efforts, the computers wouldn't have been true. Alan Turing and John Von Neumann

Alan Turing was famous for breaking the Enigma algorithm in the second world war, and he is the godfather of the computing. he is the first one how proposed the idea of "Turing machine" which is a machine that can compute any computable algorithm. I won't go further about how he talked about it and what it really is, but its all about something that can be written somewhere then executed by something then produce something else as a result that could be written to somewhere else and so on.

This idea was picked up and taken further by Von Neumann, John introduced the idea of the CPU, with some memory that can be altered by the CPU after processing and executing some instructions.

in fact, the vast majority of the computers that we are using nowadays uses what so-called "Von Neumann Architecture"

So, you might be wondering, what does all of this have to do with Assembly language?

Ok, let me explain. When the first programmable computer came out, it was pretty hard to program and the coding process was backbreaking even to a professional computer scientist.
they used to program that computer using only zeros and ones, a bunch of zeros and ones tells the CPU to do something. change a single zero to one and you will face the butterfly effect. The CPU will do something completely different from what you expected it to do.

After that, they came out with an idea to solve that problem and get rid of programming using the error-prone machine language. if 01001010 is moving something from A to B, ok I will substitute that value with a constant word and I will create a software that understands this word and convert it back to its original value so the CPU could understand and execute it.

And this idea was the Assembly language (of course it's not that easy, it's complicated more than what you can ever imagine but I don't wanna take you through mind-blowing details).

The program that used to convert the Assembly code into the machine code is called assembler.


Assembly language has many variants. Each processor family has its own Assembly language. Yea Assembly language is not portable. A program written in Assembly on an ARM computer cannot run on, for example, Intel computer.

Shocked? Ok, let me tell you more. Each variant has different versions which span the gamut from 16-bit to 64-bit opcodes. Kill yourself.

But the current most popular Assembly flavors are ARM and x86. Google and read about them on your own, because in this article I'm going to talk about the 16-bit Assembly language for the 8086 Intel processor.


To illustrate the difference between Assembly and the other programming languages, let me show you a snippet of Assembly and its equivalent from any other language like C.

This code will print Hello World on the screen, simple enough.

in C, you can do it in a single line (after doing your essential stuff like declaring the main function and including stdio and stuff)

printf("Hello World");

the code explains itself. you are calling a function that prints the passed string on the screen. pretty easy and straightforward.

What about Assembly?

Ok, let see...

.data
str db "Hello World$"

.code
mov ah, 9
lea dx, str
int 21h
hlt

Is it hard? then what about coding it using only 0s and 1s?

let's go through the first segment which is the data segment. Unlike the other programming languages, in Assembly, you cant define variables wherever and whenever you wanna, no.

Variables are defined in the data segment before getting to the code segment (some developers used to begin with the code segment and place the data segment underneath. its ok, you can do both as long as the variables and resided in the data segment).

The next line is str db "Hello World$"

In Assembly, the variables definition formula is var_name bytes_number var_value
str is the variable name, db is telling the assembler to declare a single byte, Hello World is the variable value and carry on, we will get to the dollar sign later.

Hey hey, hold on. Are you kidding? you are declaring a byte to store a string? how come!
Actually, str is not storing our string. It only has the memory address of our first litter which is 'H' and by going through the memory byte by byte you can access the whole string, untill you get to the dollar sign, the assembler will understand that this is the end of the string. the dollar sign is the string delimiter.

Now, let's move to the next line that is mov ah, 9

let's discuss the use of mov instruction first. As its name implies, it moves something from place to another. In this line, mov is moving 9 to the register AH.

Come on Mohamed! stop telling mysterious stuff, what is a register? Ok, a register is a temporary memory that can store a very tiny amount of data. A register is a small data holding place and its a part of the processor architecture. A processor usually has a set of registers consists of more than 4 registers, for example, the 8086 has more than 10 registers.

The registers in the 8086 processor are 16 bits registers, divided into two sub-registers. AX is the main register, AH is the high 8 bits of the register and AL is the low 8 bits of the register.
i.e. if AH = 01010101 and AL = 11001011 then AX = 010101111001011, clear enough.

Lets back to the track. The register AH has the value of 9. Ok but why? I will tell you later.
After that, we have lea dx, str. this line loads the address of the first char in our string into dx register. lea stands for load effective address.

The most important line which is int 21h
this is where the magic happens. for simplicity, you can consider this line as a function call in the higher level languages, int 21h is calling 21h function and the parameters are 9 and str, yea we placed them in the registers so the interrupt could access them. When int 21h is reached, it will look at AH to obtain the function code to know the task it should do. 9 is for printing (you can use Google to search for what else this interrupt can do)

Ok, I'm about to print, but what to print? int 21h will print whatever dx is pointing to, that is in our case, str.

Last but not least, we have hlt. It's just like return 0; in C or C++. In Assembly, hlt returns the control from the program back to the OS so you can do another task or run another program. And that's it!

Assembly is not as hard as it seems. It's easy, very easy. but its really complicated too. being easy doesn't make it simple. C is hard but simple. can you feel the difference?

No comments:

Post a Comment