some major architecture changes
This commit is contained in:
@@ -12,6 +12,70 @@ I want to compile Bootler and Twasm with the Twasm assembler
|
||||
- [opcodes,ModR/M,SIB](http://ref.x86asm.net/coder64.html) (no secure site available)
|
||||
- [calling conventions](https://wiki.osdev.org/Calling_Conventions); I try to use System V
|
||||
|
||||
### tokeniser
|
||||
|
||||
whitespace is ignored for the sake of readability; it can go between pretty much anything
|
||||
|
||||
```
|
||||
------------------------
|
||||
tokeniser
|
||||
------------------------
|
||||
byte(s) -> next byte(s)
|
||||
------------------------
|
||||
Newline -> Newline
|
||||
-> Komment
|
||||
-> Operator
|
||||
-> Directive
|
||||
|
||||
Komment -> Newline
|
||||
|
||||
Operator -> Newline
|
||||
-> Komment
|
||||
-> Operand
|
||||
|
||||
Operand -> Newline
|
||||
-> Komment
|
||||
-> Comma
|
||||
|
||||
Comma -> Operand
|
||||
|
||||
Directive -> Newline
|
||||
-> Komment
|
||||
-> Operator
|
||||
------------------------
|
||||
```
|
||||
|
||||
not yet implemented:
|
||||
|
||||
```
|
||||
------------------------
|
||||
operand parser
|
||||
------------------------
|
||||
byte(s) -> next byte(s)
|
||||
------------------------
|
||||
START -> '['
|
||||
-> Register
|
||||
-> Constant
|
||||
|
||||
'[' -> Register
|
||||
-> Constant
|
||||
|
||||
']' -> END
|
||||
|
||||
Register -> IF #[, ']'
|
||||
-> Operator
|
||||
|
||||
Constant -> IF #[, ']'
|
||||
-> Operator
|
||||
|
||||
Operator -> IF NOT #R, Register
|
||||
-> Constant
|
||||
------------------------
|
||||
:R: = whether a register has been found
|
||||
:[: = whether a '[' has been found
|
||||
------------------------
|
||||
```
|
||||
|
||||
### memory map
|
||||
|
||||
```
|
||||
@@ -50,15 +114,15 @@ each token gets loaded into the token table with the following form:
|
||||
|
||||
### internal data structures
|
||||
|
||||
#### `tokens.by_nameX`
|
||||
#### `tokens.[operators|registers]`
|
||||
|
||||
contains all tokens of that length followed by their ID. For some non-empty `tokens.by_nameX`, it is true that `tokens.by_name<X+1> - tokens.by_nameX` is the size in bytes of `tokens.by_nameX`.
|
||||
contains tokens by their type. Intended to be searched by token name to get the token's ID.
|
||||
|
||||
each entry is in the following form:
|
||||
|
||||
```
|
||||
+----------+--------------------------------+
|
||||
|[2 bytes] | 8 * token_length - 1 0 |
|
||||
| 47 32 | 31 0 |
|
||||
+----------+--------------------------------+
|
||||
| token ID | string without null terminator |
|
||||
+----------+--------------------------------+
|
||||
@@ -68,19 +132,16 @@ each entry is in the following form:
|
||||
example implementation:
|
||||
|
||||
```nasm
|
||||
tokens:
|
||||
.by_name1:
|
||||
db "+"
|
||||
dw 0x0062
|
||||
db "-"
|
||||
dw 0x0063
|
||||
.by_name2:
|
||||
db "r8"
|
||||
tokens
|
||||
.registers:
|
||||
dd "r8"
|
||||
dw 0x0008
|
||||
.by_name3: ; this is required for futureproofness; the caller can use this to
|
||||
; find the size of tokens.by_name2
|
||||
; find the size of registers.by_name2
|
||||
```
|
||||
|
||||
note that tokens longer than 4 bytes are problematic :/
|
||||
|
||||
#### `tokens.by_id`
|
||||
|
||||
contains some tokens with their metadata. Some tokens have embedded information (`0x10XX` for instance). Those will not have entries in this table, being handled instead inside the assemble function itself.
|
||||
|
||||
Reference in New Issue
Block a user