tokenise labels and constants! Now assembly highkey fails but ok
This commit is contained in:
109
twasm/README.md
109
twasm/README.md
@@ -22,11 +22,14 @@ tokeniser
|
||||
------------------------
|
||||
byte(s) -> next byte(s)
|
||||
------------------------
|
||||
Newline -> Newline
|
||||
Newline -> Label
|
||||
-> Newline
|
||||
-> Komment
|
||||
-> Operator
|
||||
-> Directive
|
||||
|
||||
Label -> Newline
|
||||
|
||||
Komment -> Newline
|
||||
|
||||
Operator -> Newline
|
||||
@@ -45,37 +48,6 @@ Directive -> Newline
|
||||
------------------------
|
||||
```
|
||||
|
||||
not yet implemented:
|
||||
|
||||
```
|
||||
------------------------
|
||||
operand parser
|
||||
------------------------
|
||||
byte(s) -> next byte(s)
|
||||
------------------------
|
||||
START -> '['
|
||||
-> Register
|
||||
-> Constant
|
||||
|
||||
'[' -> Register
|
||||
-> Constant
|
||||
|
||||
']' -> END
|
||||
|
||||
Register -> IF #[, ']'
|
||||
-> Operator
|
||||
|
||||
Constant -> IF #[, ']'
|
||||
-> Operator
|
||||
|
||||
Operator -> IF NOT #R, Register
|
||||
-> Constant
|
||||
------------------------
|
||||
:R: = whether a register has been found
|
||||
:[: = whether a '[' has been found
|
||||
------------------------
|
||||
```
|
||||
|
||||
### memory map
|
||||
|
||||
```
|
||||
@@ -88,6 +60,10 @@ Operator -> IF NOT #R, Register
|
||||
+------ 0x00060000 ------+
|
||||
| test arena |
|
||||
+------ 0x00050000 ------+
|
||||
| label table |
|
||||
+------ 0x00040000 ------+
|
||||
| awaiting label table |
|
||||
+------ 0x00030000 ------+
|
||||
| stack (rsp) |
|
||||
+------------------------+
|
||||
| input |
|
||||
@@ -105,6 +81,7 @@ each word represents a token on the token table.
|
||||
each token gets loaded into the token table with the following form:
|
||||
|
||||
```
|
||||
2 bytes
|
||||
+----------+
|
||||
| 15 0 |
|
||||
+----------+
|
||||
@@ -112,6 +89,40 @@ each token gets loaded into the token table with the following form:
|
||||
+----------+
|
||||
```
|
||||
|
||||
#### label table (LT)
|
||||
|
||||
label definitions are stored and recalled from this table. The memory addresses are relative to the start of the program
|
||||
|
||||
```
|
||||
16 bytes
|
||||
+---------+
|
||||
| 127 64 |
|
||||
+---------+
|
||||
| address |
|
||||
+---------+
|
||||
| 63 0 |
|
||||
+---------+
|
||||
| hash |
|
||||
+---------+
|
||||
```
|
||||
|
||||
#### awaiting label table (ALT)
|
||||
|
||||
forward references are stored in this table to be filled in after assembly is otherwise complete. The memory addresses are relative to the start of the program
|
||||
|
||||
```
|
||||
16 bytes
|
||||
+----------+----------+------------------+---------+
|
||||
| 127 105 | 104 104 | 103 96 | 95 64 |
|
||||
+----------+----------+------------------+---------+
|
||||
| reserved | abs flag | # bytes reserved | address |
|
||||
+----------+----------+------------------+---------+
|
||||
| 63 0 |
|
||||
+--------------------------------------------------+
|
||||
| hash |
|
||||
+--------------------------------------------------+
|
||||
```
|
||||
|
||||
### internal data structures
|
||||
|
||||
#### `tokens.[operators|registers]`
|
||||
@@ -121,6 +132,7 @@ contains tokens by their type. Intended to be searched by token name to get the
|
||||
each entry is in the following form:
|
||||
|
||||
```
|
||||
6 bytes
|
||||
+----------+--------------------------------+
|
||||
| 47 32 | 31 0 |
|
||||
+----------+--------------------------------+
|
||||
@@ -129,26 +141,16 @@ each entry is in the following form:
|
||||
|
||||
```
|
||||
|
||||
example implementation:
|
||||
|
||||
```nasm
|
||||
tokens
|
||||
.registers:
|
||||
dd "r8"
|
||||
dw 0x0008
|
||||
.by_name3: ; this is required for futureproofness; the caller can use this to
|
||||
; find the size of registers.by_name2
|
||||
```
|
||||
|
||||
note that tokens longer than 4 bytes are problematic :/
|
||||
|
||||
#### `tokens.by_id`
|
||||
|
||||
contains some tokens with their metadata. Some tokens have embedded information (`0x10XX` for instance). Those will not have entries in this table, being handled instead inside the assemble function itself.
|
||||
contains some tokens with their metadata. Some tokens have embedded information (`0x10XX` for instance). Those do not have entries in this table, being handled instead inside the assemble function itself.
|
||||
|
||||
metadata about some tokens in the following form:
|
||||
|
||||
```
|
||||
4 bytes
|
||||
+----------------+----------+-------+----------+
|
||||
| 31 24 | 23 20 | 19 16 | 15 0 |
|
||||
+----------------+----------+-------+----------+
|
||||
@@ -168,6 +170,7 @@ the `type` hex digit is defined as the following:
|
||||
type metadata for the different types is as follows:
|
||||
|
||||
```
|
||||
1 byte
|
||||
+----------+
|
||||
| type 0x0 |
|
||||
+----------+
|
||||
@@ -178,6 +181,7 @@ type metadata for the different types is as follows:
|
||||
```
|
||||
|
||||
```
|
||||
1 byte
|
||||
+-------------------------------+
|
||||
| type 0x1 |
|
||||
+----------+--------------------+
|
||||
@@ -188,6 +192,7 @@ type metadata for the different types is as follows:
|
||||
```
|
||||
|
||||
```
|
||||
1 byte
|
||||
+------------------------------+
|
||||
| type 0x2 |
|
||||
+----------+-----------+-------+
|
||||
@@ -210,6 +215,7 @@ type metadata for the different types is as follows:
|
||||
entries are as follows:
|
||||
|
||||
```
|
||||
16 bytes
|
||||
+------------------------------+
|
||||
| 0 operand operators |
|
||||
+------------------------------+
|
||||
@@ -230,6 +236,7 @@ entries are as follows:
|
||||
| reserved | opcode | token ID |
|
||||
+----------+--------+----------+
|
||||
|
||||
16 bytes
|
||||
+-------------------------------------------------------------+
|
||||
| 1 operand operators |
|
||||
+-------------------------------------------------------------+
|
||||
@@ -252,6 +259,7 @@ entries are as follows:
|
||||
| | dst=r/m | |
|
||||
+----------+---------------+----------------------------------+
|
||||
|
||||
16 bytes
|
||||
+----------------------------------------------+
|
||||
| 2 operand operators |
|
||||
+----------------------------------------------+
|
||||
@@ -389,14 +397,23 @@ supported tokens are listed below
|
||||
| ret | 0x005A | |
|
||||
| cmp | 0x005B | |
|
||||
| | 0x10XX | some memory address; `XX` is as specified below |
|
||||
| | 0xFEXX | used to pass some raw value `XX` in place of a token id |
|
||||
| | 0x20XX | some constant; `XX` is as specified below |
|
||||
| | 0x3XXX | some label definition; `XXX` is its entry index in the label table |
|
||||
| | 0x4XXX | some label reference; `XXX` is its entry index in the label table
|
||||
| | 0xFEXX | used to pass some raw value `XX` in place of a token id to a couple of functions that mention this as a feature. If the function doesn't mention it, it will lead to undefined behaviour |
|
||||
| | 0xFFFF | unrecognised token |
|
||||
|
||||
values of `XX` in `0x10XX`:
|
||||
|
||||
| XX | description |
|
||||
|------|-------------|
|
||||
| 0x00 | following byte is the token ID of some register |
|
||||
| 0x00 | following word is the token ID of some register |
|
||||
|
||||
values of `XX` in `0x20XX`:
|
||||
|
||||
| XX | description |
|
||||
|------|-------------|
|
||||
| 0x00 | following 8 bytes are the constant's value |
|
||||
|
||||
### example program
|
||||
|
||||
|
||||
Reference in New Issue
Block a user