clear up internal data structures, add to README
This commit is contained in:
@@ -48,6 +48,99 @@ each token gets loaded into the token table with the following form:
|
||||
+----------+
|
||||
```
|
||||
|
||||
### internal data structures
|
||||
|
||||
#### `tokens.by_nameX`
|
||||
|
||||
contains all tokens of that length followed by their ID. For some non-empty `tokens.by_nameX`, it is true that `tokens.by_name<X+1> - tokens.by_nameX` is the size in bytes of `tokens.by_nameX`.
|
||||
|
||||
each entry is in the following form:
|
||||
|
||||
```
|
||||
+----------+--------------------------------+
|
||||
|[2 bytes] | 8 * token_length - 1 0 |
|
||||
+----------+--------------------------------+
|
||||
| token ID | string without null terminator |
|
||||
+----------+--------------------------------+
|
||||
|
||||
```
|
||||
|
||||
example implementation:
|
||||
|
||||
```nasm
|
||||
tokens:
|
||||
.by_name1:
|
||||
db "+"
|
||||
dw 0x0062
|
||||
db "-"
|
||||
dw 0x0063
|
||||
.by_name2:
|
||||
db "r8"
|
||||
dw 0x0008
|
||||
.by_name3: ; this is required for futureproofness; the caller can use this to
|
||||
; find the size of tokens.by_name2
|
||||
```
|
||||
|
||||
#### `tokens.by_id`
|
||||
|
||||
contains some tokens with their metadata. Some tokens have embedded information (`0x10XX` for instance). Those will not have entries in this table, being handled instead inside the assemble function itself.
|
||||
|
||||
metadata about some tokens in the following form:
|
||||
|
||||
```
|
||||
+----------------+----------+-------+----------+
|
||||
| 31 24 | 23 20 | 19 16 | 15 0 |
|
||||
+----------------+----------+-------+----------+
|
||||
| typed metadata | reserved | type | token ID |
|
||||
+----------------+----------+-------+----------+
|
||||
```
|
||||
|
||||
the `type` hex digit is defined as the following:
|
||||
|
||||
| hex | meaning | examples |
|
||||
|-----|----------|-|
|
||||
| 0x0 | ignored | `; this entire comment is 1 token` |
|
||||
| 0x1 | operator | `mov`, `hlt` |
|
||||
| 0x2 | register | `rsp`, `al` |
|
||||
|
||||
type metadata for the different types is as follows:
|
||||
|
||||
```
|
||||
+----------+
|
||||
| type 0x0 |
|
||||
+----------+
|
||||
| 31 24 |
|
||||
+----------+
|
||||
| reserved |
|
||||
+----------+
|
||||
```
|
||||
|
||||
```
|
||||
+-------------------------------+
|
||||
| type 0x1 |
|
||||
+----------+--------------------+
|
||||
| 31 26 | 25 24 |
|
||||
+----------+--------------------+
|
||||
| reserved | number of operands |
|
||||
+----------+--------------------+
|
||||
```
|
||||
|
||||
```
|
||||
+------------------+
|
||||
| type 0x2 |
|
||||
+----------+-------+
|
||||
| 31 26 | 25 24 |
|
||||
+----------+-------+
|
||||
| reserved | width |
|
||||
+----------+-------+
|
||||
|
||||
; width:
|
||||
00b ; 8 bit
|
||||
01b ; 16 bit
|
||||
10b ; 32 bit
|
||||
11b ; 64 bit
|
||||
```
|
||||
|
||||
### token IDs
|
||||
|
||||
supported tokens are listed below
|
||||
|
||||
Reference in New Issue
Block a user