clear up internal data structures, add to README

This commit is contained in:
andromeda
2026-03-08 16:03:24 +01:00
parent 76e9cc4cd7
commit 0b7526661c
2 changed files with 160 additions and 19 deletions

View File

@@ -48,6 +48,99 @@ each token gets loaded into the token table with the following form:
+----------+
```
### internal data structures
#### `tokens.by_nameX`
contains all tokens of that length followed by their ID. For some non-empty `tokens.by_nameX`, it is true that `tokens.by_name<X+1> - tokens.by_nameX` is the size in bytes of `tokens.by_nameX`.
each entry is in the following form:
```
+----------+--------------------------------+
|[2 bytes] | 8 * token_length - 1 0 |
+----------+--------------------------------+
| token ID | string without null terminator |
+----------+--------------------------------+
```
example implementation:
```nasm
tokens:
.by_name1:
db "+"
dw 0x0062
db "-"
dw 0x0063
.by_name2:
db "r8"
dw 0x0008
.by_name3: ; this is required for futureproofness; the caller can use this to
; find the size of tokens.by_name2
```
#### `tokens.by_id`
contains some tokens with their metadata. Some tokens have embedded information (`0x10XX` for instance). Those will not have entries in this table, being handled instead inside the assemble function itself.
metadata about some tokens in the following form:
```
+----------------+----------+-------+----------+
| 31 24 | 23 20 | 19 16 | 15 0 |
+----------------+----------+-------+----------+
| typed metadata | reserved | type | token ID |
+----------------+----------+-------+----------+
```
the `type` hex digit is defined as the following:
| hex | meaning | examples |
|-----|----------|-|
| 0x0 | ignored | `; this entire comment is 1 token` |
| 0x1 | operator | `mov`, `hlt` |
| 0x2 | register | `rsp`, `al` |
type metadata for the different types is as follows:
```
+----------+
| type 0x0 |
+----------+
| 31 24 |
+----------+
| reserved |
+----------+
```
```
+-------------------------------+
| type 0x1 |
+----------+--------------------+
| 31 26 | 25 24 |
+----------+--------------------+
| reserved | number of operands |
+----------+--------------------+
```
```
+------------------+
| type 0x2 |
+----------+-------+
| 31 26 | 25 24 |
+----------+-------+
| reserved | width |
+----------+-------+
; width:
00b ; 8 bit
01b ; 16 bit
10b ; 32 bit
11b ; 64 bit
```
### token IDs
supported tokens are listed below