ttx 0.1.0
|
Terminal is modelled as a grid of cells. In a naive implementation, each cell has its own graphics attributes and text. However, in practice, this data model breaks down because of a number of requirements around handling unicode text and certain terminal features.
Some operations are controlled by the process running in the terminal (escape sequences), while others are controlled by the user (scrolling).
Because attributes like hyperlinks and the graphics rendition will be shared across cells, it makes sense to refer to these indirectly in a cell (if we store cells at all). This means there would be map from id to value and each cell would have an id for each of these attributes.
For text, we can store the size of the associated text for each cell. This allows for arbitrary width cells with an arbitrary amount of text, while still be reasonably efficient. The cursor will cache the actual text offset for each cell, which makes normal text operations efficient. By storing the width, we don't need to update every cell when performing certain operations (specifically overwriting a cell with new text). However, this approach is non-ideal when implementing insert/delete lines when a horizontal margin is present (but this is super rare).
For multi-line cells, they would have no corresponding text but will have some attribute set to indicate the cell's true information is stored elsewhere.
Cells also need damage tracking as an individual flag for optimizations when rendering.
Because scroll back cells are read-only, and the amount of scroll back the user may want is very large, a different data structure should be used for scroll back. In particular, the 16 bit integers used for the modifiable screen contents won't work at all when the scroll back becomes large enough. Additionally, most of the cells in scroll back will be completely empty, so it makes sense to optimize them out.
To handle the large size restriction, we can either use cells with larger index types, and "global" maps for all indirect metadata, or split these up into independent chunks/blocks/pages. The latter approach is clearly more efficient but is more complicated, especially because hyperlinks and multi cell info can span multiple terminal rows. This can be handled by either choosing boundaries which do not induce this behavior (and splitting hyperlink / drop multi cells in pathological cases), or with special markers which indicate to look at a separate block. Splitting hyperlinks is reasonable, but dropping multi cells is non-ideal (although realistically multi cells are going to be used by full screen applications with no scroll back so its not super important.
The other important operation with the scroll back is reflowing lines which we auto-wrapped. This is very useful functionality, but has important performance constraints, because the scroll back buffer can be potentially very large. This is especially problematic in cases where there is an extremely large line (1 million characters) when using a chunk based approach, since the entire line cannot fit in scroll back. Calling cat
on a large file with no newline needs to not hang the terminal. This implies that the scroll back limit should not be measured in lines ( which makes sense anyway because lines have no fixed meaning when we reflow text dynamically).
For this reason, we cannot consolidate logical lines into a single line for the purpose of scroll back. However, handling the extremely large line case is still tricky (unless we enforce a maximum line length).
Multi cells with a height greater than 0 are also very difficult to deal with when reflowing lines, when the cells have mixed height.
Combined cells come in 2 forms: wide characters as determined by the grapheme cluster width, and explicitly sized text specified by the kitty text-sizing protocol. Although wide characters represent an extreme subset of the text-sizing protocol, wide characters are more common and may deserve special optimizations.
To represent combined cells, it likely makes the most sense to store the associated text and graphics rendition in the top-left cell. This is useful because it lets us re-use several fields, and maps naturally when rendering cells. Every other "cell" linked to the primary cell can be ignored.
However, blank cells cannot be ignored when modifying terminal contents. In particular, erasing or overwriting a single cell which is part of a combined cell clears the whole mulit-cell. This gives us to ways of handling this operation: eager clearing or lazy clearing.
In this model, when we "clear" a cell which is part of a multi-cell, we need to find the all other joined cells and erase those too. This is somewhat inefficient and often redundant, as depending on the mutation, we're going to erase those other cells next (think something clearing the current line).
In this model, clearing a part of a multi-cell simply sets a flag on the shared metadata indicating the cell is no longer valid. When rendering, we can check this flag and treat the cell as blank if this flag is set. To properly handle background character erase, we'd need to store an associated background color on the multicell object. Additionally, we can longer share multi-cell metadata objects between "equivalent" multi-cells. This is especially bad when considering text consisting of all width 2 characters. We'd effectively need 1.5 as many objects as we would if we used the eager approach.
Given these constraints, it makes the most sense to lazily clear multicells with height > 1 and eagerly clear cells with height = 1. Almost all terminal mutations are implemented on a per-row basis, and so will be able to eagerly clear multicells based by checking the boundaries of the operation. For the common case of width=2, we'll still give the cell a multicell id, but it will have a fast path for lookup.
To determine which cell of a multicell is "primary", we'll need a 2 bit flags per cell. The primary cell will have both bits set, while the top-most and left-most cells will have only 1 of the bits set. Finding the primary cell can be done by iterating up and to the left looking for these flags.