Typed Inform

Introduction

Frequently Asked Questions
A Short Example

New Features

The Struct Directive
Typed Globals and Properties
Struct-type Constant Expressions
Routine Headers
The Declare Directive
The new Operator
Manual Memory Management
Structs With Destructors

Technical Details

Virtual Stack Frames
Routine Stubs
Reference Counting
Type Routines

Introduction

Typed Inform is an extension of Inform 6.31 with the goal of introducing a general, straightforward way to manipulate values larger than a single virtual machine word.

For more information, contact vaporware on ifMUD, or follow the email snake:

j  ansp  com
m  h  r  .
c  @  e  e
grew  stig

Frequently Asked Questions

Why would I want structs when I can already use objects?

Objects aren't suitable for some uses: there's a fixed number of them, and they're always referred to by their numbers—i.e. by reference. Structs can be passed by value, and they can be used as local variables.

Suppose you want to do some 32-bit math (on the Z-machine): each of those numbers is going to be made up of two 16-bit words. If you used an object for each number, the overhead would be huge.

OK, so I'd use arrays instead.

That's only slightly better than using objects, because either way you still only have a fixed number of long-integer variables. You need to define a separate array for each term in an expression. Those arrays are shared among all routines, so you have to be careful with reentrancy, and any changes you make will be visible everywhere.

OK, fine, but I still wouldn't need structs. I can just use two local variables for each value.

Remember, the Z-machine imposes a limit on the number of local variables and an even tighter limit on the number of parameters to a routine. If you use two locals for each value, a routine can only accept three and a half values as parameters, and it only has enough local storage space for seven and a half values. (Oh, and how are you going to return one of those two-word values?)

I can return more than one word at a time with this?

You sure can. By the way, with Typed Inform you can also have as many local variables as you want, even if they're just regular words (up to the maximum size of the routine frame, which is configurable).

Doesn't this use up a lot of space in the Z-machine's 64K of "RAM"?

Only enough to store two stack frames, and the size of those frames is a configurable, fixed number of words.

Wait a minute. Don't I need enough Z-machine RAM to store the local variables of the currently executing routine, plus any routines it calls, and any routines those routines call, etc.? I thought that's how a stack works.

That's the old kind of stack, baby. With Typed Inform, "stack" means "free memory you don't have to worry about". (See Virtual Stack Frames below for details.)

Uh, whatever. Now what's this about a "new" operator? That sounds like heap allocation. You're not crazy enough to build malloc() into Inform, are you?

Yes.

A Short Example

struct Point {
    int x, y;
};

struct Rect {
    Point topLeft, bottomRight;
};

[ Constrain:
  in/out Point pt,
  in Rect bounds;
  
  if (pt.x < bounds.topLeft.x)
      pt.x = bounds.topLeft.x;
  else if (pt.x > bounds.bottomRight.x)
      pt.x = bounds.bottomRight.x;
  
  if (pt.y < bounds.topLeft.y)
      pt.y = bounds.topLeft.y;
  else if (pt.x > bounds.bottomRight.y)
      pt.y = bounds.bottomRight.y;
];

[ MakePoint:
  in int x,
  in int y,
  local Point pt,
  return Point;
  
  pt.x = x;
  pt.y = y;
];

[ Main:
  local Rect bounds,
  local Point pt;
  
  bounds.topLeft = MakePoint(0, 0);
  bounds.bottomRight = MakePoint(10, 10);
  pt = MakePoint(random(20), random(20));
  Constrain(pt, bounds);
];

New Features

The `Struct` Directive

! Define a new type:
struct struct_name {
    ! One or more members can be declared at once:
    type member_name;
    type member_name, member_name;
};

In these definitions (and those below), the type of each member can be any of the following:

int or object, built-in types the size of a single word,
struct *, an untyped pointer,
the name of any previously defined struct (optionally followed by an asterisk, which makes the member a pointer to that struct instead),
or the name of the struct currently being defined, which must be followed by an asterisk—a type can't contain itself, but can contain pointers to other values of the same type.

Note that int * and object * are not valid types.

As in C, when more than one member of a type is defined at the same time, the asterisk meaning "pointer" only attaches to the member name immediately following it. That is, "Point *p, q;" defines p as a pointer-to-Point and q as an actual Point; "Point *p, *q;" defines them both as pointers-to-Point.

Typed Globals and Properties

Global var_name : struct_name;
Global pointer_var_name : struct_name *;

Global variables of struct or pointer types may be declared by writing the type after a colon. Pointer variables count toward $MAX_GLOBAL_VARIABLES; struct variables count toward $MAX_ARRAYS. Struct variables may be initialized with a struct constant expression (see below).

Property prop_name : type;

Properties containing structs or pointers may be declared similarly. The structs, however, must be small enough to fit into a property: no more than 4 words for V3, 32 words for V4+, or 32,768 words for Glulx. Default values may not be specified for typed properties.

Struct-type Constant Expressions

! as part of an expression:
mypoint = Point-->(3, 5);

! the "Point-->" part may be omitted when
! initializing a global variable or property:
Global origin : Point = (0, 0);

Property location : Point;
Object foo with location (100, 100);

A constant struct expression may be written by following a type name with a long arrow and a parenthesized list of values, one value for each word in the type. These expressions may not be nested: even if a struct is made up of smaller "sub-structs", the constant expression is written as if the members of those sub-structs were actually part of the larger struct, as in the following example.

struct Point {
  int x, y;
};

struct Rect {
  Point topLeft;
  Point bottomRight;
};

! to initialize topLeft to (0, 0) and bottomRight to (100, 100):
Global my_rect : Rect = (0, 0, 100, 100);

Routine Headers

! Old style routine header
[ routine_name local1 local2;

This still works but it isn't very exciting.

! New style routine header
[ routine_name:
  direction type var_name,
  direction type var_name,
  return type;

Here, only one variable can be declared at a time; the header is terminated by a semicolon. type is the same as above. The optional direction indicates how each variable is passed into or out of the routine as follows; if it's omitted, the default is local:

in: The variable is a parameter passed in ("by value"). Any changes made to it from within the routine will be purely local, invisible to the calling routine.
out: The variable is a parameter passed out—essentially an extra return value. An initial value will not be passed in from the caller, but whatever value the variable holds when the routine exits will be made available to the caller.
in/out: The variable is a parameter passed in and out ("by reference"). An initial value will be passed in, and any changes will be passed back out to the caller when the routine exits.
local: The variable is not passed in or out. The initial value is zero (for single-word variables) or a structure full of zeros.

The return declaration sets the return type of the routine. When the return type is omitted, the routine is presumed to return a single word (int or object); when present, the routine may return a struct or reference-counted pointer.

The `Declare` Directive

Declare Constrain : in/out Point pt, in Rect bounds;

The Declare directive establishes a call signature for a routine that will be defined later. This is necessary when calling a routine that uses struct parameters or out parameters, or returns a struct, but is declared further down in the source code from the point where it's called.

When the routine is eventually defined, the return type must match, and so must the type, order, and direction of every parameter. (Local variables, however, do not need to be mentioned in the Declare directive.)

The `new` Operator

[ Test:
  Point *pt;
  
  pt = new Point;
  pt->x = 123;
  pt->y = 456;
];

The new operator allocates a new block of memory, big enough to hold the specified type, and returns a reference-counted pointer to it.

Manual Memory Management

x = malloc(100);
...
mfree(x);

The malloc system function can be used to allocate memory manually, bypassing the reference-counting mechanism used by the new operator and allowing the size to be specified as an arbitrary number of bytes. The return value is the address of the new block, or 0 if a block that large couldn't be allocated.

Since reference counting is not used on these blocks, the memory must be recycled with mfree when it is no longer needed.

Structs With Destructors

struct DataWrapper {
  int datablock;
  destructor [;
    mfree(datablock);
  ];
};

A destructor routine may optionally be embedded in a struct definition. The destructor will be called when a struct value is about to be disposed of, either because its memory is being reclaimed (for reference counted pointers) or because it is a local variable of a routine which is about to return. If the struct contains manually managed pointers, as in the example, it is a good idea to free them here.

Technical Details

Virtual Stack Frames

First, some definitions: the VM stack is a feature built into the Z-machine and Glulx VMs, by which data can be stored for temporary use within a routine. A routine is not allowed to pop data off of the VM stack that was put there by a previous routine, which makes the VM stack unsuitable for passing parameters between routines (as a C compiler would do). The only way to access the VM stack is by pushing or popping a word at a time; the VM stack is not part of RAM, and therefore is exempt from the 64K limit on Z-machine RAM.

A routine frame is a block of memory laid out by the Typed Inform compiler that stores all parameters being passed into or out of a routine, as well as the routine's local variables. (Not quite all of them, actually: the VM local variables are used when possible.)

Typed Inform uses two virtual stack frames in RAM to simulate a stack. One is the local frame, which is used by the currently executing routine, and the remote frame is used by any subroutines it calls.

Before calling a subroutine, we first set up the remote frame by copying any "in" parameter values into it. We then push the contents of our local frame onto the VM stack, copy the remote frame into the local frame, and call the subroutine. Now the subroutine runs with the local frame we set up for it, and it's free to use the same trick in turn if it needs to call any other routines. After the subroutine returns, we copy its local frame back into the remote frame, pop our local frame back off of the VM stack, and copy any "out" parameters from the remote frame to their final destinations.

By ensuring that the virtual stack frames are always removed from the VM stack in the opposite order from the way they were stored there, and that each virtual frame is popped by the same routine that pushed it, we're able to use the VM stack as free storage instead of keeping several stack frames in RAM. (It isn't "free" from your computer's perspective—you still need physical RAM or virtual memory in your computer to hold all this, of course—but it is free from the perspective of the Z-machine's limited address space.)

What's the catch? Well, the VM stack doesn't have a specified minimum size, so we can't be sure how much free storage we actually get. On the other hand, it doesn't have a specified maximum size either—interpreter authors are free to choose a big stack size, or make it a setting that players can increase as needed, or even make the stack grow automatically. (The recommended size to run Inform 7 games is at least 16K, and ideally 64K or more. If your interpreter doesn't support a stack that big, pester the author.)

The other catch is that we can't take the address of anything in the local frame and pass that address to another routine, because by the time the other routine starts executing, whatever was in the local frame will have been moved onto the VM stack and replaced with something else. However, since parameters can be passed both in and out, we don't need to pass addresses to subroutines; we can pass the actual data.

Routine Stubs

For each routine you define that needs a frame of its own, Typed Inform generates a stub routine that handles the details of setting up the frame, allowing callers to simply call the stub instead of inserting a big pile of code at each call site. The stub routine's address is substituted for the original routine's address wherever it's called. For example, consider this routine:

[ Foo:
  in int a,
  in/out int b,
  out int c;
  
  c = b;
  b = a;
];

The actual code generated looks something like this:

[ Foo a;
  ! [begin frame setup section]
  ! set our preferred frame size
  (#local_frame_start-->0) = 2;
  ! [end frame setup section]
  
  (#local_frame_start-->2) = (#local_frame_start-->1);
  (#local_frame_start-->1) = a;
  
  ! [begin frame teardown section]
  ! empty in this example
  ! [end frame teardown section]
];

[ Foo__stub a b c __retval;
  ! copy parameters in
  @copy_table b (#remote_frame_start+2*WORDSIZE) 2;
  ! set the initial frame size
  (#remote_frame_start-->0) = 1;
  ! perform the call using a veneer routine
  __retval = TI__Call(Foo, a);
  ! copy parameters out
  @copy_table (#remote_frame_start+2*WORDSIZE) b 2;
  @copy_table (#remote_frame_start+4*WORDSIZE) c 2;
  return __retval;
];

Notes:

The virtual-stack-frame-switching magic is in the veneer routine TI__Call, not the stub.
Because "a" is a single word and it isn't passed out of the routine, it's passed in a normal Z-machine parameter and stored in a normal Z-machine local, which reduces code size and frame size.
The size of the remote frame is only one word before calling Foo, because there's only one "in" parameter in the frame ("b"), but two words after the call because there are two "out" parameters.
For routines that return a struct, the stub takes an extra parameter at the beginning, telling it where to save the returned value.
The frame setup and teardown sections are actually assembled at the end of the routine, and accessed by jumps, because the frame layout isn't fully known until the routine is fully compiled: for example, an expression in the middle of the routine may require temporary storage space in the frame, or may even require adding a frame to a routine that wouldn't need one at all otherwise. return, rtrue, and rfalse statements are compiled into jumps to the top of the teardown code, so that local variable reference counts can be updated before the routine exits.

Reference Counting

Memory allocated with the new operator is managed by reference counting. As long as the pointer returned by that operator is always stored in a pointer-type variable or passed as a pointer-type parameter, the number of references to the memory will be counted as the pointer is copied around, and the memory will automatically be reclaimed when the number of references drops to zero. This will cascade to other references as well: if A is a structure containing the last pointer to B, then when A is reclaimed, B will also be reclaimed.

As with any reference counting system, this will fail to reclaim memory if the only remaining references form a circle. For example, if A contains a pointer to B, B contains a pointer to C, and C contains a pointer back to A, then each pointer will keep the other structures "alive", and the memory will not be reclaimed. To avoid this situation, break the circle by setting least one of the pointers to zero.

Type Routines

To facilitate reference counting when structures are involved, Typed Inform generates a "type routine" for each structure definition. The typeof system function returns the address of this routine—for example, typeof(Point)—which can be called like so:

typeroutine(0): Returns the size of a Point struct, in words.
typeroutine(1): Returns the string "Point".
typeroutine(2): Invokes Point's destructor for the structure at the address given by the variable "self". (No effect if there is no destructor.)
typeroutine(routine, param): Calls routine once for each member of the struct, with the parameters (param, type, offset, size, name), and finally returns the total size in words. type is 0 for ints, 1 for objects, 2 for any reference counted pointer, or a type routine address for structs. offset and size are given in words. size and name are omitted in version 3 games.