`<->`

🤝

.asm

🤝

$\text{machine_sem}\,(\text{compile}\,prog)\,\subseteq\\ \text{extend_with_resource_limit}(\text{source_sem}\,prog)$

$\text{extend_with_resource_limit}(\text{source_sem}\,prog)$

$\text{source_sem}\,prog\,\cup$ 💥

✅ 👉 👍

$\bbox[background-color: #f19a3e]{\text{is_safe}\,prog} \implies \\ \text{machine_sem}\,(\text{compile}\,prog)= \text{source_sem}\,prog$

Why do programs run out of memory?

Runs out of heap
Runs out of stack
Object exceeds representation limits

CakeML

flatLang

closLang

BVL

BVI

dataLang

wordLang

stackLang

labLang

Machine code

CakeML

flatLang

closLang

BVL

BVI

dataLang

wordLang

stackLang

labLang

Machine code

CakeML

=

flatLang

=

closLang

=

BVL

=

BVI

=

dataLang

$\subseteq$

wordLang

stackLang

labLang

Machine code

CakeML

=

flatLang

=

closLang

=

BVL

=

BVI

=

dataLang

$\subseteq$

wordLang

=

stackLang

=

labLang

=

Machine code

CakeML

Source language
High level

Complex cost-semantics
Loose approximation

CakeML

=

flatLang

=

closLang

=

BVL

=

BVI

=

dataLang

$\subseteq$

wordLang

stackLang

labLang

Machine code

dataLang

Imperative
Abstract values
Stateful storage
An explicit call-stack (unlike languages above)


    FOLDL f e [] = e
    FOLDL f e (x::xs) = FOLDL f (f e x) xs


    foldl [0; 1; 2] = # FOLDL (0=l) (1=e) (2=f)
      # LENGTH l = 0?
      do 4 :≡ (TagLenEq 0 0,[0],NONE);
         if_var 4 (return 1) # Nil case, return e
           # Cons case
           do 6 :≡ (ElemAt 0,[0],NONE);  # head (x)
              7 :≡ (ElemAt 1,[0],NONE);  # tail (xs)
              ...
              # f x e
              call (18,⦕ 2; 7 ⦖) NONE [6; 1; 2; 15] NONE;
              tailcall_foldl [7; 18; 2]


    FOLDL f e [] = e
    FOLDL f e (x::xs) = FOLDL f (f e x) xs


    foldl [l; e; f] =
      # LENGTH l = 0?
      do isNil :≡ (TagLenEq 0 0,[l],NONE);
         if_var isNil (return e) # Nil case, return e
           # Cons case
           do x  :≡ (ElemAt 0,[l],NONE);  # head (x)
              xs :≡ (ElemAt 1,[l],NONE);  # tail (xs)
              ...
              # f x e
              call (e1,⦕ f; xs ⦖) NONE [x; e; f] NONE;
              tailcall_foldl [xs; e1; f]

So, the dataLang version of FOLDL
is displayed using a monadic representation, which improves readability
additionally Function names are preserved all the way from source into dataLang so it is easy to know what is what
For the code itself, Arguments are the same, they are just passed in reverse
The first few operation a case distinction over our list argument l
If l is nil base value is returned, just like in the first pattern of the source function
Otherwise, we are in the cons case
Here, head and tail are obtained
Follow by a call to our function argument f which generate the new base value
Finally a tail recursive call is performed with the tail, the new base values and the origanal function f
We can see then, that the original structure of the FOLDL function is somewhat preserved
And in our expericen this extends to other functions as well
However, some understaing of dataLang syntax and semantics is offcourse requiered
on that note...


         v = Number  int
           | Word64  word64
           | Block   ts tag (v list)  -- ts = tag = num
           | CodePtr num
           | RefPtr  num

`[1,2,3]`


      Block 8 cons_tag [Number 1;
        Block 7 cons_tag [Number 2;
          Block 6 cons_tag [Number 3;
            Block 0 nil_tag []
          ]
        ]
      ]

evaluate (prog,s) = (res,s')


      state = <| locals           : v num_map
               ; stack            : stack list
               ; refs             : v ref num_map
               ; global           : num option
               ...
               |>

evaluate (prog,s) = (res,s')


      state = <| locals           : v num_map
               ; stack            : stack list
               ; refs             : v ref num_map
               ; global           : num option
               ; limits           : limits
               ; safe_for_space   : bool
               ; stack_max        : num option
               ...
               |>

size_of_heap s

At every allocation
On heap consuming operations

size_of_stack s.stack

At every function call
On stack consuming operations

size_of_stack s.stack


        let new_stack = MAX s.stack_max
                            (size_of_stack s.stack)
        in
          s with <| safe_for_space :=
                      s.safe_for_space ∧
                      new_stack < s.limits.stack_limit;
                    stack_max := new_stack |>

size_of_heap s


        let new_heap = size_of_heap s + space_consumed s op vs
        in
          s with <| safe_for_space :=
                      s.safe_for_space ∧
                      new_heap < s.limits.heap_limit |>

Conversely, this is the measurement of heap size before a heap consuming operation is performed
Again, we make a conjunction with the current value of safe_for_space
Then we measure the amount of heap performing the operation will require, this is, the size of the current heap plus the space needed to perform the operation
space_consumed is a characterization of the space needed to perform an operation in terms of its arguments
Finally, we check if our new_heap is within the limits and update safe_for_space accordingly
This extra measurements and checks are the only changes made to the semantics
No new behaviours where added to dataLang's semantics; In fact, if one ignores the new fields the final result is the same as it was before the changes
If safe_for_space turns false at any point the semantics will continue evaluating the program.
However, the state of the final result will contain via safe_for_space an indicator of wether or not our program has ran out of memory

size_of vals refs seen

Where:

(vals) is a list of v values
(refs) is a mapping from numbers to values
(seen) timestamps we have already seen


      v = Number  int
        | Word64  word64
        | Block   ts tag (v list) -- ts = tag = num
        | CodePtr num
        | RefPtr  num


    Block 8 cons_tag [Number 1;
      Block 7 cons_tag [Number 2;
        Block 6 cons_tag [Number 3,
          Block 0 nil_tag []
        ]
      ]
    ]


    size_of [Block 3 some_tag [Number 1];
             Block 3 some_tag [Number 1]]
            refs seen


    2 +
    size_of [Block 3 some_tag [Number 1]]
            refs ({3} ∪ seen)

size_of_heap s

size_of reachabe_values s.refs {}


       <|s.locals|> ++ <|s.stack|> ++ <|s.global|>


    is_safe s prog =
       let (res,s') = evaluate (prog,s)
       in s'.safe_for_space

$\text{is_safe}\,\text{init_s}\,(\text{to_data}\,prog) \implies \\ \text{machine_sem}\,(\text{compile}\,prog)=\text{source_sem}\,prog$

(See paper for actual statement.)

Conclusion

We use dataLang as a space cost semantics for
Space cost is precisely measured by size_of
Timestamps are used to mitigate aliasing
By only counting reachable objects our measurement is idempotent over GC passes

🗣	Alejandro Gómez-Londoño
	Johannes Åman Pohjola
	Hira Taqdees Syeda
	Magnus O. Myreen
	Yong Kiam Tan

Do you have space for dessert?

`<->`

`[1,2,3]`

`size_of_heap s = size_of_heap (run_gc s)`

?