This is an old revision of the document!


Plan transformations: task trees, code replacement, and serialization

Description: this tutorial is aimed at CRAM developers, and its purpose is to present an in-development aspect of CRAM; expect content to change. It contains an intro to task trees, a data structure used by CRAM functions (note: this has existed in CRAM for a while and will not change). It also describes code replacement (also long existing), failure handling through code replacement (new), and serialization of task trees and functions (new).

CRAM task trees

Even before semrec, CRAM offered a logging mechanism to store what functions were called, when, and with what parameters, a mechanism referred to as a task tree. Let's first have a look at a very simple task tree. Run a REPL, and run the following code to give us some cram functions to play with later:

(swank:operate-on-system-for-emacs "cram-language" (quote load-op))
(in-package :cpl-impl)
(def-cram-function foo () 
  (format t "Just a cram function.~%"))
(def-top-level-cram-function bar ()
  (format t "Just a toplevel cram function.~%")
  (foo))

Before calling the newly-defined top-level cram function however, try the following:

*top-level-task-trees*

The REPL will return a hash table which you can inspect; since you've just started the REPL and didn't run any CRAM functions, it should be empty.

Next, run the top-level function defined above, then try to get the value for the hash table again:

(bar)
*top-level-task-trees*

You'll notice there's an element in the hash table now, corresponding to the top-level function we just ran. There's also a more elegant way to get the task tree:

(get-top-level-task-tree 'bar)

Inspect the result the REPL returns and pay attention to the slots in the objects. Of particular importance will be the slots CODE, CODE-REPLACEMENTS, PATH, and CHILDREN. CODE-REPLACEMENTS is a list of objects of the same type as CODE, and will become important later when we try to replace code in the task tree. CHILDREN is a list of objects of type TASK-TREE-NODE. CODE contains an SEXP, a FUNCTION object, a TASK, and a list of PARAMETERS. PATH is a list of symbols.

Let's look at the structure generated for the very simple example above. It contains:

  • A TASK-TREE-NODE with NIL path and NIL code, but one child …
  • which is a node with path ( ('TOP-LEVEL 'BAR) ), non-nil code where the function object is the code defined for the bar function, and with one child …
  • which is a node with path ( ('FOO) ('TOP-LEVEL 'BAR) ), non-nil code where the function object is the code defined for the foo function, and no children.

Notice that, in order to appear in the task tree, a function must be defined as a def-cram-function or def-top-level-cram-function. format doesn't appear in the task tree, and neither would any lisp function defined without def-[top-level-]cram-function.

Let's now also try the following, to observe another aspect of task tree generation:

(def-top-level-cram-function bar ()
  (format t "Just a toplevel cram function.~%")
  (foo)
  (foo))
(bar)
(get-top-level-task-tree 'bar)

and inspect the result. Notice that the tree structure has changed: the ( ('TOP-LEVEL 'BAR) ) node now contains two children, one with path ( ('FOO) ('TOP-LEVEL 'BAR) ) and the other with ( ('FOO :CALL 2) ('TOP-LEVEL 'BAR) ).

For some more intuition about task tree construction, here's another example you should try:

(def-cram-function eek ()
  (format t "A very low level function.~%"))
(def-cram-function foo ()
  (format t "Just a cram function.~%")
  (eek))
(bar)
(get-top-level-task-tree 'bar)

and if you inspect the result you should notice yet another change in the task tree structure. Each of the top-level node's children now also has a child, and they are, respectively, ( ('EEK) ('FOO) ('TOP-LEVEL 'BAR) ) and ( ('EEK) ('FOO :CALL 2) ('TOP-LEVEL 'BAR) ).

Here are the “laws” of task trees we encountered so far:

  • only cram-functions (including the top-level cram function) appear in the task tree. (The format function didn't appear in the task tree)
  • paths are lists of symbols which are the function names, the 'TOP-LEVEL symbol, and key-value :CALL if a function was called several times from the same location.
  • paths refer to locations during execution history, not source code. (The nodes for EEK are children of the different FOO nodes, rather than EEK appearing as a child in a single FOO node)

Code replacement and plan running

The task tree isn't just a mechanism for logging however. Though the examples above don't show it, the task tree helps define what is actually run when you invoke a cram function, and we will look at this now in the next example:

(defun box-off ()
  (format t "Box is in the off state.~%"))
(def-cram-function box-state ()
  (format t "Box is in the on-state, turning off ... ~%")
  (replace-task-code '(box-off) #'box-off '((BOX-STATE) (TOP-LEVEL SELF-OFF-TURNING-BOX))))
(def-top-level-cram-function self-off-turning-box ()
  (box-state))

Load the above code into REPL and run the top-level function. The response should look something like this:

Box is in the on-state, turning off ... 
(#S(CODE
    :SEXP (BOX-OFF)
    :FUNCTION #<FUNCTION BOX-OFF>
    :TASK NIL
    :PARAMETERS NIL))

As you can see, the message in the box-state function is printed out, which isn't surprising, and notice that a CODE object is returned by replace-task-code. Let's first inspect the task tree for self-off-turning-box however. As before, use

(get-top-level-task-tree 'self-off-turning-box)

and inspect the result. Navigate to the ( (BOX-STATE) (TOP-LEVEL SELF-OFF-TURNING-BOX) ) node; you will see there is an element in the CODE-REPLACEMENTS list now, containing a function object that points to box-off. So let's try running the top-level function again and see what happens …

Box is in the off state.
NIL

Observe that the call to box-state has now been replaced with a call to box-off. In general, and this is another “law” of task trees, if a task tree node exists and has code replacements, then the car of those code replacements is run when the node is reached during execution history.

The last part of the law is very important, as the next example shows:

(def-cram-function box-state ()
  (format t "Box is in the on-state, turning off ... ~%")
  (replace-task-code '(box-off) #'box-off '((BOX-OFF) (BOX-STATE) (TOP-LEVEL SELF-OFF-TURNING-BOX))))

First, inspect the variable *top-level-task-trees* and clear the entry associated with self-off-turning-box. Alternatively, run

(setf (gethash 'self-off-turning-box *top-level-task-trees*) nil)

Next, run self-off-turning-box again and inspect its corresponding task tree that you get with get-top-level-task-tree. Notice that a ( (BOX-OFF) (BOX-STATE) (TOP-LEVEL SELF-OFF-TURNING-BOX) ) node has been created, and it has the code replacement specified. However, if you run self-off-turning-box again … nothing different seems to have happened. The CODE-REPLACEMENTS list grows longer (and you will see it as a return value) but the box-off function is never actually called. This is because there is no call to a cram function called box-off inside the cram function box-state, therefore there is no opportunity for the system to ask whether there is code defined for such an eventuality.

This isn't to say that code replacements cannot change the structure of a task tree in ways that will also impact execution. You can add meaningful (executable) nodes to the tree, you just have to make sure that there is a call to the highest node in the hierarchy you want glued to the task tree. Here's an example of that:

(def-cram-function part-a ()
  (format t "You should never see this.~%"))
(def-cram-function part-b ()
  (format t "You should never see this either.~%"))
(defun part-a-true ()
  (format t "The part A you will see.~%"))
(defun part-b-true ()
  (format t "The part B you will see.~%"))
(defun double-call ()
  (part-a)
  (part-b))
(def-cram-function simple-node ()
  (format t "Just a place-holder that you won't see."))
(def-top-level-cram-function anticipator ()
  (replace-task-code '(double-call) #'double-call '((SIMPLE-NODE) (TOP-LEVEL ANTICIPATOR)))
  (replace-task-code '(part-a-true) #'part-a-true '((PART-A) (SIMPLE-NODE) (TOP-LEVEL ANTICIPATOR)))
  (replace-task-code '(part-b-true) #'part-b-true '((PART-B) (SIMPLE-NODE) (TOP-LEVEL ANTICIPATOR)))
  (simple-node))

If you run the anticipator top-level function, you'll notice that the part-a and part-b functions (as well as simple-node) never actually ran. Instead, the code replacements caused certain nodes to be created and were invoked when execution reached certain points.

The above examples were for parameter-less functions. Parameters, however, are not changed via code-replacement. Functions are called with the parameters that actually exist at execution time; the PARAMETERS slot in the task tree node's CODE or CODE-REPLACEMENTS is for logging purposes only.

We can therefore state some additional “laws” of task trees:

  • when a top-level cram function is called, it checks whether a top-level task tree associated with it exists. If not, it creates one and sets the current execution path to ( (TOP-LEVEL <function name>) )
  • when a cram function is called, it sets the current execution path to (append (list <function name>) <current execution path>) then checks whether a task tree node exists at the current execution path.
    • If no node exists, the function creates it, then runs the code defined in the function's source code.
    • If a node exists, but it has a valid task object (that might point to an active or finished task), then the cram function modifies the current path by adding a key-value :CALL (+ 1 <max key-value so far at this location, default 1>), creating a node there, and running its source code.
    • If a node exists, but it has a nil task object, then the cram function will run either the car of CODE-REPLACEMENTS if it is not nil, or CODE otherwise.
  • When a task tree node is run, the value of PARAMETERS in either CODE or (car CODE-REPLACEMENTS) is set to the value of the parameters at the moment of the function call.

In the examples so far we've used replace-task-code, a function defined in cram-language that takes an s-expression, a function, a path (and, optionally, a task tree; default value is the current task tree). All of the above features have existed in CRAM for a long time and are not the subject of recent pull requests.

Failure handling through code replacement: a simple example

Now that we've looked at the basics of task trees and replace-task-code, it's time to see a simple possible application of this capability– We will try to replace code as part of a failure handling routine:

(in-package :cpl-impl)
(defvar *stop-infinity* 0)
(setf *stop-infinity* 0)
 
(remove-top-level-task-tree 'ptr-fail-handle)
 
(def-cram-function failure-causing () 
  (if (> 1 *stop-infinity*) 
      (progn
        (setf *stop-infinity* (+ 1 *stop-infinity*))
        (format T "FAILURE-CAUSING: Will attempt to send a fail signal from ~A.~%" *current-path*) 
        (fail 'plan-failure)))
      (format T "FAILURE-CAUSING: Why are we back here?~%"))
 
(def-cram-function some-function ()
  (failure-causing))
 
(defun plan-repaired () 
  (format T "PLAN-REPAIRED: Hi there. This was inserted via code replacement.~%"))
 
(def-top-level-cram-function ptr-fail-handle () 
  (with-failure-handling
    ((plan-failure (f)
       (let ((code-path (plan-failure/get-code-path f)))
         (format T "Code path of failure: ~A.~% Current path: ~A.~%" code-path *current-path*)
         (replace-task-code '(plan-repaired) #'plan-repaired code-path)
         (retry))))
    (progn 
      (some-function))))

In other words, we're defining a top-level function that, if a failure happens, will replace one of the cram-functions it calls; we'll see soon enough why for the sake of example we added the check to *stop-infinity*. Let's run the top-level function. The response you see should be something like this:

FAILURE-CAUSING: Will attempt to send a fail signal from ((FAILURE-CAUSING) (SOME-FUNCTION)
                                         (TOP-LEVEL
                                           PTR-FAIL-HANDLE)).
Code path of failure: ((FAILURE-CAUSING) (SOME-FUNCTION)
                       (TOP-LEVEL
                         PTR-FAIL-HANDLE)).
 Current path: ((SOME-FUNCTION)
                (TOP-LEVEL
                  PTR-FAIL-HANDLE)).
FAILURE-CAUSING: Why are we back here?
NIL

As you can see, it's only the FAILURE-CAUSING function that gets called, not PLAN-REPAIRED, which we inserted via code-replacement. What happened? If you inspect the task tree associated with PTR-FAIL-HANDLE, you'll notice that there are 2 children for the (TOP-LEVEL PTR-FAIL-HANDLE) node, ( (SOME-FUNCTION) …) and ( (SOME-FUNCTION : CALL 2) …). This is because the retry does a non-local exit and returns to the beginning of the &BODY in the WITH-FAILURE-HANDLING clause, where it encounters a call to SOME-FUNCTION, and it sees that there is still a task object associated with ( (SOME-FUNCTION) …). Therefore (see the “laws” above) it creates a new sibling node in which the code replacement has no effect, and so it eventually calls FAILURE-CAUSING again (which this time simply terminates, to avoid an infinite loop in the example). However if you run ptr-fail-handle again the answer you see looks like this:

PLAN-REPAIRED: Hi there. This was inserted via code replacement.
NIL

(You'll get the same answer if you (setf *stop-infinity* 0) and then run ptr-fail-handle.)

This is because after a top-level function completes, all task objects in the task tree are set to nil. Therefore, when we rerun the plan and reach the call to some-function again, it's ( (SOME-FUNCTION) …) that is the current path and we have the code replacement set up here.

WITH-FAILURE-HANDLING has been intended to preserve and log all attempts in the task tree. However, in case you want your plan to actually change during runtime, rather than having to wait until it ends and runs again, you will need a new CRAM macro, with-transformative-failure-handling. The code looks exactly as before, with just a change to the top-level:

(def-top-level-cram-function ptr-fail-handle () 
  (with-transformative-failure-handling
    ((plan-failure (f)
       (let ((code-path (plan-failure/get-code-path f)))
         (format T "Code path of failure: ~A.~% Current path: ~A.~%" code-path *current-path*)
         (replace-task-code '(plan-repaired) #'plan-repaired code-path)
         (retry))))
    (progn 
      (some-function))))

Let's run the new code (as well as reset some global variables to get a clean environment):

(remove-top-level-task-tree 'ptr-fail-handle)
(setf *stop-infinity* 0)
(ptr-fail-handle)

The answer should show that the top-level calls the failure causing function once, then changes itself, then will call the replacement function immediately thereafter:

FAILURE-CAUSING: Will attempt to send a fail signal from ((FAILURE-CAUSING) (SOME-FUNCTION)
                                         (TOP-LEVEL
                                           PTR-FAIL-HANDLE)).
Code path of failure: ((FAILURE-CAUSING) (SOME-FUNCTION)
                       (TOP-LEVEL
                         PTR-FAIL-HANDLE)).
 Current path: ((SOME-FUNCTION)
                (TOP-LEVEL
                  PTR-FAIL-HANDLE)).
PLAN-REPAIRED: Hi there. This was inserted via code replacement.
NIL

If you inspect the task tree for ptr-fail-handle now, you will see there is no :CALL 2 node. Unlike with-failure-handling, with-transformative-failure-handling will only store the last attempt.

Task tree serialization