Relational Cell Morphing

for Array-Manipulating Programs

Marco Gaboardi (Boston University) · gaboardi@bu.edu PRESENTER

Eric Koskinen (Stevens Institute of Technology) · ekoskine@stevens.edu

Hiroshi Unno (Tohoku University) · hiroshi.unno@acm.org

Weihao Qu (Monmouth University) · wqu@monmouth.edu

ARRAY 2026 Workshop

Background: relational properties involving arrays

Relational verification compares two executions. Four relational properties, each with its program and formal claim:

1 / 4 · monotonicityStart with two runs ordered pointwise; the result should preserve that order.

1. monotonicity 2. robustness 3. non-interference 4. relative cost

Monotonicity

arrayMax(A, n):  m := A[0]  for i = 1 to n-1:    if A[i] > m: m := A[i]  return m Relational property:  if A1[i] ≤ A2[i] for all i,  then  max(A1) ≤ max(A2)

Robustness / sensitivity

clampPositive(A, n):  for i = 0 to n-1:    B[i] := max(A[i], 0)  return B Relational property:  if A1, A2 differ at one index,  then  Hamming(B1, B2) ≤ 1

Non-interference

arrayDoubleOpNI(A, n):  out := 0  for i = 0 to n-1:    out := out + 2*pub(A[i])  return out Relational property:  if A1 =_L A2 (public part agrees),  then  out1 = out2

Relative cost analysis

countPositive(A, n):  cost := 0  for i = 0 to n-1:    if A[i] > 0: cost++ Relational property:  if A1, A2 share ≥ d_0 equal positions,  then  cost1 − cost2 ≤ n − d_0

Talk goal: prove these array-relational claims automatically, with no array-theory reasoning in the solver query.

1 / 4

Level 1 cost analysis: valid but coarse

CountPositive: cost(A) = #{ i : A[i] > 0 }. Relational cost compares two runs: c = cost(A₁) − cost(A₂).

1 / 4 · exampleThe real gap is one changed positive cell, but fixed-k L1 can only prove n − 1.

L1 is safe but coarse It sees one fixed-cell flag bk, but it cannot count the full equality set size d₀.

running example

countPositive(A,n): for i = 0..n-1:  if A[i] > 0: cost++A1 = [3, 5, 4, -2, 5]A2 = [3, -1, 4, -2, 5]

truthd₀ = 4, n − d₀ = 1

L1 boundn − 1 = 4

what L1 sees

sees bk

one equality flag at the fixed tracked cell k

cannot count d₀

how many array indices are equal across the two runs

fixed-k pfwCSP skeleton

State: Inv(i, n, k, ak1, ak2, bk, c)HIT equal:  i = k ∧ bk = 1  ⇒  c' = cMISS:       i ≠ k           ⇒  c' = c + 1Goal:       c ≤ n − 1  when bk = 1

safe fixed-k boundc ≤ n − 1

missed tight targetc ≤ n − d₀

PCSAT probesP sat,37N unsat,2V unsat,9Probe key: P positive bound · N negated goal · V vacuity check

Takeaway: fixed-k L1 charges equal cells it cannot track

Only cell k can take HIT. Every other position falls through MISS, so equal cells outside k still add +1.

1 / 5 · fixed kStart with fixed k = 0: only this tracked cell can use bk.

truth c0 L1 charge0 tight n − d₀1

HIT equal · +0 MISS cost-different · +1 MISS equal · over-charge

The trace will scan positions i = 0 → 4.

One rule creates the gap

fixed k = 0HIT equal:  i = k ∧ bk = 1  ⇒  c' = cMISS:       i ≠ k           ⇒  c' = c + 1

HIT equal i = k ∧ bk = 1 charge +0

MISS cost-different i ≠ k ∧ [A₁[i] > 0] ≠ [A₂[i] > 0] legitimate +1

MISS equal i ≠ k ∧ A₁[i] = A₂[i] over-charge +1

MISS over-charges at i = 2, 3, 4: the cells are equal, but fixed-k L1 does not represent those equalities.

Truth: c = 1 = n − d₀ L1 charge: 4 = n − 1 Gap: d₀ − 1 = 3

Takeaway: ARel's β is a set; CHC sees its size d₀ and one bit bk

ARel uses β as a mask of positions where the array relation holds [ARel, ICFP'19].

1 / 5 · β relation maskARel begins with β as the set of indices where the two arrays are related.

Bridge: β relation mask → prophecy scalar d₀ This CHC does not store β as a set; it carries d₀ = |β| and queries bk = β(k).

d / d₀0 / 4 c0 n − d₀1

What survives in CHC

ARel:      β = { i | A₁[i] = A₂[i] }example:   β = {0,2,3,4}prophecy:  d₀ = |β| = 4state:     (..., bk, d, d₀, c)fixed k:   bk = β(k) sees one cellnext:      refocus witnesses d₀ β-cells

Press Step to build the β mask, then summarize it into d₀.

β is the set. d₀ is its size. A fixed k sees only one bit, not the whole set.

From cell morphing to relational cell morphing

1 / 4 · one runStandard cell morphing tracks one arbitrary cell in one execution.

Standard cell morphing

one execution

( k, a_k )

Track one arbitrary index and replace the array with one scalar cell.

→

same index k
in both executions

Relational cell morphing

two executions

( k, a_k1, a_k2, b_k )

Track the paired cells and carry a relation abstraction between them.

new ingredient b_k abstracts the relation between a_k1 and a_k2 a_k1 = a_k2 a_k1 ≤ a_k2 |a_k1 - a_k2| ≤ 1

Click a column to explore the relational cell (here b_k tracks equality). Everything else is abstracted by ∀ clauses.

( k, ak1, ak2, bk )

Pick a cell to track.

One paired cell replaces two whole arrays, so array-relational reasoning becomes scalar CHC: no array theory, no quantifier over the arrays.

The scalar CHC template

For this talk, we present three levels. Every encoding has the same skeleton; only the state and step cases grow.

shared CHC skeleton

Init:    Inv( i, ...state... )Step:    Inv(s') :- Inv(s), case at the tracked cellQuery:   the property / bound holds at every terminal stateProbes:  main (sat) + negation (unsat) + vacuity (unsat)

L1fixed relational cell
k, a_k1, a_k2, b_k
HIT / MISS at the fixed cell k.
monotonicity · order · NI
L2search + prophecy
+ d, d0, c
PickK searches for related cells, then settles when prophecy is fulfilled.
relative cost c ≤ n - d0
L3async scheduler
+ TT / TF / FT
Scheduler aligns runs whose source loops advance at different paces.
stride-mismatched async

Same Init / Step / Query / Probes throughout. L1 starts with the relational cell; L2 adds counters and PickK; L3 adds the async scheduler.

Backend: pfwCSP + PCSat

The clause templates become pfwCSP queries, and every evaluated encoding in this talk is solved by PCSat.

Array programstwo runs, arrays, relational goal

→

Relational cell morphingscalar state and proof clauses

→

pfwCSP .clpCHC-style extension for PCSat

→

PCSatsynthesizes predicate interpretations

→

sat / unsatproof and probe evidence

Language

pfwCSP is predicate-family-wise ∀-∃ constrained Horn clauses: a CHC-style generalization expressive enough for our encodings.

ordinary predicates functional predicates well-founded predicates

Tool

PCSat [Unno, Terauchi, Gu, Koskinen, CAV'21] synthesizes the predicate interpretations and invariants needed to satisfy those queries.

sat in our proof tables means the proof query is satisfiable. For negation and vacuity probes, unsat rules out the corresponding bad or vacuous witness.

Future backend standard CHC output

The three levels: L1, L2, L3

Two orthogonal axes: how the cell is tracked (fixed → PickK) and how the two runs are scheduled (sync → async).

1 / 5 · two axesRead the levels as two axes: cell tracking and scheduling.

	L1 21	L2 22	L3 26
Cell tracking	fixed cell k	PickK + prophecy	PickK
Scheduling	synchronous	synchronous	merged async scheduler
New state	b_k	d, d0, c	TT / TF / FT
Proves	monotonicity, NI, sensitivity	relational cost c ≤ n − d0	stride-mismatched cost

Running example: CountPositive cost = #{ i : A[i] > 0 }. L1 and L2 share the same synchronous program and differ only in the encoding; L3 changes the loop structure.

Synchronous program (L1 & L2)

P1:  for i = 0 to n-1:  if A1[i]>0: c1++P2:  for i = 0 to n-1:  if A2[i]>0: c2++

L1

fixed cell k

c ≤ n − 1

one tracked relational cell for the whole proof

→

L2

PickK refocuses

c ≤ n − d0

searches candidate positions; d,d0,c make the same program prove the tight bound

Stride-mismatched program (L3)

P1:  for i = 0 to n-1:   if A1[i]>0: c1++   (stride 1)P2:  for i = 0,2,4,…:    if A2[i]>0: c2++   (stride 2)merged scheduler re-aligns the paces⇒  c ≤ n − d0   (async, still tight)

Total: 69 mains with Tier-3 status across L1, L2, L3 (21 + 22 + 26; locked 2026-06-02).

Level 1 pfwCSP encoding: arbitrary cell k

Level 1 tracks one arbitrary array index k. The proof obligation is parametric in k, so discharging the scalar clauses proves the relational obligation uniformly for every choice of k.

1

array relation ∀j. 0 ≤ A1[j] ≤ A2[j]

The assumed relation between the two runs' input arrays.

2

arbitrary cell 0 ≤ k < n; 0 ≤ A1[k] ≤ A2[k]

One distinguished index, left free; the proof is uniform in k.

3

scalar state Inv(i, k, ak1, ak2, m1, m2, n)

The relational cell plus running maxes. No arrays reach the solver query.

4

pfwCSP clauses Init / HIT / MISS / Query

The four clause shapes the next slide assembles one by one.

All four ingredients are scalar: no array theory reaches the solver. Next: assemble them into the Level 1 clauses, one component at a time.

The same template proves robustness and non-interference; only the cell relation b_k changes: |ak1−ak2| ≤ 1 (robustness), ak1 = ak2 (NI).

Level 1 components, one by one

Worked example: ArrayMax monotonicity. Step (→) to assemble the seven Level 1 components; each lights up the matching clause in the encoding.

// ArrayMax (nonnegative):  m := 0;  for i: if A[i] > m: m := A[i]// prove:     ∀i. 0 ≤ A1[i] ≤ A2[i]  ⇒  max(A1) ≤ max(A2) State: Inv(i, k, ak1, ak2, m1, m2, n)         ak1 = A1[k], ak2 = A2[k];  b_k is  0 ≤ ak1 ≤ ak2 Init:  Inv(0, k, ak1, ak2, 0, 0, n) :-         0 ≤ k < n,  0 ≤ ak1 ≤ ak2,  n > 0 HIT  (i=k):  m1' = max(m1, ak1),  m2' = max(m2, ak2)MISS (i≠k):  m1' ≥ m1,  m2' ≥ m2     // others abstracted       both branches keep   m1' ≤ m2'     (invariant) Query: m1 ≤ m2   at  i = n

Level 1 component ledger

0 / 7

1

Relational goal∀j. 0 ≤ A1[j] ≤ A2[j] ⇒ max(A1) ≤ max(A2)

Start from the relational property we want to prove between the two runs.

2

Arbitrary cell + scalar statek arbitrary; Inv(i,k,ak1,ak2,m1,m2,n)

Replace the arrays by one arbitrary tracked cell and scalar loop state.

3

Init clause0 ≤ k < n ∧ 0 ≤ ak1 ≤ ak2

Initialize the tracked-cell relation before the loop starts.

4

Transition case: HITi = k ⇒ use ak1, ak2 exactly

At the tracked index, update both executions with the concrete cell values.

5

Transition case: MISSi ≠ k ⇒ abstract other cells

Away from k, abstract the untracked cell values while keeping the scalar effects sound.

6

Invariant obligationHIT and MISS both preserve m1' ≤ m2'

The same invariant must survive both the exact and abstract transitions.

7

Query clausei = n ⇒ m1 ≤ m2

At loop exit, the invariant gives the desired relation between the maxima.

L2: Sync + PickK + Prophecy

1 / 5 · L2 goalL2 adds refocus plus prophecy: find d₀ β-positions, then prove c ≤ n − d₀.

SEARCH

SEARCH grows d

Fresh L2 trace: PickK refocuses kp, finding β positions until d = d₀.

PROPHECY

d = d₀ fulfills prophecy

Found the promised d₀ = |β| equal positions.

SETTLED + BOUND

c ≤ n − d₀ ends tight

SETTLED lets c grow worst case while preserving the target bound.

d / d₀
0 / 3
c
0
n − d₀
3

Press Step to scan positions.

Final: c = 3 ≤ n − d₀ = 3 is tight. PickK discovered the 3 equal positions in SEARCH; the 3 unequal positions each added ≤ 1 to c in SETTLED.

L2 — Property class coverage (22 examples)

Click a class to expand its sub-buckets.

Cost upper bound 16

Sensitivity (incl. subtypes) 6

Qualitative 0

Click a class to see its sub-buckets…

Locked corpus status: 59 FULL, 7 PARTIAL, 0 FAIL, 3 RETRY. Per-file details in report/TIER3_VERIFICATION_RECORDS.md.

L3 scheduler: TT / TF / FT, then SEARCH / SETTLE

1 / 5 · scheduler addedL3 adds a proof scheduler: it pairs source transitions without changing the source loops.

Who steps next?

TTat shared pointboth runs take their next source transition in the same proof step

TFi₁ < i₂advance P1 while P2 waits

FTi₂ < i₁advance P2 while P1 waits

Inside TT: two phases

SEARCHd < d₀

PickK refocuses kp := i₁ = i₂. Equal cells can increase d; unequal cells can increase c.

SETTLEd ≥ d₀

Prophecy threshold met. Keep kp = k; stop relying on further discovery; allow worst-case c.

stride 2v3 epoch: scheduler repairs pace mismatch

(0,0)TT / SEARCH (2,3)TF (4,3)FT (4,6)TF (6,6)TT / SETTLE

A TT starts at a shared comparison point, then both source loops move once. TF / FT catch up until the next joint proof step is available.

TF/FT repair pace mismatch. TT is where PickK searches for β positions, then settles once d ≥ d₀.

Scheduler adequacy: proof steps, not source steps

1 / 5 · reviewer concernThe concern: did L3 prove the program, or bake in a favorable schedule?

1

Source coverage

The modeled async-step cases are exhaustive: TT, TF, or FT. Each has a corresponding L3 transition case.

both stepTTP1 onlyTFP2 onlyFT

2

Lower bounds

During SEARCH, lower-bound clauses constrain proof traces to make progress toward promised β positions. They add invariant obligations, not source assumptions.

d < d₀ ⇒ require catch-up proof steps

3

Phase switch

Once d ≥ d₀, SETTLE stops relying on further discovery. It fixes kp = k and over-approximates cost gap c.

SEARCH → SETTLE

The scheduler guides the proof trace; it does not change the source trace.

Adequacy status: informal forward-simulation obligation. Evidence: transition-case audit + solver probes; formal forward-simulation proof in progress.

TT/SEARCH on CountPositive: aligned source cell -> CHC update

1 / 6 · source frameA TT/SEARCH step starts only when both executions are aligned.

Running source example
P1: if A1[i1] > 0: cost1++
P2: if A2[i2] > 0: cost2++
c = cost1 - cost2

Names used below

beta hitaligned cells match; record d' = d + 1

HIT-unequalaligned cells differ; record d' = d and use guards for c

Step 0: source frame

This TT/SEARCH step records

schedulerSchTT_search

tracked indexnext: kp = i1 = i2

branchsource frame

updatestep to inspect the first aligned cell

We run CountPositive on two executions. A TT/SEARCH proof step only applies when the runs are aligned, so the source position becomes the tracked cell.

In TT/SEARCH, the scheduler only gives us the aligned step. Equality decides beta progress; the source guards decide the cost-gap update.

Payoff: L3 verifies all source-state stride benchmarks

1 / 5 · encoding templateThe 2v3 encoding shows the whole L3 template in one concrete case.

2v3 example encoding lcm 6

        Inv(i1,i2,...,d,d0,c)
        TT/SEARCH: i1=i2, d<d0, kp=i1
          equal selected cells + bk=1 => d'=d+1, c'=c
          unequal cases adjust c by guards
          i1'=i1+2, i2'=i2+3
        TF: i1<i2, d<d0 => i1'=i1+2
        FT: i2<i1, d<d0 => i2'=i2+3
        SETTLE: d>=d0
        GOAL: 2*c + 2*d0 <= n + 1
      

forced SEARCH schedule 2v3

(0,0)TT/search (2,3)TF (4,3)FT (4,6)TF (6,6)

Bidirectional catch-up forces the next alignment event, so PickK can discover fresh beta positions until d = d0.

LCM = least common multiple of the two strides: the catch-up period before both loops align again.

Same source-state L3 template, larger LCM alignments: 2v5 and 3v4 are FULL here; their execost versions are RETRY on the next slide.

1v2lcm 2FULLsat,20

1v3lcm 3FULLsat,13

2v4lcm 4FULLsat,12

2v3lcm 6FULLsat,27

2v5lcm 10FULLsat,16

3v4lcm 12FULLsat,28

These chips are source-state results. Execost: 2v4 is FULL; 2v5/3v4 are RETRY (P omitted; N/V hold). The sat,N labels are locked PCSAT evidence entries, not runtimes or a speed comparison.

L3 — Both cost models for the stride family

Each stride pair is encoded under two cost models. Click a model to see what it counts.

Stride	Source-state (c = source cost1 − cost2)	V1 strict (c = per-construct trace cost diff)
1v2	sat,20 / unsat,2	sat,54 / unsat,2
1v3	sat,13 / unsat,2	sat,43 / unsat,2
2v3	sat,27 / unsat,3	sat,51 / unsat,3
2v4	sat,12 / unsat,2	sat,60 / unsat,2
2v5	sat,16 / unsat,1	deferred (template limitation)
3v4	sat,28 / unsat,3	deferred (template limitation)

4 of 6 stride pairs verified under BOTH cost models. The two deferred V1 cases are solver-template limitations, not counterexamples; the source-state versions are FULL.

Experimental results across 69 Tier-3 pfwCSP cases

59

FULL

required probes completed

7

PARTIAL

not all required probes completed or confirmed

3

RETRY

P probe omitted on retry list

0

FAIL

no expected probe contradicted

Status legend

P = positive bound · N = negated goal · V = vacuity check

FULLP sat + N unsat + V unsat; L2 also has V2 unsat.

PARTIALAt least one expected match is present, but not all required probes completed or confirmed.

RETRYP probe omitted because the main is on the retry list.

FAILAn expected probe is contradicted. Here: 0.

Corpus breakdown

By encoding level

L1 21L2 22L3 26

By property class

cost 41sensitivity 16qualitative 12

Most obligations fully verify; the remaining non-FULL cases are methodology-justified partials, not counterexamples.

Locked headline: 59 FULL / 7 PARTIAL / 0 FAIL / 3 RETRY.

Honest limitations

01

Solver/template incompleteness

Partial results mean PCSAT did not synthesize every positive proof; they do not imply invalid encodings.

7 PARTIAL; 3 RETRY

02

Corpus scope

The examples exercise mechanisms, not broad benchmark coverage over arbitrary array programs.

focused examples, not coverage

03

Evidence boundary

pfwCSP probes validate encoded obligations; source adequacy and rule soundness are separate proof obligations.

encoding evidence ≠ full metatheory

Roadmap: logic, automation, benchmarks

Logic

Finish the companion proof story

Complete the Level-3 awhile-repac rule and the source-to-pfwCSP adequacy argument.

Automation

Strengthen solver strategy

Improve invariant templates and search strategy for LCM-heavy schedules and partial positive probes.

Benchmarks

Broaden the array corpus

Add new program families while preserving the Tier-3 probe discipline from the current evaluation.

Direction: keep the pfwCSP encoding practical, and make the logic story explain exactly why the automation is sound.

Implementation + reproducibility

69 mains with Tier-3 status: 59 FULL, 7 PARTIAL, 0 FAIL, 3 RETRY. Reproducible with CoAR and the local Docker image (coar:latest, built locally; build steps available on request).

$ docker run --rm \    -v "$(pwd)":/root/coar/example \    coar:latest bash -c "      cd /root/coar &&      ./main.exe -c ./config/solver/pcsat_tbq_ar.json -p pcsp \        ./example/encodings/level-2/demo2a_countpositive_pickk_sync_pos.clp"

Press Run to see the recorded solver output.

Solver probes verify the pfwCSP encodings; paired with the source-to-pfwCSP adequacy argument from the logic side.

Recap

69 mains with Tier-3 status: 59 FULL, 7 PARTIAL, 0 FAIL, 3 RETRY
One abstraction: relational cells + scalar CHCs across cost, sensitivity, qualitative classes
PickK + prophecy: discover shared positions and prove bounds such as c ≤ n − d_0
L3 merged scheduler: keeps stride-mismatched async runs relationally aligned
Honest boundary: probes verify encodings; source adequacy is argued separately

Q & A

Thank you!

What is cell morphing?

An array program can be encoded as CHCs directly, but that needs array theory: invariants must quantify over every element, which CHC solvers handle poorly. Cell morphing avoids array theory.

1 / 4 · array CHCThe direct CHC invariant still carries the whole array.

1. array CHC 2. quantified prefix 3. arbitrary k 4. scalar invariant

Start with the direct CHC view: the invariant talks about the whole array A.

Naive: CHC over array theory

Invariants must quantify over all elements, e.g. ∀j. P(A[j]). The proof is valid, but quantified array reasoning is solver-hostile.

Cell morphing

Monniaux & Gonnord, SAS'16: track one arbitrary free index k. At access i, HIT if i = k; otherwise MISS.

Example: prove every element is positive (for i: assert A[i] > 0). Same property, two encodings:

Array-theory invariant

Inv(i, A)          // A is an arrayneed:  ∀j. 0 ≤ j < i  →  A[j] > 0∀ over the array → solver stalls

Cell-morphing invariant

Inv(i, k, a_k)     // all scalarsneed:  0 ≤ k < i  →  a_k > 0scalar; k is a free CHC variable

1 / 4

Where Level 2 stops: source loops expose the gap

1 / 5 · same paceL2 works when both runs can take the next source transition together.

L2 works: same pace

P1for i1 step +1: if A1[i1] > 0: c1++

P2for i2 step +1: if A2[i2] > 0: c2++

One TT proof step advances both. PickK can search β positions and grow counter d.

L2 breaks: source paces differ

stride 1v2i1 += 1 | i2 += 2

stride 2v3i1 += 2 | i2 += 3

InplaceMap 1v2A[i] = f(A[i]), pace 1 vs 2

heterogeneous 1v2guards differ, pace 1 vs 2

red example: stride 2v3 needs catch-up states

(0,0)TT (2,3)TF (4,3)FT (4,6)TF (6,6)

TT takes both next transitions. TF / FT advance one run while the other waits, realigning the next comparison.

Need Level 3: a merged scheduler with TT / TF / FT transitions keeps PickK aligned across different paces.