Introducing the SIEVE Circuit-IR: Retrospective

Written: , Published: , Author: Kimberlee Model, Tags: SIEVE IR, Circuit-IR, v2.1.0, Retrospective,


In prior posts to this blog, we’ve certainly aired our grievances with the prior IR release (v1, whereas we’re now at v2). You may want to go back and skim the previous retrospective and our BOLT fast interpreter. But if you don’t want to review those, IR1’s (IR1 was the old IR’s code name) memory layout was completely disorganized, and some of its higher level features proved difficult (@switch) or slow (@for) to implement. We spent a bit of time developing the BOLT interpreter with a few tricks for greatly accelerating how loops could be processed.

Previously, our team came up with a plan for a two layered approach to the IR (plan and proposal):

  • a low level to approximate a largely flat circuit

  • a layer with higher level control flow, translatable to both the lower layer and to other high level representations (such as our EMP-ZK’s C++).

The intention of the multi layer approach was that translating a high level IR to a flat circuit would enable flexibility and backwards compatibility, while the high level IR could retain additional semantic information for better speed up.

So how does this new release fit into our earlier development plan? This revision is a development of the aforementioned low level flat(ish) circuit, which we’re aptly calling the Circuit-IR.

The higher level "Translation-IR" is still in development, as it faced a bit of push back from other performers within the program, so we won’t discuss it in this post.

At the end of the post about BOLT, we challenged ourselves to out perform IR1 BOLT purely on improvements to the IR. We expected that further semantic information in a higher level IR would be necessary to do this. Instead, we managed to outperform IR1 with Circuit-IR.

We credit this performance to two developments:

  1. Improvements to the memory layout which minimizes interpreter overhead.

  2. A better balance between minimizing relation IO against interpreter overhead

For the matrix-product circuit on a 350 sized square matrix, we see a marginal improvement from the loops/bolt column to the Circuit-IR functions column (unfortunately we see a disprovement on flat/streaming, but that’s not a column we actually care much about). This inspired us to try generating an IR1 relation using the equivalent function structure and run it with bolt. The result was almost twice as fast as any other column.

Note
These times were taken with the Non-ZK interpreters, and represent a lower bound on backend performance.

The memory usage of each also paints a story. The memory layout of IR1 was complicated enough that we were never able to fully implement the @delete directive (IR1 didn’t have a @new). This is very evident when comparing the flat/streaming columns. the loops/bolt column has the lowest memory usage of all columns. In bolt, memory use is generally proportional to the size of the Parse Tree, so the functions/bolt memory use grows a lot.

  IR1 flat/streaming IR1 loops/bolt Circuit-IR flat/streaming Circuit-IR functions/streaming IR1 functions/bolt

size

2.9G

2.4K

2.9G

24M

24M

time (s)

7.033s

3.294s

7.838s

3.051

1.867s

mem (k)

2111476

12800

20692

36016

145700

We are extremely pleased with the Circuit-IR. It managed to encode enough semantic information to significantly improve upon IR1, its memory efficiency is great, and it challenged our instincts on what makes for an ideal IR. We are strongly tempted to develop some sort of hybrid bolt/streaming interpreter for it.


This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001120C0087. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Distribution Statement "A": Approved for Public Release, Distribution Unlimited