1. Introduction
PFSS is a library for function secret sharing (FSS).
The C API is defined in the
src/c_cpp/include/pfss.h
file.
To build PFSS, you’ll need make
, a C compiler, and a C++ compiler.
PFSS requires CPU or library support for certain cryptographic
operations.
This can be either the AES-NI extension for x86 family CPUs, the
Cryptographic Extension for ARM family CPUs, or the
Nettle library.
Nettle should be readily available through most package managers.
For example, on Ubuntu, Nettle can be installed with
sudo apt-get install nettle-dev
.
If you have AES-NI or ARM Crypto, Nettle is not required.
If you have none of these, make
will fail.
You can build and install PFSS with the following commands:
./configure CFLAGS='-O2 -march=native' CXXFLAGS='-O2 -march=native'
make
sudo make install
If you don’t want to install PFSS to your system, you can replace the
sudo make install
command with make DESTDIR="$(pwd)"/tmp install
to
perform a staged install to a local directory tmp
.
You can then use those files however you want.
2. Build examples
This section provides several examples of building and installing PFSS.
These examples will assume that we’re building an application that’s
particularly interested in domain_bits
= 80 and range_bits
= 16.
Building and installing on a typical x86-64 machine:
# Enable template optimizations only for (80,16).
echo "80 16" >switch_db_rb.cfg
# Set up our compiler flags.
export CPPFLAGS="-DNDEBUG"
export CFLAGS="-O2 -march=native"
export CXXFLAGS="$CFLAGS"
# Run ./configure, ensuring that it detects AES-NI.
./configure --with-aes-ni
# Compile, run the test suite, and install.
make
make check
sudo make install
Cross compiling for Android ARMv8-A on a typical x86-64 Linux machine:
# Enable template optimizations only for (80,16).
echo "80 16" >switch_db_rb.cfg
# Download and extract the Android NDK.
wget https://dl.google.com/android/repository/android-ndk-r21c-linux-x86_64.zip
unzip android-ndk-r21c-linux-x86_64.zip
# Set up the environment as recommended by
# <https://developer.android.com/ndk/guides/other_build_systems>.
export NDK=$(pwd)/android-ndk-r21c
export TOOLCHAIN=$NDK/toolchains/llvm/prebuilt/linux-x86_64
export TARGET=aarch64-linux-android
export API=21
export AR=$TOOLCHAIN/bin/$TARGET-ar
export AS=$TOOLCHAIN/bin/$TARGET-as
export CC=$TOOLCHAIN/bin/$TARGET$API-clang
export CXX=$TOOLCHAIN/bin/$TARGET$API-clang++
export LD=$TOOLCHAIN/bin/$TARGET-ld
export RANLIB=$TOOLCHAIN/bin/$TARGET-ranlib
export STRIP=$TOOLCHAIN/bin/$TARGET-strip
# Set up our compiler flags.
export CPPFLAGS="-DNDEBUG"
export CFLAGS="-O2 -march=armv8-a+crypto"
export CXXFLAGS="$CFLAGS"
# Run ./configure, ensuring that it detects ARM Crypto and JNI.
./configure --build x86_64-pc-linux-gnu --host $TARGET \
--with-arm-crypto --with-jni
# Compile and do a staged install.
make
make DESTDIR=$(pwd)/staged install
# List the files of the staged install.
find staged ! -type d
# You may also need the libc++_shared.so library from the NDK:
# $NDK/sources/cxx-stl/llvm-libc++/libs/arm64-v8a/libc++_shared.so
3. Hardware acceleration
PFSS can optionally be compiled to use special CPU instructions for
increased performance.
The ./configure
step of the build system automatically detects which
of these instructions are available under the specified compiler flags
and enables them in the code accordingly.
The terms build system and host system are generally used to refer to the system on which you are compiling the code and the system on which you will be running the compiled code, respectively. This overloads the term “build system”, which is also generally used to refer to the overall build system, but the meaning should be clear from context.
3.1. x86 acceleration
For the x86 family of CPUs, there is one CPU instruction set that can be enabled: AES-NI.
You can generally enable AES-NI by adding -maes
to both CFLAGS
and
CXXFLAGS
.
If you’re not cross-compiling, it’s usually sufficient to specify
-march=native
without any other -m
options.
You can also specify --with-aes-ni
to ./configure
to instruct it to
verify that AES-NI was detected, failing if not.
3.2. ARM acceleration
For the ARM family of CPUs, there is one CPU instruction set that can be enabled: ARM Crypto.
You can generally enable ARM Crypto by adding -march=armv8-a+crypto
to
both CFLAGS
and CXXFLAGS
.
You can adjust this flag as needed, it is only the +crypto
extension
that is required.
If you’re not cross-compiling, it’s usually sufficient to specify
-march=native
without any other -m
options.
You can also specify --with-arm-crypto
to ./configure
to instruct it
to verify that ARM Crypto was detected, failing if not.
4. Logging
PFSS includes a logging layer that can be enabled at build time by
specifying the --with-logging
option to ./configure
.
When logging is enabled and a log file has been set by calling the
pfss_set_log_file
function, every call to the PFSS C API will
write a line to the log file with information about the call.
Each line has the following format, which is based on the format used by
the strace
utility:
time func(args) = ret <dura>
time
-
The Unix time at which the call was made, in nanoseconds.
func
-
The name of the function that was called.
args
-
The arguments that were passed to the function.
ret
-
The value the function returned.
dura
-
The amount of time the call took, in nanoseconds.
5. Benchmarking
PFSS provides a command-line utility named pfss-benchmark
for
benchmarking the PFSS library.
6. C API
6.1. The pfss_set_log_file
function
FILE * pfss_set_log_file(
FILE * log_file
);
The pfss_set_log_file
function sets the logging
destination to log_file
and returns the previous logging destination.
A null pointer indicates logging is disabled.
If logging was not enabled at build time with
./configure --with-logging
, getting and setting the logging
destination still behaves normally, but logging is always disabled.
6.2. The pfss_get_log_file
function
FILE * pfss_get_log_file(
void
);
The pfss_get_log_file
function returns the logging
destination.
A null pointer indicates logging is disabled.
If logging was not enabled at build time with
./configure --with-logging
, getting and setting the logging
destination still behaves normally, but logging is always disabled.
6.3. The pfss_status
type
typedef uint32_t pfss_status;
The pfss_status
type represents a status code of a function call.
Most PFSS functions return a pfss_status
code.
pfss_status
is guaranteed to be an alias of uint32_t
.
Each status code is defined as a macro that expands to an integer
constant expression that can be used in both normal code and in
preprocessor arithmetic.
In normal code, the type of the expression is pfss_status
.
The status codes and their meanings are listed below. Note that some meanings are more descriptive than others. Functions generally return the most descriptive status code they can.
-
0 (0x0)
PFSS_OK
The function succeeded. -
1 (0x1)
PFSS_UNKNOWN_ERROR
The function failed. -
2 (0x2)
PFSS_INVALID_ARGUMENT
The function failed because of an invalid argument. -
3 (0x3)
PFSS_NULL_POINTER
The function failed because of an invalid argument. In particular, an argument was incorrectly a null pointer. -
4 (0x4)
PFSS_INVALID_DOMAIN
-
5 (0x5)
PFSS_INVALID_RANGE
-
6 (0x6)
PFSS_UNSUPPORTED_DOMAIN_AND_RANGE
-
7 (0x7)
PFSS_DOMAIN_OVERFLOW
-
8 (0x8)
PFSS_RANGE_OVERFLOW
-
9 (0x9)
PFSS_MALFORMED_KEY
6.4. The pfss_get_status_name
function
char const * pfss_get_status_name(
pfss_status status
);
The pfss_get_status_name
function returns a pointer to a constant
string with static storage duration that holds the name of the status
code status
.
If status
is not a valid status code, the string will be
"PFSS_UNKNOWN_STATUS"
.
6.5. The pfss_endianness
type
typedef uint32_t pfss_endianness;
The pfss_endianness
type describes the byte order of an integer value.
The endianness constants and their meanings are listed below.
-
0 (0x0)
PFSS_NATIVE_ENDIAN
The native byte order of the system. -
1 (0x1)
PFSS_LITTLE_ENDIAN
Little endian byte order. -
2 (0x2)
PFSS_BIG_ENDIAN
Big endian byte order.
6.6. The pfss_gen_sizes
function
pfss_status pfss_gen_sizes(
uint32_t domain_bits,
uint32_t range_bits,
uint32_t * key_blob_size,
uint32_t * rand_buf_size
);
The pfss_gen_sizes
function determines how big each of the key blobs
generated by the pfss_gen
function will be for the given domain_bits
and range_bits
values, and how many random bytes will be needed to
generate them.
6.7. The pfss_gen
function
pfss_status pfss_gen(
uint32_t domain_bits,
uint32_t range_bits,
uint8_t const * alpha,
uint32_t alpha_size,
pfss_endianness alpha_endianness,
uint8_t const * beta,
uint32_t beta_size,
pfss_endianness beta_endianness,
uint8_t * key1_blob,
uint8_t * key2_blob,
uint8_t const * rand_buf
);
The pfss_gen
function generates two key blobs that form a secret
sharing of the function
\(f : \{0,1\}^\mathtt{domain\_bits} \rightarrow \{0,1\}^\mathtt{range\_bits}\)
defined by \(f(\mathtt{alpha}) = \mathtt{beta}\) and \(f(x) = 0\)
everywhere else.
domain_bits
and range_bits
should both be nonzero.
alpha
should be an unsigned integer that consists of alpha_size
bytes in
alpha_endianness
byte order.
The bits are ordered from least to most significant, and any bits above
the least significant domain_bits
bits are ignored.
alpha_size
should be nonzero.
beta
should be an unsigned integer that consists of beta_size
bytes in
beta_endianness
byte order.
The bits are ordered from least to most significant, and any bits above
the least significant range_bits
bits are ignored.
beta_size
should be nonzero.
The key blobs will be written to the key1_blob
and key2_blob
arrays.
The size of both arrays should be the size obtained by a call to the
pfss_gen_sizes
function with the same
domain_bits
and range_bits
values.
rand_buf
should be an array of random bytes.
The size of the array should be the size obtained by a call to the
pfss_gen_sizes
function with the same
domain_bits
and range_bits
values.
6.8. The pfss_map_gen
function
pfss_status pfss_map_gen(
uint32_t domain_bits,
uint32_t range_bits,
uint8_t const * alphas,
uint32_t alphas_count,
uint32_t alpha_size,
pfss_endianness alpha_endianness,
uint8_t const * betas,
uint32_t beta_size,
pfss_endianness beta_endianness,
uint8_t * key1_blobs,
uint8_t * key2_blobs,
uint8_t const * rand_bufs
);
The pfss_map_gen
function creates FSS key pairs for each of the
alphas_count
elements of alphas
and betas
, storing the key pairs
inside key1_blobs
and key2_blobs
.
Each element of alphas
should be an unsigned integer that consists of
alpha_size
bytes in alpha_endianness
byte order.
Each element of betas
should be an unsigned integer that consists of
beta_size
bytes in beta_endianness
byte order.
Each element in alphas should be less than 2^domain_bits and each
element of betas should be less than 2^range_bits. As long as the
values of each element in alphas and betas are in range, alpha_size and
beta_size can be arbitrarily large or small, but they must not be zero.
key1_blobs
and key2_blobs
should be containers consisting of
alphas_count
* key_blob_size
bytes.
Similarly, rand_bufs
should be a container consisting of
alphas_count
* rand_buf_size
bytes.
The function returns PFSS_OK
upon success, or another status code upon
failure.
6.9. The pfss_key
type
typedef struct pfss_key pfss_key;
The pfss_key
type represents a key that has been parsed from a key
blob.
6.10. The pfss_parse_key
function
pfss_status pfss_parse_key(
pfss_key * * p_key,
uint8_t const * key_blob,
uint32_t key_blob_size
);
The pfss_parse_key
function parses a key blob.
6.11. The pfss_destroy_key
function
pfss_status pfss_destroy_key(
pfss_key * key
);
The pfss_destroy_key
function destroys a key.
6.12. The pfss_get_domain_bits
function
pfss_status pfss_get_domain_bits(
pfss_key const * key,
uint32_t * domain_bits
);
The pfss_get_domain_bits
function sets *domain_bits
to the number of
domain bits in *key
.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
6.13. The pfss_get_range_bits
function
pfss_status pfss_get_range_bits(
pfss_key const * key,
uint32_t * range_bits
);
The pfss_get_range_bits
function sets *range_bits
to the number of
range bits in *key
.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
6.14. The pfss_eval
function
pfss_status pfss_eval(
pfss_key const * key,
uint8_t const * x,
uint32_t x_size,
pfss_endianness x_endianness,
uint8_t * y,
uint32_t y_size,
pfss_endianness y_endianness
);
The pfss_eval
function evaluates a key at a single domain element to
produce a single range element.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
6.15. The PFSS_DEFINE_DIRECT_EVAL
macro
#define PFSS_DEFINE_DIRECT_EVAL(f, domain_bits, range_type) \
static range_type f(void const * k, void const * x) { … }
The PFSS_DEFINE_DIRECT_EVAL
macro defines a self-contained function
f
that evaluates an unparsed key blob k
at a domain value x
.
f
can be any name of your choice.
domain_bits
should be an integer constant expression with type int
whose value is between 1 and 128 inclusive.
range_type
should be either uint8_t
, uint16_t
, uint32_t
, or
uint64_t
.
When calling f
, k
should be a key blob that was output by the
pfss_gen
function with the same number of domain bits and range bits
as specified by domain_bits
and range_type
, and x
should be an
unsigned integer that consists of
\(\lceil\mathtt{domain\_bits}/8\rceil\) bytes in little endian byte
order.
Any bits above the least significant domain_bits
bits of x
are
ignored.
The return value is the result of the evaluation.
The code of f
is valid as both C and C++.
However, f
currently requires AES-NI support via the -maes
compiler
option.
This option is normally handled by the PFSS build system, but since
you’ll be compiling f
yourself, you’ll need to provide this option
yourself.
In the future, it may be generalized to work on other systems.
The following code uses the PFSS_DEFINE_DIRECT_EVAL
macro to define a
self-contained function
static uint16_t my_eval(void const * k, void const * x) { … }
that
evaluates an unparsed key blob k
with 128 domain bits and 16 range
bits at a domain value x
:
#include <pfss.h>
#include <stdint.h>
PFSS_DEFINE_DIRECT_EVAL(my_eval, 128, uint16_t)
6.16. The pfss_reduce_sum
function
pfss_status pfss_reduce_sum(
uint32_t range_bits,
uint8_t const * ys,
uint32_t ys_count,
uint32_t y_size,
pfss_endianness y_endianness,
uint8_t * z,
uint32_t z_size,
pfss_endianness z_endianness
);
The pfss_reduce_sum
function sets x
to the sum of the ys_count
elements of ys
.
Each element of ys
should be an unsigned integer that consists of
y_size
bytes in y_endianness
byte order.
The sum is taken modulo 2range_bits
and output as an unsigned
integer that consists of z_size
bytes in z_endianness
byte order.
The function returns PFSS_OK
upon success, or another status code upon
failure.
6.17. The pfss_map_eval
function
pfss_status pfss_map_eval(
pfss_key const * key,
uint8_t const * xs,
uint32_t xs_count,
uint32_t x_size,
pfss_endianness x_endianness,
uint8_t * ys,
uint32_t y_size,
pfss_endianness y_endianness
);
The pfss_map_eval
function evaluates a key over a list of domain
elements to produce a list of range elements.
xs
should point to the first byte of an array of xs_count
unsigned integers, each of which consists of x_size
bytes in
x_endianness
byte order.
ys
should point to the first byte of an array of xs_count
unsigned integers, each of which consists of y_size
bytes in
y_endianness
byte order.
It is also required that
\(\mathtt{y\_size} \ge \lceil r / 8 \rceil\),
where \(r\) is the number of range bits in the key.
The key is evaluated at each domain element in the xs
array and each
result is written to the corresponding range element in the ys
array.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
6.18. The pfss_map_eval_reduce_sum
function
pfss_status pfss_map_eval_reduce_sum(
pfss_key const * key,
uint8_t const * xs,
uint32_t xs_count,
uint32_t x_size,
pfss_endianness x_endianness,
uint8_t * y,
uint32_t y_size,
pfss_endianness y_endianness
);
The pfss_map_eval_reduce_sum
function evaluates a key over a list of
domain elements and sums the resulting range elements to produce a
single range element.
xs
should point to the first byte of an array of xs_count
unsigned integers, each of which consists of x_size
bytes in
x_endianness
byte order.
y
should point to the first byte of a range element (i.e., an
unsigned integer) that consists of y_size
bytes in y_endianness
byte order.
It is also required that
\(\mathtt{y\_size} \ge \lceil r / 8 \rceil\),
where \(r\) is the number of range bits in the key.
The key is evaluated at each domain element in the xs
array, the
resulting range elements are summed modulo \(2^r\), and the sum is
written to the range element pointed to by y
.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
6.19. The pfss_eval_all
function
pfss_status pfss_eval_all(
pfss_key const * key,
uint8_t const * xp,
uint32_t xp_bits,
pfss_endianness xp_endianness,
uint8_t * ys,
uint32_t y_size,
pfss_endianness y_endianness
);
The pfss_eval_all
function evaluates a key over a (partial) grid of
domain elements to produce a (partial) grid of range elements.
First, the \(\lceil \mathtt{xp\_bits} / 8 \rceil\) bytes pointed to
by xp
are interpreted as an unsigned integer in xp_endianness
byte
order.
All bits of this integer are ignored except for the least significant
xp_bits
bits, which form a value \(v\).
This value is shifted left by \(k = d - \mathtt{xp\_bits}\) bits to
form the first domain element \(x_1\) at which key
will be
evaluated, where \(d\) is the number of domain bits in key
.
In other words, \(x_1 = v \cdot 2^k\).
It is required that \(k \ge 0\), i.e., that
\(\mathtt{xp\_bits} \le d\).
Next, key
is evaluated at each successive domain element up to and
including \(x_{2^k} = x_1 + 2^k - 1\), and each result is written to
a specific location in the array of bytes pointed to by ys
.
The array of bytes is treated as an array of unsigned integers, each
consisting of y_size
bytes in y_endianness
byte order.
It is required that \(\mathtt{y\_size} \ge \lceil r / 8 \rceil\),
where \(r\) is the number of range bits in key
.
The evaluation result of each \(x_i\) is written to the \(x_i\)'th
element of the array.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
#include <cstdint>
#include <future>
#include <iostream>
#include <memory>
#include <pfss.h>
#include <random>
#include <stdexcept>
namespace {
void check_status(pfss_status const s) {
if (s != PFSS_OK) {
throw std::runtime_error(pfss_get_status_name(s));
}
}
} // namespace
int main() {
constexpr uint32_t domain_bits = 16;
constexpr uint32_t range_bits = 32;
constexpr uint16_t alpha = 12345;
constexpr uint32_t beta = 67890;
constexpr std::size_t domain_size = (std::size_t)1 << domain_bits;
// Find out how big our buffers need to be.
uint32_t key_blob_size;
uint32_t rand_buf_size;
check_status(pfss_gen_sizes(domain_bits,
range_bits,
&key_blob_size,
&rand_buf_size));
// Allocate our buffers.
std::vector<uint8_t> key1_blob((std::size_t)key_blob_size);
std::vector<uint8_t> key2_blob((std::size_t)key_blob_size);
std::vector<uint8_t> rand_buf((std::size_t)rand_buf_size);
// Fill rand_buf with random bytes. Note that std::random_device is
// not necessarily cryptographically secure. In practice you should
// use something with better guarantees.
std::random_device random_device;
for (uint8_t & r : rand_buf) {
r = (uint8_t)random_device();
}
// Generate the two key blobs.
check_status(pfss_gen(domain_bits,
range_bits,
(uint8_t const *)&alpha,
(uint32_t)sizeof(alpha),
PFSS_NATIVE_ENDIAN,
(uint8_t const *)&beta,
(uint32_t)sizeof(beta),
PFSS_NATIVE_ENDIAN,
key1_blob.data(),
key2_blob.data(),
rand_buf.data()));
// Parse the first key blob.
pfss_key * key1_ptr;
check_status(
pfss_parse_key(&key1_ptr, key1_blob.data(), key_blob_size));
std::unique_ptr<pfss_key, void (*)(pfss_key *)> const key1(
key1_ptr,
[](pfss_key * const k) { pfss_destroy_key(k); });
// Parse the second key blob.
pfss_key * key2_ptr;
check_status(
pfss_parse_key(&key2_ptr, key2_blob.data(), key_blob_size));
std::unique_ptr<pfss_key, void (*)(pfss_key *)> const key2(
key2_ptr,
[](pfss_key * const k) { pfss_destroy_key(k); });
// Allocate two grids.
std::vector<uint32_t> ys1(domain_size);
std::vector<uint32_t> ys2(domain_size);
// Start 8 threads to evaluate the first grid. The first thread will
// evaluate the domain elements whose bit patterns are 000x...x, the
// second thread 001x...x, the third thread 010x...x, and so on.
std::vector<std::future<void>> futures1;
for (uint8_t i = 0; i != 8; ++i) {
futures1.push_back(std::async(std::launch::async, [&key1, i, &ys1] {
check_status(pfss_eval_all(key1.get(),
&i,
3, // 2^3 = 8
PFSS_NATIVE_ENDIAN,
(uint8_t *)ys1.data(),
sizeof(ys1[0]),
PFSS_NATIVE_ENDIAN));
}));
}
// Start 16 threads to evaluate the second grid. The first thread will
// evaluate the domain elements whose bit patterns are 0000x...x, the
// second thread 0001x...x, the third thread 0010x...x, and so on.
std::vector<std::future<void>> futures2;
for (uint8_t i = 0; i != 16; ++i) {
futures2.push_back(std::async(std::launch::async, [&key2, i, &ys2] {
check_status(pfss_eval_all(key2.get(),
&i,
4, // 2^4 = 16
PFSS_NATIVE_ENDIAN,
(uint8_t *)ys2.data(),
sizeof(ys2[0]),
PFSS_NATIVE_ENDIAN));
}));
}
// Wait for all threads to complete.
for (std::future<void> & f : futures1) {
f.get();
}
for (std::future<void> & f : futures2) {
f.get();
}
// Collapse the two grids by summing all of their elements mod
// 2^{range_bits} (2^32) to verify that we get beta (67890) as the
// final result. Note that because the two keys came from the same
// pfss_gen call, ys1[i] + ys2[i] should be zero mod 2^32 everywhere
// except at alpha (12345), where it should be beta (67890).
unsigned long ys1_sum = 0;
unsigned long ys2_sum = 0;
for (auto const y : ys1) {
ys1_sum += (unsigned long)y; // naturally mod 2^n for some n >= 32
}
for (auto const y : ys2) {
ys2_sum += (unsigned long)y; // naturally mod 2^n for some n >= 32
}
ys1_sum &= UINT32_MAX; // mod 2^32
ys2_sum &= UINT32_MAX; // mod 2^32
unsigned long const sum = (ys1_sum + ys2_sum) & UINT32_MAX;
// Print out the results.
std::cout << "ys1_sum = " << ys1_sum << "\n";
std::cout << "ys2_sum = " << ys2_sum << "\n";
std::cout << "sum = " << sum << "\n";
}
ys1_sum = 3425594880
ys2_sum = 869440306
sum = 67890
6.20. The pfss_eval_all_sum
function
pfss_status pfss_eval_all_sum(
pfss_key const * const * keys,
uint32_t keys_count,
uint8_t * ys,
uint32_t y_size,
pfss_endianness y_endianness,
uint32_t thread_count
);
The pfss_eval_all_sum
function evaluates multiple keys over the
complete grid of domain elements and sums the resulting grids into an
existing grid.
keys
should point to an array of keys_count
keys.
All keys should have the same number of domain bits.
All keys should have the same number of range bits.
If keys_count
is zero, keys
will not be accessed but should
still be a valid pointer.
ys
should point to a complete grid of range elements, i.e., an
array of \(2^d\) unsigned integers where \(d\) is the number of
domain bits in each key.
Each unsigned integer should consist of y_size
bytes in
y_endianness
byte order.
y_size
should always be positive.
If keys_count
is zero, ys
will not be accessed but it should
still be a valid pointer.
Each key is evaluated at all domain elements to produce an intermediate
grid, which is then summed into ys
.
thread_count
specifies the number of threads to use to perform the
computation.
This number should always be positive.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
#include <cstdint>
#include <future>
#include <iostream>
#include <memory>
#include <pfss.h>
#include <random>
#include <stdexcept>
namespace {
void check_status(pfss_status const s) {
if (s != PFSS_OK) {
throw std::runtime_error(pfss_get_status_name(s));
}
}
} // namespace
int main() {
constexpr uint32_t domain_bits = 16;
constexpr uint32_t range_bits = 32;
constexpr uint16_t alpha = 12345;
constexpr uint32_t beta = 67890;
constexpr std::size_t domain_size = (std::size_t)1 << domain_bits;
// Find out how big our buffers need to be.
uint32_t key_blob_size;
uint32_t rand_buf_size;
check_status(pfss_gen_sizes(domain_bits,
range_bits,
&key_blob_size,
&rand_buf_size));
// Allocate our buffers.
std::vector<uint8_t> key1_blob((std::size_t)key_blob_size);
std::vector<uint8_t> key2_blob((std::size_t)key_blob_size);
std::vector<uint8_t> rand_buf((std::size_t)rand_buf_size);
// Fill rand_buf with random bytes. Note that std::random_device is
// not necessarily cryptographically secure. In practice you should
// use something with better guarantees.
std::random_device random_device;
for (uint8_t & r : rand_buf) {
r = (uint8_t)random_device();
}
// Generate the two key blobs.
check_status(pfss_gen(domain_bits,
range_bits,
(uint8_t const *)&alpha,
(uint32_t)sizeof(alpha),
PFSS_NATIVE_ENDIAN,
(uint8_t const *)&beta,
(uint32_t)sizeof(beta),
PFSS_NATIVE_ENDIAN,
key1_blob.data(),
key2_blob.data(),
rand_buf.data()));
// Parse the first key blob.
pfss_key * key1_ptr;
check_status(
pfss_parse_key(&key1_ptr, key1_blob.data(), key_blob_size));
std::unique_ptr<pfss_key, void (*)(pfss_key *)> const key1(
key1_ptr,
[](pfss_key * const k) { pfss_destroy_key(k); });
// Parse the second key blob.
pfss_key * key2_ptr;
check_status(
pfss_parse_key(&key2_ptr, key2_blob.data(), key_blob_size));
std::unique_ptr<pfss_key, void (*)(pfss_key *)> const key2(
key2_ptr,
[](pfss_key * const k) { pfss_destroy_key(k); });
// Allocate one main grid initialized to all zeros.
std::vector<uint32_t> ys(domain_size);
// Use 7 threads to evaluate each key into an ephemeral grid (as if by
// calling the pfss_eval_all function) and sum the ephemeral grids
// into the main grid.
std::vector<pfss_key const *> const keys = {key1_ptr, key2_ptr};
check_status(pfss_eval_all_sum(keys.data(),
keys.size(),
(uint8_t *)ys.data(),
sizeof(ys[0]),
PFSS_NATIVE_ENDIAN,
7));
// Collapse the main grid by summing all of its elements mod
// 2^{range_bits} (2^32) to verify that we get beta (67890) as the
// final result. Note that because the two keys came from the same
// pfss_gen call, the grid should be zero mod 2^32 everywhere except
// at alpha (12345), where it should be beta (67890).
unsigned long sum = 0;
for (auto const y : ys) {
sum += (unsigned long)y; // naturally mod 2^n for some n >= 32
}
sum &= UINT32_MAX; // mod 2^32
// Print out the results.
std::cout << "sum = " << sum << "\n";
}
sum = 67890
6.21. The pfss_eval_all_dot
function
pfss_status pfss_eval_all_dot(
pfss_key const * const * keys,
uint32_t keys_count,
uint8_t const * ys,
uint32_t y_size,
pfss_endianness y_endianness,
uint8_t * zs,
uint32_t z_size,
pfss_endianness z_endianness,
uint32_t thread_count
);
The pfss_eval_all_dot
function evaluates multiple keys over the
complete grid of domain elements and computes the dot product of each
resulting grid with an existing grid.
keys
should point to an array of keys_count
keys.
All keys should have the same number of domain bits.
All keys should have the same number of range bits.
If keys_count
is zero, keys
will not be accessed but should
still be a valid pointer.
ys
should point to a complete grid of range elements, i.e., an
array of \(2^d\) unsigned integers where \(d\) is the number of
domain bits in each key.
Each unsigned integer should consist of y_size
bytes in
y_endianness
byte order.
y_size
should always be positive.
If keys_count
is zero, ys
will not be accessed but it should
still be a valid pointer.
zs
should point to an array of keys_count
unsigned integers.
Each unsigned integer should consist of z_size
bytes in z_endianness
byte order.
Each key is evaluated at all domain elements to produce an intermediate
grid.
The dot product of the intermediate grid is taken with ys
, and the
result is written to the corresponding element of zs
.
The result is taken modulo \(2^r\) where \(r\) is the number of
range bits in each key.
thread_count
specifies the number of threads to use to perform the
computation.
This number should always be positive.
This function returns PFSS_OK
upon success, or another status code
upon failure.
You can use the pfss_get_status_name
function to convert the
status code into a human readable name.
Note that this function takes a pfss_key const *
, not a pfss_key *
.
By convention, all functions that take a pfss_key const *
are safe to
call from multiple threads at the same time on the same key.
#include <cstdint>
#include <future>
#include <iostream>
#include <memory>
#include <pfss.h>
#include <random>
#include <stdexcept>
namespace {
void check_status(pfss_status const s) {
if (s != PFSS_OK) {
throw std::runtime_error(pfss_get_status_name(s));
}
}
} // namespace
int main() {
constexpr uint32_t domain_bits = 16;
constexpr uint32_t range_bits = 32;
constexpr uint16_t alpha = 12345;
constexpr uint32_t beta = 1;
constexpr std::size_t domain_size = (std::size_t)1 << domain_bits;
// Find out how big our buffers need to be.
uint32_t key_blob_size;
uint32_t rand_buf_size;
check_status(pfss_gen_sizes(domain_bits,
range_bits,
&key_blob_size,
&rand_buf_size));
// Allocate our buffers.
std::vector<uint8_t> key1_blob((std::size_t)key_blob_size);
std::vector<uint8_t> key2_blob((std::size_t)key_blob_size);
std::vector<uint8_t> rand_buf((std::size_t)rand_buf_size);
// Fill rand_buf with random bytes. Note that std::random_device is
// not necessarily cryptographically secure. In practice you should
// use something with better guarantees.
std::random_device random_device;
for (uint8_t & r : rand_buf) {
r = (uint8_t)random_device();
}
// Generate the two key blobs.
check_status(pfss_gen(domain_bits,
range_bits,
(uint8_t const *)&alpha,
(uint32_t)sizeof(alpha),
PFSS_NATIVE_ENDIAN,
(uint8_t const *)&beta,
(uint32_t)sizeof(beta),
PFSS_NATIVE_ENDIAN,
key1_blob.data(),
key2_blob.data(),
rand_buf.data()));
// Parse the first key blob.
pfss_key * key1_ptr;
check_status(
pfss_parse_key(&key1_ptr, key1_blob.data(), key_blob_size));
std::unique_ptr<pfss_key, void (*)(pfss_key *)> const key1(
key1_ptr,
[](pfss_key * const k) { pfss_destroy_key(k); });
// Parse the second key blob.
pfss_key * key2_ptr;
check_status(
pfss_parse_key(&key2_ptr, key2_blob.data(), key_blob_size));
std::unique_ptr<pfss_key, void (*)(pfss_key *)> const key2(
key2_ptr,
[](pfss_key * const k) { pfss_destroy_key(k); });
// Allocate one main grid initialized to all zeros everywhere except
// at alpha (12345), where it is 67890.
std::vector<uint32_t> ys(domain_size);
ys[alpha] = 67890;
// Use 7 threads to evaluate each key into an ephemeral grid (as if by
// calling the pfss_eval_all function) and compute the dot product of
// each ephemeral grid with the main grid into a result vector zs.
std::vector<uint32_t> zs(2);
std::vector<pfss_key const *> const keys = {key1_ptr, key2_ptr};
check_status(pfss_eval_all_dot(keys.data(),
keys.size(),
(uint8_t const *)ys.data(),
sizeof(ys[0]),
PFSS_NATIVE_ENDIAN,
(uint8_t *)zs.data(),
sizeof(zs[0]),
PFSS_NATIVE_ENDIAN,
7));
// Sum the two results mod 2^{range_bits} (2^32) to verify that we get
// ys[alpha] (67890) as the final result. Note that because the two
// keys came from the same pfss_gen call and beta = 1, the two keys
// work together to "select" the element at index alpha in ys.
unsigned long sum = 0;
for (auto const z : zs) {
sum += (unsigned long)z; // naturally mod 2^n for some n >= 32
}
sum &= UINT32_MAX; // mod 2^32
// Print out the results.
std::cout << "zs[0] = " << zs[0] << "\n";
std::cout << "zs[1] = " << zs[1] << "\n";
std::cout << "sum = " << sum << "\n";
}
zs[0] = 2326640992
zs[1] = 1968394194
sum = 67890
6.22. The PFSS_SWITCH_DB_RB
macro
#define PFSS_SWITCH_DB_RB( \
domain_bits, range_bits, template_code, standard_code)
The PFSS_SWITCH_DB_RB
macro expands to a
do { /* ... */ } while (0)
block of code that calls either
template_code((domain_bits), (range_bits))
or standard_code()
depending on whether PFSS was compiled with template optimizations
enabled for the given domain_bits
and range_bits
combination via
switch_db_rb.cfg
.
template_code
should be the name of a function-like macro that takes
two parameters.
The arguments passed to template_code
will not actually be
domain_bits
and range_bits
, but rather decimal constants whose
values match domain_bits
and range_bits
, which allows them to be
used as template arguments in C++.
standard_code
should be the name of a function-like macro that takes
no arguments.
template_code
and standard_code
should each expand to a
do { /* ... */ } while (0)
block of code.
domain_bits
and range_bits
will each be evaluated exactly once.
Here is an example function that determines whether PFSS was compiled
with template optimizations enabled for a given domain_bits
and
range_bits
combination:
int have_template_optimizations(int domain_bits, int range_bits) {
#define T_CODE(db, rb) do { return 1; } while (0)
#define S_CODE() do { return 0; } while (0)
PFSS_SWITCH_DB_RB(domain_bits, range_bits, T_CODE, S_CODE);
#undef S_CODE
#undef T_CODE
}
7. Using the Java bindings
The Java bindings are defined in the
src/java/com/stealthsoftwareinc/pfss/pfss.java
file.
PFSS detects whether <jni.h>
is available at ./configure
time in
order to decide whether to compile the Java bindings into libpfss.so
:
$ ./configure
...
checking for JNI... yes
...
There are three ways you can run ./configure
depending on your needs:
-
./configure
: Compile the Java bindings if<jni.h>
is available. -
./configure --with-jni-or-die
: Compile the Java bindings, erroring out if<jni.h>
is not available. -
./configure --without-jni
: Don’t compile the Java bindings.
If <jni.h>
is not in your default include path, you should add any
necessary -I
options to CPPFLAGS
at ./configure
time.
For example:
./configure --with-jni-or-die CPPFLAGS='-I/path/to/jni/dir'
If ./configure
is unexpectedly failing to detect that <jni.h>
is
available, you can search for JNI
in config.log
to see why.
For example, here is a failure where an additional -I
option was
necessary for <jni.h>
to find a dependent header "jni_md.h"
:
configure:9266: checking for JNI
configure:9337: gcc -c -g -O2 -pthread -Wall -Wextra -I/usr/lib/jvm/java-11-openjdk-amd64/include conftest.c >&5
In file included from conftest.c:54:0:
/usr/lib/jvm/java-11-openjdk-amd64/include/jni.h:45:10: fatal error: jni_md.h: No such file or directory
#include "jni_md.h"
^~~~~~~~~~
To use the Java bindings, the pfss
class is provided in the
src/java/com/stealthsoftwareinc/pfss/pfss.java
file.
This class is a minimal wrapper of the C API that’s intended to be
included in your own Java project along with libpfss.so
.
Pointers to arrays (including pointers to singular objects) are modeled
as Java arrays, uint32_t
values are modeled as Java int
values, and
opaque pointers are modeled as Java long
values.
Every array parameter is accompanied by an index parameter that
specifies the starting point of the data in the array.
You can import the pfss
class with
import com.stealthsoftwareinc.pfss.pfss
and access its members with
pfss.<member>
, or you can import it with
import static com.stealthsoftwareinc.pfss.pfss.*
and access its
members directly with <member>
.
For an example of using the Java bindings, see
src/java/com/stealthsoftwareinc/pfss/Example.java
.
You can compile and run this example program as follows, where <D>
is
an absolute path to a directory that contains both libpfss.so
and
libnettle.so
:
javac -sourcepath src/java src/java/com/stealthsoftwareinc/pfss/Example.java
java -cp src/java com.stealthsoftwareinc.pfss.Example [<D>]
If <D>
is not given, it will be taken as the current directory.
8. Node.js bindings
8.1. General
The Node.js bindings are provided in the
src/node
directory, which is actually an example project.
The intent is that you can take
pfss-node.cpp
verbatim and adapt
binding.gyp
and
package.json
into your own project.
8.2. Running the example project with Docker
-
Build a distribution archive (
pfss-*.tar.gz
) by runningmake dist
(after./configure
). -
Copy the distribution archive into the
src/node
directory. Make sure you have exactly one such file in that directory. If you don’t have exactly one, the image won’t build. -
Run
docker build -t pfss-node src/node
to build the image. -
Run
docker run -i -t --rm pfss-node
to start a container. -
In the container, run
npm install
followed bynode-gyp rebuild
. -
In the container, run
node example.js
.
8.3. Reference
8.3.1. The pfss_reduce_sum_async
function
pfss_status pfss_reduce_sum_async(
Uint32 range_bits,
Uint8Array const ys,
Uint32 ys_index,
Uint32 ys_count,
Uint32 y_size,
pfss_endianness y_endianness,
Uint8Array z,
Uint32 z_index,
Uint32 z_size,
pfss_endianness z_endianness,
function callback
)
The pfss_reduce_sum_async
function calls the
pfss_reduce_sum
function asynchronously
with all of the parameters except callback
.
After the asynchronous call completes, callback(s)
is called, where
s
is the status code returned by the asynchronous call.
The function returns PFSS_OK
upon success, or another status code if
the asynchronous call could not be initiated.
The following program uses the pfss_reduce_sum_async
function to
compute \((0 + 1 + \ldots + 99999)\ \textrm{mod}\ 2^{32}\):
Object.assign(global, require("bindings")("pfss"));
let ys = Uint32Array.from(Array(100000).keys());
let z = new Uint32Array(1);
let status1 = pfss_reduce_sum_async(
ys.BYTES_PER_ELEMENT * 8, // 32
new Uint8Array(ys.buffer),
0,
ys.length, // 100000
ys.BYTES_PER_ELEMENT, // 4
PFSS_NATIVE_ENDIAN,
new Uint8Array(z.buffer),
0,
z.BYTES_PER_ELEMENT, // 4
PFSS_NATIVE_ENDIAN,
function(status2) {
if (status2 == PFSS_OK) {
console.log("z = " + z[0]);
} else {
console.log(pfss_get_status_name(status2));
}
}
);
if (status1 != PFSS_OK) {
console.log(pfss_get_status_name(status1));
}
Output:
z = 704982704
8.3.2. The pfss_map_eval_reduce_sum_async
function
pfss_status pfss_map_eval_reduce_sum_async(
pfss_key key,
Uint8Array const xs,
Uint32 xs_index,
Uint32 xs_count,
Uint32 x_size,
pfss_endianness x_endianness,
Uint8Array y,
Uint32 y_index,
Uint32 y_size,
pfss_endianness y_endianness,
function callback
)
The pfss_map_eval_reduce_sum_async
function calls the
pfss_map_eval_reduce_sum
function asynchronously with all of the parameters except callback
.
After the asynchronous call completes, callback(s)
is called, where
s
is the status code returned by the asynchronous call.
The function returns PFSS_OK
upon success, or another status code if
the asynchronous call could not be initiated.