Skip to content

Type to search articles, publications, projects, and more.

Publications

Peer-reviewed papers and technical reports on distributed systems, concurrency testing, and memory systems.

2025

A Benchmark Framework for Byzantine Fault Tolerance Testing Algorithms

Recent discoveries of vulnerabilities in the design and implementation of Byzantine fault-tolerant protocols underscore the need for testing and exploration techniques to ensure their correctness. While there has been some recent effort for automated test generation for BFT protocols, there is no benchmark framework available to systematically evaluate their performance. We present ByzzBench, a benchmark framework designed to evaluate the performance of testing algorithms in detecting Byzantine fault tolerance bugs. ByzzBench is designed for a standardized implementation of BFT protocols and their execution in a controlled testing environment. It controls the nondeterminism in the concurrency, network, and process faults in the protocol execution, enabling the functionality to enforce particular execution scenarios and thereby facilitating the implementation of testing algorithms for BFT protocols.

FMBC '25

2024

Generalized Concurrency Testing Tool for Distributed Systems

Controlled concurrency testing (CCT) is an effective approach for testing distributed system implementations. However, the existing CCT tools suffer from the drawbacks of language dependency and the cost of source code instrumentation, which makes them difficult to apply to real-world production systems. We propose DSTest, a generalized CCT tool for testing distributed system implementations. DSTest intercepts messages on the application layer and, hence, eliminates the instrumentation cost and achieves language independence with minimal input. We provide a clean and modular interface to extend DSTest for various event schedulers for CCT. We package DSTest with three well-known event schedulers and validate the tool by applying it to popular production systems.

ISSTA '24

2020

Bandwidth-Aware Page Placement in NUMA

Page placement is a critical problem for memory-intensive applications running on a shared-memory multiprocessor with a non-uniform memory access (NUMA) architecture. State-of-the-art page placement mechanisms interleave pages evenly across NUMA nodes. However, this approach fails to maximize memory throughput in modern NUMA systems, characterized by asymmetric bandwidths and latencies, and sensitive to memory contention and interconnect congestion phenomena. We propose BWAP, a novel page placement mechanism based on asymmetric weighted page interleaving. BWAP combines an analytical performance model of the target NUMA system with on-line iterative tuning of page distribution for a given memory-intensive application. Our experimental evaluation with representative memory-intensive workloads shows that BWAP performs up to 66

2017

Client-Side Routing-Agnostic Gateway Selection for Heterogeneous Wireless Mesh Networks

Citizens develop Wireless Mesh Networks (WMN) in many areas as an alternative or their only way for local interconnection and access to the Internet. This access is often achieved through the use of several shared web proxy gateways. These network infrastructures consist of heterogeneous technologies and combine diverse routing protocols. Network-aware state-of-art proxy selection schemes for WMNs do not work in this heterogeneous environment. We developed a client-side gateway selection mechanism that optimizes the client-gateway selection, agnostic to underlying infrastructure and protocols, requiring no modification of proxies nor the underlying network. The choice is sensitive to network congestion and proxy load, without requiring a minimum number of participating nodes. Extended Vivaldi network coordinates are used to estimate client-proxy network performance. The load of each proxy is estimated passively by collecting the Time-to-First-Byte of HTTP requests, and shared across clients. Our proposal was evaluated experimentally with clients and proxies deployed in guifi.net, the largest community wireless network in the world. Our selection mechanism avoids proxies with heavy load and slow internal network paths, with overhead linear to the number of clients and proxies.

2016

Managing Object Versioning in Geo-Distributed Object Storage Systems

Object versioning is the keystone for implementing eventual consistency in modern geo-distributed object storage systems such as Amazon S3. Despite this, the study of implementing object versioning has not been given a lot of attention in either academic or industrial communities. The selection of an implementation method is not considered as an important factor impacting the overall system performance under different workloads. In this paper, we present our study of two methods of implementing object versioning in geo-distributed object storage systems and on how these impact the performance of these systems under different workloads. We propose and analyze the advantages and disadvantages of (1) Write-Repair approach and (2) Read-Repair approach. From our experiments, we found that the choice of approach significantly impacts the performance of storage systems, which in turn impact the performance of applications built on top of these systems.

ScienceCloud '16

2015