Publications
Generalized Concurrency Testing Tool for Distributed Systems
ISSTA '24 - 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis
Controlled concurrency testing (CCT) is an effective approach for testing distributed system implementations. However, the existing CCT tools suffer from the drawbacks of language dependency and the cost of source code instrumentation, which makes them difficult to apply to real-world production systems. We propose DSTest, a generalized CCT tool for testing distributed system implementations. DSTest intercepts messages on the application layer and, hence, eliminates the instrumentation cost and achieves language independence with minimal input. We provide a clean and modular interface to extend DSTest for various event schedulers for CCT. We package DSTest with three well-known event schedulers and validate the tool by applying it to popular production systems.
Bandwidth-Aware Page Placement in NUMA
IPDPS '20 - 34th IEEE International Parallel and Distributed Processing Symposium
Page placement is a critical problem for memory-intensive applications running on a shared-memory multiprocessor with a non-uniform memory access (NUMA) architecture. State-of-the-art page placement mechanisms interleave pages evenly across NUMA nodes. However, this approach fails to maximize memory throughput in modern NUMA systems, characterized by asymmetric bandwidths and latencies, and sensitive to memory contention and interconnect congestion phenomena. We propose BWAP, a novel page placement mechanism based on asymmetric weighted page interleaving. BWAP combines an analytical performance model of the target NUMA system with on-line iterative tuning of page distribution for a given memory-intensive application. Our experimental evaluation with representative memory-intensive workloads shows that BWAP performs up to 66% better than state-of-the-art techniques. These gains are particularly relevant when multiple co-located applications run in disjoint partitions of a large NUMA machine or when applications do not scale up to the total number of cores.
Client-Side Routing-Agnostic Gateway Selection for heterogeneous Wireless Mesh Networks
IM '17 - IFIP/IEEE Symposium on Integrated Network and Service Management
Citizens develop Wireless Mesh Networks (WMN) in many areas as an alternative or their only way for local interconnection and access to the Internet. This access is often achieved through the use of several shared web proxy gateways. These network infrastructures consist of heterogeneous technologies and combine diverse routing protocols. Network-aware state-of-art proxy selection schemes for WMNs do not work in this heterogeneous environment. We developed a client-side gateway selection mechanism that optimizes the client-gateway selection, agnostic to underlying infrastructure and protocols, requiring no modification of proxies nor the underlying network. The choice is sensitive to network congestion and proxy load, without requiring a minimum number of participating nodes. Extended Vivaldi network coordinates are used to estimate client-proxy network performance. The load of each proxy is estimated passively by collecting the Time-to-First-Byte of HTTP requests, and shared across clients. Our proposal was evaluated experimentally with clients and proxies deployed in guifi.net, the largest community wireless network in the world. Our selection mechanism avoids proxies with heavy load and slow internal network paths, with overhead linear to the number of clients and proxies.
Managing Object Versioning in Geo-Distributed Object Storage Systems
ScienceCloud '16 - ACM 7th Workshop on Scientific Cloud Computing
Object versioning is the keystone for implementing eventual consistency in modern geo-distributed object storage systems such as Amazon S3. Despite this, the study of implementing object versioning has not been given a lot of attention in either academic or industrial communities. The selection of an implementation method is not considered as an important factor impacting the overall system performance under different workloads. In this paper, we present our study of two methods of implementing object versioning in geo-distributed object storage systems and on how these impact the performance of these systems under different workloads. We propose and analyze the advantages and disadvantages of (1) Write-Repair approach and (2) Read-Repair approach. From our experiments, we found that the choice of approach significantly impacts the performance of storage systems, which in turn impact the performance of applications built on top of these systems.
Geo-Replicated Buckets: An Optimistic Geo-Replication Shim for Key-Value Stores
KTH, School of Information and Communication Technology (ICT)
This work introduces GeoD and VersionD: middleware for key-value stores to transparently enable geo-replication and object versioning without any changes to the database stack.
A Name Is Not A Name: The Implementation Of A Cloud Storage System
APSys '15 - 6th Asia-Pacific Workshop on Systems
Automatic conflict resolution in cloud storage services has been well studied. However, how to correctly implement it in real-world systems remains challenging. This paper presents the challenges we experienced when implementing our cloud storage system.