Publications

You can also find my articles on my Google Scholar profile.

HARMONI: Hierarchical ARchitecture MOdeling for LLMs with Near/In Memory Computing

Published in ISPASS 2026, 2026

A hierarchical architecture modeling framework for evaluating near-memory and in-memory computing solutions for large language model inference.

Recommended citation: Khyati Kiyawat, Yasas Seneviratne, Zhenxing Fan, Morteza Baradaran, Kevin Skadron. "HARMONI: Hierarchical ARchitecture MOdeling for LLMs with Near/In Memory Computing." ISPASS, 2026.

Sangam: A Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing

Published in arXiv, 2025

A chiplet-based DRAM processing-in-memory accelerator with CXL integration designed for efficient LLM inference.

Recommended citation: Khyati Kiyawat, Zhenxing Fan, Yasas Seneviratne, Morteza Baradaran, Akhil Shekar, Zihan Xia, Mingu Kang, Kevin Skadron. "Sangam: A Chiplet-Based DRAM-PIM Accelerator with CXL Integration for LLM Inferencing." arXiv, Nov 2025.

TriPIM — Exact Triangle Counting on UPMEM PIM for Graph Analytics

Published in MemSys 2025, 2025

An exact triangle counting implementation on the UPMEM PIM architecture for graph analytics workloads. (Late Breaking Results)

Recommended citation: Morteza Baradaran, Khyati Kiyawat, Akhil Shekar, Abdullah Mughrabi, Kevin Skadron. "TriPIM — Exact Triangle Counting on UPMEM PIM for Graph Analytics." MemSys, 2025.

Membrane: Accelerating Database Analytics with DRAM-Based PIM Filtering and Schema Denormalization

Published in ACM TACO 2025, 2025

A system that accelerates database analytics by combining DRAM-based PIM filtering with schema denormalization.

Recommended citation: Akhil Shekar, Kevin Gaffney, Martin Prammer, Khyati Kiyawat, Lingxi Wu, Helena Caminal, Zhenxing Fan, Yimin Gao, Ashish Venkat, José F. Martínez, Jignesh Patel, Kevin Skadron. "Membrane: Accelerating Database Analytics with DRAM-Based PIM Filtering and Schema Denormalization." ACM TACO, 2025.

Architectural Modeling and Benchmarking for Digital DRAM PIM

Published in IISWC 2024, 2024

A framework for architectural modeling and benchmarking of digital DRAM processing-in-memory architectures.

Recommended citation: F. Siddique, D. Guo, Z. Fan, M. Gholamrezaei, M. Baradaran, A. Ahmed, H. Abbot, K. Durrer, K. Nandagopal, E. Ermovick, K. Kiyawat, B. Gul, A. Mughrabi, A. Venkat, K. Skadron. "Architectural Modeling and Benchmarking for Digital DRAM PIM." IISWC, 2024.

An Efficient Scaling-Free Folded Hyperbolic CORDIC Design Using a Novel Low-Complexity Power-of-2 Taylor Series Approximation

Published in IEEE TVLSI, 2023

Hyperbolic trigonometric functions are widely used in several engineering and scientific applications, including digital signal processing (DSP), communication systems, and many others. In this article, we propose a scaling-free hyperbolic coordinate rotation digital computer (CORDIC) algorithm and its architecture based on a novel power-of-2 coefficient low-complexity Taylor series approximation to implement sinh and cosh functions. CORDIC architectures are generally slow due to their high latency of computation. The proposed architecture reduces the latency and achieves the desired precision with only four iterations where an optimized angle set comprised of six CORDIC microrotations are mapped into a four-stage folded-pipeline structure leveraging mutually exclusive behavior of two pairs of microrotations. The proposed design is implemented on field-programmable gate arrays (FPGAs) Xilinx Zedboard using 65.38% less registers with ~63.63% less latency and 48.97% less power consumption compared with the best of the existing designs. The proposed design is synthesized by Synopsys Design Compiler and place and route (PnR) tool using Taiwan Semiconductor Manufacturing Company (TSMC) 65-nm CMOS process. It consumes ~76.31% less area, 68.75% less computational delay, and 68.92% less power consumption compared with the best of the existing designs. Moreover, the proposed architecture involves 46.89% less energy per output (EPO) than the best of the existing designs. The error–energy performance (EEP) and the error–area performance (EAP) of the proposed design are, respectively, ~1.25 times and ~2.8 times better than that of the best of the existing designs. Besides, the proposed architecture is also implemented and verified on a silicon chip in the TSMC 180-nm CMOS process for the validation of the algorithm and architecture.

Recommended citation: A. Verma, K. Kiyawat, B. P. Das and P. K. Meher, "An Efficient Scaling-Free Folded Hyperbolic CORDIC Design Using a Novel Low-Complexity Power-of-2 Taylor Series Approximation," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 31, no. 8, pp. 1167-1177, Aug. 2023.

Real-time minimum energy point tracking using a predetermined optimal voltage setting strategy

Published in ISVLSI, 2020

Minimizing the energy consumption of processors for a given computational workload is highly desired for matured and energy efficient, information oriented society. In this paper, we refer to a pair of the supply voltage (VDD) and threshold voltage (VTH), which minimizes the energy consumption of the processor under a given computational workload, as a minimum energy point (MEP in short). Since always running at the MEP largely reduces the energy consumption of processors without fundamental degradation of the performance, a lot of methods for tracking the MEP at runtime have been investigated over the past several years. However, to the best of our knowledge, all the previous methods are based on time-consuming power measurement to identify the MEP at runtime, which prevents the real-time tracking of the MEP. This paper proposes a real-time MEP tracking method based on a predetermined MEP-curve which is characterized as a linear model for each chip at a boot phase. Experimental results obtained using a 50-stage fanout-4 inverter chain designed to reflect the behavior of a microprocessor pipeline demonstrate that the energy loss introduced by the linear approximation MEP model is only 3.1% at the worst case.

Recommended citation: K. Kiyawat, Y. Masuda, J. Shiomi and T. Ishihara, "Real-Time Minimum Energy Point Tracking Using a Predetermined Optimal Voltage Setting Strategy", 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pp. 415-421, 2020.