GPGPU 2025

The 17th Workshop on General Purpose Processing Using GPU (GPGPU 2025)

Las Vegas, NV, USA.

March 1, 2025. Afternoon, Half Day

Zoom Link: TBD

GPUs are delivering more and more computing power required by modern society. With the growing popularity of massively parallel devices, users demand better performance, programmability, reliability, and security. The goal of this workshop is to provide a forum to discuss massively parallel applications, environments, platforms, and architectures, as well as infrastructures that facilitate related research.
Authors are invited to submit papers of original research in the general area of GPU computing and architectures. Topics include, but are not limited to

GPU Architecture and Hardwares

Next-generation GPU architectures
Energy-efficient GPU designs
Scalable multi-GPU systems
GPU memory hierarchies and management

Programming Models and Compilers

High-level programming abstractions for GPUs
Compiler optimizations for GPU codes
Source-to-source translations and tools
Debugging and profiling tools for GPUs

GPU Algorithms and Data Structures

Parallel algorithms tailored for GPUs
Data structures optimized for GPU memory hierarchies
Algorithmic primitives and building blocks

Performance Optimization Techniques

Performance modeling and benchmarking
Auto-tuning and performance portability
Techniques for reducing communication overheads

GPU Applications

Case studies of real-world GPU applications
GPU applications in scientific computing, machine learning, large language models, graphics, and emerging field (e.g., quantum, neuromorphic, bioinformatics and genomics)
Performance comparisons between GPU and other parallel computing platforms

Integration of GPUs with Other Technologies

GPU and FPGA co-processing
Hybrid systems (e.g., CPU-GPU, GPU-TPU integration)
Cloud-based GPU computing

Challenges and Future Trends

Reliability and fault tolerance in GPU systems
Security and privacy concerns in GPU computing
The future of heterogeneity in computing platforms
GPU programming and architecture education

Workshop Program

13:00 - 14:00	[Keynote] Keynote is canceled due to unexpected issues and the workshop will begin at 2 PM [Link] ▶

14:00 - 14:30	Session 1: GPU Architecture Exploration
	[Paper] Performance Impact and Trade-Offs for Tuning Key Architectural Parameters on CPU+GPU Systems Kazi Asifuzzaman (Oak Ridge National Laboratory), Narasinga Rao Miniskar(Oak Ridge National Laboratory), William Godoy(Oak Ridge National Laboratory), Oscar Hernandez (Oak Ridge National Laboratory) and Jeffrey Vetter(Oak Ridge National Laboratory) ▶ Abstract TBD. ▶ Recording
	[Paper] Exploring the Wafer-Scale GPUs. Daoxuan Xu (William & Mary), Le Xu (Byte Dance Inc.), Jie Ren (William & Mary) and Yifan Sun (William & Mary) ▶ Abstract TBD ▶ Recording
14:30 - 15:00	Session 2: GPU Characterization and Modeling
	[Paper] Modeling Utilization to Identify Shared-memory Atomic Bottlenecks. Rongcui Dong (University of Rochester) and Sreepathi Pai (University of Rochester) ▶ Abstract TBD ▶ Recording
	[Paper] Uncovering Detailed Power Characterizations of GPUs on Edge Platforms Mujahid Al Rafi (UC Merced), Kevin Chau (UC Merced) and Hyeran Jeon (UC Merced) ▶ Abstract TBD ▶ Recording
15:00 - 15:30	Coffee Break
15:30 - 16:00	Session 3: Tensor Cores and Memory Accelerators
	[Paper] Can Tensor Cores Benefit Memory-Bound Kernels? (NO!) Lingqi Zhang (RIKEN Center for Computational Science), Jiajun Huang (UC Riverside), Sheng Di (Argonne National Laboratory), Satoshi Matsuoka (RIKEN Center for Computational Science) and Mohamed Wahib (RIKEN Center for Computational Science) ▶ Abstract TBD ▶ Recording
	[Paper] ACTA: Automatic Configuration of the Tensor Memory Accelerator for High-End GPUs. Nicolás Meseguer (Universidad de Murcia), Yifan Sun (William & Mary), Michael Pellauer (NVIDIA), José L. Abellán (Universidad de Murcia) and Manuel E. Acacio (Universidad de Murcia) ▶ Abstract TBD ▶ Recording
16:00 - 16:45	Session 4: GPU Applications and Algorithm Optimization
	[Paper] Efficient Parallel Implementation of Non-Local Means Algorithm on GPU. Xiang Li (Nanjing University), Qiong Chang (Institute of Science Tokyo), Yun Li (Nanjing University) and Jun Miyazaki (Institute of Science Tokyo) ▶ Abstract TBD ▶ Recording
	[Paper] Evaluating Parallel Sliding Window Techniques: Algorithmic and Multi-GPU Advancements with Fast-PII. Seth Ockerman (University of Wisconsin Madison) and Erin Carrier (Grand Valley State University) ▶ Abstract TBD ▶ Recording
	[Paper] Optimizing Auto-tuning of OpenMP Offload kernels for performance and power. Nafis Mustakin (UC Riverside) and Daniel Wong (UC Riverside) ▶ Abstract TBD ▶ Recording

Speaker: Yingyan (Celine) Lin (Georgia Institute of Technology)
Title:[Canceled] Towards Ubiquitous 3D Intelligence through Cross-Layer Algorithm-Hardware Synergy

Abstract: Real-time 3D intelligence is set to revolutionize a broad array of applications—including robotics, digital twins, and telepresence—yet achieving instant reconstruction and seamless rendering at scale remains a significant challenge. In this talk, I will introduce our recent work on advancing real-time 3D intelligence through cross-layer algorithm-hardware innovations. This collaborative effort showcases how cross-layer synergy—from adaptive algorithms to specialized hardware—can unlock scalable, instantaneous 3D intelligence across a wide range of devices and domains, potentially inspiring further integrated innovations toward ubiquitous 3D intelligence.

Bio: Yingyan (Celine) Lin is an associate professor in the School of Computer Science at the Georgia Institute of Technology, where she also serves as the co-director of the Center for Advancing Responsible Computing (CARE). Her research group focuses on developing efficient machine learning solutions through cross-layer innovations, from AI algorithms and hardware accelerators to AI acceleration chips, aiming to promote green AI and enable ubiquitous AI-powered intelligence. Their research has received various recognitions, including first place at the ACM SIGDA University Demonstration at DAC 2022, first place in the ACM/IEEE TinyML Design Contest at ICCAD 2022, and an IEEE Micro Top Pick of 2023. Additionally, their work has been spotlighted at ICLR in 2020, 2021, and 2025, presented as an oral paper at ECCV 2024, and most recently, received the Best Paper Award at MICRO 2024.

Important Dates (Tentative) (11:59 pm, Anywhere on Earth)

Papers due: ~~December 2~~ December 16, 2024
Notification: January 20, 2025
Final paper due: February 17, 2025

Submission Guidelines

Full paper submissions must be in PDF format for A4 or US letter-size paper. They must not exceed 6 pages (excluding references) in standard ACM two-column sigplan format (review mode, sigplan template). Authors can select if they want to reveal their identity in the submission. Word and LaTeX atTemplates for ACM format are available for Microsoft Word, and LaTeX at: https://www.acm.org/publications/proceedings-template

Submission Site: GPGPU 2025

Workshop Organizers


Hyeran Jeon	Yifan Sun	Daniel Wong
Co-chair	Co-chair	Co-chair
UC Merced	William & Mary	UC Riverside
Hyeran Jeon is an Associate Professor in the Department of Computer Science and Engineering at the University of California, Merced. She received her PhD at the University of Southern California. Her research interests lie in energy-efficient, reliable, and secure GPU architectures. She received NSF CAREER award in 2024.	Yifan Sun is an Assistant Professor in the Department of Computer Science at William & Mary since Fall 2020. He received his Ph.D. degree from the Department of Electrical and Computer Engineering at Northeastern University in 2020. His research interests lie in GPU architecture, performance evaluation, and performance modeling.	Daniel Wong is an Associate Professor in the Department of Electrical and Computer Engineering at the University of California, Riverside. He received his PhD in Electrical Engineering at the University of Southern California (USC). His research spans GPU Architecture, High Performance Computing, and Warehouse-scale Computing. His current research focuses on energy efficient and high performance computing systems from datacenter scale to micro-architectures. His research work has been recognized with an IEEE MICRO Top Picks in 2012 and an NSF CAREER award in 2020.


Nafis Mustakin	Yuan Feng
Publication Chair	Web Chair
UC Riverside	UC Merced

Please contact the organizers if you have any questions.

Program Committee

Dongho Ha (MangoBoost Inc.)
Hongyuan Liu (The Hong Kong University of Science and Technology)
Seonjin Na (Georgia Institute of Technology)
Harisankar Sadasivan (AMD)
Devashree Tripathy (IIT Bhubaneswar)
Jinyang Liu (University of Houston)
Yujia Zhai (NVIDIA)
Jie Ren (William & Mary)

History and Impact

David Kaeli (Northeastern) and John Cavazos (Delaware) started this GPGPU workshop series, which was first held in 2007 at Northeastern University. In 2008, the workshop was held with ASPLOS 2008. This trend continued and this GPGPU workshop was held with ASPLOS for the next 6 years. From 2015 to 2018, the GPGPU workshop was co-located with PPoPP. In 2019 and 2020, the GPGPU workshop is co-hosted by Adwait Jog (William & Mary), Onur Kayiran (AMD), and Ashutosh Pattnaik (ARM). The average citation count (as per Google Scholar), for a GPGPU workshop paper is currently 37.5, where there have been 8 influential papers with 100+ citations.

Previous versions of the GPGPU workshop: