Please turn JavaScript on

hgpu

Want to stay in touch with the latest updates from Hgpu? That's easy! Just subscribe clicking the Follow button below, choose topics or keywords for filtering if you want to, and we send the news to your inbox, to your phone via push notifications or we put them on your personal page here on Specificfeeds.

Reading your RSS feed has never been easier!

Website title: High performance computing on graphics processing units | hgpu.org

Is this your feed? Claim it!

Publisher:  Unclaimed!
Message frequency:  0.47 / day

Message History

Large Language Models (LLMs) show promise for code compilation tasks, but applying them to runtime performance tuning is difficult due to complex microarchitectural effects and noisy runtime measurements. We present AutoPass, a multi-agent framework for compiler performance tuning that uses compiler and runtime evidence to guide LLM-generated optimization decisions. Rather th...


Read full story

Agentic kernel optimization automates manual GPU kernel tuning via iterative generation, validation, and profiling with reasoning LLMs, casting the optimization task as feedback-guided search. However, our workload characterization reveals three system-level inefficiencies that limit search efficiency: (1) long generation latency due to LLM reasoning, (2) insufficient profili...


Read full story

We present KernelPro, a closed-loop multi-agent system that automatically generates, profiles, and iteratively optimizes GPU kernel code by integrating large language model (LLM) code generation with hardware profiler feedback and pluggable bottleneck detection tools. KernelPro introduces four contributions: (1) a semantic feedback operator that encodes expert heuristics as p...


Read full story

LLM-based coding agents need higher-level operational knowledge about a repository (which files house which subsystems, how to run the test suite, which workflows have historically led to wrong fixes) that does not exist in the code itself. Engineers typically maintain AGENTS.md files to supply this context as instructions for coding agents, but whether they help is contested...


Read full story

Benchmarks for LLM-generated GPU kernels (KernelBench, TritonBench, GEAK) score correctness through fixed-shape, small-sample allclose-style checks. The number of inputs varies between benchmarks. The shape, dtype, and tolerance are fixed for each kernel. We test that oracle empirically. We construct a controlled corpus of 24 Triton and CPU stand-in kernels (15 correct contro...


Read full story