USCMS Researcher: Yao Yao



Postdoc dates: Oct 2023 - Mar 2026

Home Institution: Purdue University


Project: Automating algorithm loading and executing on GPUs for SONIC

Automating the process of loading and executing algorithms on GPUs is an essential aspect of the SONIC project. SONIC, short for Services for Optimized Network Inference on Coprocessors, aims to optimize computing resource utilization for large-scale data processing involving the use of ML and non-ML algorithms to identify and categorize reconstructed particles from collisions.

More information: My project proposal

Mentors:
  • Miaoyuan Liu (Purdue University)

Presentations
Current Status


2025 Q4

  • Progress
    • Wrote producers in CMSSW for LST to run inference on the Triton server.
    • After communicating with LST experts and debugging, confirmed that the current LST input HostCollection contains pointers to RecHits that are not used in LST inference but are consumed later by OutputConverter after inference in the data processing workflow. We agreed that these should not be sent to the server. Therefore, the producer was restructured to avoid delivering these elements. Corresponding updates were made in the CMS task chain in CMSSW.
    • The developments is currently onging and nearing the testing stage.
    • Organizing a SONIC hackathon in early 2026.
    • Preparing an ML model metadata survey with the MLG Production group and Knowledge group. This serves as preparation for future work on harmonizing the interface between direct inference and as-a-service inference in CMSSW, removing hard-coded model paths and ambiguity in model versions. It may also support automatically restructuring the input and output tensors for per-event batch inference.


2025 Q3

  • Progress
    • Understood the compilation procedure of LST Alpaka standalone code, including links, included headers and external packages. Also understood that LST now uses Structure of Array (SoA) input/output objects compared to the previous CUDA version.
    • Successfully developed a custom backend for LST Alpaka standalone code. A wrapper function was added to the LST code to avoid exposing CMSSW data format during backend compilation. The LST algorithm now loads successfully on both CPU and GPU server instances after resolving linking issues for dependencies required by the LST shared object.
    • Moved to the testing phase for the backend. Extracted a single-event input and output by running the LST standalone code. A small discrepancy was observed between CPU and GPU inference outputs; after consulting experts, this behavior was confirmed to be expected.


2025 Q2

  • Progress
    • Successfully developed an Alpaka demo on the Nvidia Triton server as a preliminary test before implementing the backend for LST. Created an Alpaka demo that adds two float vectors and outputs a float vector. Modified the code so that input and output types are compatible with the Triton server. The code was compiled similarly to the LST standalone code, using a Makefile, linking the ALPAKA and BOOST libraries from /cvmfs, and generating both a CUDA shared object and a CPU shared object.
    • Developed a custom backend to load the Alpaka demo code into the Triton server. This included supporting multiple inputs per inference request and defining input and output buffers. All backend changes were adapted for compatibility with the more recent Triton server version 24.11, compared to the previous LST on SONIC setup, which used server version 21.04. Both shared objects are successfully loaded by the Nvidia Triton server and produce correct outputs. Some additional sanity checks are still ongoing.
    • Collaborated with the University of Florida team (supervised by Prof.Philip Chang) to assist student Alexandra Aponte in initiating the project to adapt the latest LST code for SONIC.
    • Mentored graduate students Ethan Colbert and Arghya Ranjan Das in developing a producer to support GloParT on the Nvidia Triton server. This work is currently in progress.


2025 Q1

  • Progress
    • Used perf_analyzer to study the throughput and latency of the CUDA code for LST on Nvidia Triton server. Observed some latency in data transmission.
    • Investigated the compilation process of the standalone LST Alpaka code. Developed a “helloWorld” Alpaka example that follows the same compilation procedure as the LST standalone code using a Makefile, and produces a shared object for loading into the Triton server. Plan to test the “helloWorld” example with the Triton server as an initial step toward understanding the technical requirements for running Alpaka code with Triton.
    • The SONIC producer for UParTAK4_V01 has been successfully merged into CMSSW.
    • Provided a tutorial to Purdue graduate student Yibo Zhong, who successfully developed the SONIC producer for MLPF. Investigated output differences between CPU direct inference and GPU direct inference, where the Triton server produced NaN values in the output. The issue was traced to onnxruntime compatibility with the cuDNN version. Updating to a more recent CMSSW version resolved the inference discrepancy between CPU and GPU, and switching to a newer Triton server version eliminated the NaN output issue.


2024 Q3

  • Progress
    • The ParticleTransformer SONIC producer was merged into CMSSW in August. The Run3 AOD-to-MiniAOD workflow is currently being tested with tasks prepared by PPD, and jobs are expected to run on Purdue resources. This testing is being carried out by Dr. Lisa Paspalaki and Dr. Dmitry Kondratyev and is still ongoing.
    • Regarding the model issue with the Unified Particle Transformer, it was confirmed with the author that the dimension swap within their model does not affect inference results. Therefore, the problem lies in the model’s conversion from PyTorch to ONNX.


2024 Q2

  • Progress
    • Completed the ParticleTransformer SONIC producer implementation; the code was reviewed and is ready to be integrated into CMSSW.
    • Modified the Unified Particle Transformer PyTorch-to-ONNX model conversion, after discussion with the model’s author, to support dynamic batching, enabling multiple jets to be sent to the server in a single request.
    • Nearly completed the Unified Particle Transformer producer implementation. Due to the high structural similarity to the ParticleTransformer producer, it is recommended that the producer structure be refactored to reduce code duplication.
    • Measured throughput for each ML model using perf_analyzer.


2023 Q4

  • Progress
    • Learned how to run the SONIC MiniAOD workflow on the Purdue Tier-2 cluster.
    • Learned how to measure throughput and latency using the available tools to evaluate the MiniAOD workflow performance for both GPU Triton server inference and CPU direct inference, as well as how to interpret the results.


Contact me: