Wangyang Ying

About Me

I am currently a Ph.D. candidate who has passed the final defense at Arizona State University in Tempe. I began my Ph.D. studies in spring, 2023. Prior to that, I received both my Bachelor's (2016) and Master's (2019) degrees from Sichuan University. Following my Master's, I worked at Alibaba and Tencent, focusing on video recommendation and time-sensitive search algorithms, respectively.

My research interests lie in data-centric AI (improving AI performance by focusing on data quality and data processes), multi-agent reasoning, and scientific equation discovery. In particular, I focus on data-centric methods to enhance the robustness and effectiveness of machine learning; multi-agent frameworks for structured knowledge extraction and reasoning; and interpretable methods for equation discovery to uncover scientific patterns from data.

Education

Arizona State University - Ph.D. (2023 - 2026)
Sichuan University - M.S. (2016 - 2019)
Sichuan University - B.S. (2012 - 2016)

Research Interests

Data-Centric AI
Multi-Agent Reasoning
Scientific Equation Discovery

Selected Publications

TKDD 2025

Topology-aware Reinforcement Feature Space Reconstruction for Graph Data

Wangyang Ying, Haoyue Bai, Kunpeng Liu, Yanjie Fu

Paper

TIST 2025

Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

Wangyang Ying, Nanxu Gong, Dongjie Wang, Yanjie Fu

project page / Paper

TKDD 2024

Feature Selection as Deep Sequential Generative Learning

Wangyang Ying, Dongjie Wang, Haifeng Chen, Yanjie Fu

Paper

CIKM 2024

Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration

Wangyang Ying, Dongjie Wang, Xuanming Hu, Ji Qiu, Jin Park, Yanjie Fu

Paper

KDD 2024

Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

Wangyang Ying, Dongjie Wang, Xuanming Hu, Yuanchun Zhou, Charu C. Aggarwal, Yanjie Fu

Paper

ICDM 2023

Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing

Wangyang Ying, Dongjie Wang, Kunpeng Liu, Leilei Sun, Yanjie Fu

Paper

FCS 2020

Sichuan Dialect Speech Recognition with Deep LSTM Network

Wangyang Ying, Lei Zhang, Hongli Deng

Paper

NeurIPS 2025

Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

Nanxu Gong, Zijun Li, Sixun Dong, Haoyue Bai, Wangyang Ying, Xinyuan Wang, Yanjie Fu

Paper

TBD 2025

Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

Xinyuan Wang, Dongjie Wang, Wangyang Ying, Rui Xie, Haifeng Chen, Yanjie Fu

Paper

WSC 2025

Supply Chain Optimization via Generative Simulation and Iterative Decision Policies

Haoyue Bai, Haoyu Wang, Nanxu Gong, Xinyuan Wang, Wangyang Ying, Haifeng Chen, Yanjie Fu

Paper

AAAI 2025

Evolutionary Large Language Model for Automated Feature Transformation

Nanxu Gong, Chandan K Reddy, Wangyang Ying, Haifeng Chen, Yanjie Fu

project page / Paper

npj AI 2025

Privacy-preserving Data Reprogramming

Haoyue Bai, Wangyang Ying, Nanxu Gong, Xinyuan Wang, Yanjie Fu

Paper

AAAI 2025

Efficient Post-Training Refinement of Latent Reasoning in Large Language Models

Xinyuan Wang, Dongjie Wang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Sixun Dong, Kunpeng Liu, Yanjie Fu

Paper

AAAI 2025

Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO₂ Storage

Haoyue Bai, Guodong Chen, Wangyang Ying, Xinyuan Wang, Nanxu Gong, Sixun Dong, Giulia Pedrielli, Haoyu Wang, Haifeng Chen, Yanjie Fu

Paper

IJCAI 2025

Unsupervised Feature Transformation via In-context Generation, Generator-critic LLM Agents, and Duet-play Teaming

Nanxu Gong, Xinyuan Wang, Wangyang Ying, Haoyue Bai, Sixun Dong, Haifeng Chen, Yanjie Fu

Paper

CIKM 2024

Reinforcement Feature Transformation for Polymer Property Performance Prediction

Xuanming Hu, Dongjie Wang, Wangyang Ying, Yanjie Fu

Paper

EMNLP 2022

Title2event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset

Haolin Deng, Yanan Zhang, Yangfan Zhang, Wangyang Ying, Changlong Yu, Jun Gao, Wei Wang, Xiaoling Bai, Nan Yang, Jin Ma, et al.

Paper

Preprints

arXiv 2025

A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Wangyang Ying, Cong Wei, Nanxu Gong, Xinyuan Wang, Haoyue Bai, Arun Vignesh Malarkkan, Sixun Dong, Dongjie Wang, Denghui Zhang, Yanjie Fu

ArXiv

arXiv 2025

Data-Efficient Symbolic Regression via Foundation Model Distillation

Wangyang Ying, Jinghan Zhang, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Kunpeng Liu, Chandan K Reddy, Yanjie Fu

ArXiv

arXiv 2025

Distribution Shift Aware Neural Tabular Learning

Wangyang Ying, Nanxu Gong, Dongjie Wang, Xinyuan Wang, Arun Vignesh Malarkkan, Vivek Gupta, Chandan K Reddy, Yanjie Fu

ArXiv

arXiv 2025

Bridging the Domain Gap in Equation Distillation with Reinforcement Feedback

Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Haifeng Chen, Yanjie Fu

ArXiv

arXiv 2025

LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation

Xinyuan Wang, Haoyue Bai, Nanxu Gong, Wangyang Ying, Sixun Dong, Xiquan Cui, Yanjie Fu

ArXiv

arXiv 2025

Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories

Nanxu Gong, Sixun Dong, Haoyue Bai, Xinyuan Wang, Wangyang Ying, Yanjie Fu

ArXiv

arXiv 2025

Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Dongjie Wang, Yanyong Huang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Tao Zhe, Kunpeng Liu, Meng Xiao, et al.

ArXiv

Work Experience

Research Intern

Data Science & System Security,NEC Laboratories America, Princeton

05/2025 - 08/2025

Developed multi-agent LLM frameworks for structured knowledge extraction (procedural graph representation), supporting downstream retrieval-augmented generation (RAG). Explored how structured knowledge enables personalized LLM training by grounding user-specific workflows into structured representations

Research Intern

Institute of High Performance Computing, A*STAR, Singapore

05/2024 - 08/2024

Investigated trustworthiness of LLMs in medical applications, with emphasis on understanding how jailbreak attacks compromise system reliability. Conducted systematic analysis of jailbreak strategies as a foundation for designing future LLM safety and protection mechanisms.

Full Time

Platform and Content Group, Tencent, Beijing

11/2020 - 08/2022

Led algorithm design for time-sensitive search scenarios (e.g., weather, stock, news), serving hundreds of millions of users. Designed methods for query time-sensitivity detection, retrieval pipeline optimization, and time-aware ranking and presentation to enhance freshness and relevance in search results.

Full Time

Digital Media & Entertainment Group, Alibaba, Beijing

06/2019 - 10/2020

Built recommendation systems for long- and short-form video platforms (movies, TV shows, variety shows, and micro-videos). Worked on video content understanding (e.g., tagging, user profiling) and video retrieval, improving large-scale recommendation quality and user engagement.

News

[2025-12] Good News! I have passed my PhD defense.
[2025-11] Good News! Two papers have been accepted by AAAI 2026.
[2025-11] Good News! One paper has been accepted by IEEE TBD.
[2025-09] Good News! One paper has been accepted by ACM TKDD.
[2025-09] Good News! One paper has been accepted by NeuraIPS 2025.
[2025-04] Good News! One paper has been accepted by IJCAI 2025.
[2025-03] I will join NEC lab, Princeton as a research intern during the summer (May–August 2025).
[2024-12] One paper has been accepted by AAAI 2025.
[2024-12] One paper has been accepted by ACM TIST.
[2024-08] One paper has been accepted by ACM TKDD.
[2024-07] Two papers have been accepted by CIKM 2024.
[2024-05] I will join A*STAR, Singapore as a research intern during the summer (May–August 2024).
[2024-05] One paper has been accepted by KDD 2024.

Ph.D. | SCAI, Arizona State University

I'm currently on the job market for industry positions in search, recommendation, and advertising systems, starting Spring 2026. I'd be delighted to chat and truly appreciate any opportunity to connect and discuss potential roles!

About Me

Education

Research Interests

Research Areas

Data-Centric AI

Multi-Agent Reasoning

Scientific Equation Discovery

Selected Publications

Topology-aware Reinforcement Feature Space Reconstruction for Graph Data

Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation

Feature Selection as Deep Sequential Generative Learning

Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration

Unsupervised Generative Feature Transformation via Graph Contrastive Pre-training and Multi-objective Fine-tuning

Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing

Sichuan Dialect Speech Recognition with Deep LSTM Network

Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation

Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent

Supply Chain Optimization via Generative Simulation and Iterative Decision Policies

Evolutionary Large Language Model for Automated Feature Transformation

Privacy-preserving Data Reprogramming

Efficient Post-Training Refinement of Latent Reasoning in Large Language Models

Brownian Bridge Augmented Surrogate Simulation and Injection Planning for Geological CO₂ Storage

Unsupervised Feature Transformation via In-context Generation, Generator-critic LLM Agents, and Duet-play Teaming

Reinforcement Feature Transformation for Polymer Property Performance Prediction

Title2event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset

Preprints

A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

Data-Efficient Symbolic Regression via Foundation Model Distillation

Distribution Shift Aware Neural Tabular Learning

Bridging the Domain Gap in Equation Distillation with Reinforcement Feedback

LLM-ML Teaming: Integrated Symbolic Decoding and Gradient Search for Valid and Stable Generative Feature Transformation

Agentic Feature Augmentation: Unifying Selection and Generation with Teaming, Planning, and Memories

Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Work Experience

Research Intern

Research Intern

Full Time

Full Time

News

Service

Teaching Experience

Contact