jweihe

∴

I believe AGI will arrive soon.

My recent work focuses on LLM agents that can reason over multi-step tasks, use tools, interact with task environments, and improve through evaluation, data feedback, and reinforcement learning.

ByteDanceLLM algorithms

Alibaba ATHInternmultimodal LLMs

Tencent ARCInterndocument AI

News

Recent Signals

Live

✦May 2026
SkillsBench became the fastest benchmark repository to reach 1k GitHub stars, with 1.1k stars within two months of release
✧May 2026
SkillsBench appeared in recent model-card and release discussions, including Qwen 3.6 Plus and HY-3
◌May 2026
SkillsBench now covers 86 tasks, 11 domains, 7,308 trajectories, and 40 indexed benchmarks
↗May 2026
Harbor v0.6.5 was released for agent evaluation, task environments, and RL-ready rollout workflows
•Mar 2025
Started working on LLM algorithms at ByteDance

Selected Publications

Papers

arXiv 2026 · Agent Skills · 1.1k stars

SkillsBench: Benchmarking how well agent skills work across diverse tasks

★ 1.1k GitHub stars86 tasks11 domains7,308 trajectories

A benchmark for evaluating whether agent skills actually work across diverse tasks, separating skill quality from an agent’s ability to discover and use the right skill.

Paper Website GitHub

AAAI 2024 · CCF A · 99 citations

Anomaly-denoised pretraining3-level masking

Ada-gad: Anomaly-denoised autoencoders for graph anomaly detection

A two-stage graph anomaly detection framework that pretrains graph autoencoders on anomaly-denoised graphs at node, edge, and subgraph levels, then retrains the decoder on the original graph to mitigate anomaly overfitting and homophily traps.

arXiv Code Scholar

ACM MM 2024 · CCF A

HGOE: Hybrid External and Internal Graph Outlier Exposure

Studies graph out-of-distribution detection with hybrid synthetic and internal outlier exposure, improving robustness when abnormal patterns are scarce or shift across graphs.

Paper arXiv

ACM MM 2024 · CCF A

RHKH: Relational Hypergraph Neural Network for N-ary Knowledge Hypergraphs

Models n-ary relational facts as hypergraphs for knowledge reasoning, moving beyond binary triples and capturing richer relational structures.

Paper OpenReview

View full publication list

Open Source

Projects

01S

Fastest to 1k starsQwen model card

SkillsBench

1.1k stars86 tasks7,308 trajectories

A benchmark for testing whether agent skills actually work across tasks, models, and environments.

105 domain experts from Stanford, CMU, Berkeley, Oxford, Amazon, ByteDance, and more
11 domains and 40 indexed benchmarks
Referenced by leading model labs and recent agent-skill research

Website GitHub Paper