Xu Luo

I am currently pursuing a Ph.D. at UESTC, supervised by Prof. Jingkuan Song. My primary research interest lies in advancing the state-of-the-art for generative AI models, including diffusion models and (multimodal) large language models (LLMs). Beyond these, I am fervently committed to actualizing the potential of these models, shaping them into autonomous agents capable of performing intricate tasks in real-world scenarios.

Email  /  CV  /  Google Scholar  /  Github

profile photo
News


[2024/09] Two papers accepted to NeurIPS 2024.
[2024/05] Thrilled to introduce our latest work, Lumina-T2X, a text-to-any-modality generation framework! Our models can generate high-resolution images & 720p videos of any aspect ratio, 3D multiview images, and audio conditioned on text.
[2023/10] Regret to say that FSL is no longer a valuable research topic. My research focus has shifted from Few-Shot Learning (FSL) to the burgeoning field of generative AI. I am currently an intern at Shanghai AI Lab.
[2023/04] Our systematic study on few-shot learning has been accepted at ICML'23! This project demanded a significant investment of time and revealed numerous unexpected findings. Check it out now!
[2022/05] Our paper, which uncovers fundamental problems inherent in few-shot learning, was accepted at ICML 2022.
[2021/09] My first top conference paper at NeurIPS 2021! The paper is not that good, but a good experience(:

Selected Publications
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao*, Le Zhuo*, Dongyang Liu*, Ruoyi Du*, Xu Luo*, Longtian Qiu*, Yuhang Zhang, Chen Lin, Rongjie Huang, Shijie Geng, Renrui Zhang, Junlin Xi, Wenqi Shao, Zhengkai Jiang, Tianshuo Yang, Weicai Ye, Tong He, Jingwen He, Yu Qiao, Hongsheng Li
arXiv, 2024
[PDF] [Code]

Text-to-any-modality models that generate images, videos, audio, and 3D multiview images conditioned on text in a flow-based diffusion framework, using novel Flag-DiT architectures with up to 5B parameters and 128K context windows.

Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks
Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song
arXiv, 2023
[PDF]

Uncovering and analyzing extreme feature redundancy phenomenon of pretrained vision models when transferring to few-shot tasks.

DETA: Denoised Task Adaptation for Few-shot Learning
Ji Zhang, Lianli Gao, Xu Luo, Hengtao Shen, Jingkuan Song
ICCV, 2023
[PDF] [Code]

Proposing DETA--a framework that solves potential data/label noise in downstream few-shot transfer tasks.

A Closer Look at Few-shot Classification Again
Xu Luo*, Hao Wu*, Ji Zhang, Lianli Gao, Jing Xu, Jingkuan Song
ICML, 2023
[PDF] [Code]

Empirically proving the disentanglement of training and adaptation algorithms in few-shot classification, and performing interesting analysis of each phase that leads to the discovery of several important observations.

Alleviating the Sample Selection Bias in Few-shot Learning by Removing Projection to the Centroid
Jing Xu, Xu Luo, Xinglin Pan, Yanan Li, Wenjie Pei, Zenglin Xu
NeurIPS, 2022   (Spotlight)
[PDF] [Code]

Revealing a strong bias caused by the centroid of features in each few-shot learning task. A simple method is designed to rectify this bias by removing the dimension along the direction of task centroid from the feature space.

Channel Importance Matters in Few-Shot Image Classification
Xu Luo, Jing Xu, Zenglin Xu
ICML, 2022
[PDF] [Code]

Revealing and analyzing the channel bias problem that we found critical in few-shot learning, through a simple channel-wise feature transformation applied only at test time.

Rectifying the Shortcut Learning of Background for Few-Shot Learning
Xu Luo, Longhui Wei, Liangjian Wen, Jinrong Yang, Lingxi xie, Zenglin Xu, Qi Tian
NeurIPS, 2021
[PDF] [Code]

Identifying image background as a shortcut knowledge ungeneralizable beyond training categories in Few-Shot Learning. A novel framework, COSOC, is designed to tackle this problem.

Boosting Few-Shot Classification with View-Learnable Contrastive Learning
Xu Luo, Yuxuan Chen, Liangjian Wen, Lili Pan, Zenglin Xu
ICME, 2021
[PDF] [Code]

Applying contrastive learning to Few-Shot Learning, with views generated in a learning-to-learn fashion.

Academic service


Conference reviewer

  • NeurIPS 2023, 2024
  • ICML 2022, 2024
  • ICLR 2024, 2025
  • CVPR 2023, 2024, 2025
  • ICCV 2023
  • ECCV 2022, 2024
  • CoLLAs 2023, 2024
  • AISTATS 2025

  • Journal reviewer
  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • IEEE Transactions on Image Processing (TIP)

  • This well-designed template is borrowed from this guy