Jie An  [安 捷]

Hi! I am an Applied Scientist at Amazon AGI, where I work on image and video generation models as well as omni understanding and generation models. Previously, I was a Research Scientist at Meta Reality Labs, where I worked on 3D generation and world modeling. I received my Ph.D. in Computer Science from University of Rochester, advised by Prof. Jiebo Luo, where I received the ACM SIGMM Outstanding Ph.D. Thesis Award. Earlier, I earned my bachelor's and master's (with honor) degrees in Applied Mathematics from Peking University, advised by Prof. Jinwen Ma.

I was a research intern at Apple (Seattle, 2024–2025), working with Prof. Alexander Schwing. Before that, I interned at Microsoft Cloud & AI (Redmond, 2023–2024, now part of Microsoft AI) and Meta FAIR (New York City, 2022), hosted by Dr. Zhengyuan Yang and Prof. Harry Yang.

My research primarily focuses on visual content generation. I am particularly interested in foundational generative models, text-to-video generation, physics AI/world modeling, and artistic generation/style transfer. My research is driven by a long-term vision: from developing a fundamental understanding of generative models, to building increasingly capable foundation models, and ultimately advancing toward visual artificial general intelligence (VisualAGI).

Contact: pkuanjie [at] gmail [dot] com

Email  /  Google Scholar  /  Github  /  LinkedIn  /  Name Pronounce

News

Thesis

Publications & Preprints

Latent-Reframe camera control result
Latent-Reframe camera control
Latent-Reframe: Enabling Camera Control for Video Diffusion Model without Training
Zhenghong Zhou*, Jie An*, Jiebo Luo
ICCV 2025  |  Project Page  |  BibTex

We incorporate a 3D point cloud model based on MonST3R into a video diffusion model, which enables arbitrary camera trajectory control in video generation, without training.

OpenLEAF interleaved generation result
OpenLEAF interleaved generation
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation
Jie An*, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo Luo
ACM MM (BNI Track) 2024  |  BibTex

We introduce a benchmark dataset, an evaluation pipeline, and a set of baseline models for interleaved image-text generation task.

Bring Metric Functions into Diffusion Models
Bring Metric Functions into Diffusion Models
Jie An, Zhengyuan Yang, Jianfeng Wang, Linjie Li, Zicheng Liu, Lijuan Wang, Jiebo Luo
IJCAI 2024  |  BibTex

We study how to utilize LPIPS loss in diffusion model training to improve the image generation quality.

Latent-Shift video generation result
Latent-Shift video generation
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Jie An*, Songyang Zhang*, Harry Yang, Sonal Gupta, Jia-Bin Huang, Jiebo Luo and Xi Yin
Arxiv 2023  |  Project Page  |  BibTex

We propose an efficient text-to-video generation method based on latent diffusion model and temporal shift.

QuantArt style transfer result
QuantArt style transfer
QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity
Siyu Huang*, Jie An*, Donglai Wei, Jiebo Luo and Hanspeter Pfister
CVPR 2023  |  Code  |  BibTex

QuantArt allows the style transfer model take the reference from the whole artistic picture dataset, leading to improved visual fidelity.

Make-A-Video generation result
Make-A-Video generation
Make-A-Video: Text-to-video Generation Without Text-video Data.
Uriel Singer*, Adam Polyak*, Thomas Hayes*, Xi Yin*, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta and Yaniv Taigman
ICLR 2023  |  BibTex

We propose a text-to-video generation method based on diffusion model.

Latent Space Anchoring translation result
Latent Space Anchoring translation
Domain-Scalable Unpaired Image Translation via Latent Space Anchoring
Siyu Huang*, Jie An* , Donglai Wei, Zudi Lin, Jiebo Luo and Hanspeter Pfister
TPAMI  |  Code  |  BibTex

We propose a GAN-based multi-domain image translation method that can extend to any unseen domain without the need to train the core backbone.

Facial Attribute Transformer result
Facial Attribute Transformer
Facial Attribute Transformers for Precise and Robust Makeup Transfer
Zhaoyi Wan, Haoran Chen, Jie An, Wentao Jiang, Cong Yao and Jiebo Luo
WACV 2022  |  BibTex

We propose an new transformer-based method for makeup transfer and removal.

ArtFlow style transfer result
ArtFlow style transfer
ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows
Jie An*, Siyu Huang*, Yibing Song, Dejing Dou, Wei Liu and Jiebo Luo
CVPR 2021  |  Code  |  BibTex

We propose an unbiased style transfer method based on neural flows to address the content leak issue in style transfer.

Global sentiment transfer result
Global sentiment transfer
Global Image Sentiment Transfer
Jie An, Tianlang Chen, Songyang Zhang and Jiebo Luo
ICPR 2020  |  BibTex

We propose a method to transfer the global sentiment of images.

StyleNAS photorealistic transfer result
StyleNAS photorealistic transfer
Ultrafast photorealistic style transfer via neural architecture search
Jie An*, Haoyi Xiong*, kun Huan and Jiebo Luo
AAAI 2020   (Oral Presentation)  |  Code  |  BibTex

We propose a neural architecture search framework to discover efficient architectures for photo-realistic style transfer.

Invited Talks

  • [2026/01] "On Diffusion-Based Visual Content Generation: From Base Model to Methodology and Evaluation" @ Apple, MLR
  • [2025/03] "On Diffusion-Based Visual Content Generation: From Base Model to Methodology and Evaluation" @ Meta, Reality Labs
  • [2025/02] "On Diffusion-Based Visual Content Generation: From Base Model to Methodology and Evaluation" @ Clemson University, Computer Science

Work & Internships

Amazon logo
Amazon AGI
[2025/07 - Current] Applied Scientist
Video Generation Architecture, Parallelism, and Pre-training.
Omni Understanding and Generation Models.
Reinforcement Learning for Image Generation.
Meta logo
Meta Reality Lab
[2025/05 - 2025/07] AI Research Scientist
3D Generation and World Modeling.
Microsoft logo
Microsoft Cloud & AI
[2023/02 - 2024/4] Research Intern
Advisors: Zhengyuan Yang, Jianfeng Wang, Linjie Li, Lijuan Wang, Zicheng Liu
Project: Diffusion Model and Visual-Language Generation.

Collaborators

I am fortunate to collaborate with, and more importantly learn from, the following talented collaborators:
  • [2025] Zhenghong Zhou, Ph.D., University of Rochester.
  • [2025] Jingyuan (Patrick) Chen, B.S., University of Rochester, now M.S. at University of Pennsylvania.
  • [2024] Yunlong Tang, Ph.D., University of Rochester.
  • [2023-2024] Junyu Chen, M.S., University of Rochester, now Ph.D. at University of Rochester.
  • [2023] Alexander Martin, B.S., University of Rochester, now Ph.D. at Johns Hopkins University.
  • [2023] Songyang Zhang, Ph.D., University of Rochester, now researcher at Utopic Studio.
  • [2023] Tao Li, Ph.D., Peking University.
  • [2022] Zhaoyi Wan, researcher, Megvii, now researcher at JQ Investments.
  • [2021-2023] Siyu Huang, researcher, Baidu, now professor at Clemson University.
  • [2018] Hanchao Li, intern, Megvii, now engineer at Microsoft (Redmond).

Academic Services

Conference Reviewer

  • CVPR, ICCV, ECCV, NeurIPS, ICLR, ICML, WACV, EMNLP, ACL, MM, AAAI, ICASSP, ICME

Journal Reviewer

  • TPAMI, TMM, TNNLS, APSIPA

Honors & Awards

  • [2025/10] SIGMM Award for Outstanding PhD Thesis
  • [2025/06] Selection for Doctoral Consortium, CVPR 2025
  • [2019/06] Outstanding Graduate of Beijing
  • [2018/10] Graduate student scholarship, Peking University
  • [2016/10] Graduate student scholarship, Peking University
  • [2015/06] "Guang Hua" scholarship, Peking University