Yefan Zhou

Hi, I'm Yefan. I am a CS PhD candidate (2023- ) at Dartmouth College advised by Prof. Yaoqing Yang. I am a researcher at UC Berkeley/ICSI working with Prof. Michael Mahoney.
I earned my Master's degree in EECS at UC Berkeley.

Scholar / / / Linkedin / Email / Resume

Research

I'm interested in improving efficiency and transparency of machine learning models. My current research is focused on model diagnosis, utilizing high-dimension features such as loss landscapes, weight matrix analysis. This research contributes to

Neural network optimization ([TempBalance]), LLM fine-tuning ([TempBalance_v2]), Ensembling ([SharpBalance]), LLM MoE ([AlphaExpert])
LLM pruning ([AlphaPruning]), Model pruning ([ThreeRegime_Prune])
Post-training diagnosis and hyperparameter tuning ([MD_tree])

Selected first-author paper

	Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training {Yefan Zhou, Tianyu Pang}, Keqin Liu, Charles H. Martin, Michael Mahoney, Yaoqing Yang, NeurIPS 2023 Spotlight Paper / Code / Video [NN optimizer, Efficient training, Layer quality analysis] We introduce a new learning rate scheduler TempBalance that analyzes the layer-wise training quality via "diagnosis" and dynamically adjusts learning rates per layer. This method quantifies and utilizes the heavy-tailed structure of the weight matrix eigenspectrum.
	AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models {Haiquan Lu, Yefan Zhou}, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang NeurIPS 2024 Paper / Code / [LLM pruning, Efficient inference] We introduce a LLM pruning method AlphaPruning, that assigns the layer-wise sparsities to pre-trained LLM based on weight-analysis based metrics.
	Sharpness-diversity tradeoff: improving flat ensembles with SharpBalance {Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li}, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang NeurIPS 2024 Paper / Code / [Ensembling, Out-of-distribution, Loss landscape analysis] We discover sharpness-diversity trade-off: minimizing the sharpness in the loss landscape tends to diminish the diversity of members within the ensemble, adversely affecting the ensemble's improvement. We introduce a new ensemble training method called SharpBalance to address it.
	MD tree: a model-diagnostic tree grown on loss landscape {Yefan Zhou, Jianlong Chen}, Qinxue Cao, Konstantin Schürholt, Yaoqing Yang, ICML 2024 Paper / Code / Video [(Post-training) model diagnosis, Model selection, Scaling law, Hyperparameter tuning] This work studies how to predict the source of model failure from a set of failure modes (such as a wrong hyperparameter, inadequate model size) without knowing the training configuration of the pre-trained NN. We propose a diagnosis method called MD tree based on loss landscape metrics and demonstrate its advantage over classical validation-based approaches.
	A Three-regime model of Network Pruning Yefan Zhou, Yaoqing Yang, Arin Chang, Michael Mahoney ICML 2023 Paper / Code / Video [NN pruning, Model selection, Losslandscape analysis] The study identifies a transition phenomenon in neural network pruning, where the effect of increasing the temperature-like parameter (e.g. training epochs) depends on the value of the load-like parameter (e.g. pruning ratio), leading to different pruning outcomes. The findings are then applied to three practical scenarios, including optimizing hyperparameters for improved pruning and selecting the most suitable model for pruning.
	A Dataset-dispersion Perspective on Reconstruction versus Recognition in Single-view 3D Reconstruction Networks Yefan Zhou, Yiru Shen, Yujun Yan, Chen Feng, Yaoqing Yang 3DV 2021 arXiv / Github / 3DV 2021 / Video [Dataset diagnosis, 3D reconstruction] A SVR model can be disposed towards recognition (classification-based) or reconstruction depending on how dispersed the training data becomes. We propose "dispersion score", which is a data-driven metric used to measure the tendency of SVR models to perform recognition or reconstruction. It can also be used to diagnose problems from the training data and guide the design of data augmentation schemes.

Collaborating paper

Model Balancing Helps Low-data Training and Fine-tuning
Zihang Liu, Yuanzhe Hu, Tianyu Pang, Yefan Zhou, Pu Ren, Yaoqing Yang
EMNLP main Oral 2024
Paper / Code /

[LLM fine-tuning, Layer quality analysis]
We show that optimizer TempBalance effectively improves LLM fine-tuning, especially in the cases of low downstream data.

AlphaExpert: Assigning LoRA Experts Based on Layer Training Quality
Peijun Qing, Chongyang Gao, Yefan Zhou, Xingjian Diao, Pu Ren, Yaoqing Yang, Soroush Vosoughi
EMNLP main 2024
Paper /

[LLM efficient fine-tuning, Mixture-of-expert]
We introduced a new LoRA-MoE fine-tuning approach called AlphaExpert, which utilizes the layer-wise weight analysis to assign expert numbers.

Learn to Grasp with Less Supervision: A Data-Efficient Maximum Likelihood Grasp Sampling Loss
Xinghao Zhu, Yefan Zhou, Yongxiang Fan, Jianyu Chen, Masayoshi Tomizuka
ICRA 2022
arXiv / ICRA 2022 / Video

Empirical grasping datasets are typically sparsely labeled (i.e., a small number of successful grasp labels in each image).
We propose a maximum likelihood grasp sampling loss (MLGSL) for learning robotic grasping from sparsely labeled datasets.
MLGSL is 8× more data-efficient than SOTA with a 91.8% grasp success rate in real-world experiments.

Academic service

ICLR 2024-2025, CVPR 2024-2025, ICML 2024, AAAI 2025, NeurIPS 2023, IROS 2022, TMLR, CAPL 2024,

Website template