ZMIC Journal Club

Model Compression Techniques "Less is More"

Presenter: 张杨
School of Data Science, Fudan University
2025-08-28

Model Compression
ZMIC Journal Club

Table Of Content

  • Pre-training or Post-training?
  • Pruning
  • Approximation
  • Knowledge Distillation
  • Quantization
Intro
ZMIC Journal Club

Pre-training "Compression" ?

Reducing memory usage while maintaining the model’s capability.

Pre-training
ZMIC Journal Club

Pruning


A Survey on DNN Pruning, TPAMI 2024

  • Unstructured Pruning

  • (semi-) Structured Pruning

Pruning
ZMIC Journal Club

Pruning Methods

  • Criteria
  • Workflow:
    • Prune-Retrain, e.g. LLM-Pruner (NIPS 2023)
    • Prune-Optimization, e.g. SparseGPT (ICML 2023)
      • optimize the remaining unpruned weights while keeping the mask unchanged.
Pruning
ZMIC Journal Club

SparseGPT (ICML 2023)

GPT family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy.

Pruning
ZMIC Journal Club

Approximation

Deep models are fundamentally composed of matrices.

Therefore, they can be compressed by applying matrix compression (low-rank approximation) techniques, such as truncated Singular Value Decomposition (SVD):

where is the SVD of .

Approximation
ZMIC Journal Club

Standard SVD (NIPS 2014)

Speed up layers in a CNN by a factor 2 − 13×, with negligible loss of performance:

Approximation
ZMIC Journal Club

Low rank + Sparse (CVPR 2017)

, L is a low-rank matrix and S is a sparse matrix.

Experiments:

Approximation
ZMIC Journal Club

FWSVD (ICLR 2022)

Approximation
ZMIC Journal Club

FWSVD cont'd

Fishier imformation matrix:

Experiments:

Approximation
ZMIC Journal Club

Knowledge Distillation

KD was first proposed by Hinton in NIPS 2014, Deep Learning Workshop. The prototype of KD can be traced back to KDD 2006.

KD
ZMIC Journal Club

Why KD works?

Information Leakage!

Adapted fron yibo's slides.

KD
ZMIC Journal Club

KD schemes

  • Online / Offline / Self Distillation

KD
ZMIC Journal Club

DINO (ICCV 2021)

DINO V2 (2023) and V3 (2025) have been released.

KD
ZMIC Journal Club

DINO cont'd

KD
ZMIC Journal Club

KD in LLM era

KD
ZMIC Journal Club

Deepseek R1 (arXiv 2025)

KD
ZMIC Journal Club

DeepSeek-R1-Distill-Qwen

They use DeepSeek-R1 as the teacher model to generate 800K training samples, and fine-tune several small dense models. The results are promising:

KD
ZMIC Journal Club

Quantization

The Key idea is mapping the floating-point weights and/or activation values in the model to low-precision representations, such as integers. (Quantization for DNN: A Survey)

Quantization
ZMIC Journal Club

QWEN Quantization

Quantization
ZMIC Journal Club

gpt-oss Quantization

gpt-oss-20b can run on systems with as little as 16GB memory! The magic comes from MXFP4 a new type of Block floating point.

Quantization
ZMIC Journal Club

THANKS

THANKS