Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

Deep Learning study

과적합 vs 과소적합 심층 분석: 이론, 진단, 해결 전략 본문

AI/Deep learning 을 위한 지식

과적합 vs 과소적합 심층 분석: 이론, 진단, 해결 전략

illinaire 2025. 4. 18. 12:08

과적합(Overfitting) vs 과소적합(Underfitting) 심층 분석: 이론, 진단, 해결 전략

포스트 요약: 머신러닝 모델의 일반화 성능을 좌우하는 과소적합과 과적합 문제를 수학적 이론(편향-분산 트레이드오프)부터 실험 진단, 다양한 해결책(정규화, 데이터 증강, 조기 종료 등)까지 깊이 있게 다룹니다.

1. 서론

머신러닝 모델은 훈련 데이터에 얼마나 잘 적응하느냐에 따라 실제 적용 시 성능이 크게 달라집니다. 이 과정에서 과소적합(underfitting)과 과적합(overfitting)이라는 두 가지 주요 문제가 발생합니다. 과소적합은 모델이 데이터의 패턴을 제대로 학습하지 못하는 상태, 과적합은 훈련 데이터의 노이즈까지 학습해 새로운 데이터에 일반화되지 못하는 상태를 의미합니다.

2. 이론적 배경: 편향-분산 트레이드오프

모델의 일반화 오차는 다음과 같이 분해됩니다:

\[ \mathbb{E}_{x,y}\!\bigl[(y - \hat f(x))^2\bigr] = \underbrace{\bigl(\mathbb{E}[\hat f(x)] - f(x)\bigr)^2}_{\text{Bias}^2} + \underbrace{\mathbb{E}\bigl[(\hat f(x) - \mathbb{E}[\hat f(x)])^2\bigr]}_{\text{Variance}} + \text{Irreducible Error} \]

Bias (편향): 모델의 예측 평균이 참 함수 \(f(x)\)와 얼마나 차이나는지
Variance (분산): 훈련 세트가 바뀔 때 모델 예측이 얼마나 요동치는지
Irreducible Error: 데이터 자체의 노이즈

Bias-Variance Tradeoff — 편향-분산 트레이드오프 다이어그램 (출처: Wikimedia Commons, public domain)

3. 과소적합 vs 과적합 진단

3.1 학습 곡선 활용

훈련 및 검증 오류 변화를 관찰해 상태를 진단합니다:

상태	훈련 오류	검증 오류	대응 방안
과소적합	높음	높음	모델 복잡도↑, 학습 시간↑
적절 적합	낮음	낮음	유지
과적합	매우 낮음	상승	정규화, 데이터 증강, 조기 종료

3.2 코드 예제: Scikit‑learn 학습 곡선

from sklearn.model_selection import learning_curve
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

model = RandomForestClassifier(n_estimators=100)
train_sizes, train_scores, val_scores = learning_curve(
    model, X, y, cv=5, train_sizes=[0.1,0.3,0.5,0.7,1.0], n_jobs=-1
)

plt.plot(train_sizes, train_scores.mean(axis=1), label='Train Score')
plt.plot(train_sizes, val_scores.mean(axis=1), label='Validation Score')
plt.xlabel('Training Set Size')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

4. 과소적합 해결 전략

모델 복잡도 증가: 레이어 수·뉴런 수 확장, 비선형성 추가
학습 시간 연장: 에폭(epoch) 수 늘리기
특징 엔지니어링: 유용한 입력 변수 추가·변환
하이퍼파라미터 최적화: 학습률(learning rate)↑, 정규화(weight decay)↓

5. 과적합 완화 전략

5.1 정규화(Regularization)

가중치 크기를 제약해 과도한 적합 방지:

\(\displaystyle
\mathcal{L} = \mathcal{L}_{\text{data}} + \lambda \,\lVert w \rVert^2_2
\)

L2 정규화 (Weight Decay)
Dropout: 무작위 뉴런 비활성화

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(100,50),
    nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(50,10)
)

5.2 데이터 증강(Data Augmentation)

학습 데이터 다양성 증가로 일반화 강화 (이미지·텍스트 등):

from torchvision import transforms

aug = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()
])

5.3 조기 종료(Early Stopping)

검증 성능 개선이 멈추면 학습 중단:

best_val = float('inf')
patience, counter = 5, 0

for epoch in range(100):
    train(...)
    val_loss = validate(...)
    if val_loss < best_val:
        best_val, counter = val_loss, 0
    else:
        counter += 1
        if counter >= patience:
            break

6. 종합 비교 표

전략	적용 시점	장점	단점
모델 복잡도↑	초기	표현력↑	과적합↑ 위험
정규화	학습 시	과적합 감소	학습 속도↓
데이터 증강	데이터 준비	일반화↑	추가 연산
조기 종료	검증 시	효율적 자원 사용	최적점 미도달 가능

7. 결론 및 실습 과제

Bias-Variance 시각화 실습: 다양한 모델로 실험
정규화 세기(λ) 변화에 따른 성능 분석
데이터 증강 기법 조합 실험 및 효과 비교

참고 문헌

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
Scikit‑learn Documentation: Learning Curves — https://scikit-learn.org
Srivastava, N., et al. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR.
Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on Image Data Augmentation for Deep Learning. Journal of Big Data.

저작자표시 비영리 변경금지 (새창열림)

'AI > Deep learning 을 위한 지식' 카테고리의 다른 글

GPU 메모리·연산 효율 최적화 완전 가이드 (0)	2025.04.18
전이 학습(Transfer Learning) 심층 가이드: 이론부터 도메인 적응까지 (0)	2025.04.18
학습률 스케줄러 심층 가이드: 수학적 원리부터 실전 적용까지 (1)	2025.04.18
베이즈 정리(Bayes theorem)와 최대 우도 추정 (Maximum likelihood estimation, MLE) (0)	2020.04.12
Lipschitz function, Lipschitz constant(립시츠 함수, 립시츠 상수) (2)	2019.09.08

'AI/Deep learning 을 위한 지식' Related Articles

Comments