자주 만나는 PyTorch 에러 해결법 모음

Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

Deep Learning study

자주 만나는 PyTorch 에러 해결법 모음 본문

AI/Pytorch

자주 만나는 PyTorch 에러 해결법 모음

illinaire 2025. 4. 17. 21:37

자주 만나는 PyTorch 에러 & 해결법 모음

포스트 요약: PyTorch 사용 중 흔히 직면하는 대표적인 에러 유형과 원인, 해결 방안을 사례별로 정리했습니다. “size mismatch”, “CUDA out of memory”, “RuntimeError: expected ...” 등 실제 코드 예시와 함께 디버깅 팁을 제시합니다.

1. 에러 총정리 표

에러 메시지	원인	해결 방법
`size mismatch`	레이어 입력·출력 차원 불일치	모델 구조·입력 형태 확인, `in_features`·`out_features` 조정
`CUDA out of memory`	GPU 메모리 부족	배치 크기 축소, `torch.cuda.empty_cache()`, mixed precision
`RuntimeError: expected ... but got ...`	함수 호출 시 인자 타입·모양 불일치	문서 확인, `.unsqueeze()`·`.view()` 등으로 텐서 조정
`grad is None`	Leaf 텐서가 아니거나 `requires_grad=False`	`retain_grad()`, `requires_grad=True` 확인
`RuntimeError: element 0 of tensors does not require grad`	손실 함수 결과가 스칼라 아님	`loss = loss_tensor.mean()` 혹은 `loss.sum()`

2. 대표 에러 사례별 상세解

2.1 size mismatch

원인: Linear 레이어의 입력 차원(in_features)과 전달된 텐서 shape이 일치하지 않음.

RuntimeError: size mismatch, m1: [64 x 512], m2: [256 x 10]

해결:

모델 정의에서 레이어 차원을 확인
```
self.fc = nn.Linear(512, 10)
```

입력 텐서 shape 점검

x.shape  # torch.Size([batch_size, 256])

필요 시 reshape/프로젝션 추가
```
x = x.view(batch_size, 512)
```

2.2 CUDA out of memory

원인: GPU로 올린 텐서가 너무 많거나 배치 크기가 큼.

RuntimeError: CUDA out of memory. Tried to allocate 2.00 GiB...

해결:

배치 크기 줄이기: batch_size=32 → 16
불필요 캐시 비우기:
```
torch.cuda.empty_cache()
```

Mixed Precision 사용:

from torch.cuda.amp import autocast, GradScaler

Gradient Accumulation: 작은 배치 여러 번 누적

2.3 RuntimeError: expected … but got …

원인: 함수에 넘긴 인자의 dtype 혹은 device 불일치.

RuntimeError: expected scalar type Float but found Double

해결:

dtype 통일:
```
tensor = tensor.float()
```
device 통일:
```
tensor = tensor.to(device)
```

2.4 grad is None

원인: 중간 텐서(Non‑leaf)에 gradient를 저장하지 않음.

print(y.grad)  # None

해결:

Leaf 텐서 만들기: x = x.clone().detach().requires_grad_(True)
retain_grad 호출:
```
y.retain_grad()
```

2.5 non-scalar loss

원인: loss.backward() 호출 시 loss가 스칼라가 아님.

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

해결:

평균 또는 합산 후 backward:
```
loss = loss_tensor.mean()
```

3. 디버깅 팁 모음

torch.autograd.set_detect_anomaly(True)로 NaN/Inf 추적
print(tensor.shape)을 통해 shape mismatch 점검
assert not torch.isnan(tensor).any()으로 NaN 조기 감지
IDE 디버거(breakpoint) 활용, 중간값 직접 확인

4. 결론 및 다음 단계

에러 발생 시, 메시지를 꼼꼼히 읽고 “원인→재현→수정” 순으로 접근하세요. 위 대표 사례 외에도, 프로젝트 특수 에러는 문서 및 GitHub 이슈를 참고해 해결할 수 있습니다.

PyTorch 공식 문서 및 Discussion 포럼 정기 확인
오픈소스 코드 분석을 통한 패턴 학습
CI/CD 파이프라인에 간단한 유닛 테스트 추가

참고 문헌

PyTorch Documentation — https://pytorch.org/docs/stable/
StackOverflow & GitHub Issues

저작자표시 비영리 변경금지 (새창열림)

'AI > Pytorch' 카테고리의 다른 글

컨볼루션 신경망(CNN) 동작 원리 한눈에 보기 (0)	2025.04.17
GPU 메모리·연산 효율 최적화 팁 (0)	2025.04.17
데이터 증강(Data Augmentation) 기법 모아보기 (0)	2025.04.17
배치 정규화(Batch Normalization)의 비밀: 원리, 구현, 활용 (0)	2025.04.17
과적합 vs 과소적합: 원인, 진단, 해결법 완전 정복 (0)	2025.04.17

'AI/Pytorch' Related Articles

Comments

Deep Learning study

자주 만나는 PyTorch 에러 해결법 모음 본문

자주 만나는 PyTorch 에러 해결법 모음

자주 만나는 PyTorch 에러 & 해결법 모음

1. 에러 총정리 표

2. 대표 에러 사례별 상세解

2.1 size mismatch

2.2 CUDA out of memory

2.3 RuntimeError: expected … but got …

2.4 grad is None

2.5 non-scalar loss

3. 디버깅 팁 모음

4. 결론 및 다음 단계

참고 문헌

'AI > Pytorch' 카테고리의 다른 글

티스토리툴바