PyTorch - 데이터준비, Dataset Colab

728x90

데이터 준비

파이토치에서는 데이터 준비를 위해 torch.utils.data의 Dataset과 DataLoader 사용 가능

Dataset에는 다양한 데이터셋이 존재(MNIST, FashionMNIST, CIFAR10, ...)

Vision Dataset, Text Dataset, Audio Dataset

DataLoader와 Dataset을 통해 batch_size, train 여부, transform 등을 인자로 넣어 데이터를 어떻게 load할 것인지 정해줄 수 있음

import torch
import numpy
from torch.utils.data import Dataset, DataLoader

토치비전(torchvision)은 파이토치에서 제공하는 데이터셋들이 모여있는 패키지
transforms: 전처리할 때 사용하는 메소드
transforms에서 제공하는 클래스 이외는 일반적으로 클래스를 따로 만들어 전처리 단계를 진행

import torchvision.transforms as transforms from torchvision import datasets

DataLoader의 인자로 들어갈 transform을 미리 정의할 수 있고, Compose를 통해 리스트 안에 순서대로 전처리 진행

ToTensor()를 하는 이뉴는 torchvision이 PIL Image형태로만 입력을 받기 때문에 데이터 처리를 위해서Tensor형으로 변환 필요

mnist_transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=(0.5,), std=(1.0))

trainset = datasets.MNIST(root='/content/', train=True, download=True,

                          transform=mnist_transform)
testset = datasets.MNIST(root='/content/',

                          train=False, download=True,
                          transform=mnist_transform)

# Trainset과 testset
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to 100% 9912422/9912422 [00:00<00:00,

57769126.64it/s]

     Extracting /content/MNIST/raw/train-images-idx3-ubyte.gz to /content/MNIST/
     Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to 100% 28881/28881 [00:00<00:00, 1614668.95it/s]

     Extracting /content/MNIST/raw/train-labels-idx1-ubyte.gz to /content/MNIST/
     Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to / 100% 1648877/1648877 [00:00<00:00,

51639647.24it/s]

     Extracting /content/MNIST/raw/t10k-images-idx3-ubyte.gz to /content/MNIST/r
     Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to / 100% 4542/4542 [00:00<00:00, 268813.29it/s]

DataLoader는 데이터 전체를 보관했다가 실제 모델 학습을 할 때 batch_size 크기만큼 데이터를 가져옴train_loader = DataLoader(trainset, batch_size=8, shuffle=True, num_workers=2)

test_loader = DataLoader(testset, batch_size=8, shuffle=False, num_workers=2)

# 윈도우 환경에서는
# dataiter = iter(train_loader)
# images, labels = dataiter.next()
# 맥 m1환경에서는
images, labels = next(iter(train_loader)) # 이유는 아직 잘 모르겠음 images.shape, labels.shape

     (torch.Size([8, 1, 28, 28]), torch.Size([8]))

torch_image = torch.squeeze(images[0]) # 차원 축소 torch_image.shape

torch.Size([28, 28])

import matplotlib.pyplot as plt

figure = plt.figure(figsize=(12, 6((
cols, row = 4, 2
for i in range(1, cols * rows +1):
	sample_idx = torch.randint(len(trainset), size=(1,)).item()
    img, label = trainset[sample_idx]
    figure.add_subplot(rows, cols, i)
    plt.title(label)
    plt.axis('off')
    plt.imshow(img.squeeze(), camp='gray')
plt.show()

'IT > 머신러닝공부' 카테고리의 다른 글

PyTorch_Tutorial_Datasets & Dataloaders (0)	2023.01.06
PyTorch_Tutorials_Tensor 공식사이트 튜토리얼 (0)	2023.01.06
PyTorch - numpy를 활용하기 간단 정리 Colab (0)	2023.01.04
PyTorch - 텐서의 조작(Manipulating the Tensor) 간단 정리 Colab (0)	2023.01.04
PyTorch 간단 소개 Colab 사용 (0)	2023.01.04

인생은패패승승승

PyTorch - 데이터준비, Dataset Colab

'IT > 머신러닝공부' 카테고리의 다른 글

티스토리툴바

PyTorch - 데이터준비, Dataset Colab

'IT > 머신러닝공부' 카테고리의 다른 글

'IT/머신러닝공부' Related Articles

티스토리툴바