TorchStudio provides a GenericLoader dataset which can load most common datasets formats, as long as they contains pictures, audio files or numpy tensors.

Loading classification datasets

File name based classes

For this example download and extract the Dogs vs Cats dataset composed of single folders containing both pictures of cats and dogs:

cat.0.jpg
cat.1.jpg
...
dog.0.jpg
dog.1.jpg
...

In the Dataset tab, select the torchstudio.datasets category and the GenericLoader dataset.

Then drag and drop the train folder from the Dogs vs Cats dataset into the path parameter.

Make sure the classification parameter is set to True.

Set the separator parameter to ‘.’, so that the GenericLoader knows where to read the class in the file name.

Click Load and browse the samples to make sure the dataset was properly read and interpreted.

Folder name based classes

For this example download and extract the Audio Cats and Dogs dataset composed of 2 folders (cats and dogs) containing audio files corresponding to each category:

cat/cat_1.jpg
cat/cat_2.jpg
...
dog/dog_barking_0.jpg
dog/dog_barking_1.jpg
...

In the Dataset tab, select the torchstudio.datasets category and the GenericLoader dataset.

Then drag and drop the train folder from the Audio Cats and Dogs dataset into the path parameter.

Make sure the classification parameter is set to True.

Set the separator parameter to ‘/’, so that the GenericLoader reads the classes from the folder name.

Click Load and browse the samples to make sure the dataset was properly read and interpreted.

Loading a segmentation dataset

For this example download and extract the Satellite Images of Water Bodies dataset composed of 2 folders (images and masks) containing pictures corresponding to each component:

images/water_body_1.jpg
images/water_body_2.jpg
...
masks/water_body_1.jpg
masks/water_body_2.jpg
...

In the Dataset tab, select the torchstudio.datasets category and the GenericLoader dataset.

Then drag and drop the Water Bodies Dataset folder from the Satellite Images of Water Bodies dataset into the path parameter.

Make sure the classification parameter is set to False.

Set the separator parameter to ‘/’, so that the GenericLoader reads the components from the folder name.

Click Load and browse the samples to make sure the dataset was properly read and interpreted.

Applying transforms

If needed you can apply transforms to image and audio samples.

Image transforms

Set up a classification dataset as described in File name based classes.

To make it suitable for batch training, all pictures need to be resized to a fixed dimension.

To do so, we’ll use torchvision.transforms, in particular Resize and CenterCrop. Set the transforms parameter to:

torchvision.transforms.Compose([torchvision.transforms.Resize(64),torchvision.transforms.CenterCrop(64)])

And Load the dataset.

Audio transforms

Set up an audio classification dataset such as described in Folder name based classes.

Let’s convert all audio waveforms to spectrograms, to make it suitable for picture-based classification models.

To do so, we’ll use torchaudio.transforms, in particular MelSpectrogram. Set the transforms parameter to:

torchaudio.transforms.MelSpectrogram()

And Load the dataset. You may notice that the spectrogram displays upside-down : click the settings icon next to the Bitmap renderer and set its invert parameter to True.

Tutorials