CIFAR-10がダウンロードできないときの対処法【Keras×Google Colab】

Google ColabでCIFAR-10を使おうとしたら、こんなエラーが出たことはありませんか？

Exception: URL fetch failure on https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz: 503 -- Service Unavailable

2026年5月、トロント大学が計画停電を実施したため、CIFAR-10の配布サーバーへのアクセスが一時的に不可能になりました。keras.datasets.cifar10.load_data() はこのサーバーに依存しているため、ダウンロードが失敗して学習が始められない状態になります。

「ならtfds（TensorFlow Datasets）を使えばいい」と思うかもしれませんが、残念ながら tfdsもバックエンドでトロント大学のURLを参照しており、同じエラーが発生します。

RetryError: HTTPSConnectionPool(host='www.cs.toronto.edu', port=443):
Max retries exceeded with url: /~kriz/cifar-10-binary.tar.gz
(Caused by ResponseError('too many 503 error responses'))

この記事では、トロント大学のサーバーを一切使わない代替手段としてHugging Faceを使う方法をコード付きで紹介します。今すぐColabで動かせます。

📘 この記事でわかること

tfdsも含めてトロント大学に依存している理由
Hugging Face経由でCIFAR-10を取得する方法
取得後にkerasと同じ形式で使う方法

なぜ load_data() も tfds も失敗するのか

原因は両方とも同じURLを参照しているためです。

方法	ダウンロード元URL	障害時の挙動
`keras.datasets.cifar10.load_data()`	www.cs.toronto.edu（Python版 .tar.gz）	❌ URLフェッチエラー
`tfds.load('cifar10')`	www.cs.toronto.edu（バイナリ版 .tar.gz）	❌ 503エラーでRetryError
Hugging Face `load_dataset`	Hugging Faceサーバー（Parquet形式）	✅ 正常動作

tfdsはデータセット管理ライブラリであり、データそのものをGoogleがミラーしているわけではありません。初回ダウンロード時は公式ソースであるトロント大学のサーバーへアクセスするため、障害時には同じく失敗します。

Hugging Faceの uoft-cs/cifar10 はParquet形式でHugging Faceのサーバーに直接ホストされており、トロント大学とは完全に独立しています。

Hugging Face経由で取得する方法

コード

!pip install -q datasets

from datasets import load_dataset
import numpy as np

# Hugging FaceからCIFAR-10を読み込む
ds = load_dataset('uoft-cs/cifar10')

# keras.datasets と同じ形式（NumPy配列）に変換する
def hf_to_numpy(split):
    images = np.array([np.array(item['img']) for item in split])
    labels = np.array([item['label'] for item in split]).reshape(-1, 1)
    return images, labels

x_train, y_train = hf_to_numpy(ds['train'])
x_test,  y_test  = hf_to_numpy(ds['test'])

# 正規化（既存コードに合わせる）
x_train = x_train.astype('float32') / 255.0
x_test  = x_test.astype('float32')  / 255.0

print(x_train.shape, y_train.shape)  # (50000, 32, 32, 3) (50000, 1)
print(x_test.shape,  y_test.shape)   # (10000, 32, 32, 3) (10000, 1)

実行結果をクリックして内容を開く

/usr/local/lib/python3.12/dist-packages/huggingface_hub/utils/_auth.py:93: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
WARNING:huggingface_hub.utils._http:Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
README.md: 
 5.16k/? [00:00<00:00, 233kB/s]
plain_text/train-00000-of-00001.parquet: 100%
 120M/120M [00:01<00:00, 179MB/s]
plain_text/test-00000-of-00001.parquet: 100%
 23.9M/23.9M [00:00<00:00, 120MB/s]
Generating train split: 100%
 50000/50000 [00:01<00:00, 65046.88 examples/s]
Generating test split: 100%
 10000/10000 [00:00<00:00, 19826.83 examples/s]
(50000, 32, 32, 3) (50000, 1)
(10000, 32, 32, 3) (10000, 1)

既存コードとの置き換えかた

変換後の x_train / y_train / x_test / y_test は keras.datasets の出力と完全に同じ形です。元のコードの冒頭2行を以下のように差し替えるだけで、それ以降の学習コードは一切変更不要です。

	コード
変更前	`(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()`
変更後	上記のHugging Faceコード（pip install + load_dataset + hf_to_numpy）に差し替え

まとめ

keras.datasets.cifar10.load_data() と tfds.load('cifar10') は、どちらもトロント大学のサーバーに依存しているため、障害時は両方使えなくなる
Hugging Faceの uoft-cs/cifar10 はHugging Faceのサーバーに直接ホストされており、トロント大学と無関係のため障害の影響を受けない
pip install datasets の1行と変換5行を追加するだけで、今すぐColabで使える
変換後の配列形式は keras.datasets と同一なので、学習コードはそのまま流用できる

CIFAR-10がダウンロードできないときの対処法【Keras×Google Colab】

なぜ load_data() も tfds も失敗するのか

Hugging Face経由で取得する方法

コード

既存コードとの置き換えかた

まとめ

0 件のコメント:

コメントを投稿

このブログを検索

このブログのまとめページ

カテゴリー

自己紹介

お問い合わせフォーム

プライバシーポリシー