Hello, I tried to download the dataset from huggingface using the following command script:
from datasets import load_dataset
afrispeech = load_dataset("tobiolatunji/afrispeech-200", "all")
And I get and EOFError. Is someone else having the same issue? I am using Python 3.9.
Can you share the full stack trace? At what point are you getting this error?
Downloading and preparing dataset afri_speech/all to /Users/iseck/.cache/huggingface/datasets/tobiolatunji___afri_speech/all/1.0.0/041d7776b1a6e1fe90f0fdf148e58de8d8fa44fc176977bf3efbc5dcabb9f0c6...
Downloading data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 694.48it/s]
Extracting data files: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "/Users/iseck/Documents/carreer_growth/competitions/afrispeech_200/download_afrispeech200.py", line 3, in <module>
afrispeech = load_dataset("tobiolatunji/afrispeech-200", "all")
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/load.py", line 1691, in load_dataset
builder_instance.download_and_prepare(
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/builder.py", line 605, in download_and_prepare
self._download_and_prepare(
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/builder.py", line 1104, in _download_and_prepare
super()._download_and_prepare(dl_manager, verify_infos, check_duplicate_keys=verify_infos)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/builder.py", line 672, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "/Users/iseck/.cache/huggingface/modules/datasets_modules/datasets/tobiolatunji--afrispeech-200/041d7776b1a6e1fe90f0fdf148e58de8d8fa44fc176977bf3efbc5dcabb9f0c6/afrispeech-200.py", line 193, in _split_generators
local_extracted_archive_paths = dl_manager.extract(archive_paths) if not dl_manager.is_streaming else {}
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/download_manager.py", line 355, in extract
extracted_paths = map_nested(
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 314, in map_nested
mapped = [
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 315, in <listcomp>
_single_map_nested((function, obj, types, None, True, None))
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 269, in _single_map_nested
mapped = [_single_map_nested((function, v, types, None, True, None)) for v in pbar]
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 269, in <listcomp>
mapped = [_single_map_nested((function, v, types, None, True, None)) for v in pbar]
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 251, in _single_map_nested
return function(data_struct)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 262, in cached_path
output_path = ExtractManager(cache_dir=download_config.cache_dir).extract(
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/extract.py", line 40, in extract
self.extractor.extract(input_path, output_path, extractor=extractor)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/extract.py", line 179, in extract
return extractor.extract(input_path, output_path)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/site-packages/datasets/utils/extract.py", line 53, in extract
tar_file.extractall(output_path)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/tarfile.py", line 2045, in extractall
self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/tarfile.py", line 2086, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/tarfile.py", line 2159, in _extract_member
self.makefile(tarinfo, targetpath)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/tarfile.py", line 2208, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/tarfile.py", line 247, in copyfileobj
buf = src.read(bufsize)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/gzip.py", line 300, in read
return self._buffer.read(size)
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/Users/iseck/opt/anaconda3/envs/env_sp/lib/python3.9/gzip.py", line 506, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
I think you don't have enough free space, if you want to download the whole "tobiolatunji/afrispeech-200" dataset, you must have more than 100G free space.
try the streaming mode on the load_dataset "streaming=True", that way you can iterate over the data without downloading it.
afrispeech = load_dataset("tobiolatunji/afrispeech-200", "all", streaming=True)
https://huggingface.co/docs/datasets/v1.10.1/dataset_streaming.html
I don't think it is coming from there. I have enough space.
perhaps it's not the raison (because I'm trying helping whithout seeing all the facts) but it can still work for you, just try it
dataset = load_dataset("tobiolatunji/afrispeech-200", "all", streaming=True) `
Yes, I really appreciate that you take the time to try and help.
try
` dataset = load_dataset("tobiolatunji/afrispeech-200", "all", use_auth_token=True, streaming=True) `
Try the download again. It should work fine now