Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Chapter 2: Section "Download the data", buggy implementation for load_housing_data() function #156

Open
ali-moameri opened this issue Aug 22, 2024 · 1 comment

Comments

@ali-moameri
Copy link

The implementation for load_housing_data() is as following:

def load_housing_data():
    tarball_path = Path("datasets/housing.tgz")
    if not tarball_path.is_file():
        Path("datasets").mkdir(parents=True, exist_ok=True)
        url = "https://github.com/ageron/data/raw/main/housing.tgz"
        urllib.request.urlretrieve(url, tarball_path)
        with tarfile.open(tarball_path) as housing_tarball:
            housing_tarball.extractall(path="datasets")
    return pd.read_csv(Path("datasets/housing/housing.csv"))

Based on this implementation if the file datasets/housing.tgz exists, it just reads the datasets/housing/housing.csv and returns. It may be a case that datasets/housing.tgz exists but datasets/housing/housing.csv dosen't. Therefor the code will run to FileNotFoundError. The correct implementation should be like this:

def load_housing_data():
  tarfile_path = Path(f'datasets/housing.tgz')
  
  if not tarfile_path.is_file():
    Path.mkdir(Path('datasets'), parents=True, exist_ok=True)
    response = requests.get('https://github.com/ageron/data/raw/main/housing.tgz')
    with open(tarfile_path, 'wb') as f:
      f.write(response.content)

  with tarfile.open(tarfile_path) as housing_tarball:
    housing_tarball.extractall(path="datasets")
  return pd.read_csv(Path("datasets/housing/housing.csv"))

If datasets/housing.tgz exists, extract and then read it. If it dosen't, download it, extract it and then read it.

@Naseef03
Copy link

I saw the same one too. If you delete the housing folder then the code will throw an error at the read_csv part

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants