In [ ]:

Why cloudpathlib?¶

We 😍 pathlib¶

pathlib a wonderful tool for working with filesystem paths, available from the Python 3 standard library.

In [ ]:

Copied!

from pathlib import Path
from pathlib import Path

For example, we can easily list all the files in a directory.

In [ ]:

Copied!

list(Path(".").glob("*"))
list(Path(".").glob("*"))

There are methods to quickly learn everything there is to know about a filesystem path, and even do simple file manipulations.

In [ ]:

Copied!

notebook = Path("why_cloudpathlib.ipynb").resolve()
notebook = Path("why_cloudpathlib.ipynb").resolve()

In [ ]:

Copied!





print(f"{'Path:':15}{notebook}")
print(f"{'Name:':15}{notebook.name}")
print(f"{'Stem:':15}{notebook.stem}")
print(f"{'Suffix:':15}{notebook.suffix}")
print(f"{'With suffix:':15}{notebook.with_suffix('.cpp')}")
print(f"{'Parent:':15}{notebook.parent}")
print(f"{'Read_text:'}\n{notebook.read_text()[:200]}\n")
print(f"{'Path:':15}{notebook}")
print(f"{'Name:':15}{notebook.name}")
print(f"{'Stem:':15}{notebook.stem}")
print(f"{'Suffix:':15}{notebook.suffix}")
print(f"{'With suffix:':15}{notebook.with_suffix('.cpp')}")
print(f"{'Parent:':15}{notebook.parent}")
print(f"{'Read_text:'}\n{notebook.read_text()[:200]}\n")

If you're new to pathlib, we highly recommend it over the older os.path module. We find that it has a much more intuitive and convenient interface. The official documentation is a helpful reference, and we also recommend this excellent cheat sheet by Chris Moffitt.

Cross-platform support¶

One great feature about using pathlib over regular strings is that it lets you write code with cross-platform file paths. It "just works" on Windows too. Write path manipulations that can run on anyone's machine!

path = Path.home()
path
>>> C:\Users\DrivenData\

docs = path / 'Documents'
docs
>>> C:\Users\DrivenData\Documents

We also 😍 cloud storage¶

This is great, but I live in the future. Not every file I care about is on my machine. What do I do when I am working on S3? Do I have to explicitly download every file before I can do things with them?

Of course not, if you use cloudpathlib!

In [ ]:

Copied!

# load environment variables from .env file;
# not required, just where we keep our creds
from dotenv import load_dotenv, find_dotenv
# load environment variables from .env file;
# not required, just where we keep our creds
from dotenv import load_dotenv, find_dotenv

In [ ]:

Copied!

load_dotenv(find_dotenv())
load_dotenv(find_dotenv())

In [ ]:

Copied!

from cloudpathlib import S3Path
from cloudpathlib import S3Path

In [ ]:

Copied!

s3p = S3Path("s3://cloudpathlib-test-bucket/why_cloudpathlib/file.txt")
s3p.name
s3p = S3Path("s3://cloudpathlib-test-bucket/why_cloudpathlib/file.txt")
s3p.name

In [ ]:

Copied!

# Nothing there yet...
s3p.exists()
# Nothing there yet...
s3p.exists()

In [ ]:

Copied!

# Touch (just like with `pathlib.Path`)
s3p.touch()
# Touch (just like with `pathlib.Path`)
s3p.touch()

In [ ]:

Copied!

# Bingo!
s3p.exists()
# Bingo!
s3p.exists()

In [ ]:

Copied!

# list all the files in the directory
[p for p in s3p.parent.iterdir()]
# list all the files in the directory
[p for p in s3p.parent.iterdir()]

In [ ]:

Copied!

stat = s3p.stat()
print(f"File size in bytes: {stat.st_size}")
stat
stat = s3p.stat()
print(f"File size in bytes: {stat.st_size}")
stat

In [ ]:

Copied!

s3p.write_text("Hello to all of my friends!")
s3p.write_text("Hello to all of my friends!")

In [ ]:

Copied!

stat = s3p.stat()
print(f"File size in bytes: {stat.st_size}")
stat
stat = s3p.stat()
print(f"File size in bytes: {stat.st_size}")
stat

In [ ]:

Copied!

# Delete (again just like with `pathlib.Path`)
s3p.unlink()
# Delete (again just like with `pathlib.Path`)
s3p.unlink()

In [ ]:

Copied!

s3p.exists()
s3p.exists()

Cross-cloud support¶

That's cool, but I use Azure Blob Storage―what can I do?

In [ ]:

Copied!

from cloudpathlib import AzureBlobPath
from cloudpathlib import AzureBlobPath

In [ ]:

Copied!

azp = AzureBlobPath("az://cloudpathlib-test-container/file.txt")
azp.name
azp = AzureBlobPath("az://cloudpathlib-test-container/file.txt")
azp.name

In [ ]:

Copied!

azp.exists()
azp.exists()

In [ ]:

Copied!

azp.write_text("I'm on Azure, boss.")
azp.write_text("I'm on Azure, boss.")

In [ ]:

Copied!

azp.exists()
azp.exists()

In [ ]:

Copied!

# list all the files in the directory
[p for p in azp.parent.iterdir()]
# list all the files in the directory
[p for p in azp.parent.iterdir()]

In [ ]:

Copied!

azp.exists()
azp.exists()

In [ ]:

Copied!

azp.unlink()
azp.unlink()

Cloud hopping¶

Moving between cloud storage providers should be a simple as moving between disks on your computer. Let's say that the Senior Vice President of Tomfoolery comes to me and says, "We've got a mandate to migrate our application to Azure Blob Storage from S3!"

No problem, if I used cloudpathlib! The CloudPath class constructor automatically dispatches to the appropriate concrete class, the same way that pathlib.Path does for different operating systems.

In [ ]:

Copied!

from cloudpathlib import CloudPath
from cloudpathlib import CloudPath

In [ ]:

Copied!

cloud_directory = CloudPath("s3://cloudpathlib-test-bucket/why_cloudpathlib/")
cloud_directory = CloudPath("s3://cloudpathlib-test-bucket/why_cloudpathlib/")

In [ ]:

Copied!

upload = cloud_directory / "user_upload.txt"
upload.write_text("A user made this file!")
upload = cloud_directory / "user_upload.txt"
upload.write_text("A user made this file!")

In [ ]:

Copied!

assert upload.exists()
upload.unlink()
assert not upload.exists()
assert upload.exists()
upload.unlink()
assert not upload.exists()

In [ ]:

Copied!

from cloudpathlib import CloudPath
from cloudpathlib import CloudPath

In [ ]:

Copied!

# Changing this root path is the ONLY change!
cloud_directory = CloudPath("az://cloudpathlib-test-container/why_cloudpathlib/")
# Changing this root path is the ONLY change!
cloud_directory = CloudPath("az://cloudpathlib-test-container/why_cloudpathlib/")

In [ ]:

Copied!

upload = cloud_directory / "user_upload.txt"
upload.write_text("A user made this file!")
upload = cloud_directory / "user_upload.txt"
upload.write_text("A user made this file!")

In [ ]:

Copied!

assert upload.exists()
upload.unlink()
assert not upload.exists()
assert upload.exists()
upload.unlink()
assert not upload.exists()