Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StreamingDataset improve deletion strategy #19118

Merged
merged 53 commits into from
Dec 7, 2023
Merged
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
daf5220
update
Dec 5, 2023
b5b4d34
Merge branch 'master' into improve_speed
tchaton Dec 5, 2023
c0d9164
update
Dec 5, 2023
e627385
Merge branch 'improve_speed' of https://github.com/Lightning-AI/light…
Dec 5, 2023
007b9f9
update
Dec 5, 2023
fd10ed0
update
Dec 5, 2023
7ad008a
update
Dec 5, 2023
1f65326
update
Dec 5, 2023
36a62c9
update
Dec 5, 2023
50d2b6d
update
Dec 5, 2023
257d831
update
Dec 5, 2023
2b82d44
update
tchaton Dec 5, 2023
a318e65
update
tchaton Dec 5, 2023
428a3f5
update
tchaton Dec 5, 2023
1d984d1
update
tchaton Dec 5, 2023
fb412a8
update
tchaton Dec 5, 2023
b785957
update
tchaton Dec 5, 2023
89869f1
update
tchaton Dec 5, 2023
b9c5e53
update
tchaton Dec 5, 2023
734adf6
update
tchaton Dec 5, 2023
7980c05
update
tchaton Dec 5, 2023
9c3acb0
update
Dec 5, 2023
b59c38c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 5, 2023
a5e10b7
update
tchaton Dec 6, 2023
3e452ab
update
tchaton Dec 6, 2023
687bd5c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2023
7ca362c
update
Dec 6, 2023
033feed
update
Dec 6, 2023
e613774
update
tchaton Dec 6, 2023
08292f5
update
tchaton Dec 6, 2023
347db4e
update
Dec 6, 2023
513e554
update
tchaton Dec 6, 2023
d844e35
Merge branch 'improve_speed_2' of https://github.com/Lightning-AI/pyt…
tchaton Dec 6, 2023
5f4d6c0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2023
34269d6
update
tchaton Dec 6, 2023
d5c5d89
update
tchaton Dec 6, 2023
f5f650d
update
tchaton Dec 6, 2023
879473f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 6, 2023
2a3acf7
update
Dec 6, 2023
e3b4389
update
tchaton Dec 6, 2023
771d49c
Merge branch 'improve_speed_2' of https://github.com/Lightning-AI/pyt…
tchaton Dec 6, 2023
282c550
update
tchaton Dec 6, 2023
6956504
update
Dec 6, 2023
49bd18f
update
Dec 6, 2023
838da1b
update
Dec 6, 2023
268b908
update
Dec 6, 2023
f22085f
update
Dec 6, 2023
bbc7887
update
Dec 6, 2023
ee3c288
remove_delete
Dec 6, 2023
3ae48b6
update
Dec 6, 2023
e4c9b72
update
Dec 6, 2023
db16b49
update
Dec 6, 2023
d6ac211
tune the timeout
awaelchli Dec 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update
tchaton committed Dec 5, 2023
commit 89869f13806ac8429b57beb0a9ebee7891d7bd6c
2 changes: 1 addition & 1 deletion src/lightning/data/streaming/cache.py
Original file line number Diff line number Diff line change
@@ -51,7 +51,7 @@ def __init__(
chunk_size: Optional[int] = None,
chunk_bytes: Optional[Union[int, str]] = None,
item_loader: Optional[BaseItemLoader] = None,
max_cache_size: Union[int, str] = "10GB",
max_cache_size: Union[int, str] = "200GB",
serializers: Optional[Dict[str, Serializer]] = None,
):
"""The Cache enables to optimise dataset format for cloud training. This is done by grouping several elements
4 changes: 0 additions & 4 deletions src/lightning/data/streaming/reader.py
Original file line number Diff line number Diff line change
@@ -46,7 +46,6 @@ def __init__(self, config: ChunksConfig, item_loader, max_cache_size: Optional[i
self._to_download_queue: multiprocessing.Queue = multiprocessing.Queue()
self._to_delete_queue: multiprocessing.Queue = multiprocessing.Queue()
self._to_stop_queue: multiprocessing.Queue = multiprocessing.Queue()
self._pre_downloaded = 0

def download(self, chunk_indexes: List[int]) -> None:
"""Receive the list of the chunk indices to download for the current epoch."""
@@ -77,7 +76,6 @@ def _delete_chunks(self):
while (self._max_cache_size and self._chunks_index_to_be_deleted and total >= self._max_cache_size):
self._delete(self._chunks_index_to_be_deleted.pop(0))
total = _get_folder_size(self._parent_cache_dir)
self._pre_downloaded -= 1
else:
self._chunks_index_to_be_deleted.append(chunk_index)
except Empty:
@@ -94,8 +92,6 @@ def run(self) -> None:
try:
chunk_index = self._to_download_queue.get(timeout=0.01)
self._config.download_chunk_from_index(chunk_index)

self._pre_downloaded += 1
except Empty:
pass
except OSError as e: