-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[UX] Support --tail parameter for sky logs #4241
[UX] Support --tail parameter for sky logs #4241
Conversation
Nice. One UX thing to discuss is do we want
So far, before this PR we didn't have |
Simply adding one line would make -f valid. @click.option(
'-f',
'--follow/--no-follow',
is_flag=True,
default=True,
help=('Follow the logs of a job. '
'If --no-follow is specified, print the log so far and exit. '
'[default: --follow]')) The difference between us and them is that their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint!
@Michaelvll please take a look when you have time to proceed with the merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint for adding this! Left a few comments. We need to fix them before merging for robustness : )
sky/skylet/job_lib.py
Outdated
f'\nif getattr(constants, "SKYLET_LIB_VERSION", 1) > 1: tail_log_kwargs["tail"] = {tail}', | ||
'\nlog_lib.tail_logs(**tail_log_kwargs)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these \n
's?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the if statement
needs to start on a new line; the Python interpreter requires that.
If we don't start with a new line:
Python 3.11.0 (main, Sep 24 2024, 21:11:22) [Clang 19.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a=1;if a == 1: print(a);
File "<stdin>", line 1
a=1;if a == 1: print(a);
^^
SyntaxError: invalid syntax
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Just want to make sure these \n
won't cause issues. If \n
works, maybe we should just change those code generation into multiple line script instead of having them joined by ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The multiline script requires writing a new _build
method and a new _PREFIX
, which causes code duplication. I would prefer to keep it as it is and replace the '\n'
with a macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @zpoint! It seems to me that the current handling does not cover all the cases. See comments below
sky/skylet/job_lib.py
Outdated
f'\nif getattr(constants, "SKYLET_LIB_VERSION", 1) > 1: tail_log_kwargs["tail"] = {tail}', | ||
'\nlog_lib.tail_logs(**tail_log_kwargs)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! Just want to make sure these \n
won't cause issues. If \n
works, maybe we should just change those code generation into multiple line script instead of having them joined by ;
Co-authored-by: Zhanghao Wu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @zpoint! Left a concern with the memory consumption
sky/skylet/log_lib.py
Outdated
if tail > 0: | ||
current_file_lines = log_file.readlines() | ||
lines = collections.deque(current_file_lines, maxlen=tail) | ||
start_stream = _should_start_streaming(current_file_lines, | ||
lines, start_stream_at) | ||
for line in lines: | ||
if start_stream_at in line: | ||
start_stream = True | ||
if start_stream: | ||
print(line, end='') | ||
# Flush the last n lines | ||
print(end='', flush=True) | ||
# Now, the cursor is at the end of the last lines | ||
# if tail > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this over complicated a bit. Since we are already processing the all the lines of the file, why can't we just do so by something like the following:
def _print_lines_after_start_stream_at(log_content, start_stream_at, tail) -> bool:
for line_idx, line in log_content[:PEEK_HEAD_LINES_FOR_START_STREAM]:
if start_stream_at in line:
break
else:
return False
tail_start_idx = len(log_content) - tail
start_stream = tail_start_idx >= line_idx
log_content = log_content[-tail:]
for line in log_content:
if start_stream_at in line:
start_stream = True
if start_stream:
print(line, end='')
print(end='', flusth=True)
if tail > 0: | |
current_file_lines = log_file.readlines() | |
lines = collections.deque(current_file_lines, maxlen=tail) | |
start_stream = _should_start_streaming(current_file_lines, | |
lines, start_stream_at) | |
for line in lines: | |
if start_stream_at in line: | |
start_stream = True | |
if start_stream: | |
print(line, end='') | |
# Flush the last n lines | |
print(end='', flush=True) | |
# Now, the cursor is at the end of the last lines | |
# if tail > 0 | |
if tail > 0: | |
start_stream = _print_lines_after_start_stream_at(log_file.readlines(), start_stream_at, tail) | |
# Now, the cursor is at the end of the last lines | |
# if tail > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can't load the whole file into memory, I changed the approach
Can't resolve this comment.
sky/skylet/log_lib.py
Outdated
# If tail > 0, we need to read the last n lines. | ||
# We use double ended queue to rotate the last n lines. | ||
current_file_lines = f.readlines() | ||
lines = collections.deque(current_file_lines, maxlen=tail) | ||
start_stream = _should_start_streaming( | ||
current_file_lines, lines, start_stream_at) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the function above this will become
# If tail > 0, we need to read the last n lines. | |
# We use double ended queue to rotate the last n lines. | |
current_file_lines = f.readlines() | |
lines = collections.deque(current_file_lines, maxlen=tail) | |
start_stream = _should_start_streaming( | |
current_file_lines, lines, start_stream_at) | |
# If tail > 0, we need to read the last n lines. | |
# We use double ended queue to rotate the last n lines. | |
_print_lines_after_start_stream_at(log_file.readlines(), start_stream_at, tail) | |
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above reason, don't have the whole file in memory now
sky/skylet/log_lib.py
Outdated
@@ -440,15 +476,39 @@ def tail_logs(job_id: Optional[int], | |||
with open(log_path, 'r', newline='', encoding='utf-8') as log_file: | |||
# Using `_follow` instead of `tail -f` to streaming the whole | |||
# log and creating a new process for tail. | |||
if tail > 0: | |||
current_file_lines = log_file.readlines() | |||
lines = collections.deque(current_file_lines, maxlen=tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One fruit of thought: I think the original purpose we use deque
is to avoid the large memory consumption for a large log file, but here we are reading the whole file into the memory, so the memory consumption will still be high. Can we think of a way to avoid the large memory consumption by log files (we have users encountered the large memory consumption on controller)?
There are two ways:
- Can we avoid reading the whole file when tail is provided or not?
- If needed, we can also try to switch to
tail -f
/tail -f -n
and find some way to handle thestart_stream_at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we use deque
and iterable readlines()
to save memory.
I guess I should keep my over complicated
implementation with cases 1, 2, and 3, with only a lookahead of the log file instead of consuming the whole file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zpoint for fixing all those comments! It looks good to me now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments before we merge it. :)
Trying to add this feature: #4222
Config file:
Launch command:
log command 1:
log command 2:
Tested (run the relevant ones):
bash format.sh
pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh