[UX] Support --tail parameter for sky logs #4241

zpoint · 2024-11-01T09:59:05Z

Trying to add this feature: #4222

Config file:

(sky) ➜  skypilot git:(dev/zeping/support_tail_n) cat ~/Desktop/hello-sky/long_log.yaml 
# SkyPilot YAML configuration for a cluster with one node
workdir: .

resources:
  cloud: aws
  instance_type: t3.small

num_nodes: 1

setup: |
  echo "Running setup."

run: |
  i=1
  while [ $i -le 1000 ]; do
    echo "This is test log line number $i"
    i=$((i + 1))
  done

Launch command:

# Below are the output from this branching: sky/skylet/log_lib.py:436    if follow and status in [job_lib.JobStatus.SETTING_UP, job_lib.JobStatus.PENDING, job_lib.JobStatus.RUNNING, ]:
sky launch -c long_log_small ~/Desktop/hello-sky/long_log.yaml

...
(task, pid=1789) This is test log line number 985
(task, pid=1789) This is test log line number 986
(task, pid=1789) This is test log line number 987
(task, pid=1789) This is test log line number 988
(task, pid=1789) This is test log line number 989
(task, pid=1789) This is test log line number 990
(task, pid=1789) This is test log line number 991
(task, pid=1789) This is test log line number 992
(task, pid=1789) This is test log line number 993
(task, pid=1789) This is test log line number 994
(task, pid=1789) This is test log line number 995
(task, pid=1789) This is test log line number 996
(task, pid=1789) This is test log line number 997
(task, pid=1789) This is test log line number 998
(task, pid=1789) This is test log line number 999
(task, pid=1789) This is test log line number 1000
✓ Job finished (status: SUCCEEDED).

📋 Useful Commands
Job ID: 1
├── To cancel the job:          sky cancel long_log_small 1
├── To stream job logs:         sky logs long_log_small 1
└── To view job queue:          sky queue long_log_small

Cluster name: long_log_small
├── To log into the head VM:    ssh long_log_small
├── To submit a job:            sky exec long_log_small yaml_file
├── To stop the cluster:        sky stop long_log_small
└── To teardown the cluster:    sky down long_log_small

log command 1:

(sky) ➜  skypilot git:(dev/zeping/support_tail_n) ✗ sky logs long_log_small 1 --tail 10
Tailing logs of job 1 on cluster 'long_log_small'...
(task, pid=1789) This is test log line number 991
(task, pid=1789) This is test log line number 992
(task, pid=1789) This is test log line number 993
(task, pid=1789) This is test log line number 994
(task, pid=1789) This is test log line number 995
(task, pid=1789) This is test log line number 996
(task, pid=1789) This is test log line number 997
(task, pid=1789) This is test log line number 998
(task, pid=1789) This is test log line number 999
(task, pid=1789) This is test log line number 1000

log command 2:

(sky) ➜  skypilot git:(dev/zeping/support_tail_n) ✗ sky logs long_log_small 1 --tail 20
Tailing logs of job 1 on cluster 'long_log_small'...
(task, pid=1789) This is test log line number 981
(task, pid=1789) This is test log line number 982
(task, pid=1789) This is test log line number 983
(task, pid=1789) This is test log line number 984
(task, pid=1789) This is test log line number 985
(task, pid=1789) This is test log line number 986
(task, pid=1789) This is test log line number 987
(task, pid=1789) This is test log line number 988
(task, pid=1789) This is test log line number 989
(task, pid=1789) This is test log line number 990
(task, pid=1789) This is test log line number 991
(task, pid=1789) This is test log line number 992
(task, pid=1789) This is test log line number 993
(task, pid=1789) This is test log line number 994
(task, pid=1789) This is test log line number 995
(task, pid=1789) This is test log line number 996
(task, pid=1789) This is test log line number 997
(task, pid=1789) This is test log line number 998
(task, pid=1789) This is test log line number 999
(task, pid=1789) This is test log line number 1000

Tested (run the relevant ones):

Code formatting: bash format.sh
Any manual or new tests for this PR (please specify below)
All smoke tests: pytest tests/test_smoke.py
Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
Backward compatibility tests: conda deactivate; bash -i tests/backward_compatibility_tests.sh

concretevitamin · 2024-11-01T15:29:58Z

Nice. One UX thing to discuss is do we want -f to be compatible with existing tools.

So far, before this PR we didn't have -f, because it's implicit. Maybe it's ok.

zpoint · 2024-11-01T15:47:08Z

Simply adding one line would make -f valid.

@click.option(
    '-f',
    '--follow/--no-follow',
    is_flag=True,
    default=True,
    help=('Follow the logs of a job. '
          'If --no-follow is specified, print the log so far and exit. '
          '[default: --follow]'))

The difference between us and them is that their -f defaults to false, while our -f defaults to true.

sky/cli.py

yika-luo

Thanks @zpoint!

zpoint · 2024-11-04T05:50:54Z

@Michaelvll please take a look when you have time to proceed with the merge.

Michaelvll

Thanks @zpoint for adding this! Left a few comments. We need to fix them before merging for robustness : )

sky/skylet/log_lib.py

sky/skylet/job_lib.py

sky/skylet/log_lib.py

sky/skylet/job_lib.py

Michaelvll · 2024-11-06T03:11:25Z

sky/skylet/job_lib.py

+            f'\nif getattr(constants, "SKYLET_LIB_VERSION", 1) > 1: tail_log_kwargs["tail"] = {tail}',
+            '\nlog_lib.tail_logs(**tail_log_kwargs)',


Do we need these \n's?

Yes, the if statement needs to start on a new line; the Python interpreter requires that.

If we don't start with a new line:

Python 3.11.0 (main, Sep 24 2024, 21:11:22) [Clang 19.1.0 ] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> a=1;if a == 1: print(a); File "<stdin>", line 1 a=1;if a == 1: print(a); ^^ SyntaxError: invalid syntax >>>

I see! Just want to make sure these \n won't cause issues. If \n works, maybe we should just change those code generation into multiple line script instead of having them joined by ;

The multiline script requires writing a new _build method and a new _PREFIX, which causes code duplication. I would prefer to keep it as it is and replace the '\n' with a macro.

sky/skylet/log_lib.py

Michaelvll

Thanks for the update @zpoint! It seems to me that the current handling does not cover all the cases. See comments below

sky/skylet/job_lib.py

Michaelvll · 2024-11-08T01:07:20Z

sky/skylet/job_lib.py

+            f'\nif getattr(constants, "SKYLET_LIB_VERSION", 1) > 1: tail_log_kwargs["tail"] = {tail}',
+            '\nlog_lib.tail_logs(**tail_log_kwargs)',


I see! Just want to make sure these \n won't cause issues. If \n works, maybe we should just change those code generation into multiple line script instead of having them joined by ;

sky/skylet/log_lib.py

Co-authored-by: Zhanghao Wu <[email protected]>

Michaelvll

Thanks for the update @zpoint! Left a concern with the memory consumption

sky/skylet/log_lib.py

Michaelvll · 2024-11-08T08:20:04Z

sky/skylet/log_lib.py

+            if tail > 0:
+                current_file_lines = log_file.readlines()
+                lines = collections.deque(current_file_lines, maxlen=tail)
+                start_stream = _should_start_streaming(current_file_lines,
+                                                       lines, start_stream_at)
+                for line in lines:
+                    if start_stream_at in line:
+                        start_stream = True
+                    if start_stream:
+                        print(line, end='')
+                # Flush the last n lines
+                print(end='', flush=True)
+            # Now, the cursor is at the end of the last lines
+            # if tail > 0


I found this over complicated a bit. Since we are already processing the all the lines of the file, why can't we just do so by something like the following:

def _print_lines_after_start_stream_at(log_content, start_stream_at, tail) -> bool: for line_idx, line in log_content[:PEEK_HEAD_LINES_FOR_START_STREAM]: if start_stream_at in line: break else: return False tail_start_idx = len(log_content) - tail start_stream = tail_start_idx >= line_idx log_content = log_content[-tail:] for line in log_content: if start_stream_at in line: start_stream = True if start_stream: print(line, end='') print(end='', flusth=True)

Suggested change

if tail > 0:

current_file_lines = log_file.readlines()

lines = collections.deque(current_file_lines, maxlen=tail)

start_stream = _should_start_streaming(current_file_lines,

lines, start_stream_at)

for line in lines:

if start_stream_at in line:

start_stream = True

if start_stream:

print(line, end='')

# Flush the last n lines

print(end='', flush=True)

# Now, the cursor is at the end of the last lines

# if tail > 0

if tail > 0:

start_stream = _print_lines_after_start_stream_at(log_file.readlines(), start_stream_at, tail)

# Now, the cursor is at the end of the last lines

# if tail > 0

Since we can't load the whole file into memory, I changed the approach
Can't resolve this comment.

Michaelvll · 2024-11-08T08:21:06Z

sky/skylet/log_lib.py

+                    # If tail > 0, we need to read the last n lines.
+                    # We use double ended queue to rotate the last n lines.
+                    current_file_lines = f.readlines()
+                    lines = collections.deque(current_file_lines, maxlen=tail)
+                    start_stream = _should_start_streaming(
+                        current_file_lines, lines, start_stream_at)


With the function above this will become

Suggested change

# If tail > 0, we need to read the last n lines.

# We use double ended queue to rotate the last n lines.

current_file_lines = f.readlines()

lines = collections.deque(current_file_lines, maxlen=tail)

start_stream = _should_start_streaming(

current_file_lines, lines, start_stream_at)

# If tail > 0, we need to read the last n lines.

# We use double ended queue to rotate the last n lines.

_print_lines_after_start_stream_at(log_file.readlines(), start_stream_at, tail)

return

Same as above reason, don't have the whole file in memory now

Michaelvll · 2024-11-08T08:24:34Z

sky/skylet/log_lib.py

@@ -440,15 +476,39 @@ def tail_logs(job_id: Optional[int],
        with open(log_path, 'r', newline='', encoding='utf-8') as log_file:
            # Using `_follow` instead of `tail -f` to streaming the whole
            # log and creating a new process for tail.
+            if tail > 0:
+                current_file_lines = log_file.readlines()
+                lines = collections.deque(current_file_lines, maxlen=tail)


One fruit of thought: I think the original purpose we use deque is to avoid the large memory consumption for a large log file, but here we are reading the whole file into the memory, so the memory consumption will still be high. Can we think of a way to avoid the large memory consumption by log files (we have users encountered the large memory consumption on controller)?

There are two ways:

Can we avoid reading the whole file when tail is provided or not?

If needed, we can also try to switch to tail -f / tail -f -n and find some way to handle the start_stream_at

Yes, we use deque and iterable readlines() to save memory.
I guess I should keep my over complicated implementation with cases 1, 2, and 3, with only a lookahead of the log file instead of consuming the whole file.

sky/skylet/log_lib.py

Michaelvll

Thanks @zpoint for fixing all those comments! It looks good to me now.

sky/skylet/log_lib.py

Michaelvll

Minor comments before we merge it. :)

sky/skylet/log_lib.py

support -n parameter for sky logs

345a124

zpoint changed the title ~~support -n parameter for sky logs~~ [UX]support -n parameter for sky logs Nov 1, 2024

zpoint added 3 commits November 1, 2024 18:03

format

5e6a704

fix format

b22f6e8

format fix again

7666fa3

fix pylint

5c3c8b6

fix format

09b802f

yika-luo reviewed Nov 1, 2024

View reviewed changes

sky/cli.py Outdated Show resolved Hide resolved

zpoint added 3 commits November 2, 2024 11:18

rename the -n to --tail

a4070a1

fix format

323a685

remove -n

2aee245

zpoint requested a review from yika-luo November 2, 2024 03:32

yika-luo approved these changes Nov 2, 2024

View reviewed changes

Michaelvll requested changes Nov 5, 2024

View reviewed changes

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

sky/skylet/job_lib.py Outdated Show resolved Hide resolved

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

zpoint added 5 commits November 5, 2024 17:18

resolve comment

b6a7d2f

backward compatiability

c4fe5e2

pass yapf

7f0225a

backward compatiability

d5c13ec

format

1f00be8

zpoint requested a review from Michaelvll November 5, 2024 10:55

yapf

5d5a815

Michaelvll reviewed Nov 6, 2024

View reviewed changes

restore change and add comment

da6cbe7

Michaelvll reviewed Nov 8, 2024

View reviewed changes

zpoint and others added 2 commits November 8, 2024 10:21

moving the comment closer to the place

7a69541

Co-authored-by: Zhanghao Wu <[email protected]>

reslove comment

06ce10c

zpoint requested a review from Michaelvll November 8, 2024 04:25

Michaelvll reviewed Nov 8, 2024

View reviewed changes

peek the head instead of loading the whole file to memory

8c10aa0

zpoint requested a review from Michaelvll November 8, 2024 09:59

Michaelvll reviewed Nov 8, 2024

View reviewed changes

sky/skylet/log_lib.py Show resolved Hide resolved

zpoint added 2 commits November 8, 2024 18:09

bug fix

2d1d854

rephrase function name and comment

7776f75

zpoint requested a review from Michaelvll November 8, 2024 10:53

fix

a768d15

Michaelvll approved these changes Nov 8, 2024

View reviewed changes

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

remove readlines

a4cede6

Michaelvll approved these changes Nov 9, 2024

View reviewed changes

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

sky/skylet/log_lib.py Outdated Show resolved Hide resolved

reslove comment

e0c9a29

Michaelvll changed the title ~~[UX]support -n parameter for sky logs~~ [UX] Support --tail parameter for sky logs Nov 9, 2024

Michaelvll added this pull request to the merge queue Nov 9, 2024

Merged via the queue into skypilot-org:master with commit 42c79e1 Nov 9, 2024
20 checks passed

Michaelvll mentioned this pull request Nov 10, 2024

[UX] sky logs should be able to tail the last lines of the logs instead of showing all logs #4222

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UX] Support --tail parameter for sky logs #4241

[UX] Support --tail parameter for sky logs #4241

zpoint commented Nov 1, 2024 •

edited

Loading

concretevitamin commented Nov 1, 2024

zpoint commented Nov 1, 2024

yika-luo left a comment

zpoint commented Nov 4, 2024

Michaelvll left a comment

Michaelvll Nov 6, 2024

zpoint Nov 6, 2024 •

edited

Loading

Michaelvll Nov 8, 2024

zpoint Nov 8, 2024

Michaelvll left a comment

Michaelvll Nov 8, 2024

Michaelvll left a comment

Michaelvll Nov 8, 2024

zpoint Nov 8, 2024 •

edited

Loading

Michaelvll Nov 8, 2024

zpoint Nov 8, 2024

Michaelvll Nov 8, 2024

zpoint Nov 8, 2024 •

edited

Loading

Michaelvll left a comment

Michaelvll left a comment

		f'\nif getattr(constants, "SKYLET_LIB_VERSION", 1) > 1: tail_log_kwargs["tail"] = {tail}',
		'\nlog_lib.tail_logs(**tail_log_kwargs)',

[UX] Support --tail parameter for sky logs #4241

[UX] Support --tail parameter for sky logs #4241

Conversation

zpoint commented Nov 1, 2024 • edited Loading

concretevitamin commented Nov 1, 2024

zpoint commented Nov 1, 2024

yika-luo left a comment

Choose a reason for hiding this comment

zpoint commented Nov 4, 2024

Michaelvll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zpoint Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michaelvll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michaelvll left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zpoint Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zpoint Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Michaelvll left a comment

Choose a reason for hiding this comment

Michaelvll left a comment

Choose a reason for hiding this comment

zpoint commented Nov 1, 2024 •

edited

Loading

zpoint Nov 6, 2024 •

edited

Loading

zpoint Nov 8, 2024 •

edited

Loading

zpoint Nov 8, 2024 •

edited

Loading