-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
line_profiler orders of magnitude slower when long list with large numbers present in function #82
Comments
output of line_profiler
without line:
|
Interesting behavior.
|
python 3.7.7 moving the line outside the function being profiled also shows no slow down. trust me i tried every combination i could think of trying to isolate the issue |
This is very strange. I've reproduced the issue. The bytecode also shows no noticable difference:
I'm not sure if this is a line-profiler bug, but it is certainly peculiar. |
Any updates? |
No updates, but we do expect profiling to add some amount of overhead to the code. This does seem to be on the high side. My guess is that this will have to do with how Python is representing the function and perhaps reference counting. The I wonder if it happens if you make this a dictionary. My guess is that it will also have the slow behavior. I'm going to try some tests. |
I've written a script to help test this in a more principled way. Expand details to see testing script: """
Requirements:
pip install ubelt kwplot line_profiler rich pandas
"""
import ubelt as ub
def main():
import random
rng = random.Random(8675309)
test_dpath = ub.Path.appdir('line_profiler/tests/test_issue_82').ensuredir()
results = []
template = ub.codeblock(
'''
#from timeit import default_timer as time
from line_profiler import profile
@profile
def main():
#thisTimer = time()
for i in range(500):
9**0.5
#print('thisTimer', (round(1/(time()-thisTimer),1)if time()-thisTimer else 999), round(time()-thisTimer,6))
{const_line}
return
if __name__ == '__main__':
main()
''')
for size in [10, 300, 500, 1000, 1_500, 5_000, 10_000]:
list_numbers = [rng.randint(0, 12884571924951672) for _ in range(size)]
dict_numbers = {k: k for k in list_numbers}
list_small_numbers = [rng.randint(0, 10) for _ in range(size)]
list_same_number = [12884571924951672] * size
variants = {
'nothing': '',
'list_numbers': repr(list_numbers),
'comment_list_numbers': '# ' + repr(list_numbers),
'dict_numbers': repr(dict_numbers),
'list_small_numbers': repr(list_small_numbers),
'list_same_number': repr(list_same_number),
'tuple_numbers': repr(tuple(list_numbers)),
}
for name, const_line in variants.items():
script_fpath = test_dpath / f'{name}.py'
text = template.format(const_line=const_line)
script_fpath.write_text(text)
for LINE_PROFILE in [0, 1]:
with ub.Timer() as t:
ub.cmd(f'LINE_PROFILE={LINE_PROFILE} python {script_fpath}', verbose=3, shell=True)
row = {
'name': name,
'size': size,
'script_fpath': script_fpath,
'LINE_PROFILE': LINE_PROFILE,
'duration': t.elapsed,
}
results.append(row)
import pandas as pd
import rich
table = pd.DataFrame(results)
piv = table.pivot(index=['name', 'size'], columns=['LINE_PROFILE'], values=['duration'])
rich.print(table.to_string())
rich.print(piv.to_string())
import kwplot
sns = kwplot.autosns()
ax = sns.lineplot(data=table, x='size', y='duration', hue='LINE_PROFILE', style='name')
ax.set_yscale('log')
kwplot.show_if_requested()
if __name__ == '__main__':
"""
CommandLine:
python ~/code/line_profiler/dev/devcheck/generate_examples_issue_82.py
"""
main() This will make a copy of the above script, but put in different variants of the hard-coded constant and time them with and without line profiler active. Interestingly I'm not seeing a difference on the current machine I'm using with line profiler 4.1.4, but on my main machine I do see a difference with 4.2.0. Graph from main machine: Graph from current machine: EDIT: I've changed testing script to test more sizes and use a log scale and found a clear pattern: |
quite a weird bug that I can't seem to wrap my head around...
this is the code to recreate:
when theres a long list with large numbers, the timing returns 0.01s but without the list its 0.0005s
this slow down doesnt happen if i comment out the list, or make the list a tuple, or combine the numbers into one large number since it could be due to the long line, or split the list into multiple lists or make all the numbers the same or any other changes.
It just seems to not like this specific case
The text was updated successfully, but these errors were encountered: