Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Large Tensor] Implemented LT flag for OpPerf testing #17449

Merged
merged 59 commits into from
Feb 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
629b3cd
Passing large_tensor parameter down
connorgoggins Jan 23, 2020
61732f8
Adding large tensor testing functionality for convolutional operators
connorgoggins Jan 23, 2020
8e6c0cf
Added large tensor test functionality for conv ops
connorgoggins Jan 23, 2020
dcb6daf
Fixing sizing for conv ops
connorgoggins Jan 23, 2020
b2a7cf7
Added gemm large tensor, print on conv
connorgoggins Jan 23, 2020
59f0960
Updated input for gemm ops and print statements
connorgoggins Jan 23, 2020
052aa9b
Fixed deconv large tensor test
connorgoggins Jan 24, 2020
1a0c684
Added bias for deconv
connorgoggins Jan 24, 2020
a856579
Added test functionality for nn_activation and nn_basic ops
connorgoggins Jan 24, 2020
4974184
Fixed deconv bias, implemented large tensor test logic for general op…
connorgoggins Jan 24, 2020
0ce25f2
Dropped unnecessary print statements
connorgoggins Jan 27, 2020
9690022
Fixed lint errors
connorgoggins Jan 27, 2020
75967c2
Added large_tensor parameter to existing function descriptions, added…
connorgoggins Jan 29, 2020
f40ff81
Adding docs, changed large_tensor to int64_tensor for clarity
connorgoggins Feb 12, 2020
f7cf931
Added warmup/runs to gemm ops, debugging process failure
connorgoggins Feb 12, 2020
edc7e18
Resolved merge conficts, added default params and input switching fun…
connorgoggins Feb 25, 2020
537fbee
Dynamic input handling for default inputs, additional custom data for…
connorgoggins Feb 25, 2020
f28b3cb
Fixed RPD issue
connorgoggins Feb 25, 2020
03af393
Everything through reduction ops working
connorgoggins Feb 27, 2020
bdcb971
Passing large_tensor parameter down
connorgoggins Jan 23, 2020
f7bea09
Adding large tensor testing functionality for convolutional operators
connorgoggins Jan 23, 2020
2d5c6ad
Added large tensor test functionality for conv ops
connorgoggins Jan 23, 2020
c79440b
Fixing sizing for conv ops
connorgoggins Jan 23, 2020
225f0bf
Added gemm large tensor, print on conv
connorgoggins Jan 23, 2020
cbaca09
Updated input for gemm ops and print statements
connorgoggins Jan 23, 2020
aed27c8
Fixed deconv large tensor test
connorgoggins Jan 24, 2020
e5910a6
Added bias for deconv
connorgoggins Jan 24, 2020
aa09bba
Added test functionality for nn_activation and nn_basic ops
connorgoggins Jan 24, 2020
848a84c
Fixed deconv bias, implemented large tensor test logic for general op…
connorgoggins Jan 24, 2020
a9dd4a5
Dropped unnecessary print statements
connorgoggins Jan 27, 2020
f5af873
Fixed lint errors
connorgoggins Jan 27, 2020
ecfbe89
Added large_tensor parameter to existing function descriptions, added…
connorgoggins Jan 29, 2020
709c9ab
Adding docs, changed large_tensor to int64_tensor for clarity
connorgoggins Feb 12, 2020
430198f
Added warmup/runs to gemm ops, debugging process failure
connorgoggins Feb 12, 2020
fc7fcd5
Resolved merge conficts, added default params and input switching fun…
connorgoggins Feb 25, 2020
dde927b
Dynamic input handling for default inputs, additional custom data for…
connorgoggins Feb 25, 2020
fb29bf1
Fixed RPD issue
connorgoggins Feb 25, 2020
5dcbee2
Everything through reduction ops working
connorgoggins Feb 27, 2020
1af1d3d
Random sampling & loss ops working
connorgoggins Feb 27, 2020
d91dca8
Added indices, depth, ravel_data in default_params
connorgoggins Feb 27, 2020
729d0d0
Added indexing ops - waiting for merge on ravel
connorgoggins Feb 27, 2020
6e4c7f8
Added optimizer ops
connorgoggins Feb 27, 2020
5121d8c
Resolved merge conflicts on optimizer ops
connorgoggins Feb 27, 2020
9848f96
All misc ops working
connorgoggins Feb 28, 2020
6e802e1
Resolved merge conflicts while adding misc ops
connorgoggins Feb 28, 2020
a3fd429
All NN Basic ops working
connorgoggins Feb 28, 2020
d53388f
Fixed merge conflict on adding NN Basic ops
connorgoggins Feb 28, 2020
9689b7d
Fixed LT input for ROIPooling
connorgoggins Feb 28, 2020
c2a624a
Refactored NN Conv tests
connorgoggins Feb 28, 2020
ea96ad5
Added test for inline optimizer ops
connorgoggins Feb 28, 2020
06288f0
Dropping extra tests to decrease execution time
connorgoggins Feb 28, 2020
a2a1bf1
Switching to inline tests for RNN to support additional modes
connorgoggins Feb 28, 2020
7387f68
Added state_cell as NDArray param, removed linalg testing for int64 t…
connorgoggins Feb 28, 2020
f7bc841
Cleaned up styling
connorgoggins Feb 28, 2020
1927066
Fixed conv and deconv tests
connorgoggins Feb 28, 2020
cbb276d
Merge branch 'opperf_large_tensor_flag' of https://github.com/connorg…
connorgoggins Feb 28, 2020
52e0aea
Retrigger CI for continuous build
connorgoggins Feb 28, 2020
fec6fb2
Cleaned up GEMM op inputs
connorgoggins Feb 28, 2020
256ad70
Dropped unused param from default_params
connorgoggins Feb 28, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions benchmark/opperf/nd_operations/array_rearrange.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@
"""


def run_rearrange_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype) for all the
def run_rearrange_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and input data size (int64_tensor) for all the
rearrange operators in MXNet.

Parameters
Expand All @@ -41,6 +41,8 @@ def run_rearrange_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 25
Number of times to run for warmup
runs: int, default 100
Expand All @@ -55,5 +57,5 @@ def run_rearrange_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='
mx_rearrange_ops = get_all_rearrange_operators()

# Run benchmarks
mx_rearrange_op_results = run_op_benchmarks(mx_rearrange_ops, dtype, ctx, profiler, warmup, runs)
mx_rearrange_op_results = run_op_benchmarks(mx_rearrange_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
return mx_rearrange_op_results
26 changes: 17 additions & 9 deletions benchmark/opperf/nd_operations/binary_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@
get_all_elemen_wise_binary_operators, get_all_misc_binary_operators


def run_mx_binary_misc_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype) for all the miscellaneous
def run_mx_binary_misc_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and input data size (int64_tensor) for all the miscellaneous
binary operators in MXNet.

Parameters
Expand All @@ -48,6 +48,10 @@ def run_mx_binary_misc_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profi
Context to run benchmarks
dtype: str, default 'float32'
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 25
Number of times to run for warmup
runs: int, default 100
Expand All @@ -61,12 +65,12 @@ def run_mx_binary_misc_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profi
# Fetch all Miscellaneous Binary Operators
mx_binary_misc_ops = get_all_misc_binary_operators()
# Run benchmarks
mx_binary_op_results = run_op_benchmarks(mx_binary_misc_ops, dtype, ctx, profiler, warmup, runs)
mx_binary_op_results = run_op_benchmarks(mx_binary_misc_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
return mx_binary_op_results


def run_mx_binary_broadcast_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype) for all the binary
def run_mx_binary_broadcast_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and input data size (int64_tensor) for all the binary
broadcast operators in MXNet.

Parameters
Expand All @@ -77,6 +81,8 @@ def run_mx_binary_broadcast_operators_benchmarks(ctx=mx.cpu(), dtype='float32',
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 25
Number of times to run for warmup
runs: int, default 100
Expand All @@ -90,12 +96,12 @@ def run_mx_binary_broadcast_operators_benchmarks(ctx=mx.cpu(), dtype='float32',
# Fetch all Binary Broadcast Operators
mx_binary_broadcast_ops = get_all_broadcast_binary_operators()
# Run benchmarks
mx_binary_op_results = run_op_benchmarks(mx_binary_broadcast_ops, dtype, ctx, profiler, warmup, runs)
mx_binary_op_results = run_op_benchmarks(mx_binary_broadcast_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
return mx_binary_op_results


def run_mx_binary_element_wise_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype) for all the binary
def run_mx_binary_element_wise_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and input data size (int64_tensor) for all the binary
element_wise operators in MXNet.

Parameters
Expand All @@ -106,6 +112,8 @@ def run_mx_binary_element_wise_operators_benchmarks(ctx=mx.cpu(), dtype='float32
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 10
Number of times to run for warmup
runs: int, default 50
Expand All @@ -119,5 +127,5 @@ def run_mx_binary_element_wise_operators_benchmarks(ctx=mx.cpu(), dtype='float32
# Fetch all Binary Element_wise Operators
mx_binary_element_wise_ops = get_all_elemen_wise_binary_operators()
# Run benchmarks
mx_binary_op_results = run_op_benchmarks(mx_binary_element_wise_ops, dtype, ctx, profiler, warmup, runs)
mx_binary_op_results = run_op_benchmarks(mx_binary_element_wise_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
return mx_binary_op_results
84 changes: 59 additions & 25 deletions benchmark/opperf/nd_operations/gemm_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@
"""


def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype)for all the GEMM
def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and input data size (int64_tensor) for all the GEMM
operators (dot, batch_dot, khatri_rao) in MXNet.

Parameters
Expand All @@ -47,6 +47,8 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 25
Number of times to run for warmup
runs: int, default 100
Expand All @@ -57,43 +59,75 @@ def run_gemm_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nativ
Dictionary of results. Key -> Name of the operator, Value -> Benchmark results.

connorgoggins marked this conversation as resolved.
Show resolved Hide resolved
"""
# Benchmark tests for dot operator
standard_inputs_dot = [{"lhs": (1024, 1024),
"rhs": (1024, 1024)},
{"lhs": (1000, 10),
"rhs": (1000, 10),
"transpose_b": True},
{"lhs": (1000, 1),
"rhs": (100, 1000),
"transpose_a": True,
"transpose_b": True}]
int64_tensor_inputs_dot = [{"lhs": (2**16, 2**16),
"rhs": (2**16, 2**16)},
{"lhs": (4, 2**30),
"rhs": (4, 2**30),
"transpose_b": True},
{"lhs": (2**28, 16),
"rhs": (16, 2**28),
"transpose_a": True,
"transpose_b": True}]
standard_inputs_batch_dot = [{"lhs": (32, 1024, 1024),
"rhs": (32, 1024, 1024)},
{"lhs": (32, 1000, 10),
"rhs": (32, 1000, 10),
"transpose_b": True},
{"lhs": (32, 1000, 1),
"rhs": (32, 100, 1000),
"transpose_a": True,
"transpose_b": True}]
int64_tensor_inputs_batch_dot = [{"lhs": (1, 2**16, 2**16),
"rhs": (1, 2**16, 2**16)},
{"lhs": (1, 4, 2**30),
"rhs": (1, 4, 2**30),
"transpose_b": True},
{"lhs": (1, 2**28, 16),
"rhs": (1, 16, 2**28),
"transpose_a": True,
"transpose_b": True}]
standard_inputs_khatri_rao = [{"args": [(32, 32), (32, 32)]},
{"args": [(64, 64), (64, 64)]}]
int64_tensor_inputs_khatri_rao = [{"args": [(2**32, 1), (2**32, 1)]}]

if int64_tensor == 'on':
inputs_dot = int64_tensor_inputs_dot
inputs_batch_dot = int64_tensor_inputs_batch_dot
inputs_khatri_rao = int64_tensor_inputs_khatri_rao
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the only difference between the if and else branch is the inputs argument. Can we only generate different inputs in the if/else branch and pass them to the same operator function?

inputs_dot = standard_inputs_dot
inputs_batch_dot = standard_inputs_batch_dot
inputs_khatri_rao = standard_inputs_khatri_rao

# Benchmark tests for dot and batch_dot operators
dot_benchmark_res = run_performance_test(
[getattr(MX_OP_MODULE, "dot")], run_backward=True,
dtype=dtype, ctx=ctx,
inputs=[{"lhs": (1024, 1024),
"rhs": (1024, 1024)},
{"lhs": (1000, 10),
"rhs": (1000, 10),
"transpose_b": True},
{"lhs": (1000, 1),
"rhs": (100, 1000),
"transpose_a": True,
"transpose_b": True}],
inputs=inputs_dot,
warmup=warmup, runs=runs, profiler=profiler)
# Benchmark tests for batch_dot operator

batch_dot_benchmark_res = run_performance_test(
[getattr(MX_OP_MODULE, "batch_dot")], run_backward=True,
dtype=dtype, ctx=ctx,
inputs=[{"lhs": (32, 1024, 1024),
"rhs": (32, 1024, 1024)},
{"lhs": (32, 1000, 10),
"rhs": (32, 1000, 10),
"transpose_b": True},
{"lhs": (32, 1000, 1),
"rhs": (32, 100, 1000),
"transpose_a": True,
"transpose_b": True}],
inputs=inputs_batch_dot,
warmup=warmup, runs=runs, profiler=profiler)
# Operator khatri_rao is not yet implemented for GPU
# Operator khatri_rao is not yet implemented for GPU
khatri_rao_benchmark_res = []
if ctx != mx.gpu():
# Benchmark tests for khatri_rao operator
khatri_rao_benchmark_res = run_performance_test(
[getattr(MX_OP_MODULE, "khatri_rao")], run_backward=False,
dtype=dtype, ctx=ctx,
inputs=[{"args": [(32, 32), (32, 32)]},
{"args": [(64, 64), (64, 64)]}],
inputs=inputs_khatri_rao,
warmup=warmup, runs=runs, profiler=profiler)

# Prepare combined results for GEMM operators
Expand Down
8 changes: 5 additions & 3 deletions benchmark/opperf/nd_operations/indexing_routines.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@
"""


def run_indexing_routines_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype) for all the indexing routines
def run_indexing_routines_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and data size (int64_tensor) for all the indexing routines
in MXNet.

Parameters
Expand All @@ -47,6 +47,8 @@ def run_indexing_routines_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='na
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 25
Number of times to run for warmup
runs: int, default 100
Expand All @@ -61,5 +63,5 @@ def run_indexing_routines_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='na
mx_indexing_ops = get_all_indexing_routines()

# Run benchmarks
mx_indexing_op_results = run_op_benchmarks(mx_indexing_ops, dtype, ctx, profiler, warmup, runs)
mx_indexing_op_results = run_op_benchmarks(mx_indexing_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
return mx_indexing_op_results
8 changes: 5 additions & 3 deletions benchmark/opperf/nd_operations/linalg_operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
from benchmark.opperf.utils.common_utils import merge_map_list
from benchmark.opperf.rules.default_params import MX_OP_MODULE

def run_linalg_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', warmup=25, runs=100):
"""Runs benchmarks with the given context and precision (dtype) for all the linear algebra
def run_linalg_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='native', int64_tensor='off', warmup=25, runs=100):
"""Runs benchmarks with the given context, precision (dtype), and data size (int64_tensor) for all the linear algebra
operators in MXNet.

Parameters
Expand All @@ -46,6 +46,8 @@ def run_linalg_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nat
Precision to use for benchmarks
profiler: str, default 'native'
Type of Profiler to use (native/python)
int64_tensor: str, default 'off'
Input tensor size to use for tests (if on, dimensions >= 2**32)
warmup: int, default 25
Number of times to run for warmup
runs: int, default 100
Expand Down Expand Up @@ -74,5 +76,5 @@ def run_linalg_operators_benchmarks(ctx=mx.cpu(), dtype='float32', profiler='nat
# Fetch all Linear Algebra Operators
mx_linalg_ops = get_all_linalg_operators()
# Run benchmarks
mx_linalg_op_results = run_op_benchmarks(mx_linalg_ops, dtype, ctx, profiler, warmup, runs)
mx_linalg_op_results = run_op_benchmarks(mx_linalg_ops, dtype, ctx, profiler, int64_tensor, warmup, runs)
return merge_map_list(linalg_potrf_benchmark + [mx_linalg_op_results])
Loading