Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline several methods to make tracing::event! smaller #2555

Merged
merged 1 commit into from
Apr 11, 2023

Conversation

ldm0
Copy link
Contributor

@ldm0 ldm0 commented Apr 11, 2023

Motivation

Make tracing::event! codegen smaller

Solution

Add inline to several functions called by tracing::event!.

Simple example: https://github.com/ldm0/tracing_test

After inlining, executable size drops from 746kb to 697kb(cargo build --release + strip), saves 50 bytes per event!.

Test environment:

toolchain: nightly-aarch64-apple-darwin
rustc-version: rustc 1.70.0-nightly (88fb1b922 2023-04-10)

@ldm0 ldm0 requested review from hawkw, carllerche, davidbarsky and a team as code owners April 11, 2023 18:03
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that inlining these functions actually improves binary size — they must all compile to fewer instructions than a function call, in release mode.

Out of curiosity, would you mind running the tracing crate's benchmarks before and after this change? It would be interesting to see if there's a meaningful performance delta in those micro benchmarks.

@ldm0
Copy link
Contributor Author

ldm0 commented Apr 11, 2023

Out of curiosity, would you mind running the tracing crate's benchmarks before and after this change? It would be interesting to see if there's a meaningful performance delta in those micro benchmarks.

Yes, there are some perf win:

event/scoped [-40.689% -40.475% -40.228%]
event/scoped_recording [-14.972% -14.685% -14.410%]
event/global [-48.412% -48.217% -48.010%]
span_fields/scoped [-25.317% -24.876% -24.494%]
span_fields/global [-39.695% -39.488% -39.242%]
span_repeated/global [-27.514% -26.633% -25.298%]
static/baseline_single_threaded [-32.275% -32.032% -31.808%]
static/single_threaded [-29.628% -29.376% -29.156%]
static/enabled_one [-29.777% -29.544% -29.305%]
static/enabled_many [-30.901% -30.504% -30.140%]
dynamic/baseline_single_threaded [-20.157% -19.880% -19.603%]

I retried benchmark several times and the improvements seem to be fairly stable.

raw log: https://gist.github.com/ldm0/6573935f4979d2645fbcf5bde7361386

@hawkw
Copy link
Member

hawkw commented Apr 11, 2023

Thanks, that looks great! I'll merge this once CI passes. :)

@hawkw hawkw merged commit db64fc2 into tokio-rs:master Apr 11, 2023
@ldm0 ldm0 deleted the ldm_inline_tweak branch April 12, 2023 05:32
@ldm0 ldm0 restored the ldm_inline_tweak branch April 12, 2023 05:32
@ldm0 ldm0 deleted the ldm_inline_tweak branch April 12, 2023 05:32
@ldm0 ldm0 restored the ldm_inline_tweak branch April 12, 2023 05:33
hawkw pushed a commit that referenced this pull request Apr 21, 2023
## Motivation

Make `tracing::event!` codegen smaller

## Solution

Add `inline` to several functions called by `tracing::event!`.

Simple example: https://github.com/ldm0/tracing_test

After inlining, executable size drops from 746kb to 697kb
(`cargo build --release + strip`), saves 50 bytes per `event!`.

Test environment:
```
toolchain: nightly-aarch64-apple-darwin
rustc-version: rustc 1.70.0-nightly (88fb1b922 2023-04-10)
```

There are also performance improvements in the benchmarks:

```
event/scoped [-40.689% -40.475% -40.228%]
event/scoped_recording [-14.972% -14.685% -14.410%]
event/global [-48.412% -48.217% -48.010%]
span_fields/scoped [-25.317% -24.876% -24.494%]
span_fields/global [-39.695% -39.488% -39.242%]
span_repeated/global [-27.514% -26.633% -25.298%]
static/baseline_single_threaded [-32.275% -32.032% -31.808%]
static/single_threaded [-29.628% -29.376% -29.156%]
static/enabled_one [-29.777% -29.544% -29.305%]
static/enabled_many [-30.901% -30.504% -30.140%]
dynamic/baseline_single_threaded [-20.157% -19.880% -19.603%]
```

I retried benchmark several times and the improvements seem to be fairly
stable.

raw log: https://gist.github.com/ldm0/6573935f4979d2645fbcf5bde7361386
hawkw pushed a commit that referenced this pull request Apr 21, 2023
## Motivation

Make `tracing::event!` codegen smaller

## Solution

Add `inline` to several functions called by `tracing::event!`.

Simple example: https://github.com/ldm0/tracing_test

After inlining, executable size drops from 746kb to 697kb
(`cargo build --release + strip`), saves 50 bytes per `event!`.

Test environment:
```
toolchain: nightly-aarch64-apple-darwin
rustc-version: rustc 1.70.0-nightly (88fb1b922 2023-04-10)
```

There are also performance improvements in the benchmarks:

```
event/scoped [-40.689% -40.475% -40.228%]
event/scoped_recording [-14.972% -14.685% -14.410%]
event/global [-48.412% -48.217% -48.010%]
span_fields/scoped [-25.317% -24.876% -24.494%]
span_fields/global [-39.695% -39.488% -39.242%]
span_repeated/global [-27.514% -26.633% -25.298%]
static/baseline_single_threaded [-32.275% -32.032% -31.808%]
static/single_threaded [-29.628% -29.376% -29.156%]
static/enabled_one [-29.777% -29.544% -29.305%]
static/enabled_many [-30.901% -30.504% -30.140%]
dynamic/baseline_single_threaded [-20.157% -19.880% -19.603%]
```

I retried benchmark several times and the improvements seem to be fairly
stable.

raw log: https://gist.github.com/ldm0/6573935f4979d2645fbcf5bde7361386
hawkw added a commit that referenced this pull request Apr 25, 2023
# 0.1.38 (April 25th, 2023)

This `tracing` release changes the `Drop` implementation for
`Instrumented` `Future`s so that the attached `Span` is entered when
dropping the `Future`. This means that events emitted by the `Future`'s
`Drop` implementation will now be recorded within its `Span`. It also
adds `#[inline]` hints to methods called in the `event!` macro's
expansion, for an improvement in both binary size and performance.

Additionally, this release updates the `tracing-attributes` dependency
to [v0.1.24][attrs-0.1.24], which updates the [`syn`] dependency to
v2.x.x. `tracing-attributes` v0.1.24 also includes improvements to the
`#[instrument]` macro; see [the `tracing-attributes` 0.1.24 release
notes][attrs-0.1.24] for details.

### Added

- `Instrumented` futures will now enter the attached `Span` in their
  `Drop` implementation, allowing events emitted when dropping the
  future to occur within the span (#2562)
- `#[inline]` attributes for methods called by the `event!` macros,
  making generated code smaller (#2555)
- **attributes**: `level` argument to `#[instrument(err)]` and
  `#[instrument(ret)]` to override the level of the generated return
  value event (#2335)
- **attributes**: Improved compiler error message when `#[instrument]`
  is added to a `const fn` (#2418)

### Changed

- `tracing-attributes`: updated to [0.1.24][attrs-0.1.24]
- Removed unneeded `cfg-if` dependency (#2553)
- **attributes**: Updated [`syn`] dependency to 2.0 (#2516)

### Fixed

- **attributes**: Fix `clippy::unreachable` warnings in
  `#[instrument]`-generated code (#2356)
- **attributes**: Removed unused "visit" feature flag from `syn`
  dependency (#2530)

### Documented

- **attributes**: Documented default level for `#[instrument(err)]`
  (#2433)
- **attributes**: Improved documentation for levels in `#[instrument]`
  (#2350)

Thanks to @nitnelave, @jsgf, @Abhicodes-crypto, @LukeMathWalker,
@andrewpollack, @quad, @klensy, @davidpdrsn, @dbidwell94, @ldm0,
@NobodyXu, @ilsv, and @daxpedda for contributing to this release!

[`syn`]: https://crates.io/crates/syn
[attrs-0.1.24]:
    https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.24
hawkw added a commit that referenced this pull request Apr 25, 2023
# 0.1.38 (April 25th, 2023)

This `tracing` release changes the `Drop` implementation for
`Instrumented` `Future`s so that the attached `Span` is entered when
dropping the `Future`. This means that events emitted by the `Future`'s
`Drop` implementation will now be recorded within its `Span`. It also
adds `#[inline]` hints to methods called in the `event!` macro's
expansion, for an improvement in both binary size and performance.

Additionally, this release updates the `tracing-attributes` dependency
to [v0.1.24][attrs-0.1.24], which updates the [`syn`] dependency to
v2.x.x. `tracing-attributes` v0.1.24 also includes improvements to the
`#[instrument]` macro; see [the `tracing-attributes` 0.1.24 release
notes][attrs-0.1.24] for details.

### Added

- `Instrumented` futures will now enter the attached `Span` in their
  `Drop` implementation, allowing events emitted when dropping the
  future to occur within the span (#2562)
- `#[inline]` attributes for methods called by the `event!` macros,
  making generated code smaller (#2555)
- **attributes**: `level` argument to `#[instrument(err)]` and
  `#[instrument(ret)]` to override the level of the generated return
  value event (#2335)
- **attributes**: Improved compiler error message when `#[instrument]`
  is added to a `const fn` (#2418)

### Changed

- `tracing-attributes`: updated to [0.1.24][attrs-0.1.24]
- Removed unneeded `cfg-if` dependency (#2553)
- **attributes**: Updated [`syn`] dependency to 2.0 (#2516)

### Fixed

- **attributes**: Fix `clippy::unreachable` warnings in
  `#[instrument]`-generated code (#2356)
- **attributes**: Removed unused "visit" feature flag from `syn`
  dependency (#2530)

### Documented

- **attributes**: Documented default level for `#[instrument(err)]`
  (#2433)
- **attributes**: Improved documentation for levels in `#[instrument]`
  (#2350)

Thanks to @nitnelave, @jsgf, @Abhicodes-crypto, @LukeMathWalker,
@andrewpollack, @quad, @klensy, @davidpdrsn, @dbidwell94, @ldm0,
@NobodyXu, @ilsv, and @daxpedda for contributing to this release!

[`syn`]: https://crates.io/crates/syn
[attrs-0.1.24]:
    https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.24
hawkw added a commit that referenced this pull request May 11, 2023
# 0.1.31 (May 11, 2023)

This release of `tracing-core` fixes a bug that caused threads which
call `dispatcher::get_default` _before_ a global default subscriber is
set to never see the global default once it is set. In addition, it
includes improvements for instrumentation performance in some cases,
especially when using a global default dispatcher.

### Fixed

- Fixed incorrect thread-local caching of `Dispatch::none` if
  `dispatcher::get_default` is called before
  `dispatcher::set_global_default` (#2593)

### Changed

- Cloning a `Dispatch` that points at a global default subscriber no
  longer requires an `Arc` reference count increment, improving
  performance substantially (#2593)
- `dispatcher::get_default` no longer attempts to access a thread local
  if the scoped dispatcher is not in use, improving performance when the
  default dispatcher is global (#2593)
- Added `#[inline]` annotations called by the `event!` and `span!`
  macros to reduce the size of macro-generated code and improve
  recording performance (#2555)

Thanks to new contributor @ldm0 for contributing to this release!
hawkw added a commit that referenced this pull request May 11, 2023
# 0.1.31 (May 11, 2023)

This release of `tracing-core` fixes a bug that caused threads which
call `dispatcher::get_default` _before_ a global default subscriber is
set to never see the global default once it is set. In addition, it
includes improvements for instrumentation performance in some cases,
especially when using a global default dispatcher.

### Fixed

- Fixed incorrect thread-local caching of `Dispatch::none` if
  `dispatcher::get_default` is called before
  `dispatcher::set_global_default` (#2593)

### Changed

- Cloning a `Dispatch` that points at a global default subscriber no
  longer requires an `Arc` reference count increment, improving
  performance substantially (#2593)
- `dispatcher::get_default` no longer attempts to access a thread local
  if the scoped dispatcher is not in use, improving performance when the
  default dispatcher is global (#2593)
- Added `#[inline]` annotations called by the `event!` and `span!`
  macros to reduce the size of macro-generated code and improve
  recording performance (#2555)

Thanks to new contributor @ldm0 for contributing to this release!
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
# 0.1.38 (April 25th, 2023)

This `tracing` release changes the `Drop` implementation for
`Instrumented` `Future`s so that the attached `Span` is entered when
dropping the `Future`. This means that events emitted by the `Future`'s
`Drop` implementation will now be recorded within its `Span`. It also
adds `#[inline]` hints to methods called in the `event!` macro's
expansion, for an improvement in both binary size and performance.

Additionally, this release updates the `tracing-attributes` dependency
to [v0.1.24][attrs-0.1.24], which updates the [`syn`] dependency to
v2.x.x. `tracing-attributes` v0.1.24 also includes improvements to the
`#[instrument]` macro; see [the `tracing-attributes` 0.1.24 release
notes][attrs-0.1.24] for details.

### Added

- `Instrumented` futures will now enter the attached `Span` in their
  `Drop` implementation, allowing events emitted when dropping the
  future to occur within the span (tokio-rs#2562)
- `#[inline]` attributes for methods called by the `event!` macros,
  making generated code smaller (tokio-rs#2555)
- **attributes**: `level` argument to `#[instrument(err)]` and
  `#[instrument(ret)]` to override the level of the generated return
  value event (tokio-rs#2335)
- **attributes**: Improved compiler error message when `#[instrument]`
  is added to a `const fn` (tokio-rs#2418)

### Changed

- `tracing-attributes`: updated to [0.1.24][attrs-0.1.24]
- Removed unneeded `cfg-if` dependency (tokio-rs#2553)
- **attributes**: Updated [`syn`] dependency to 2.0 (tokio-rs#2516)

### Fixed

- **attributes**: Fix `clippy::unreachable` warnings in
  `#[instrument]`-generated code (tokio-rs#2356)
- **attributes**: Removed unused "visit" feature flag from `syn`
  dependency (tokio-rs#2530)

### Documented

- **attributes**: Documented default level for `#[instrument(err)]`
  (tokio-rs#2433)
- **attributes**: Improved documentation for levels in `#[instrument]`
  (tokio-rs#2350)

Thanks to @nitnelave, @jsgf, @Abhicodes-crypto, @LukeMathWalker,
@andrewpollack, @quad, @klensy, @davidpdrsn, @dbidwell94, @ldm0,
@NobodyXu, @ilsv, and @daxpedda for contributing to this release!

[`syn`]: https://crates.io/crates/syn
[attrs-0.1.24]:
    https://github.com/tokio-rs/tracing/releases/tag/tracing-attributes-0.1.24
kaffarell pushed a commit to kaffarell/tracing that referenced this pull request May 22, 2024
# 0.1.31 (May 11, 2023)

This release of `tracing-core` fixes a bug that caused threads which
call `dispatcher::get_default` _before_ a global default subscriber is
set to never see the global default once it is set. In addition, it
includes improvements for instrumentation performance in some cases,
especially when using a global default dispatcher.

### Fixed

- Fixed incorrect thread-local caching of `Dispatch::none` if
  `dispatcher::get_default` is called before
  `dispatcher::set_global_default` (tokio-rs#2593)

### Changed

- Cloning a `Dispatch` that points at a global default subscriber no
  longer requires an `Arc` reference count increment, improving
  performance substantially (tokio-rs#2593)
- `dispatcher::get_default` no longer attempts to access a thread local
  if the scoped dispatcher is not in use, improving performance when the
  default dispatcher is global (tokio-rs#2593)
- Added `#[inline]` annotations called by the `event!` and `span!`
  macros to reduce the size of macro-generated code and improve
  recording performance (tokio-rs#2555)

Thanks to new contributor @ldm0 for contributing to this release!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants