Skip to content

Conversation

@kysucix
Copy link
Member

@kysucix kysucix commented May 23, 2023

Backport memory fixes from https://github.com/ClickHouse/azure-sdk-for-cpp
This is effectively silencing address sanitizers warnings we've been observing.

Includes the following PRs:
ClickHouse#4
ClickHouse#5
ClickHouse#6

Pull Request Checklist

Please leverage this checklist as a reminder to address commonly occurring feedback when submitting a pull request to make sure your PR can be reviewed quickly:

See the detailed list in the contributing guide.

  • C++ Guidelines
  • Doxygen docs
  • Unit tests
  • No unwanted commits/changes
  • Descriptive title/description
    • PR is single purpose
    • Related issue listed
  • Comments in source
  • No typos
  • Update changelog
  • Not work-in-progress
  • External references or docs updated
  • Self review of PR done
  • Any breaking changes?

@github-actions github-actions bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Storage Storage Service (Queues, Blobs, Files) labels May 23, 2023
@github-actions
Copy link

Thank you for your contribution @kysucix! We will review the pull request and get back to you soon.

@Jinming-Hu Jinming-Hu self-assigned this May 23, 2023
@Jinming-Hu
Copy link
Member

Jinming-Hu commented May 23, 2023

@kysucix Do you still see memory leak issues in the latest release?

If so, can you share a code snippet that can reproduce the leak issue?

@kysucix kysucix marked this pull request as draft May 23, 2023 09:46
@kysucix
Copy link
Member Author

kysucix commented May 23, 2023

Keeping it in draft mode as a test in win2022 is failing.

@kysucix
Copy link
Member Author

kysucix commented May 24, 2023

I can confirm that the leak is still present on latest release and latest master version (b8e6aa1).
I'm narrowing down the test as an example for the leak.

@kysucix
Copy link
Member Author

kysucix commented May 24, 2023

This is enough to trigger a memory leak in containerClient.ListBlobs(); :

  const auto serviceClient =
      Azure::Storage::Blobs::BlobServiceClient::CreateFromConnectionString(
          connectionString);

  const auto containerClient =
      serviceClient.GetBlobContainerClient(containerName);

  containerClient.ListBlobs();

@Jinming-Hu
Copy link
Member

Thanks @kysucix . We'll take some time to test the repro and review your PR.

@Jinming-Hu
Copy link
Member

@kysucix I had a quick glance at the changes in this PR. It seems to me all of proposed changes are equivalent to what it is in main branch, like using smart points for reader/writer context rather than raw void points.
Do you know which line(s) exactly caused the memory leak, and how did you fix it?

@kysucix
Copy link
Member Author

kysucix commented May 25, 2023

In the patch it's using unique_ptr instead of void *which correctly cleanup memory.
See also ClickHouse/ClickHouse#44862

@kysucix
Copy link
Member Author

kysucix commented May 25, 2023

also xmlCleanupParser(); is called once.

explicit XmlNode(
XmlNodeType type,
std::string name = std::string(),
std::string value = std::string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string value = std::string())
std::string value = {}))

Q2:
Why are you using a copy constructor for name and value and then moving them to Name? Why not pass nme and value as const& and then construct Name and Value from the reference?

In other words, something like:

explicit XmlNode(
  XmlNodeType type,
  std::string const& name={},
  std::string const& type={}) : Type(type), Name(name), Value(value) {}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's better to pass arguments as const&.

I'll fix it thanks!

Bear in mind that I'm porting a fix from https://github.com/ClickHouse/azure-sdk-for-cpp and it's not my code :).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LarryOsterman About passing by value or reference, if you look at sdk/storage/azure-storage-common/src/xml_wrapper.cpp, at all of calling sites of this constructor, we pass by right values (e.g. XmlNode(type, std::move(name), std::move(value))).

So if the constructor argument is passed by value, the actual string data is first moved from the local variable of calling site, then moved to XmlNode private members, there's no copy here.
If by const reference, we'll need to make two copies.

There are a lot of discussions on stackoverflow regarding this topic, like https://stackoverflow.com/questions/4321305/best-form-for-constructors-pass-by-value-or-reference

class XmlReader final {
public:
explicit XmlReader(const char* data, size_t length);
XmlReader(const char* data, size_t length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
XmlReader(const char* data, size_t length);
XmlReader(uint8_t const* data, size_t length);

Binary data should be std::uint8_t, not char - char should be reserved for characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML data is seldom binary. It's usually printable strings. We should use char* instead of `uint8_t* in this case, shouldn't we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is string data, it should be std::string. The incantation of pointer+length is normally used for binary data. If this is intended for signed 8 bit integers, you should add a comment explaining why this cannot be a std::string. Also why 8 bit signed integer is the correct representation of this data.

kysucix and others added 2 commits May 26, 2023 09:14
@Jinming-Hu
Copy link
Member

Jinming-Hu commented May 29, 2023

In the patch it's using unique_ptr instead of void *which correctly cleanup memory. See also ClickHouse/ClickHouse#44862

Even with void*, it's also deleted in the destructor of XmlReader and XmlWriter.

also xmlCleanupParser(); is called once.

in current implementation, XmlGlobalInitializer is a static variable, the constructor and destructor of it are also executed only once, aren't they?

Anyway, I'll dig into it. What kind of leak sanitizer are you using? I'll try to reproduce. @kysucix

@Jinming-Hu
Copy link
Member

Hi @kysucix i'm trying to repro this issue with below code

#include <azure/storage/blobs.hpp>

#include <cstdio>
#include <iostream>
#include <stdexcept>

int main()
{
  const static std::string ConnectionString = "";
  using namespace Azure::Storage::Blobs;


  int* unused_var = new int;
  (void)unused_var;

  const std::string containerName = "sample-container";
  const std::string blobName = "sample-blob";
  const std::string blobContent = "Hello Azure!";

  auto containerClient
      = BlobContainerClient::CreateFromConnectionString(ConnectionString, containerName);

  containerClient.CreateIfNotExists();

  for (int i =0 ; i < 10; ++i) {
      auto blobClient=  containerClient.GetAppendBlobClient("sample-blob-a-" + std::to_string(i));
      blobClient.CreateIfNotExists();
  }

  for (int i = 0; i < 10; ++i) {
      containerClient.ListBlobs();
  }
  return 0;
}

The output is


=================================================================
==70490==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x4e021d in operator new(unsigned long) (/home/jamis/azure-sdk-for-cpp/build/sdk/storage/azure-storage-blobs/samples/blob-getting-started+0x4e021d)
    #1 0x4e3012 in main /home/jamis/azure-sdk-for-cpp/sdk/storage/azure-storage-blobs/samples/blob_getting_started.cpp:16:21
    #2 0x7f6b7e820082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).

It can detect the leak I created on purpose to prove LeakSanitizer is correctly enabled, but cannot detect any leak from xml wrapper.

@github-actions
Copy link

github-actions bot commented Aug 4, 2023

Hi @kysucix. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

@github-actions github-actions bot added the no-recent-activity There has been no recent activity on this issue. label Aug 4, 2023
@github-actions
Copy link

Hi @kysucix. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. no-recent-activity There has been no recent activity on this issue. Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants