Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when subparser constructed elsewhere uses inbuilt help parameter #260

Closed
tecc opened this issue Mar 5, 2023 · 11 comments · Fixed by #304
Closed

Segmentation fault when subparser constructed elsewhere uses inbuilt help parameter #260

tecc opened this issue Mar 5, 2023 · 11 comments · Fixed by #304

Comments

@tecc
Copy link

tecc commented Mar 5, 2023

The following is an example test that I created to demonstrate this issue.

#include <argparse/argparse.hpp>
#include <doctest.hpp>
#include <stdint.h>

struct SubparserContainer {
  argparse::ArgumentParser parser;
};

SubparserContainer *container = nullptr;
SubparserContainer *get_container() {
  if (container == nullptr) {
    argparse::ArgumentParser parser("subcommand");
    parser.add_description("Example");
    container = new SubparserContainer{parser};
  }
  return container;
}

TEST_CASE("Segmentation fault on help") {
  argparse::ArgumentParser program("program");
  auto *container = get_container();
  program.add_subparser(container->parser);

  program.parse_args({"program", "subcommand", "-h"});
}

A segmentation fault occurs when this is run. I've reproduced this using both GCC and Clang.

@skrobinson
Copy link
Contributor

Thank you for the test case. I like a concrete example.

It looks like the problem is that parser (in get_container) is a stack variable that is destroyed at the end of get_container. You're trying to assign the stack variable parser to the heap variable container->parser.

@tecc
Copy link
Author

tecc commented Mar 7, 2023

But it would get copied, would it not? At least as far as I know, C++ copies nearly everything by default.

@skrobinson
Copy link
Contributor

It looks like there's more to this issue than I first thought. It will be some time before I can look a this more closely.

In the mean time, one way around this is to create parser on the heap.

#include <argparse/argparse.hpp>

struct SubparserContainer {
  argparse::ArgumentParser *parser;
};

SubparserContainer *container = nullptr;
SubparserContainer *get_container() {
  if (container == nullptr) {
    argparse::ArgumentParser *parser = new argparse::ArgumentParser("subcommand");
    parser->add_description("Example");
    container = new SubparserContainer{parser};
  }
  return container;
}

int main() {
  argparse::ArgumentParser program("program");
  auto *container = get_container();
  program.add_subparser(*(container->parser));

  program.parse_args({"program", "subcommand", "-h"});
}

@tecc
Copy link
Author

tecc commented Mar 9, 2023

Yeah, I already did what I wanted to do using an alternative solution (namely my equivalent of the get_container function instead taking a reference to a parser and using a few macros to write the repeated code for me).

@lifeforce-dev
Copy link

lifeforce-dev commented Mar 10, 2023

So my opinion is that this should be an extremely high priority bug. I have a workaround but I'll continue looking at it as well when I get time. As of writing, if you attempt to use --help with any subparser that has been assigned-to where it goes through the operator=(ArgumentParser &&), it will cause a crash.

I've spent the past 2 days debugging this issue. For what its worth, this is the solution I came up with that solves it for me for the time being:

class ParserRegistry {

public:
	ParserRegistry()
	{
                // need to reserve on m_subParsers to prevent reallocation on insert which could invalidate your parsers.
	}

	argparse::ArgumentParser& GetBaseParser()
	{
		return m_baseParser;
	}

	argparse::ArgumentParser& RegisterBaseParser(const std::string& programName,
		const std::string& version)
	{
		if (m_isBaseParserSet)
			return m_baseParser;

		m_baseParser = argparse::ArgumentParser{programName, version};
		m_isBaseParserSet = true;

		return m_baseParser;
	}

	argparse::ArgumentParser& RegisterSubparser(const std::string& commandName)
	{
		auto& it = m_subParsers.emplace_back(commandName);
		m_baseParser.add_subparser(it);
		return it;
	}
	
private:
	bool m_isBaseParserSet = false;
	// First parser requested is always the base parser.
	argparse::ArgumentParser m_baseParser;
	std::vector<argparse::ArgumentParser> m_subParsers;
};

Basically I create a factory that manages the ArgumentParser memory so that it never changes underneath you. One caveat though is you need to make sure you call std::vector::reserve:: because if you don't, adding subparsers could cause it to reallocate and you'll have the same problem.

BUT, that's indicative of a design issue if that's required. I don't want to have to update that reserve number or pick an arbitrary one to prevent a crash :)

This is the minimum case you need to cause this bug:

		argparse::ArgumentParser baseParser{"program", "0.0.0.0"};
		argparse::ArgumentParser testCommand{ "file-unsafe" };
		
		// This is what causes references to be invalidated.
		testCommand = argparse::ArgumentParser{ "file-safe" };

		testCommand.add_description("File generator command description.");
		testCommand.add_argument("-p", "--path")
			.default_value("some/path/on/system")
			.required()
			.help("Specifies the path to the target output file.");
		baseParser.add_subparser(testCommand);
		try
		{
			baseParser.parse_args(argc, argv);
		}
		catch (const std::runtime_error& err)
		{
			std::cout << "error: " << err.what();
		}

Specifically testCommand = argparse::ArgumentParser{ "file-safe" };.

I'll put a larger example though that encapsulates what I'm actually trying to use that for. In that file, this line that clobbers the memory

m_commandParser = argparse::ArgumentParser{ commandName };

In debugging I found a couple of things:

  1. ArgumentParser appears to be fine after the move constructor. All of its fields seem okay.
  2. argparse::Argument on the other hand, is completely blown away.
  3. I put some comments in the larger example that explain some other interesting behavior

I suspect it has something to do with how most of the Argument functions return a reference to an Argument. So I wonder if constructing the temporary Argument parser sets up all of those references with the temporary field address, and then you move ArgumentParser with the operator=(ArgumentParser&&). Then the temporary disappears, and now all of your references are invalid.

That would explain why its only failing on the --help --version arguments.

Its just a hunch though and honestly I'm not that familiar with most of the library code being used (neat stuff though :) ) . I've been setting data breakpoints everywhere though and for example, if you can catch Argument::m_action or Argument::m_prefix_chars when the temporary is first created before the assignment, then data breakpoint at their addresses, you'll see they get blown away when the assignment happens.

Side note, I wonder if it makes sense to work with shared_ptr rather than just returning references? Since semantically, the argParse lib should share ownership with an ArgumentParser created by the client, especially if its passing references to external resources around. It needs to be sure those are kept around and it can't guarantee that with just a reference.

The scariest part of the library is add_subparser because it takes a reference to a ArgumentParser object. That means I can't even safely have a vector of sub parsers because if I'm not extremely careful it could invalidate the reference when reallocating. A shared_ptr might be a good idea for that at least, but the API would need to support it.

Taking a reference leads to some subtle restrictions in how the client handles managing parsers. Everything works great in example code written in main but when you try and integrate it into something larger and more scalable it starts to get sticky :p

Here is a larger use case that shows an attempt to piecewise construct parsers. This is really useful when you have several commands and you want to have some separation of concerns.

To get that to fail, run with file-unsafe --help. To to run successfully, run with file-safe --help

#include <argparse/argparse.hpp>
#include <iostream>
#include <algorithm>
#include <functional>
#include <memory>

class ICommand {
public:
	virtual ~ICommand() = default;
	virtual void Process() = 0;
	virtual bool Initialize() = 0;
	virtual const std::string& GetCommandName() const = 0;
	virtual argparse::ArgumentParser& GetParser() = 0;
};

class FileGeneratorCommand;
class Manager {

public:
	explicit Manager(int32_t argc, char* argv[])
		: m_baseParser{ "gen", "0.0.0.0" }
	{
		argparse::ArgumentParser testCommand{ "file-safe" };
		testCommand.add_description("File generator command description.");
		testCommand.add_argument("-t", "--test");

		// Calls the argparse::ArgumentParser(argparse::ArgumentParser&). Everything is fine after.
		std::unique_ptr<argparse::ArgumentParser> argParsePtr =
			std::make_unique< argparse::ArgumentParser>(testCommand);

		// This works just fine.
		m_baseParser.add_subparser(*argParsePtr);

		// Troubles start.
		auto& it = m_registeredCommands.emplace_back(std::make_unique<FileGeneratorCommand>(this));
		// You could call add_subparser(it) here instead of in FileGeneratorCommand,
		// but that still doesn't work. The damage is already done in the assignment operator.

		try
		{
			m_baseParser.parse_args(argc, argv);
		}
		catch (const std::runtime_error& err)
		{
			std::cout << "error: " << err.what();
		}
	}

	// Returns the base parser. Subcommands can be added from this base instance.
	argparse::ArgumentParser& GetBaseParser() { return m_baseParser; }

private:

	// Base program parser. Contains all commands/subcommands.
	argparse::ArgumentParser m_baseParser;

	std::vector<std::unique_ptr<ICommand>> m_registeredCommands;
};

class CommandBase : public ICommand {

public:
	CommandBase(Manager* buildGenManager, std::string commandName)
		: m_buildGenManager(buildGenManager)
		, m_commandName(commandName)
		, ICommand()
	{
		// Creates a temporary and then moves into m_commandParser because
		// of operator=(argparse::ArgumentParser&&); argparse::Argument data is foobar after.
		m_commandParser = argparse::ArgumentParser{ commandName };
	}
	virtual ~CommandBase() { }
	virtual const std::string& GetCommandName() const { return m_commandName; }

protected:
	// Parent.
	Manager* m_buildGenManager = nullptr;

	// The subcommand string the user will input. Used to lookup this command. i.e gen "file" <args>
	std::string m_commandName;

	// This is our parser for this subcommand.
	argparse::ArgumentParser m_commandParser;

};

class FileGeneratorCommand : public CommandBase {
public:
	~FileGeneratorCommand() { }

	FileGeneratorCommand(Manager* buildGenManager)
		: CommandBase(buildGenManager, "file-unsafe")
	{
		if (!m_buildGenManager)
			return;

		m_commandParser.add_description("File generator command description.");

		m_commandParser.add_argument("-p", "--path")
			.default_value("some/path/on/system")
			.required()
			.help("Specifies the path to the target output file.");

		m_commandParser.add_argument("-s", "--size")
			.required()
			.help("Specifies the size of the file to generator")
			.scan<'i', int64_t>();

		std::cout << "FileCommandGeneartor - FileCommandGenerator& "  << &m_commandParser << "\n";

		m_buildGenManager->GetBaseParser().add_subparser(m_commandParser);
	}
	virtual argparse::ArgumentParser& GetParser() override { return m_commandParser; }
	[[nodiscard]] virtual bool Initialize() override
	{
		return true;
	}
	virtual void Process() override
	{
		// NYI
	}
};

int main(int argc, char* argv[])
{
	Manager manager(argc, argv);
}

@skrobinson
Copy link
Contributor

@jsandersr Thank you for your analysis. My own checking has not been as extensive, but nothing I've seen disagrees with your findings.

#242 and #226 may be related. IMHO, @lingerer's suggestion to add an
Argument copy ctor might reduce (not fix) this Issue.

A further complication is that some Argument properties (e.g. m_prefix_chars) are std::string_views to ArgumentParser properties. If the ArgumentParser is moved, its Arguments lose track of these strings.

My current recommendation is to never copy or move an ArgumentParser instance.

@lifeforce-dev
Copy link

Yeah, honestly it might be enough to just mention that in the docs or something. Once you look at it enough, the design is pretty clear that its meant to have everything setup and parsed in a single function because of the references, string_views, etc. I think that's fine, there's simplicity in that.

But maybe just having a disclaimer in the docs like, "This library is meant to be used in a single function, attempting to move or copy will invalidate internal references" or something like that might cut down on bug reports for this kind of thing.

@lingerer
Copy link

Yeah, honestly it might be enough to just mention that in the docs or something. Once you look at it enough, the design is pretty clear that its meant to have everything setup and parsed in a single function because of the references, string_views, etc. I think that's fine, there's simplicity in that.

But maybe just having a disclaimer in the docs like, "This library is meant to be used in a single function, attempting to move or copy will invalidate internal references" or something like that might cut down on bug reports for this kind of thing.

If the ArgumentParse is forbided copy by design,then any copy constructor should mark as delete.
But I really don't think this lib should limit it. Why people try to copy instance is to save time for create similar parser, I believe this is what many lazy programers like me tend to do.

@lifeforce-dev
Copy link

Yeah, honestly it might be enough to just mention that in the docs or something. Once you look at it enough, the design is pretty clear that its meant to have everything setup and parsed in a single function because of the references, string_views, etc. I think that's fine, there's simplicity in that.
But maybe just having a disclaimer in the docs like, "This library is meant to be used in a single function, attempting to move or copy will invalidate internal references" or something like that might cut down on bug reports for this kind of thing.

If the ArgumentParse is forbided copy by design,then any copy constructor should mark as delete. But I really don't think this lib should limit it. Why people try to copy instance is to save time for create similar parser, I believe this is what many lazy programers like me tend to do.

Yeah I agree on the copy ctr delete. I get that its kind of a bummer of a limitation, but the biggest issue is just that it wasn't an obvious limitation. The copy ctr delete would go a long way in communicating that.

Whether or not it should be a limitation is a different discussion. For me, I was using it in an internal tool. But I had a lot of commands and so I decided to kind of break it up so that an ICommand implementer can create its own subparser and add it to the main parser. So for larger projects where you want to separate concerns, yeah its a bummer.

My solution works well enough, I just have to make sure to reserve the vector so it doesn't allocate which is kind of scary and requires contributors to read documentation. But, its fine. Its just an internal tool anyway.

I think the spirit of this library is just having a quick and easy solution for parsing command line args for cmd line tools. Ideally smaller projects are wieldy enough to do this all in main and forget about it. I think either way is fine really.

That said, I 100% agree on the copy ctr delete. I think that's a great suggestion for the immediate issue.

@lingerer
Copy link

Yeah, honestly it might be enough to just mention that in the docs or something. Once you look at it enough, the design is pretty clear that its meant to have everything setup and parsed in a single function because of the references, string_views, etc. I think that's fine, there's simplicity in that.
But maybe just having a disclaimer in the docs like, "This library is meant to be used in a single function, attempting to move or copy will invalidate internal references" or something like that might cut down on bug reports for this kind of thing.

If the ArgumentParse is forbided copy by design,then any copy constructor should mark as delete. But I really don't think this lib should limit it. Why people try to copy instance is to save time for create similar parser, I believe this is what many lazy programers like me tend to do.

Yeah I agree on the copy ctr delete. I get that its kind of a bummer of a limitation, but the biggest issue is just that it wasn't an obvious limitation. The copy ctr delete would go a long way in communicating that.

Whether or not it should be a limitation is a different discussion. For me, I was using it in an internal tool. But I had a lot of commands and so I decided to kind of break it up so that an ICommand implementer can create its own subparser and add it to the main parser. So for larger projects where you want to separate concerns, yeah its a bummer.

My solution works well enough, I just have to make sure to reserve the vector so it doesn't allocate which is kind of scary and requires contributors to read documentation. But, its fine. Its just an internal tool anyway.

I think the spirit of this library is just having a quick and easy solution for parsing command line args for cmd line tools. Ideally smaller projects are wieldy enough to do this all in main and forget about it. I think either way is fine really.

That said, I 100% agree on the copy ctr delete. I think that's a great suggestion for the immediate issue.

At least mark it delete will make people know it is not for copy and find their own way out.

@thammegowda
Copy link

thammegowda commented Nov 26, 2024

I just found out this limitation. I'd like to separate argparse into a separate function than main, as my CLI is somewhat complex with too many options.

Is it still bad to return std::shared_ptr<argparse::ArgumentParser> from a function and dereference it?

// define a function to contain CLI specs and parsing
 auto parse_args(int argc, char* argv[]) -> std::shared_ptr<argparse::ArgumentParser>;

// inside main: get a pointer and dereference it
auto& args = *parse_args(argc, argv);

// update: dereference doesn't work. Just use ptr as is
auto args = parse_args(argc, argv);

UPDATE: dereferencing didnt work. But retaining shared_ptr as it is worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants