Skip to content

Conversation

@martinsumner
Copy link
Contributor

Alternative PR to #1040

This includes updates to riaknostic so riak admin diag will work.

sudo riak debug appears to work straight from install then start (tested on ubuntu 18)

martinsumner and others added 9 commits December 15, 2020 15:29
* Tweaks for riak-debug and hopefully Riak OTP versions

* Added leveled and tictacaae directory listings

* Remove a forgotten "fi"

* Fix a typo

* A little tidying after testing on a different OS

* Update specfile to allow post install symlink

* Added symlink to /usr/sbin for riak-debug

Co-authored-by: root <[email protected]>
Co-authored-by: Bob-The-Marauder <[email protected]>
Caused failure when installing - file is already present (perhaps from previous install)
Not starting as an app
@martinsumner martinsumner mentioned this pull request Dec 15, 2020
@Bob-The-Marauder
Copy link
Contributor

Bob-The-Marauder commented Dec 16, 2020

I'm afraid the substitution is failing as when I run riak debug, it again runs riak-debug -s so it only runs a selection of the commands instead of the full suite. It also fails to collect other information such as logs and patches. Finally it dumps to /tmp/[email protected] as a folder and finishes there rather than bundling that folder into a tar.gz file in the folder that called riak debug originally and then removing the folder from the /tmp directory. Please see output differences below:

Pull 1047 top level

[root@localhost ~]# ls /tmp/[email protected]/
commands

Pull 1040 top level

[root@localhost ~]# ls [email protected]
commands  config  logs  patches  ring

Pull 1047 commands folder

[root@localhost ~]# ls /tmp/[email protected]/commands/
blockdev._dev_sda1  df    dmesg  hostname      last   ps   sestatus  sysctl  vmstat
date                df_i  free   java_version  mount  rpm  swapon    uname   w

Pull 1040 commands folder

[root@localhost ~]# ls [email protected]/commands/
blockdev._dev_sda1  diskstats       java_version  mount               riak_ping               riak_status     sestatus
cpuinfo             dmesg           last          ps                  riak_repl_clusterstats  riak_transfers  swapon
date                free            limits.conf   redhat_release      riak_repl_connections   riak_version    sysctl
df                  hostname        limits.d      riak_diag           riak_repl_modes         rpm             uname
df_i                iptables_nat    meminfo       riak_http_stats     riak_repl_status        rx_crc_errors   vmstat
disk_by_id          iptables_rules  messages      riak_member_status  riak_ring_status        schedulers      w

If I go in to /usr/lib64/riak/bin and execute ./riak-debug in pull 1047, I get almost correct behaviour. It pulls everything it is meant to, deletes its folder in /tmp and creates a .tar.gz in the folder I executed ./riak-debug from. This direct calling is what the symlinks achieved in pull 1040.

Given the similarity, of the two pull requests,, I think we either add riaknostic to 1040 or add the symlinks to 1047.

@martinsumner
Copy link
Contributor Author

The problem with the symlink, is that I've had them crash an install (when the symlink is already present). This changes the presence of riak debug into something that is potentially harmful for the people that don't use it.

I will try and look deeper later today.

@Bob-The-Marauder
Copy link
Contributor

Could we get around the pre-existing symlink issue by adding a rm -rf /usr/sbin/riak-debug after

preun
# Pre-uninstall script

in rel/pkg/rpm/specfile and to rel/pkg/deb/debian/postrm? By deleting the symlinks on uninstall, they should not be an issue if they are re-created.

Alternatively we could try something like [ -L /usr/sbin/riak-debug ] || (ln -s /usr/lib64/riak/bin/riak-debug /usr/sbin/riak-debug ) so we only attempt to create the symlink if it doesn't exist. Obviously there would be different paths for Ubuntu/Debian vs Rhel/CentOS.

Please note the above are off the top of my head and not tested yet.

@martinsumner
Copy link
Contributor Author

As I understand it, the link is only resolving the problem in that it now allows you to run:

sudo riak-debug

i.e. this works without the link if you ran:

sudo /usr/lib/riak/bin/riak-debug

The command we want to run:

sudo riak debug

doesn't work as expected with or without the link. It in fact behaves in a strange way, acting as if you passed in config options to produce some stats and not generate a file. If you pass in config switches it behaves in bizarre ways e.g:

sudo riak debug -r

will produce the riak outputs, and will now attempt to create the tar/zip file, but will fail as it tries to dump the tar/zip file in a privileged location.

It would perhaps be understandable if sudo riak debug behaved as if no switches had been passed, but this isn't true.

So something is garbling the call to riak-debug when it goes through the main riak script. We can ignore this and run riak-debug directly (either by calling it in its actual location or making a symbolic link to the location in the path). But there must exist the concern that this is a canary for a more general problem with passing information through the relx generated riak script.

There's nothing obvious in the actual riak script to explain this:

elx_is_extension() {
    EXTENSION=$1
    case "$EXTENSION" in
        admin|debug|repl|chkconfig|undefined)
            echo "1"
        ;;
        *)
            echo "0"
        ;;
    esac
}
relx_get_extension_script() {
    EXTENSION=$1
    # below are the extensions declarations
    # of the form:
    # foo_extension="path/to/foo_script";bar_extension="path/to/bar_script"
    admin_extension="riak-admin";debug_extension="riak-debug";repl_extension="riak-repl";chkconfig_extension="riak-
chkconfig"
    # get the command extension (eg. foo) and
    # obtain the actual script filename that it
    # refers to (eg. "path/to/foo_script"
    eval echo "$"${EXTENSION}_extension""
}
relx_run_extension() {
    # drop the first argument which is the name of the
    # extension script
    EXTENSION_SCRIPT=$1
    shift
    # all extension script locations are expected to be
    # relative to the start script location
    [ "$SCRIPT_DIR/$EXTENSION_SCRIPT" ] && . "$SCRIPT_DIR/$EXTENSION_SCRIPT" "$@"
}
*)
        # check for extension
        IS_EXTENSION=$(relx_is_extension $1)
        if [ "$IS_EXTENSION" = "1" ]; then
            EXTENSION_SCRIPT=$(relx_get_extension_script $1)
            shift
            relx_run_extension $EXTENSION_SCRIPT "$@"
            # all extension scripts are expected to exit
        else
            relx_usage $1
        fi
        exit 1
        ;;
esac

and the riak-debug script is explicit what the defaults should be if nothing is passed (and this isn't happening):

if [ 0 -eq $(( $get_cfgs + $get_logs + $get_patches + $get_riakcmds + $get_yzcmds + $get_syscmds + $get_extracmds )
) ]; then
    # Nothing specific was requested, so get everything except extracmds
    get_cfgs=1
    get_logs=1
    get_patches=1
    get_riakcmds=1
    get_yzcmds=1
    get_syscmds=1
    get_extracmds=0
fi

@martinsumner
Copy link
Contributor Author

So I don't think this is because the wrong switches are being passed in e.g. "-s"

If we're running riak debug we're doing this as the riak user. I think the script is failing at some point (probably whilst getting the system data, and as the system data is the first thing to run that's why it looks like it is as if just that switch has been passed). This may be because of privileges - whereas if we run sudo riak-debug directly were doing this as root.

@Bob-The-Marauder
Copy link
Contributor

That's exactly where my previous work got me to. I couldn't find for the life of me why riak debug called riak-debug -s and then stopped working at the /tmp directory level. I, possibly foolishly, thought it was only related to riak debug and, given time constraints, opted to find a temporary work around rather than a solution. Hence the symlinks.

What you've expressed here has raised a concern that there is more potential available for riak [something] to garble an argument or pass the wrong one to commands other than riak debug.

@martinsumner
Copy link
Contributor Author

OK, so it isn't running riak-debug -s, it is running riak-debug - however it is crashing when in that part of the debug script(the collecting of system information), so it looks like it runs -s as that is the only output you see.

This is why it doesn't try to produce the tar file, the script has already exited before it gets to that point.

So running the script direct works, but crashes when run via usr/sbin/riak and the relx generated start script.

Even if we stop the su - riak when we run riak debug, it still crashes. The problem is related to this in relx:

# all extension script locations are expected to be
    # relative to the start script location
    [ "$SCRIPT_DIR/$EXTENSION_SCRIPT" ] && . "$SCRIPT_DIR/$EXTENSION_SCRIPT" "$@"

if we instead use:

    [ "$SCRIPT_DIR/$EXTENSION_SCRIPT" ] && "$SCRIPT_DIR/$EXTENSION_SCRIPT" "$@"

The script will now run through to completion (but with errors). So it is this use of ". "$SCRIPT" to call script rather than simply "$SCRIPT" that causes the script to fail.

Precisely what this notation means is hard to discover via google. But obviously, we don't really want to mess about in the relx extensions stuff - so perhaps it might be better to just try and treat debug differently within usr\sbin\riak

Not as an extension script.

There appeasr to be problems with the way relx starts extension scripts from a relative file location (and possibly also as a riak user)
@martinsumner
Copy link
Contributor Author

The branch has bene updated to bypass the relx generated script, without using symlinks.

mas@riak-ubuntu18:~/dbroot/riak$ sudo dpkg -i rel/pkg/packages/riak_3.0.2dbg1-OTP22.3_amd64.deb 
Selecting previously unselected package riak.
(Reading database ... 184169 files and directories currently installed.)
Preparing to unpack .../riak_3.0.2dbg1-OTP22.3_amd64.deb ...
Unpacking riak (3.0.2dbg1-OTP22.3) ...
Setting up riak (3.0.2dbg1-OTP22.3) ...
Processing triggers for ureadahead (0.100.0-21) ...
Processing triggers for systemd (237-3ubuntu10.42) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
mas@riak-ubuntu18:~/dbroot/riak$ sudo service riak start
mas@riak-ubuntu18:~/dbroot/riak$ sudo riak debug
......................EEEEEE.................................E. ~/dbroot/riak/[email protected]
mas@riak-ubuntu18:~/dbroot/riak$

So there are still some errors when running riak-debug, but the script will run through to the end.

@martinsumner
Copy link
Contributor Author

@martincox @Bob-The-Marauder any objections to proceeding with this version of the PR?

Copy link
Contributor

@Bob-The-Marauder Bob-The-Marauder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just tested on CentOS. Works fine when executed as root or via sudo. When called as the riak user, it fails with:

bash-4.2$ whoami
riak
bash-4.2$ cd ~/
bash-4.2$ riak debug
Usage: riak {start|start_boot <file>|foreground|stop|restart|reboot|pid|ping|console|console_clean|console_boot <file>|attach|remote_console|upgrade|downgrade|install|uninstall|versions|escript|rpc|rpcterms|eval|status|admin|repl|chkconfig}

but I don't think that many people will be doing this as the riak user. Please proceed with this version of the PR.

@martinsumner martinsumner added the 3.0.2 to be included in release 3.0.2 label Dec 17, 2020
@martinsumner martinsumner merged commit 4bfbe3b into develop-3.0 Dec 17, 2020
@martinsumner martinsumner deleted the mas-i1037-riakdiag branch December 21, 2022 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.0.2 to be included in release 3.0.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants