Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: trace::get_commentry: Invalid communicator index. #8

Open
villator opened this issue Nov 22, 2017 · 0 comments
Open

Error: trace::get_commentry: Invalid communicator index. #8

villator opened this issue Nov 22, 2017 · 0 comments

Comments

@villator
Copy link

Hi all, I'm playing with dumpi2otf and the example found at http://mpi.deino.net/mpi_functions/MPI_Intercomm_create.html, which crashes and issues the message "Error: trace::get_commentry: Invalid communicator index."

I'm wondering how to avoid this issue. Maybe this is a bug of libundumpi.

My analysis is the following:

  1. "dumpi2otf -f dumpi-2017.11.22.12.28.10.meta -o TRACE.otf" returns the following lines (non-important lines are omitted):

MPI_Comm_split entering at walltime 79071.672523400, cputime 0.019312436 seconds in thread 0.
MPI_Comm oldcomm=2 (MPI_COMM_WORLD)
int color=0
int key=0
MPI_Comm newcomm=4 (user-defined-comm)
MPI_Comm_split returning at walltime 79071.672922008, cputime 0.019510788 seconds in thread 0.

MPI_Intercomm_create entering at walltime 79071.672924778, cputime 0.019513582 seconds in thread 0.
MPI_Comm localcomm=4 (user-defined-comm)
int localleader=0
MPI_Comm remotecomm=2 (MPI_COMM_WORLD)
int remoteleader=1
int tag=52
MPI_Comm newcomm=5 (user-defined-comm)
MPI_Intercomm_create returning at walltime 79071.766963076, cputime 0.064398238 seconds in thread 0.

MPI_Comm_free entering at walltime 79071.767282900, cputime 0.064599472 seconds in thread 0.
MPI_Comm comm=4 (user-defined-comm)
MPI_Comm_free returning at walltime 79071.767291600, cputime 0.064608163 seconds in thread 0.

MPI_Comm_free entering at walltime 79071.767293619, cputime 0.064610199 seconds in thread 0.
MPI_Comm comm=5 (user-defined-comm)
MPI_Comm_free returning at walltime 79071.767298083, cputime 0.064614646 seconds in thread 0.

MPI_Finalize entering at walltime 79071.767299452, cputime 0.064616031 seconds in thread 0.
MPI_Finalize returning at walltime 79071.771893553, cputime 0.066521156 seconds in thread 0.

  1. After debugging, I think the problem is the function MPI_Comm_free that does not match the communicators created by MPI_Intercomm_create. dumpi2otf crashes in the function:
int trace::handle_comm_free(const dumpi_comm_free *prm, uint16_t thread,
            const dumpi_time *cpu, const dumpi_time *wall,
            const dumpi_perfinfo *perf, void *userarg) {
        (void) prm;
        (void) thread;
        (void) cpu;
        (void) wall, (void) perf;
        (void) userarg;
        trace *self = (trace*) userarg;
        std::cerr << "DEBUG:  trace::handle_comm_free comm=" << prm->comm << std::endl;
        commentry &the_comm = self->get_commentry(prm->comm); /* the_comm IS NOT FOUND */
        std::cerr << "DEBUG:  success" << std::endl;
        the_comm.freed = wall->stop;
        return 1;
}
trace::commentry& trace::get_commentry(int commhandle) {
        std::pair<commmap_t::iterator, commmap_t::iterator> it =
                this->comms_.equal_range(commhandle);
        if (it.first == it.second) // if no match was found.
            throw "trace::get_commentry:  Invalid communicator index."; /* OUR EXCEPTION */
        commmap_t::iterator ele = it.second;
        --ele;
        return ele->second;
}
  1. If we run the command "dumpi2veft -i dumpi-2017.11.22.12.28.10.meta -o TRACE.otf" the output is:
    DEBUG: trace::handle_comm_free comm=4
    DEBUG: success
    DEBUG: trace::handle_comm_free comm=5
    Error: trace::get_commentry: Invalid communicator index.

  2. I see that comm 4 is freed, whereas comm=5 is not.

Anyone know what may be happening. Any solution? suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant