Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OP-TEE Benchmark #79

Merged
merged 1 commit into from
May 30, 2017
Merged

OP-TEE Benchmark #79

merged 1 commit into from
May 30, 2017

Conversation

igoropaniuk
Copy link
Contributor

@igoropaniuk igoropaniuk commented Feb 22, 2017

struct statistics *bench_stat_buf;

#ifdef CFG_TEE_BENCHMARK
static const TEEC_UUID sta_benchmark_uuid = BENCHMARK_STA_UUID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: rather user pta_xxx than sta_xxx. "static TA" is a deprecated reference.
here and there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

double delta = x - s->m;

s->n++;
s->m += delta/s->n;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: spaces around /

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed all statistics handling code

offset = (off_t)hw_addr % getpagesize();
page_addr = (off_t)(hw_addr - offset);

hw_addr = (intptr_t *)mmap(0, 4, PROT_READ|PROT_WRITE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should check that mmap() succeeded.

@@ -553,6 +552,11 @@ TEEC_Result TEEC_InvokeCommand(TEEC_Session *session, uint32_t cmd_id,
goto out;
}

if ((cmd_id & BENCHMARK_CMD(0)) != BENCHMARK_CMD(0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expects the 0xFA19xxxx IDs are reserved to the benchmark pseudo TA. This is not fair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but the only one alternative(IMHO) is to hijack(how it's done for benchmark buf) session_id (opened to Benchmark PTA) and store it in all OP_TEE layers
If it's definitely better solution, I'll use it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Now timestamp buffer is registered via RPC call from SW

struct statistics {
int n;
double m;
double M2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: i would rather reserve the capital letters to macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

{
uintptr_t pc;
#ifdef __aarch64__
asm volatile("adr %0, ." : "=r"(pc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asm volatile("adr %0, ." : "=r"(pc)); should work for both AArch32 and AArch64.

@etienne-lms
Copy link
Contributor

This PR seems to deprecate #63. If so, could you close it ?

@igoropaniuk igoropaniuk changed the title [RFC] OP-TEE Benchmark OP-TEE Benchmark Apr 18, 2017
#define __TEEC_BENCHMARK_H

#include <linux/types.h>
#include <stdbool.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed here?

#include <linux/types.h>
#include <stdbool.h>
#include <stdio.h>
#include <sys/types.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

return ccounter * TEE_BENCH_DIVIDER;
}

/* Misc auxilary functions */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter now, but in the long run we'd better move it into teec_trace.c (and cleanup that file. I don't think the stuff in there is used any longer ... old ST-Ericsson code).

@@ -553,6 +553,12 @@ TEEC_Result TEEC_InvokeCommand(TEEC_Session *session, uint32_t cmd_id,
goto out;
}

if (!pthread_mutex_trylock(&teec_bench_mu)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trylock induce some memory barrier. If bench is disabled, these barriers are useless. Could we condition this to CFG_TEE_BENCHMARK ?


/* fill timestamp data */
if (cur_cpu >= bench_ts_global->cores)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore cpu affinity before returning.


/* set back affinity mask */
ret = sched_setaffinity(0, sizeof(cpu_set_old), &cpu_set_old);
if (ret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can void the return value.


#define PTA_BENCHMARK_UUID \
{ 0x0b9a63b0, 0xb4c6, 0x4c85, \
{ 0xa2, 0x84, 0xa2, 0x28, 0xef, 0x54, 0x7b, 0x4e } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: could move to tee_bench.h next to pTA command IDs definition.

TEEC_Operation op = { 0 };
uint32_t ret_orig;

benchmark_pta_open();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, the command should not be invoked.


res = TEEC_InvokeCommand(&bench_sess, BENCHMARK_CMD_GET_MEMREF,
&op, &ret_orig);
tee_check_res(res, "TEEC_InvokeCommand");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, *paddr_ts_buf should not be filled.

int devmem;
off_t offset = 0;
off_t page_addr;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: empty line

hw_addr = (intptr_t *)mmap(0, 4, PROT_READ|PROT_WRITE,
MAP_SHARED, devmem, page_addr);
if (hw_addr == MAP_FAILED)
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: return NULL

cpu_buf->stamps[ts_i & TEE_BENCH_MAX_MASK] = ts_data;

/* set back affinity mask */
ret = sched_setaffinity(0, sizeof(cpu_set_old), &cpu_set_old);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: drop return value.

/*
* Copyright (c) 2017, Linaro Limited
*
* This software is licensed under the terms of the GNU General Public
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer the BSD style licence.

#define BENCHMARK_CMD_GET_MEMREF BENCHMARK_CMD(2)
#define BENCHMARK_CMD_UNREGISTER BENCHMARK_CMD(3)

#define TEE_BENCH_DIVIDER 64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add same comment as in optee_os/core/.../bench.h:

/*
 * Cycle count divider is enabled (in PMCR), 
 * CCNT value is incremented every 64th clock cycle
 */

offset = (off_t)hw_addr % getpagesize();
page_addr = (off_t)(hw_addr - offset);

hw_addr = (intptr_t *)mmap(0, 4, PROT_READ|PROT_WRITE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request maps of only 4 bytes. Should better map up to cores field, then remap the full bench ts data structure.


intptr_t *hw_addr = (intptr_t *)paddr;

devmem = open("/dev/mem", O_RDWR | O_SYNC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely warn and exit if this fails. Maybe /dev/mem is not accessible.

op.paramTypes = TEEC_PARAM_TYPES(TEEC_VALUE_OUTPUT, TEEC_NONE,
TEEC_NONE, TEEC_NONE);

res = TEEC_InvokeCommand(&bench_sess, BENCHMARK_CMD_GET_MEMREF,
&op, &ret_orig);
tee_check_res(res, "TEEC_InvokeCommand");
if (res != TEEC_SUCCESS)
return res;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmark_pta_close();

#include <stdbool.h>
#include <tee_bench.h>

void bm_init_ts_buf(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: deprecated bm_init_ts_buf()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

sched_setaffinity(0, sizeof(cpu_set_old), &cpu_set_old);
}
#else /* CFG_TEE_BENCHMARK */
void bm_init_ts_buf(void) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: deprecated bm_init_ts_buf()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -553,6 +554,14 @@ TEEC_Result TEEC_InvokeCommand(TEEC_Session *session, uint32_t cmd_id,
goto out;
}

#ifdef CFG_TEE_BENCHMARK
if (!pthread_mutex_trylock(&teec_bench_mu)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lock is held, this will not get any timestamp. Is this expected ? (maybe this has already been discussed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full sequence (including lock and get-buffer) could be implemented inside bm_timestamp()`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

page_addr = (off_t)(hw_addr - offset);

hw_addr = (intptr_t *)mmap(0, size, PROT_READ|PROT_WRITE,
MAP_SHARED, devmem, page_addr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can (should) close(devmem); from here (after mmap() returns).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@igoropaniuk igoropaniuk force-pushed the optee_benchmark branch 2 times, most recently from 013d0dd to 8c567f6 Compare May 19, 2017 11:49
@igoropaniuk
Copy link
Contributor Author

Rebased on latest master and tested

{
uint64_t ts_buf_raw = 0;
uint64_t ts_buf_size = 0;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: empty line

ret_addr = __builtin_return_address(0);

cpu_buf = &bench_ts_global->cpu_buf[cur_cpu];
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: useless usage of __sync_fetch_and_add(), you old a lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@etienne-lms why useless?
we can be preempted here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your lock + cpu affinity prevent another thread from accessing the buffer.

Copy link
Contributor Author

@igoropaniuk igoropaniuk May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sched_setaffinity call just tells linux scheduler that this thread should be scheduled on specified core (or cores), it doesn't somehow affect preemption (if to compare with get_cpu()/put_cpu() in the linux kernel)

It's theoretically possible, that while running simple non-atomic cpu_buf->head++ thread will be preempted, and another one will use non-actual cpu-buf->head value. Or maybe I just didn't get what actually you meant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, understood. Thanks.

cpu_buf = &bench_ts_global->cpu_buf[cur_cpu];
ts_i = __sync_fetch_and_add(&cpu_buf->head, 1);
ts_data.cnt = read_ccounter();
ts_data.addr = (uintptr_t) ret_addr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: space between cast and cast'ed.


#include <inttypes.h>

#define UNUSED(x) (void)(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: i fear this may conflict with some other external definition of UNUSED().

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With minor style comment addressed or not: Reviewed-by: Etienne Carriere <[email protected]>

Makefile Outdated
@$(MAKE) --directory=libteec --no-print-directory --no-builtin-variables
@$(MAKE) --directory=libteec CFG_TEE_BENCHMARK=$(CFG_TEE_BENCHMARK) \
CFG_TEE_PRINT_LATENCY_STAT=$(CFG_TEE_PRINT_LATENCY_STAT) \
--no-print-directory --no-builtin-variables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor style: maybe insert an indentation in the 2 later line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Makefile Outdated
@@ -29,7 +29,9 @@ all: build install

build-libteec:
@echo "Building libteec.so"
@$(MAKE) --directory=libteec --no-print-directory --no-builtin-variables
@$(MAKE) --directory=libteec CFG_TEE_BENCHMARK=$(CFG_TEE_BENCHMARK) \
CFG_TEE_PRINT_LATENCY_STAT=$(CFG_TEE_PRINT_LATENCY_STAT) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CFG_TEE_PRINT_LATENCY_STAT seems not used. Maybe discard it until some code rely on its value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@igoropaniuk igoropaniuk force-pushed the optee_benchmark branch 2 times, most recently from 0258a56 to 113ef0d Compare May 23, 2017 10:55
@igoropaniuk
Copy link
Contributor Author

Rebased & applied tags

OP-TEE Benchmark feature provides timestamp data for the roundtrip time
from libteec to OP-TEE OS core.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Igor Opaniuk <[email protected]>
@jforissier jforissier merged commit 63106fb into OP-TEE:master May 30, 2017
@igoropaniuk igoropaniuk deleted the optee_benchmark branch July 4, 2017 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants