VC usage (activity) reports #2073

scarlett2018 · 2019-01-24T11:33:16Z

As we are suggesting to use VC to organize functional group teams, the operation team needs to know how well users/jobs in each VC is.
i.e. need a daily/weekly/monthly report for each team's VC usage, including jobs, gpu, memory.
i.e. need to know the individual's activity in a VC. who use the most resources in the vc for example
[To be evaluated the value ] ~~i.e. need alert and email notification for VC loads~~
i.e. would like to have report summary send to admin through email

xudifsd · 2019-03-25T10:01:43Z

Do we need to alert on VC loads? I think this is kind of abuse the alert email, since this should a normal situation, and admin can do nothing about it.

Maybe we can use power BI to implement the report, it's not possible to send report via alert rule, and if we want to generate a report we can write another services, but I think BI maybe more appropriate.

xudifsd · 2019-03-25T10:07:21Z

Two problems with monthly report:

prometheus currently only retain 15 days data
we do not save prometheus data in host path, so all data will be lost in prometheus redeployment

We can prolong the retention date but may incur too much disk consumption, will need to investigate.

scarlett2018 · 2019-03-25T11:08:26Z

Do we need to alert on VC loads? I think this is kind of abuse the alert email, since this should a normal situation, and admin can do nothing about it.

Maybe we can use power BI to implement the report, it's not possible to send report via alert rule, and if we want to generate a report we can write another services, but I think BI maybe more appropriate.

Sounds reasonable, just updated the original request and mark that item as to be evaluated. Let's pending that one, and gather more feedbacks about whether it is needed.

scarlett2018 · 2019-03-25T11:10:47Z

Two problems with monthly report:

prometheus currently only retain 15 days data

we do not save prometheus data in host path, so all data will be lost in prometheus redeployment

We can prolong the retention date but may incur too much disk consumption, will need to investigate.

All good pionts, my quick thinking is we should figure out a way to persist usage and log related histories. Please investigate and get @fanyangCS and @sterowang 's technical suggestions accordingly.

xudifsd · 2019-03-26T04:10:39Z

user/vc's resource usage can be scrapped from yarn's api /ws/v1/cluster/apps, it has application's resource usage info and final status. But this API will only retain specific number of entries, like 1000 entries, so we need another service to periodically get this info and persist somewhere for later report use.

xudifsd · 2019-03-26T08:12:04Z

/ws/v1/cluster/scheduler API from yarn already have info we need from ws/v1/cluster/apps, and easier to get, we can implement by scrapping /ws/v1/cluster/scheduler API in yarn-exporter and show the usage graph in Grafana.

scarlett2018 added C-MII C-DLTS ops-opt PAI-Exp labels Jan 24, 2019

scarlett2018 removed the C-DLTS label Jan 25, 2019

scarlett2018 added system and removed PAI-Exp labels Mar 19, 2019

scarlett2018 added this to the End April Release milestone Mar 25, 2019

scarlett2018 mentioned this issue Mar 25, 2019

End May 2019 Release Plan #2386

Closed

4 tasks

scarlett2018 added the investigation label Mar 25, 2019

xudifsd mentioned this issue Mar 26, 2019

Need job usage report #2127

Open

scarlett2018 assigned xudifsd Apr 2, 2019

xudifsd mentioned this issue Apr 10, 2019

add script to generate reports for OpenPai cluster #2507

Merged

scarlett2018 closed this as completed Jun 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VC usage (activity) reports #2073

VC usage (activity) reports #2073

scarlett2018 commented Jan 24, 2019 •

edited

Loading

xudifsd commented Mar 25, 2019

xudifsd commented Mar 25, 2019 •

edited

Loading

scarlett2018 commented Mar 25, 2019

scarlett2018 commented Mar 25, 2019

xudifsd commented Mar 26, 2019

xudifsd commented Mar 26, 2019

VC usage (activity) reports #2073

VC usage (activity) reports #2073

Comments

scarlett2018 commented Jan 24, 2019 • edited Loading

xudifsd commented Mar 25, 2019

xudifsd commented Mar 25, 2019 • edited Loading

scarlett2018 commented Mar 25, 2019

scarlett2018 commented Mar 25, 2019

xudifsd commented Mar 26, 2019

xudifsd commented Mar 26, 2019

scarlett2018 commented Jan 24, 2019 •

edited

Loading

xudifsd commented Mar 25, 2019 •

edited

Loading