Skip to content

Commit

Permalink
Enable grpc as an alternative to DBus for communications
Browse files Browse the repository at this point in the history
- First pass at creating a prototype buffer definition
  for DBus methods used by ServiceIOGroup
- First draft of GEOPM grpc service
- Generate protobuffer files with autogen.sh
- Add some configure scripts for grpc.
- Add grpc requirements to spec file
- Add protoc-gen.sh to tarball
- Fixes geopm#2775
- Use LOCAL_TCP not UDS because python support server context in UDS is
  limited
- Abstract use of GLib and posix pid interfaces. This will enable network
  peer ID to be used in place of PID for tracking clients
- Derive the client_id from the gRPC server context: use number
  in peer name following last colon
- TODO: Write unit tests
- TODO: Run integration tests for controls
- TODO: Add documentation
- TODO: Implement session closure when client connection ends
- Get working in containers
- Add k8 manifest and Dockerfile
- TODO: Dockerfile currently points to my debug builds
- Switch back to user provided credentials
  + Although we are able to get the peer name from the context
    it is not really possible to go from network peer to linux PID
  + We will have to switch back to a UDS approach and implement
    OpenSession and CloseSession RPCs in either C++ or golang
    where getting the UDS credentials from the server context is
    possible (cannot see a way in python)
- Switch back to UDS socket
- Add a seccomp for all discovered syscalls
- Disable PID tracking when running inside of a container
  + This is a stopgap solution.
  + Need to get credentials from UDS
  + More pressingly, we need to be able to convert between PID namespaces.
- Move seccomp files into container image
- Clean up client test
- Remove seccomp sections of manifest
- Add some documentation about the k8 files
- Remove unnessesary build requires from spec file
- Add rust proxy server to tranfer UDS credentials
- Forward requests to python based geopmd server
- Transfer UDS credentials through the SessionKey message
- Switch geopmdpy to use private port for gprc comms
- Remove use of google Empty protobuf
  + Cannot seem to properly import it into rust
- update .gitignore
- Add a mutex to protect the client object
- Add build scripts for rust
- Fix issue with stop batch (missing session key)
- Add a vendor archive to support rust build in obs
- Fixup protobuf deps
- Remove extra crate files from install
- Switch socket paths to end in ".sock" to make tonic happy
- Update proxy server to use correct pattern for UDS sockets based on tonic examplue
- Fix permissions on public socket
- Remove modification to geopm.seervice spec file:
  do not use grpc flag
- Got basic read test working with credential forwarding
- Update docker file to use Tumbleweed distro (required for latest Rust)
- Remove the seccomp files
- Do not seem to be required on the k8 system under test.
- May be missing system calls, only ran strace on one test.
- Add util-linux to Requires section of spec file
- Create the user and groups on the server node and share the PID namespace
- Get rid of known issue documentation (no longer a known issue)
- Restrict umask when creating secure UDS
- Add missing "not" in comment
- Add Header to CSV
- Add more documentation about the Kubernetes demo
- Clean up k8 documentation in README
- Add more information and links to experimental branch description
- Add control loop feature to cloud readme
- Add link to upstream issue in gRPC
- Update README now that work around for grpcio v1.30.2 is in place
- Merge k8 directory service readme and service Dockerfile from cloud branch
- Fixup control and rules files for grpc
- Sync grpc interface with app profiling api
- Add new build requirements to github workflow
- Add gRPC plumbing for PlatformRestoreControl
- Do not build cargo index in home
  + Build the index in $(abs_builddir) instead.
- Revert client_registry changes
- Disable array bounds checking due to issue with protoc generated code
  + protocolbuffers/protobuf#7140
- Switch geopm systemd service to using grpc in unit file
- Periodically close inactive sessions in the grpc server
- Change batch server from fork to subprocess
- Remove BatchServerTest entirely
  • Loading branch information
cmcantalupo committed Jul 19, 2024
1 parent 9308f17 commit afc8e4d
Show file tree
Hide file tree
Showing 34 changed files with 1,404 additions and 1,572 deletions.
5 changes: 3 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,12 @@ jobs:
CXX: ${{ matrix.config.cxx }}
FC: gfortran-9
F77: gfortran-9
CXXFLAGS: -Wno-array-bounds

steps:
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: install system dependencies
run: sudo apt-get update && sudo apt-get install libelf-dev mpich libmpich-dev libomp-15-dev libsystemd-dev liburing-dev gobject-introspection python3-gi python3-yaml libcap-dev zlib1g-dev doxygen graphviz
run: sudo apt-get update && sudo apt-get install libelf-dev mpich libmpich-dev libomp-15-dev libsystemd-dev liburing-dev gobject-introspection python3-gi python3-yaml libcap-dev zlib1g-dev doxygen graphviz cargo libgrpc++-dev libgrpc-dev libprotobuf-dev protobuf-compiler protobuf-compiler-grpc zstd
- name: install geopmpy and geopmdpy along with their development dependencies
run: |
python3 -m pip install --upgrade pip setuptools wheel pep517
Expand Down Expand Up @@ -151,7 +152,7 @@ jobs:
- name: show failure logs
if: ${{ failure() }}
run: |
cat ./*/test/test-suite.log || true
cat ./test-suite.log || true
cat integration/service/open_pbs/*.log || true
publish_obs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:

- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
- name: install system dependencies
run: sudo apt-get update && sudo apt-get install libelf-dev mpich libmpich-dev libomp-11-dev libsystemd-dev liburing-dev libgirepository1.0-dev libcap-dev zlib1g-dev
run: sudo apt-get update && sudo apt-get install libelf-dev mpich libmpich-dev libomp-11-dev libsystemd-dev liburing-dev libgirepository1.0-dev libcap-dev zlib1g-dev cargo libgrpc++-dev libgrpc-dev libprotobuf-dev protobuf-compiler protobuf-compiler-grpc zstd
- name: install geopmpy and geopmdpy python dependencies
run: |
python3 -m pip install --upgrade pip setuptools wheel pep517
Expand Down
4 changes: 3 additions & 1 deletion geopmdpy/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@
/geopmdpy.spec
/make_sdist.log
/geopmdpy-*/
/debian/changelog
/debian/changelog
/geopmdpy/geopm_service_pb2.py
/geopmdpy/geopm_service_pb2_grpc.py
4 changes: 3 additions & 1 deletion geopmdpy/MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
include debian/changelog
include debian/changelog
include geopmdpy/geopm_service_pb2.py
include geopmdpy/geopm_service_pb2_grpc.py
2 changes: 2 additions & 0 deletions geopmdpy/debian/control
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ Depends: python3 (>= 3.6~),
python3-jsonschema (>= 3.2.0),
python3-psutil (>= 5.8.0),
python3-setuptools (>= 53.0.0),
python3-grpcio (>=1.30.2),
python3-protobuf (>=3.12.4),
python3:any
Recommends: libgeopmd2 (= ${binary:Version}),
python3-geopmdpy-doc (= ${binary:Version})
Expand Down
2 changes: 2 additions & 0 deletions geopmdpy/geopmdpy.spec.in
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ Requires: python3-dasbus >= 1.6
Requires: python3-jsonschema
Requires: python3-psutil
Requires: python3-cffi
Requires: python3-grpcio
Requires: python3-protobuf
Requires: libgeopmd2 = %{version}
Recommends: python3-geopmdpy-doc

Expand Down
26 changes: 19 additions & 7 deletions geopmdpy/geopmdpy/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@
# SPDX-License-Identifier: BSD-3-Clause
#

import sys

from dasbus.loop import EventLoop
from dasbus.connection import SystemMessageBus
from signal import signal
from signal import SIGTERM

import sys
import os
from . import service
from . import system_files
from . import __version_str__
from . import grpc_service
from geopmdpy.restorable_file_writer import RestorableFileWriter

ALLOW_WRITES_PATH = '/sys/module/msr/parameters/allow_writes'
Expand All @@ -31,15 +35,9 @@ def stop():
if _loop is not None:
_loop.quit()


def main():
if len(sys.argv) > 1 and sys.argv[1] == '--version':
print(__version_str__)
return 0
def main_dbus():
signal(SIGTERM, term_handler)
global _bus, _loop
system_files.secure_make_dirs(system_files.GEOPM_SERVICE_RUN_PATH,
perm_mode=system_files.GEOPM_SERVICE_RUN_PATH_PERM)
_loop = EventLoop()
_bus = SystemMessageBus()
with RestorableFileWriter(
Expand All @@ -57,5 +55,19 @@ def main():
finally:
stop()

def main_grpc():
grpc_service.run()

def main():
if len(sys.argv) > 1 and sys.argv[1] == '--version':
print(__version_str__)
return 0
system_files.secure_make_dirs(system_files.GEOPM_SERVICE_RUN_PATH,
perm_mode=system_files.GEOPM_SERVICE_RUN_PATH_PERM)
if len(sys.argv) > 1 and sys.argv[1] == '--grpc':
main_grpc()
else:
main_dbus()

if __name__ == '__main__':
main()
184 changes: 184 additions & 0 deletions geopmdpy/geopmdpy/grpc_service.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
#
# Copyright (c) 2015 - 2022, Intel Corporation
# SPDX-License-Identifier: BSD-3-Clause
#

import os
import sys
import pwd
import grpc
import subprocess # nosec
from concurrent import futures
from . import geopm_service_pb2_grpc
from . import geopm_service_pb2
from . import service
from . import system_files

class GEOPMServiceProxy(geopm_service_pb2_grpc.GEOPMServiceServicer):
def __init__(self):
self._platform_service = service.PlatformService()
self._topo_service = service.TopoService()

def GetUserAccess(self, request, context):
client_id = self._get_client_id(request, context)
result = geopm_service_pb2.AccessLists()
signals, controls = self._platform_service.get_user_access(self._get_user(client_id), client_id)
for ss in signals:
result.signals.append(ss)
for cc in controls:
result.controls.append(cc)
return result

def GetSignalInfo(self, request, context):
result = geopm_service_pb2.SignalInfoList()
signal_info = self._platform_service.get_signal_info(request.names)
for si in signal_info:
element = geopm_service_pb2.SignalInfoList.SignalInfo()
element.name = si[0]
element.description = si[1]
element.domain = si[2]
element.aggregation = si[3]
element.format_type = si[4]
element.behavior = si[5]
result.list.append(element)
return result

def GetControlInfo(self, request, context):
result = geopm_service_pb2.ControlInfoList()
control_info = self._platform_service.get_control_info(request.names)
for ci in control_info:
element = geopm_service_pb2.ControlInfoList.ControlInfo()
element.name = ci[0]
element.description = ci[1]
element.domain = ci[2]
result.list.append(element)
return result

def StartBatch(self, request, context):
result = geopm_service_pb2.BatchKey()
client_id = self._get_client_id(request.session_key, context)
signal_config = []
if request.signal_config:
signal_config = [(int(sc.domain), int(sc.domain_idx), (str(sc.name)))
for sc in request.signal_config]
control_config = []
if request.control_config:
control_config = [(int(cc.domain), int(cc.domain_idx, str(cc.name)))
for cc in request.control_config]
server_pid, server_key = self._platform_service.start_batch(client_id,
signal_config,
control_config)
result.batch_pid = server_pid
result.shmem_key = server_key
return result

def StopBatch(self, request, context):
client_id = self._get_client_id(request.session_key, context)
self._platform_service.stop_batch(client_id, request.batch_key.batch_pid)
return geopm_service_pb2.Empty()

def ReadSignal(self, request, context):
result = geopm_service_pb2.Sample()
client_id = self._get_client_id(request.session_key, context)
platform_request = request.request
result.sample = self._platform_service.read_signal(client_id,
platform_request.name,
platform_request.domain,
platform_request.domain_idx)
return result

def WriteControl(self, request, context):
client_id = self._get_client_id(request.session_key, context)
platform_request = request.request
self._platform_service.write_control(client_id,
platform_request.name,
platform_request.domain,
platform_request.domain_idx,
request.setting)
return geopm_service_pb2.Empty()

def TopoGetCache(self, request, context):
result = geopm_service_pb2.TopoCache()
result.cache = self._topo_service.get_cache()
return result

def OpenSession(self, request, context):
result = geopm_service_pb2.SessionKey()
client_id = self._get_client_id(request, context)
self._platform_service.open_session(self._get_user(client_id), client_id)
result.name = request.name
return result

def CloseSession(self, request, context):
client_id = self._get_client_id(request, context)
self._platform_service.close_session(client_id)
return geopm_service_pb2.Empty()

def RestoreControl(self, request, context):
client_id = self._get_client_id(request, context)
self._platform_service.restore_control(client_id)
return geopm_service_pb2.Empty()

def StartProfile(self, request, context):
client_id = self._get_client_id(request.session_key, context)
profile_name = request.profile_name
self._platform_service.start_profile(self._get_user(client_id),
client_id, profile_name)
return geopm_service_pb2.Empty()

def StopProfile(self, request, context):
client_id = self._get_client_id(request.session_key, context)
region_names = list(request.region_names)
self._platform_service.stop_profile(client_id, region_names)
return geopm_service_pb2.Empty()

def GetProfilePids(self, request, context):
result = geopm_service_pb2.PidList()
client_id = self._get_client_id(request.session_key, context)
profile_name = request.profile_name
pids = self._platform_service.get_profile_pids(
self._get_user(client_id), profile_name)
for pid in pids:
result.pids.append(pid)
return result

def PopProfileRegionNames(self, request, context):
result = geopm_service_pb2.NameList;
profile_name = request.profile_name
client_id = self._get_client_id(request.session_key, context)
result = geopm_service_pb2.NameList()
names = self._platform_service.pop_profile_region_names(
self._get_user(client_id), profile_name)
for name in names:
result.names.append(name)
return result

def close_inactive_clients(self):
self._platform_service.close_inactive_clients()

def watch_interval(self):
return self._platform_service.watch_interval()

def _get_client_id(self, session_key, context):
pid_str = session_key.name.split(',')[1]
return int(pid_str)

def _get_user(self, client_id):
uid = os.stat(f'/proc/{client_id}/status').st_uid
return pwd.getpwuid(uid).pw_name
def run():
grpc_socket_path = os.path.join(system_files.GEOPM_SERVICE_RUN_PATH,
'grpc-private.sock')
server = grpc.server(futures.ThreadPoolExecutor(max_workers=1))
geopm_proxy = GEOPMServiceProxy()
geopm_service_pb2_grpc.add_GEOPMServiceServicer_to_server(geopm_proxy, server)
server_credentials = grpc.local_server_credentials(grpc.LocalConnectionType.UDS)
original_umask = os.umask(0o077)
server.add_secure_port(f'unix://{grpc_socket_path}', server_credentials)
os.umask(original_umask)
server.start()

with subprocess.Popen('geopmd-proxy') as proxy:
while server.wait_for_termination(geopm_proxy.watch_interval()):
geopm_proxy.close_inactive_clients()
proxy.terminate()
Loading

0 comments on commit afc8e4d

Please sign in to comment.