Skip to content

Comments

Refactor nodes package#640

Merged
javuto merged 9 commits intojmpsec:mainfrom
zhuoyuan-liu:node-cache
Apr 12, 2025
Merged

Refactor nodes package#640
javuto merged 9 commits intojmpsec:mainfrom
zhuoyuan-liu:node-cache

Conversation

@zhuoyuan-liu
Copy link
Contributor

@zhuoyuan-liu zhuoyuan-liu commented Apr 8, 2025

I’m in the middle of adding in-memory caching to the node packages, and I’ve noticed quite a few areas that could be improved along the way.

This also includes a bug fix related to how active nodes are identified. I would expect a longer time for this PR.


There is an import cycle:

pkg/nodes/node-cache.go imports pkg/cache
pkg/cache → pkg/types → pkg/queries → pkg/nodes

My idea is that the cache package should be generic and not import any other packages. The same applies to the types package, as it only stores some of the osquery data formats.

@javuto javuto added cache Cache related issues refactor Refactorization of code 🚧 bugfix Fix for an existing bug labels Apr 8, 2025
@javuto javuto requested review from Copilot and javuto April 9, 2025 16:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (9)

pkg/nodes/utils_test.go:70

  • The removal of cache key tests reduces coverage; ensure that caching functionality is either removed entirely from production or that new tests are added for any replacement functionality.
func TestCacheFullKey(t *testing.T) {

pkg/nodes/nodes.go:133

  • Replacing 'updated_at' with 'last_seen' in the active nodes query assumes that 'last_seen' is consistently updated. Please verify that this field correctly reflects the node's activity status.
if err := n.DB.Where(s+" = ?", selector).Where("last_seen > ?", time.Now().Add(time.Duration(hours)*time.Hour)).Find(&nodes).Error; err != nil {

pkg/nodes/nodes.go:137

  • Replacing 'updated_at' with 'last_seen' in the inactive nodes query relies on accurate maintenance of the 'last_seen' field. Ensure that this change aligns with the intended behavior.
if err := n.DB.Where(s+" = ?", selector).Where("last_seen < ?", time.Now().Add(time.Duration(hours)*time.Hour)).Find(&nodes).Error; err != nil {

pkg/nodes/nodes.go:155

  • The active nodes query in the Gets() function now uses 'last_seen'; verify that this change correctly captures the intended active records.
if err := n.DB.Where("last_seen > ?", time.Now().Add(time.Duration(hours)*time.Hour)).Find(&nodes).Error; err != nil {

pkg/nodes/nodes.go:159

  • The inactive nodes query in the Gets() function was updated to use 'last_seen'; ensure this condition accurately identifies inactive nodes.
if err := n.DB.Where("last_seen < ?", time.Now().Add(time.Duration(hours)*time.Hour)).Find(&nodes).Error; err != nil {

pkg/nodes/nodes.go:213

  • The GetStatsByEnv active count query now uses 'last_seen' instead of 'updated_at'; please confirm that 'last_seen' is the preferred field for node activity statistics.
if err := n.DB.Model(&OsqueryNode{}).Where("environment = ?", environment).Where("last_seen > ?", tHours).Count(&stats.Active).Error; err != nil {

pkg/nodes/nodes.go:216

  • Ensure that the inactive count query in GetStatsByEnv using 'last_seen' properly reflects the inactive nodes as intended.
if err := n.DB.Model(&OsqueryNode{}).Where("environment = ?", environment).Where("last_seen < ?", tHours).Count(&stats.Inactive).Error; err != nil {

pkg/nodes/nodes.go:229

  • In GetStatsByPlatform, the active query now uses 'last_seen'; please verify that this change correctly captures active nodes based on platform.
if err := n.DB.Model(&OsqueryNode{}).Where("platform = ?", platform).Where("last_seen > ?", tHours).Count(&stats.Active).Error; err != nil {

pkg/nodes/nodes.go:232

  • In GetStatsByPlatform, the inactive query now uses 'last_seen'; ensure this modification appropriately tracks inactive nodes.
if err := n.DB.Model(&OsqueryNode{}).Where("platform = ?", platform).Where("last_seen < ?", tHours).Count(&stats.Inactive).Error; err != nil {

@zhuoyuan-liu zhuoyuan-liu marked this pull request as ready for review April 10, 2025 13:30
Copy link
Contributor Author

@zhuoyuan-liu zhuoyuan-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javuto I'd like to get your thoughts on the next steps. If we decide to split the table, it would be a significant breaking change. In that case, we should also add more tests for the package.


// GetByKey to retrieve full node object from DB or cache, by node_key
// node_key is expected lowercase
func (n *NodeManager) GetByKey(nodekey string) (OsqueryNode, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to split the node table so that one of the table keep the node attributes like Node Key, ID, UUID, CPU etc.. and another table keep node status like last seen etc..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The good thing is that currently, all calling only used the node ID, UUID, and environment ID. So, everything should be good. @javuto Could you please double-check this part?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to cache the table, then we can explore splitting the table. Otherwise we can just cache a reduced struct with the fields that are used more often (node_key, ID, UUID, etc). Let's decide once we make changes to improve osctrl-admin and its UI.

"encoding/json"
"strconv"

"github.com/jmpsec/osctrl/pkg/queries"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the dependency for queries pkg.

"fmt"

redis "github.com/go-redis/redis/v8"
"github.com/jmpsec/osctrl/pkg/types"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the dependency for types pkg. I think the redis instance has never been used in the code. But let's keep them for now.

@zhuoyuan-liu zhuoyuan-liu requested a review from Copilot April 11, 2025 06:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

pkg/nodes/nodes.go:139

  • In the GetBySelector function, the column variable is reset to "platform" regardless of the selector type, which overrides a proper assignment for the "environment" case. Remove or adjust this assignment to preserve the intended column mapping based on stype.
column = "platform"

Fix for inconsistent casing

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@javuto javuto merged commit ebcf428 into jmpsec:main Apr 12, 2025
25 checks passed
// ActiveTimeCutoff returns the cutoff time for active nodes
// based on the specified number of hours
func ActiveTimeCutoff(hours int64) time.Time {
return timeNow().Add(-time.Duration(hours) * time.Hour)
Copy link
Contributor Author

@zhuoyuan-liu zhuoyuan-liu Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @javuto , I forgot to mention that InactiveHours needs to be set to a positive value to make it more intuitive. Otherwise, all nodes will be marked as inactive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🚧 bugfix Fix for an existing bug cache Cache related issues refactor Refactorization of code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants