Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(webui): Support viewing search results in context for JSON logs (clp-json). #596

Merged
merged 48 commits into from
Nov 21, 2024

Conversation

haiqi96
Copy link
Contributor

@haiqi96 haiqi96 commented Nov 19, 2024

Description

The existing implementation of webui only supports viewing unstructured logs. This PR supports viewing JSON logs in the webui.

With refactor in #584, JSON and IR log viewing share a large portion of common code, except two differences:

  1. The streamId between Json and IR are different. IR's stream id is orig_file_id, whereas JSON's streamId is archive_id.
  2. The job_config and Job type to be stored in the database are different.

This PR replaces the original IR extraction methods with stream extraction methods which can support both JSON and IR viewing. In addition to the original parameters, they take in an extra "JobType" to distinguish if the job is targeting JSON or IR.

Also fixes #608..

Validation performed

Tested package locally and verified that both CLP and CLP-S flow works

Summary by CodeRabbit

Release Notes

  • New Features

    • Updated job submission process to support extraction of both CLP IR and JSON line files.
    • Enhanced QueryStatus component with new job type handling and improved error messages.
    • Introduced new constants and functions in SearchResultsTable for better handling of storage types and URL generation.
    • Added configuration options for stream output size in the log viewer web UI.
    • New parameter for stream target uncompressed size added to settings.
  • Bug Fixes

    • Improved error handling for unsupported job types in query routes.
  • Documentation

    • Updated metadata handling and response structures to reflect new extraction types.
    • Clarified configuration comments related to stream file sizes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
components/log-viewer-webui/server/src/routes/query.js (3)

5-13: Enhance function documentation with return type

The JSDoc documentation is thorough but missing the @return annotation. Consider adding it to improve completeness.

Add this line to the JSDoc:

 * @param {int} sanitizedLogEventIdx
 * @throws {Error} if the extract stream job submission fails.
+ * @return {Promise<void>}
 */

20-34: Consider separate size configurations for IR and JSON

Currently, both IR and JSON extractions use the same streamTargetUncompressedSize configuration. These might benefit from separate configurations due to their different nature and typical sizes.

Consider using distinct configuration values:

-    const streamTargetUncompressedSize = settings.StreamTargetUncompressedSize;
+    const streamTargetUncompressedSize = extractJobType === QUERY_JOB_TYPE.EXTRACT_IR
+        ? settings.IrStreamTargetUncompressedSize
+        : settings.JsonStreamTargetUncompressedSize;

85-86: Standardize error message format

The error message format differs from other error messages in the codebase. Consider using a consistent template.

-                const err = new Error("Unable to find the metadata of extracted stream with " +
-                    `streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
+                const err = new Error(
+                    `Failed to find extracted stream metadata (streamId: ${streamId}, logEventIdx: ${sanitizedLogEventIdx})`
+                );
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ce1acdb and b631316.

📒 Files selected for processing (1)
  • components/log-viewer-webui/server/src/routes/query.js (2 hunks)
🔇 Additional comments (1)
components/log-viewer-webui/server/src/routes/query.js (1)

64-66: ⚠️ Potential issue

Input validation still needed for request parameters

The previous review comment about input validation remains valid. The code should validate all input parameters before processing.

As suggested in the previous review, add comprehensive validation:

 const {extractJobType, logEventIdx, streamId} = req.body;
+if (!streamId || typeof streamId !== 'string') {
+    const err = new Error('Invalid streamId');
+    err.statusCode = 400;
+    throw err;
+}
+
+if (!extractJobType || !Object.values(QUERY_JOB_TYPE).includes(extractJobType)) {
+    const err = new Error(`Invalid extractJobType: ${extractJobType}`);
+    err.statusCode = 400;
+    throw err;
+}
+
 const sanitizedLogEventIdx = Number(logEventIdx);
+if (isNaN(sanitizedLogEventIdx) || sanitizedLogEventIdx < 0) {
+    const err = new Error('Invalid logEventIdx');
+    err.statusCode = 400;
+    throw err;
+}

@haiqi96 haiqi96 requested a review from junhaoliao November 20, 2024 02:10
Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good: Passing the StreamTargetUncompressedSize from the package config to the server/settings.json with a fallback value

@@ -86,7 +86,7 @@
#stream_output:
# directory: "var/data/streams"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we check if issue #608 is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. pushed a fix

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let also mention fixes #608 in the PR description to link the issue (or we can do it in the commit message body if you see fit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for providing a suboptimal refactoring suggestion previously. Let's see if the new proposal makes sense.

Comment on lines 73 to 78
await submitAndWaitForExtractStreamJob(
fastify,
extractJobType,
streamId,
sanitizedLogEventIdx
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so sorry for sharing a not optimal refactoring technique previously. Instead of creating a submitAndWaitForExtractStreamJob helper in this file, can we move the job config construction logics into the DbManager's submitAndWaitForExtractStreamJob method?

Suggested change
await submitAndWaitForExtractStreamJob(
fastify,
extractJobType,
streamId,
sanitizedLogEventIdx
);
const extractResult = await fastify.dbManager.submitAndWaitForExtractStreamJob({
jobType: extractJobType,
logEventIdx: sanitizedLogEventIdx,
streamId: streamId,
targetUncompressedSize: settings.StreamTargetUncompressedSize,
});
if (null === extractResult) {
const err = new Error("Unable to extract stream with " +
`streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
err.statusCode = 400;
throw err;
}

Comment on lines 3 to 54


/**
* Submits a stream extraction job with the given parameters and waits for it.
*
* @param {import("fastify").FastifyInstance | {dbManager: DbManager}} fastify
* @param {QUERY_JOB_TYPE} extractJobType
* @param {string} streamId
* @param {int} sanitizedLogEventIdx
* @throws {Error} if the extract stream job submission fails.
*/
const submitAndWaitForExtractStreamJob = async (
fastify,
extractJobType,
streamId,
sanitizedLogEventIdx
) => {
let jobConfig;
const streamTargetUncompressedSize = settings.StreamTargetUncompressedSize;
if (QUERY_JOB_TYPE.EXTRACT_IR === extractJobType) {
jobConfig = {
file_split_id: null,
msg_ix: sanitizedLogEventIdx,
orig_file_id: streamId,
target_uncompressed_size: streamTargetUncompressedSize,
};
} else if (QUERY_JOB_TYPE.EXTRACT_JSON === extractJobType) {
jobConfig = {
archive_id: streamId,
target_chunk_size: streamTargetUncompressedSize,
};
}

if (null === jobConfig) {
const err = new Error(`Unsupported Job type: ${extractJobType}`);
err.statusCode = 400;
throw err;
}

const extractResult = await fastify.dbManager.submitAndWaitForExtractStreamJob(
jobConfig,
extractJobType
);

if (null === extractResult) {
const err = new Error("Unable to extract stream with " +
`streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);

err.statusCode = 400;
throw err;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree, the logics in this helper can be moved into dbManager.submitAndWaitForExtractStreamJob. Let's see the suggestion there.

Suggested change
/**
* Submits a stream extraction job with the given parameters and waits for it.
*
* @param {import("fastify").FastifyInstance | {dbManager: DbManager}} fastify
* @param {QUERY_JOB_TYPE} extractJobType
* @param {string} streamId
* @param {int} sanitizedLogEventIdx
* @throws {Error} if the extract stream job submission fails.
*/
const submitAndWaitForExtractStreamJob = async (
fastify,
extractJobType,
streamId,
sanitizedLogEventIdx
) => {
let jobConfig;
const streamTargetUncompressedSize = settings.StreamTargetUncompressedSize;
if (QUERY_JOB_TYPE.EXTRACT_IR === extractJobType) {
jobConfig = {
file_split_id: null,
msg_ix: sanitizedLogEventIdx,
orig_file_id: streamId,
target_uncompressed_size: streamTargetUncompressedSize,
};
} else if (QUERY_JOB_TYPE.EXTRACT_JSON === extractJobType) {
jobConfig = {
archive_id: streamId,
target_chunk_size: streamTargetUncompressedSize,
};
}
if (null === jobConfig) {
const err = new Error(`Unsupported Job type: ${extractJobType}`);
err.statusCode = 400;
throw err;
}
const extractResult = await fastify.dbManager.submitAndWaitForExtractStreamJob(
jobConfig,
extractJobType
);
if (null === extractResult) {
const err = new Error("Unable to extract stream with " +
`streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
err.statusCode = 400;
throw err;
}
};

* @return {Promise<number|null>} The ID of the job or null if an error occurred.
*/
async submitAndWaitForExtractIrJob (jobConfig) {
async submitAndWaitForExtractStreamJob (jobConfig, jobType) {
let jobId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let jobId;
let jobConfig;
if (QUERY_JOB_TYPE.EXTRACT_IR === jobType) {
jobConfig = {
file_split_id: null,
msg_ix: logEventIdx,
orig_file_id: streamId,
target_uncompressed_size: targetUncompressedSize,
};
} else if (QUERY_JOB_TYPE.EXTRACT_JSON === jobType) {
jobConfig = {
archive_id: streamId,
target_chunk_size: targetUncompressedSize,
};
}
let jobId;

Comment on lines 17 to 19
fastify.post("/query/extract-stream", async (req, resp) => {
const {extractJobType, logEventIdx, streamId} = req.body;
const sanitizedLogEventIdx = Number(logEventIdx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, for extractJobType and streamId.
Let's add this list, to be exported, directly below QUERY_JOB_TYPE

/**
 * List of valid extract job types.
 */
const EXTRACT_JOB_TYPES = Object.freeze([
    QUERY_JOB_TYPE.EXTRACT_IR,
    QUERY_JOB_TYPE.EXTRACT_JSON,
]);

and add a check for extractJobType here. e.g.,

if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
    const err = new Error("Invalid extractJobType");
    err.statusCode = 400;
    throw err;
}

We also want to check / sanitize the streamId to avoid SQL injection attacks. @kirkrodrigues do you think we should check for the input parameter's length and limit the characters set? Or maybe checking it against a RegExp is efficient.

Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the Validation instructions and tested the code, which worked.
There's one suggestion about moving the extract job config construction logics into the db manager, and another one about checking / sanitizing the query server inputs; otherwise, the PR is good to be merged.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
components/log-viewer-webui/server/src/DbManager.js (1)

165-176: Consider renaming 'orig_file_id' to 'stream_id' in a future update.

Based on the previous discussion in PR comments, using 'orig_file_id' for both IR and JSON streams is a temporary solution. A future update should rename this field to 'stream_id' for better semantic clarity.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d06be72 and b9b8440.

📒 Files selected for processing (2)
  • components/log-viewer-webui/server/src/DbManager.js (4 hunks)
  • components/log-viewer-webui/server/src/routes/query.js (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • components/log-viewer-webui/server/src/routes/query.js
🔇 Additional comments (3)
components/log-viewer-webui/server/src/DbManager.js (3)

65-65: LGTM! Well-structured enum and constant additions.

The new enum value and constant array are properly implemented, maintaining consistency with the existing codebase structure.

Also applies to: 69-75


113-140: LGTM! Well-implemented job type handling.

The method properly differentiates between IR and JSON configurations while maintaining a clean interface.


271-272: LGTM! Appropriate exports added.

The new exports provide necessary access to the job type enums and constants.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
components/log-viewer-webui/server/src/routes/query.js (2)

29-57: Consider optimizing metadata retrieval flow

The code makes two separate calls to getExtractedStreamFileMetadata. Consider restructuring to avoid this duplication.

+        const getMetadata = async () => {
+            const metadata = await fastify.dbManager.getExtractedStreamFileMetadata(
+                streamId,
+                sanitizedLogEventIdx
+            );
+            if (metadata === null) {
+                resp.code(StatusCodes.BAD_REQUEST);
+                throw new Error("Unable to find the metadata of extracted stream with " +
+                    `streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
+            }
+            return metadata;
+        };
+
         let streamMetadata = await fastify.dbManager.getExtractedStreamFileMetadata(
             streamId,
             sanitizedLogEventIdx
         );

         if (null === streamMetadata) {
             const extractResult = await fastify.dbManager.submitAndWaitForExtractStreamJob(
                 extractJobType,
                 sanitizedLogEventIdx,
                 streamId,
                 settings.StreamTargetUncompressedSize,
             );

             if (null === extractResult) {
                 resp.code(StatusCodes.BAD_REQUEST);
                 throw new Error("Unable to extract stream with " +
                     `streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
             }

-            streamMetadata = await fastify.dbManager.getExtractedStreamFileMetadata(
-                streamId,
-                sanitizedLogEventIdx
-            );
-
-            if (null === streamMetadata) {
-                resp.code(StatusCodes.BAD_REQUEST);
-                throw new Error("Unable to find the metadata of extracted stream with " +
-                    `streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
-            }
+            streamMetadata = await getMetadata();
         }

60-60: Enhance response handling

Consider setting an explicit success status code and validating the response structure.

-        return streamMetadata;
+        resp.code(StatusCodes.OK);
+        return streamMetadata;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b9b8440 and 66e3466.

📒 Files selected for processing (1)
  • components/log-viewer-webui/server/src/routes/query.js (2 hunks)
🔇 Additional comments (1)
components/log-viewer-webui/server/src/routes/query.js (1)

3-3: Consider Node.js version compatibility

The with syntax for JSON imports is only available in newer Node.js versions. Please ensure your deployment environment supports this feature or consider using a more compatible import method.

✅ Verification successful

The with import syntax is supported as Node.js 22.4.0 is used for the log viewer webui

Based on the Taskfile.yml configuration, the log-viewer-webui component specifically uses Node.js v22.4.0 (which supports the with import syntax) for building and running. This is evident from the task configurations where nodejs-22 is used for the log-viewer-webui components.

Key findings:

  • The project explicitly uses Node.js v22.4.0 for log-viewer-webui components
  • The with import syntax is supported in Node.js since v20.10.0
  • The build system is properly configured to use the correct Node.js version
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if package.json specifies the required Node.js version
fd package.json --exec jq -r '.engines.node'

Length of output: 69


Script:

#!/bin/bash
# Check Node.js version requirements in package.json files
fd package.json --exec cat {} \; | grep -A 5 '"engines"'

# Check for any documentation about Node.js version requirements
fd -e md -e txt | xargs grep -l -i "node.*version\|prerequisites\|requirements" 2>/dev/null

# Look for any build or CI configuration files that might specify Node.js version
fd -e yml -e yaml | xargs grep -l "node-version\|nodejs" 2>/dev/null

Length of output: 636


Script:

#!/bin/bash
# Check the content of relevant CI and build files
cat Taskfile.yml lint-tasks.yml

# Check the content of relevant documentation files
cat docs/src/dev-guide/building-package.md

Length of output: 29999

Comment on lines 16 to 17
const {extractJobType, logEventIdx, streamId} = req.body;
const sanitizedLogEventIdx = Number(logEventIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for numeric conversion

The Number(logEventIdx) conversion should be validated to ensure it doesn't result in NaN or negative values.

-        const sanitizedLogEventIdx = Number(logEventIdx);
+        const sanitizedLogEventIdx = Number(logEventIdx);
+        if (Number.isNaN(sanitizedLogEventIdx) || sanitizedLogEventIdx < 0) {
+            resp.code(StatusCodes.BAD_REQUEST);
+            throw new Error("logEventIdx must be a non-negative number");
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const {extractJobType, logEventIdx, streamId} = req.body;
const sanitizedLogEventIdx = Number(logEventIdx);
const {extractJobType, logEventIdx, streamId} = req.body;
const sanitizedLogEventIdx = Number(logEventIdx);
if (Number.isNaN(sanitizedLogEventIdx) || sanitizedLogEventIdx < 0) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("logEventIdx must be a non-negative number");
}

Comment on lines 24 to 27
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
Copy link
Contributor

@coderabbitai coderabbitai bot Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance streamId validation

The current validation only checks for null. Consider adding checks for undefined, empty string, and string type.

-        if (null === streamId) {
+        if (!streamId || typeof streamId !== 'string' || streamId.trim() === '') {
             resp.code(StatusCodes.BAD_REQUEST);
-            throw new Error("streamId must not be null");
+            throw new Error("streamId must be a non-empty string");
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
if (!streamId || typeof streamId !== 'string' || streamId.trim() === '') {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must be a non-empty string");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel what rabbit makes sense. Shall I just commit suggestions? @junhaoliao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Except that we do not need to check !streamId because typeof streamId !== "string" should have handled that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
if (typeof streamId !== "string" || 0 === streamId.trim().length) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error(`"streamId" must be a non-empty string.`);
}

@haiqi96 haiqi96 requested a review from junhaoliao November 21, 2024 18:57
Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamOutput.directory default value change looks good.

Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in DbManager looks good.

Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good except one suggestion in a method signature.

Comment on lines 24 to 27
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Except that we do not need to check !streamId because typeof streamId !== "string" should have handled that.

Comment on lines 121 to 126
async submitAndWaitForExtractStreamJob (
jobType,
logEventIdx,
streamId,
targetUncompressedSize
) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple parameters to a single function, it's be better to use the object destructuring syntax:

Suggested change
async submitAndWaitForExtractStreamJob (
jobType,
logEventIdx,
streamId,
targetUncompressedSize
) {
async submitAndWaitForExtractStreamJob ({
jobType,
logEventIdx,
streamId,
targetUncompressedSize
}) {

The JsDoc can be updated as:

     * @param {object} props
     * @param {number} props.jobType
     * @param {number} props.logEventIdx
     * @param {string} props.streamId
     * @param {number} props.targetUncompressedSize

Then the method call can be modified as:

            const extractResult = await fastify.dbManager.submitAndWaitForExtractStreamJob({
                jobType: extractJobType,
                logEventIdx: sanitizedLogEventIdx,
                streamId: streamId,
                targetUncompressedSize: settings.StreamTargetUncompressedSize,
            });

which looks more readable. What do you think?

Comment on lines 271 to 272
export {EXTRACT_JOB_TYPES};
export {QUERY_JOB_TYPE};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export {EXTRACT_JOB_TYPES};
export {QUERY_JOB_TYPE};
export {
EXTRACT_JOB_TYPES,
QUERY_JOB_TYPE,
};

Comment on lines 24 to 27
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
if (typeof streamId !== "string" || 0 === streamId.trim().length) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error(`"streamId" must be a non-empty string.`);
}

Comment on lines 19 to 22
if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("Invalid extractJobType");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("Invalid extractJobType");
}
if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error(`Invalid extractJobType="${extractJobType}".`);
}

fastify.post("/query/extract-ir", async (req, resp) => {
const {origFileId, logEventIdx} = req.body;
fastify.post("/query/extract-stream", async (req, resp) => {
const {extractJobType, logEventIdx, streamId} = req.body;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the parameter validation checks directly below this line.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
components/log-viewer-webui/server/src/routes/query.js (3)

Line range hint 6-14: Enhance route documentation

The JSDoc comment should include details about request parameters and possible error responses.

 /**
  * Creates query routes.
  *
+ * @param {Object} req.body
+ * @param {string} req.body.extractJobType - The type of extraction job (must be in EXTRACT_JOB_TYPES)
+ * @param {string} req.body.streamId - The ID of the stream to extract
+ * @param {number} req.body.logEventIdx - The index of the log event
+ * @throws {Error} When parameters are invalid or extraction fails
  * @param {import("fastify").FastifyInstance | {dbManager: DbManager}} fastify
  * @param {import("fastify").FastifyPluginOptions} options
  * @return {Promise<void>}
  */

42-44: Add logging for extraction failures

Consider logging extraction failures to help with debugging and monitoring.

+                fastify.log.error(`Stream extraction failed for streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
                 resp.code(StatusCodes.BAD_REQUEST);
                 throw new Error("Unable to extract stream with " +
                     `streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);

59-59: Consider using a consistent response structure

For consistency across endpoints, consider wrapping the response in a standard structure.

-        return streamMetadata;
+        return {
+            success: true,
+            data: streamMetadata
+        };
components/log-viewer-webui/server/src/DbManager.js (2)

69-75: Enhance constant documentation

Consider adding more descriptive JSDoc comments to explain the purpose and usage of EXTRACT_JOB_TYPES. For example:

 /**
- * List of valid extract job types.
+ * An immutable array containing valid job types for stream extraction operations.
+ * Used for validation and type checking in stream extraction related operations.
+ * @type {ReadonlyArray<number>}
  */

172-177: Track technical debt: field naming in MongoDB

As discussed in previous reviews, orig_file_id is being used for both IR and JSON streams. Consider creating a technical debt ticket to standardize this field to stream_id in a future release.

Would you like me to create a GitHub issue to track this technical debt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 66e3466 and a897682.

📒 Files selected for processing (2)
  • components/log-viewer-webui/server/src/DbManager.js (4 hunks)
  • components/log-viewer-webui/server/src/routes/query.js (2 hunks)
🔇 Additional comments (4)
components/log-viewer-webui/server/src/routes/query.js (2)

15-60: Implementation aligns well with requirements

The route handler successfully implements the stream extraction functionality with proper validation and error handling. The code is well-structured and maintainable.


3-3: Verify JSON import assertion compatibility

The JSON import assertion syntax (with {type: "json"}) is a newer feature. Ensure your deployment environment supports this syntax or consider providing a fallback.

✅ Verification successful

JSON import assertions are properly supported

The project's configuration in log-viewer-webui-server/package.json explicitly supports JSON import assertions through the @babel/plugin-syntax-import-attributes plugin, and the project is configured as an ES module with "type": "module". The syntax is safe to use in this environment.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check package.json for Node.js version requirements
rg -A 5 '"engines"' package.json

Length of output: 87


Script:

#!/bin/bash
# Search for package.json files in the repository
fd -t f "package.json" -x cat {}

Length of output: 5779

components/log-viewer-webui/server/src/DbManager.js (2)

65-65: LGTM: Enum value addition follows established pattern

The addition of EXTRACT_JSON to QUERY_JOB_TYPE maintains consistency with the existing enum structure.


271-274: LGTM: Exports follow best practices

The addition of named exports for EXTRACT_JOB_TYPES and QUERY_JOB_TYPE is appropriate and follows JavaScript best practices.


let irMetadata = await fastify.dbManager.getExtractedIrFileMetadata(
origFileId,
const sanitizedLogEventIdx = Number(logEventIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for logEventIdx parameter

The logEventIdx parameter should be validated before conversion to ensure it's defined.

+        if (undefined === logEventIdx) {
+            resp.code(StatusCodes.BAD_REQUEST);
+            throw new Error("\"logEventIdx\" must be provided.");
+        }
         const sanitizedLogEventIdx = Number(logEventIdx);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sanitizedLogEventIdx = Number(logEventIdx);
if (undefined === logEventIdx) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("\"logEventIdx\" must be provided.");
}
const sanitizedLogEventIdx = Number(logEventIdx);

Comment on lines +121 to +140
async submitAndWaitForExtractStreamJob ({
jobType,
logEventIdx,
streamId,
targetUncompressedSize,
}) {
let jobConfig;
if (QUERY_JOB_TYPE.EXTRACT_IR === jobType) {
jobConfig = {
file_split_id: null,
msg_ix: logEventIdx,
orig_file_id: streamId,
target_uncompressed_size: targetUncompressedSize,
};
} else if (QUERY_JOB_TYPE.EXTRACT_JSON === jobType) {
jobConfig = {
archive_id: streamId,
target_chunk_size: targetUncompressedSize,
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for job type parameter

The method should validate that the provided jobType is one of the valid extract job types defined in EXTRACT_JOB_TYPES.

 async submitAndWaitForExtractStreamJob ({
     jobType,
     logEventIdx,
     streamId,
     targetUncompressedSize,
 }) {
+    if (!EXTRACT_JOB_TYPES.includes(jobType)) {
+        throw new Error(`Invalid job type: ${jobType}. Expected one of: ${EXTRACT_JOB_TYPES}`);
+    }
     let jobConfig;
     if (QUERY_JOB_TYPE.EXTRACT_IR === jobType) {

Committable suggestion skipped: line range outside the PR's diff.

@haiqi96 haiqi96 requested a review from junhaoliao November 21, 2024 21:36
Copy link
Collaborator

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR tile, how about

feat(webui): Support viewing search results in context for JSON logs (clp-json).

@haiqi96 haiqi96 changed the title feat(clp-package): Support JSON log viewing in webui feat(webui): Support viewing search results in context for JSON logs (clp-json). Nov 21, 2024
@haiqi96 haiqi96 merged commit 7d85ad3 into y-scope:main Nov 21, 2024
6 checks passed
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
@haiqi96 haiqi96 deleted the webuiLogEventIdx branch December 6, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants