Skip to content

feat(websoc): implement related courses from section code range #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

aadi-shanker
Copy link

@aadi-shanker aadi-shanker commented Jan 31, 2025

Description

  • Added validation and transformation logic for the includeRelatedCourses flag in the websoc.ts schema. The flag now accepts "true" or "false" as strings, which are transformed into a boolean.
  • Introduced a bonus transformation for the courseIds flag in the same schema. courseIds is now accepted as a comma-separated string and preprocessed into an array of strings.
  • Updated the buildQuery function in services/websoc.ts to build conditions for related courses based on section code ranges. When includeRelatedCourses is enabled along with sectionCodes, a subquery fetches the related course IDs, and the query is constructed to filter based on these IDs.
  • Implemented custom error handling to return an error when both sectionCodes and courseIds are provided with includeRelatedCourses, since the flag is only intended to work with sectionCodes.

Related Issue

Completes #18

Motivation and Context

This change enhances the WebSoc API by providing a more flexible and user-friendly way to fetch related courses. The includeRelatedCourses flag now supports multiple boolean representations and ensures that related courses are dynamically fetched based on section code ranges. Additionally, the bonus handling for courseIds allows users to input a comma-separated string that is automatically transformed into an array, enforcing consistency and preventing ambiguous queries.

How Has This Been Tested?

Manually tested the REST and GraphQL endpoints using various scenarios:

  • Valid queries using only sectionCodes with includeRelatedCourses: "true".
  • Queries using only courseIds.
  • Queries that combine both sectionCodes and courseIds with includeRelatedCourses, which correctly trigger the error response.

Screenshots (if appropriate):

image image image image

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code involves a change to the database schema.
  • My code requires a change to the documentation.

@laggycomputer
Copy link
Member

Just confirmed; this PR as is does not fulfill the written feature request from the issue. Remember the flag was meant to set the following behavior:
If a section code filter is provided, and a section with a certain code is returned, then every other section from the same course must also be included. For example, this query is not behaving correctly:

query Websoc {
    websoc(
        query: {
            includeRelatedCourses: "true"
            year: "2025"
            quarter: Winter
            sectionCodes: "35610"
        }
    ) {
        schools {
            departments {
                courses {
                    courseNumber
                    sections {
                        sectionType
                        sectionCode
                    }
                }
            }
        }
    }
}

We would expect it to return every ICS 33 section (including the other lecture and the labs) this term, since ICS 33 is a "related course" to a section within the set of codes we specified. However it only returns the section we originally requested.

@ecxyzzy
Copy link
Member

ecxyzzy commented Feb 6, 2025

Also, please make sure to update the title of your PR to comply with Conventional Commits.

@aadi-shanker aadi-shanker changed the title [WebSoc] Related courses from section code range feat(websoc): implement related courses from section code range Feb 14, 2025
Copy link
Member

@laggycomputer laggycomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why the new courseIds parameter is necessary. It duplicates existing parameters and isn't documented so a user would have no idea what it does differently from the existing options or that it's only associated with related courses (and not meaningful otherwise) and mutually exclusive with sectionCodes. Remember that the spec says that you should be able to accomplish related courses without any additional parameters (or forced lack thereof).

You also accumulate SQL data into JS and then use it to request SQL again, which usually incurs excess computing, memory, and network cost. It can and should be done purely in SQL. Off the top of my head a self join or subquery can be effective.

@aadi-shanker
Copy link
Author

aadi-shanker commented Feb 15, 2025

I'm not sure why the new courseIds parameter is necessary. It duplicates existing parameters and isn't documented so a user would have no idea what it does differently from the existing options or that it's only associated with related courses (and not meaningful otherwise) and mutually exclusive with sectionCodes. Remember that the spec says that you should be able to accomplish related courses without any additional parameters (or forced lack thereof).

You also accumulate SQL data into JS and then use it to request SQL again, which usually incurs excess computing, memory, and network cost. It can and should be done purely in SQL. Off the top of my head a self join or subquery can be effective.

Hey Dante,

Thanks for the feedback. I introduced the courseIds parameter as a bonus of my implementation based on Andrew’s guidance.

  • It allows users to directly query for a list of course IDs (e.g. &courseIds=WOMNST187,WOMNST50C,WOMNST60B), so they can explicitly request all sections for specific courses.

  • It separates the concerns of query construction and execution—when sectionCodes are provided with includeRelatedCourses, a subquery is used to fetch the related course IDs and then those are passed into the main query.

I realize that in the current implementation the courseIds value gets overwritten when both sectionCodes and includeRelatedCourses are provided, which might seem confusing from a user’s perspective. Taking that into account I asked Andrew what would be the best approach and he said to either have impelementation to merge the courseIds or just let the user know includeRelatedCoures only works with section codes. I'll make sure to update the documentation to clarify that the courseIds parameter is intended for direct queries, and that when using includeRelatedCourses, only sectionCodes should be supplied.

Regarding the accumulation of SQL data in JavaScript before making another query: I understand the performance concern and the desire for a more SQL-centric solution. My current approach was designed to clearly separate query construction from execution. However, I'm open to refactoring the implementation to incorporate more logic directly in SQL—perhaps using a self-join or a subquery like you mentioned. Could you please share any specific suggestions or examples of how you'd like to see this implemented?

Let me know if this approach works or if further adjustments are needed.

@laggycomputer
Copy link
Member

laggycomputer commented Feb 16, 2025

Thanks for the clarification. I'd like to hear more of your approach and broader context, if you don't mind. Doing so would allow me to deeply understand and comment directly on the code.

It allows users to directly query for a list of course IDs

What about existing department and courseNumber parameters? It seems dissonant to have two ways to ask for sections by course. What have you and Andrew decided to do about this?

when sectionCodes are provided with includeRelatedCourses, a subquery is used to fetch the related course IDs and then those are passed into the main query

From what I see so far, the first DB query which is de-duplicated via Set (assuming that's what you meant) is not a sub-query. A sub-query is a specific concept in SQL where a SELECT statement appears in place of a reference to a table in a JOIN, SELECT FROM, etc.

It separates the concerns of query construction and execution

I agree that this is a good priority to have. I also think there may be a way to both have the cake and eat it by building conditions which also include a SQL sub-query or CTE when necessary. I argue that this would still be an acceptable separation of concerns since that expression isn't executed by the query building code.

either have impelementation to merge the courseIds or just let the user know includeRelatedCoures only works with section codes

The latter seems reasonable; to be clear I understand you want to make the inclusion of the sectionCodes parameter necessary and sufficient to allow includeRelatedCoures to be effective. I suggest you extend this to any query which does not include a filter by course or department, since I can imagine use cases where this might be useful e.g. asking "what are the sections of classes which have a section taught by this instructor?"

But "impelementation to merge the courseIds" confuses me; can you clarify?

Could you please share any specific suggestions or examples of how you'd like to see this implemented?

Of course. At a high level, you can reframe the buildQuery function into two broad cases:

First, when related courses functionality is not required; this case is the existing code from before this PR.

Second, when related courses functionality is needed; this is the new case. You have the choice of when this case should trigger (see above). In this second case, you should wrap the conditions from case 1 in a sub-query, CTE, etc. This inner expression should fetch according to the conditions from case 1, but SELECT only the course_id column. Note that this is different from your usage of the term "course ID" so far where you mean the human readable department and number. The course_id column is a UUID.
Finally, the return value of buildQuery in this case is an expression which requires that any returned row have a course_id which is in the expression you create.

The queries in these two cases might look like this:

-- case 1
SELECT something FROM websoc_section -- INNER JOIN something ON something... more joins, etc.
WHERE -- conditions
;
-- case 2
SELECT something FROM websoc_section -- INNER JOIN...
WHERE course_id IN (
    SELECT course_id FROM websoc_section -- INNER JOIN something...
    WHERE -- conditions
);

Let me know if you have questions on any of the above.
Thanks for your detailed responses to my review.

@aadi-shanker
Copy link
Author

aadi-shanker commented Feb 17, 2025

Thanks for the response Dante. Here some details answering your questions.

I'd like to hear more of your approach and broader context, if you don't mind. Doing so would allow me to deeply understand and comment directly on the code.

For sure, the basic approach for the step-through was:
User queries with includeRelatedCourses → run a separate JavaScript DB query to find related course IDs (which now I realize not a true subquery) → store those courseIds in input.courseIds → pass input.courseIds into buildQuery → execute the final SQL query to find all sections of those related courses.

Just wanted to make sure I fully understand your feedback. The issue seems to be that the "subquery" I described isn’t actually a true SQL subquery but rather a separate database query executed in JavaScript. If I’m understanding correctly, the expectation is to handle this logic within the main SQL query itself, rather than fetching courseIds separately in JavaScript and passing them into buildQuery. Andrew initially mentioned that while the initial approach avoided a single SQL query due to the complexity of architecting it, he agrees that moving everything into SQL would likely bring performance benefits. Given that, I’ll be looking into refactoring this into a fully SQL-based solution..

What about existing department and courseNumber parameters? It seems dissonant to have two ways to ask for sections by course. What have you and Andrew decided to do about this?

I hadn’t considered department and courseNumber when implementing courseIds. Andrew and I didn’t plan to change those parameters, so at first, it seemed like courseIds might not have a distinct purpose—especially since department + courseNumber already allow course lookups. However, after discussing this with Andrew, he pointed out that the existing combination of department and courseNumber only allows looking up one department at a time, whereas courseIds could provide additional flexibility. Since this is new functionality that doesn’t exist yet, we can choose to keep it. I think the key question now is how to handle cases where users provide conflicting parameters (courseIds vs. department + courseNumber). I’ll need to determine how to prioritize those conditions in buildQuery. Let me know if you have any preferences on that!

But "impelementation to merge the courseIds" confuses me; can you clarify?

Yeah, so the problem was when I had the three flags—courseIds, includeRelatedCourses, and sectionCodes—activated. It ignored whatever was inputted in courseIds, causing an ambiguity problem which Andrew adviced either needed a fix or atleast let the user know. Regarding the fix when I say "merge the courseIds," I was considering whether it would be more appropriate to take the existing courseIds from the input and combine them with the related course IDs retrieved from the query. Andrew let it up to me on what do regarding this specific ambeguity problem and I chose to just let the user know not to have all 3 of those flags active when making a query.

I am also assuming this problem would also change due to the change in subquery implementation that I need to refactor.

I had some quick follow up questions as well regarding this refactor:

  • Should the subquery (or CTE) be built using Drizzle’s TypeScript DSL, or would raw SQL be preferred for this part?

  • I talked to Andrew about the UUID vs. human-readable identifiers, and he mentioned that UUIDs should be used for joins in SQL because they offer performance benefits, while user-facing filters should avoid UUIDs since they are random and not user-friendly. I just want to confirm that this aligns with your expectations and that I’m understanding the trade-offs correctly.

  • Are there any specific examples or patterns from our codebase that you think would serve as a good reference for implementing this subquery or CTE approach?

I wanted to give you a heads-up that I should have the new implementation ready within the next week or so. I have my ICS 51 midterm this Wednesday, so I’m pretty packed with studying until then.

Thanks for your time and detailed feedback! I’ll work on the refactors and reach out if I have any questions.

@laggycomputer
Copy link
Member

User queries with includeRelatedCourses → run a separate JavaScript DB query to find related course IDs (which now I realize not a true subquery) → store those courseIds in input.courseIds → pass input.courseIds into buildQuery → execute the final SQL query to find all sections of those related courses.

That agrees with my understanding.

the expectation is to handle this logic within the main SQL query itself, rather than fetching courseIds separately in JavaScript and passing them into buildQuery.

It should be in SQL, not necessarily the last query you do or any other query you might call the "main" query. But it should not cross the SQL-JS barrier and go back in again without a very good reason (e.g. you've proven it's actually faster).

Given that, I’ll be looking into refactoring this into a fully SQL-based solution.

Excellent.

I’ll need to determine how to prioritize those conditions in buildQuery. Let me know if you have any preferences on that!

They are not mutually exclusive. Of course we are working under the SemVer constraint that we will have no breaking changes. If they are present together, I argue disjunction (logical OR) is most appropriate.

I am also assuming this problem would also change due to the change in subquery implementation that I need to refactor.

Your decision to disallow the presence of all three parameters is mostly a concern of your validation step. Your further code, including SQL queries, can assume that invariant is upheld where applicable. As particularly applied to SQL, it would probably *reduce* the amount of cases you have to consider when building queries.

Should the subquery (or CTE) be built using Drizzle’s TypeScript DSL, or would raw SQL be preferred for this part?

Prefer the ORM, since we have it. As I understand the problem statement this should be feasible unless you discover yet another uncovered corner case in Drizzle. If you have to use raw SQL, minimize its scope (e.g. if you can't express a single function, use raw SQL for only that function).

I just want to confirm that this aligns with your expectations and that I’m understanding the trade-offs correctly.

You should never attempt a join on the human-readable course IDs unless you have a very very good reason. You should also never expose the UUIDs. Your understanding is correct.

Are there any specific examples or patterns from our codebase that you think would serve as a good reference for implementing this subquery or CTE approach?

A CTE is used here to assemble an array of specializations for each major. Here, we SELECT from a CTE.

Two sub-queries are used here. The first is JOINed against to accomplish a complex filtering operation, forming the second. The second is SELECTed against to do a more simple filtering operation. Don't worry about the complexity, just understand what methods need to be called. After you create a CTE or sub-query, you can use it in place of a table identifier (mostly).

I wanted to give you a heads-up that I should have the new implementation ready within the next week or so. I have my ICS 51 midterm this Wednesday, so I’m pretty packed with studying until then.

No worries. I'm told this isn't urgent.

<!--- Provide a general summary of your changes in the Title above -->

## Description

Introduces a new "websocSectionEnrollmentLive" table to the schema,
purposed to keep track of updated enrollment data based on continuous
scrapes.

Updated the websoc scraper to populate new table every 5 minutes. The
table holds short term data, clearing out all entries older than an hour
every scrape.

Added REST endpoint functionality for the new table, which retrieves
enrollment snapshots based on a `since` timestamp.
  - The API returns:
- A "from" snapshot representing the latest known state before `since`
(if available).
    - A "to" snapshot representing the latest known state.
    - If no significant changes are detected, only "to" is returned.

## Related Issue

Closes [#59]

## Motivation and Context

Allows developers to create accurate applications of UCI's enrollment
information, helping student's quarterly enrollment processes go
smoother and be more communicative.

## How Has This Been Tested?

the schema and scraper were tested across numerous scapes in 5 minute
intervals, and the data was validated for consistency.

## Screenshots (if appropriate):

## Types of changes

<!--- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->

- [ ] Bug fix (non-breaking change which fixes an issue)
- [X] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)

## Checklist:

- [X] My code involves a change to the database schema.
- [ ] My code requires a change to the documentation.

---------

Co-authored-by: Sanskar Mishra <[email protected]>
Co-authored-by: Jordan <[email protected]>
Co-authored-by: Jordan Yee <[email protected]>
Co-authored-by: Dante Dam <[email protected]>
Co-authored-by: Andrew Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[WebSoc] Related courses from section code range
5 participants