Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-architect backend tornado http server. Adds supporting decoupled library packages. (Large PR) #80

Merged
merged 133 commits into from
Sep 27, 2021

Conversation

aaraney
Copy link
Member

@aaraney aaraney commented Aug 17, 2021

Large backend change. In summary, all tornado http handlers have been re-written. Handlers now:
validate input and output using pydantic, use hsclient to communicate with HydroShare when
required, enforce content-type headers, send meaningful http response codes, and rely on an
encrypted session cookie to validate and maintain login status. Filesystem state responsibility has
also been changes, now, a websocket handler pushes local HydroShare resource filesystem status
updates to the client.

Additions

  • handlers subpackage. WIP. In future, tornado handlers now in server.py will be modularized and move here. Currently, the get_route_handlers method is the only thing that has been ported from server.py.
  • lib package added with following:
    • filesystem package:

      • Class hierarchies (FSMap & FSResourceMap) that describe the relationships between hydroshare resources, resource files', and resource files' md5 checksum.
        • FSResourceMap: (RemoteFSResourceMap & LocalFSResourceMap are subclasses) encapsulate relationship between a single resource and the resource's files and file checksums.
        • FSMap: (RemoteFSMap & LocalFSMap are subclasses) encapsulate relationship between one or more resources and FSResourceMap instances.
        • AggregateFSMap: Encapsulate a RemoteFSMap & LocalFSMap instance and treat them as if they were a single instance (updates both).
          FSResourceMap_classes
          FSMap_classes
    • events package:

      • simple events library. Can subscribe, unsubscribe, and dispatch events.
    • resource_factories:

      • factory for downloading a HydroShare resource file or folder. Simplifies API and hides implementation details.
    • resource_strategies

      • Strategy pattern (download HydroShare resource file or folder) used by resource_factories
  • watchdog added as dependency. It is used to watch filesystem changes and relay those changes to a client via a websocket connection.
  • utilities subpackage. Adds pathlib utilities to simplify common multi-step operations.
  • models subpackage. Houses pydantic models used in software. Used by handlers to validate and marshal http handler inputs and outputs.
  • templates subpackage. Houses tornado html templates used in software.

Removals

Changes

  • Configuration file now uses standard config semantics (key, value pairs separated by '=') instead of Json. The config file is looked for in these locations in order: ~/.config/hydroshare_jupyter_sync/config, ~/.hydroshare_jupyter_sync_config. Currently supported config keys are DATA and LOG.
  • Use session mix-in classes to add additional functionality to base handler subclasses.
    • HeadersMixIn overrides set_default_headers in child classes. Child classes provide a tuple collection of header value pairs in a class variable, _custom_headers field.
    • SessionMixIn adds methods for reading session and interacting with session state. (i.e. get the hsclient.HydroShare for the current session.)
    • MutateSessionMixIn adds method to mutate (reset or replace w/ new instance) the global session struct object (contains hsclient.HydroShare session, session cookie, user id and username).
  • User HydroShare username and password no longer stored in a file. Each new server session will require the user to re-enter their credentials.
  • Post user login state is verified and maintained using an encrypted session cookie (cookie key is user). This cookie must be send along with each request to retrieve a <300 response.
  • Tornado HTTP Handlers now validate input and output using schemas.
  • hsclient is now used to sign-in and interact with HydroShare (check resource remote state, download, and upload resources).
  • Endpoint changes:
    • /syncApi/ws: websocket over which local vs. remote filesystem state are pushed from the server to the client.
    • /syncApi/user: get user info (name, email, url, phone, address, org, website, orcid, google scholar id, research gate id).
    • /syncApi/resources: collection of resources a user owns (resource type, title, id, is immutable, url, creation date, last modified date).
    • /syncApi/resources/<resource-id>: collection of files and file checksum in a resource owned by a user.
    • /syncApi/resources/<resource-id>/download: download a resource owned by the user locally.
    • /syncApi/resources/<resource-id>/upload: upload a local file or folder to an existing hydroshare resource.
    • /syncApi/resources/<resource-id>/download/<file-or-directory>?folder=(true | false): download a file or folder in a HydroShare resource owned by the user locally. Paths should be specified relative to /data/contents (i.e. to download a file, f in resource 1 which is saved in the bagit store at 1/1/data/contents/f, only f would be specified in the request).
  • To run hydroshare_jupyter_sync in develop mode now use CLI via python -m hydroshare_jupyter_sync. Allows changing port number, hostname, supplying an alternative config file, or disabling debug mode via CLI.

Notes

  • If you are a reviewer of this massive PR, I would mainly pay attention to the following subpackages and modules (in order):

    • hydroshare_jupyter_sync/handlers/__init__.py define handler endpoints
    • hydroshare_jupyter_sync/server.py tornado http handlers
    • hydroshare_jupyter_sync/websocket_handler.py tornado websocket handler
    • hydroshare_jupyter_sync/lib/ a few modularized library packages
    • hydroshare_jupyter_sync/fs_events.py & fs_event_handler.py
    • hydroshare_jupyter_sync/session_struct.py
    • hydroshare_jupyter_sync/session.py
    • hydroshare_jupyter_sync/session_sync_event_listeners.py
    • hydroshare_jupyter_sync/utilities
  • There are several handlers, methods, and modules that were not removed in this PR. Most notably, many of the tornado handlers in hydroshare_jupyter_sync/server.py are no longer used (in the future they will be removed). I would suggest looking at the get_route_handlers in the handlers (this method will also change in the future).

  • Ignore changes tohydroshare_jupyter_sync/resource_manager.py, the module is no longer used in the project and the majority of changes are code reformatting done by black.

Todos

  • Move tornado http and websocket handlers to the handlers subpackage.

Austin Raney added 30 commits July 8, 2021 13:26
…hild classes that don't need this functionality
…ookie created post login is used to retain login state.
Austin Raney added 13 commits September 27, 2021 09:52
…extension_points is not _jupyter_server_extension_paths
…e type hint declarations. Previous imp, causes duplicate value fields, to be 'removed'
…he correct intermediate baggit like structure.
…path where hydroshare resources are stored.
…ter sync data directory via a GET request. Does not require login
…s TODO action). When resource entity is downloaded, both local and remote resource are updated, meaning, at the resource level, all the md5 checksums are recomputed and the remote is re-queired for changes in the manifest-md5.txt file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Issue requires backend code change enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants