slides_outline.txt

# NOTES:

# o Maybe while you have a fully functioning CAIS, it's not
# necessarily a robust CAIS?  Make this clear?

#+TITLE: Defense Slidedeck Outline

* Title Slide:
+ Show logos for:
    + RAIR
    + CISL
    + RPI
    + IBM
    + AFOSR
    + ONR

"Thank you to IBM for their support of CISL and ONR & AFOSR separately for
making this research possible."
# MATT:  Technically, you did receive /some/support from ONR & AFOSSR,
# and need to mention that too.  Thx.  //S

* Roadmap

+ Introduction, Background, and Motivation
+ Technology for Cognitive and Immersive Systems
+ Reagent:  Enabling Interaction with Arbitrary Web Content
+ MUIFOLD: Mobile UI Framework for Operating Large Displays
+ Formalizing the CAIS
+ Planning and Plan Recognition
+ Closing remarks

* Introduction

** What is a CAIS?
+ A CAIS should broadly be able to fulfill the following two requirements:
    + C Cognitive
    + I Immersive

** Motivating Example of Alvin, Betty, and Charlie

** Requirements for a CAIS implementation
Figure: =chapters/01_introduction/figures/planning_cais_framework.png=
+ The requirements of a CAIS must be able to support our earlier use-case of Alvin, Bill, and Charlie
+ Must be multi-user and multi-modal
+ Optimally be able to handle:
    + speech input     (done here)
    + gestural input   (done here)
    + perception input (done here, caveat placed on headpose vs just being "in the CAIS")
+ Should be able to operate on a generalizable standpoint, making adoption to others easier
+ Hammer that while it is fine to define things theoretically, we wish to provide here an actual
    physical implementation to drive our theoretical underpinnings, to prevent us from giving
    a theoretical system that is easily far into the future, and that we don't trap ourselves
    there.

** Contributions
+ Create a definition for what constitutes a cognitive and immersive system
+ Define and provide implementation of a generalizable framework for a CAIS
    + As stated above, this implementation does not necessarily hit the absolute limits of robustness,
        it can be used to accomplish all of the theoretical underpinnings with some slight caveats
    + Additionally, as part of this work, we provide this CAIS as open-source so as to help see
        its adoption and secure its longevity. A lot of prior work appears to generate a paper or two,
        get some buzz, and then disappear, all the while no technology is provided for future endeavors
+ Create formalization for CAIS such that it can operate at theory of mind level
+ Utilize items 2 and 3 for purposes of planning and plan recognition

** Remarks on Colocated vs Distributed
+ Why are they different?
+ Why can we not just pivot?

* Technology

** Prior Work on intelligent rooms
+ =bolt_put-that-there:_1980=
+ =coen_building_1997=
+ =brooks_intelligent_1997=
(Gap due to CALO project)
+ =farrell_symbiotic_2016=
+ =chakraborti_mr._2017=
+ =arai_cira:_2019=

** Architecture for a CAIS
Figure: =chapters/02_technology/figures/cais_high_level.png=
+ We have workers that are at a sensor level (e.g. the inputs of the "brain")
+ At the next several levels, the "raw" input is parsed into semantic information as well as can be
    combined in any way
    + support for early vs late fusion common in ML multi-modal approaches
    + we aim for "semantic late fusion"
+ The results of the input are combined as necessary for resolution of the execution engine

** Implementing a CAIS
Figure: =chapters/02_technology/figures/cais_implementation.png=
+ Provide here open-source implementation of modules "Bishop CAIS" (https://github.com/bishopcais)
+ Briefly discuss these modules in minor detail:
    + transcript-worker
    + speaker-worker
    + executor
    + conversation-worker
    + occupancy-tracker

** Implementing a CAIS: display-worker
+ Electron based application which allows us to open webpages on grid system
+ Webpages can point to anywhere, and that through electron, we can reach into open webpages
+ Surfaces RabbitMQ / REST API to pull details out about open webpages (e.g. location on screen)
+ How do we make sense though of what's on the screen?

* Reagent

** Reagent
+ Motivation:
    + To make possible understanding pointing, we need capacity to understand what is on the screen
    + Prior work demanded that all content be created in-house and be specially instrumented
        (e.g. chakraborti_visualizations_2018)
    + Doable, but of course that would mean our CAIS is probably not going to see real world usage

** Reagent Architecture
Figure: =chapters/03_reagent/figures/reagent.png=
+ Utilize pre-loading of small JS file on-top of any webview loaded by display-worker
+ Creates a bridge into the page to the Reagent server
+ Creates bindings into the page to detect interaction data, as well as hooks to allow us
    to query information from the page
+ MutationObserver attached to the page alerting us of any changes to the DOM

** Reagent: Sample ontology use-case
+ Given page of ESPN, talk about how Reagent exposes the table on the page to users
+ Allow querying information about of the table using table names, as well as gestural information
+ Walk through building ontology

** Regaent: Experiment
+ Walkthrough experiment on:
    + Selecting pages
    + Composing our evaluation set
    + Metrics we measure against

** Reagent: Experiment Results
+ Show table

** Reagent: Summary
+ We can use this to understand screen content on the fly
+ Can build out a number of general extensions for a number of generic pages (e.g. tables)
+ Can build out specific extensions for custom content (such as we do below for our use-cases)

* MUIFOLD

** Motivation
+ While there exists prior work in pointing in large spaces, they almost always
    utilize usage range of sensors that are costly to set-up and use
+ Additionally, these solutions had issues of allowed movement zones, Additional
    hardware, etc.

** Prior Work
+ hutchison_pointerphone:_2013
+ pietroszek_smartcasting:_2014
+ abascal_ucanvas_2015
+ weisker_massive_2016
+ baldauf_your_2016
+ barsotti_web_2017

** Introduction MUIFOLD
+ Framework for building out UI for interacting with CAIS using cellphones
+ Almost everyone has a smart-phone that features powerful accelerometer and gyroscope
+ Does not utilize external sensors or landmarks
+ Allows building out also UIs that are more friendly to a use-case

** MUIFOLD: Sample Use-Case
+ Demonstrate MUIFOLD in intelligent analyst use-case

** MUIFOLD: User Study
+ Walk through experiment
+ Walk through results

** MUIFOLD: Summary
+ Technology works, provides general pointing device we can use in CAIS minimally
+ Maximally provides also alternative input schemes to content beyond just natural
    language sentences and gestures

* Formalizing the CAIS

** Introduce CEC
+ Provide brief description
+ Intensional logic so allows for modeling not just B, K of a user, but also that
    can be B of user of B of another user (or self).

** Formalizing our requirements
+ Walk through one at a time per slide (C1, C2, I1, I2, I3):
    + Informal textual description of principal
    + Formalization

** Formalizing the CAIS technology
+ Users within CAIS
+ Perception and Vicinity
+ Pointing
+ Spoken / Typed Natural Language and MUIFOLD Input

** Formalization of sample use-cases
+ Blocks world
+ Sticky-notes

** Solving the False-Belief Task with our Formalization

* Planning / Plan Recognition

** Why?
+ Given now that we have both a real-world implementation and formalization,
    can now attack issues of planning and plan recognition in these spaces

** Define Planning (give example)

** Define Plan Recognition

** Define STRIPS-style Action Planners

** Discuss Spectra

** Intent Resolution (2+ slides)

+ Given sentence "Delete the green note"
+ Walk through scenario of resolving it
    + intent: delete, entities: ["green"]
    + match against possible action candidate
    + gain additional information what we're looking for (note)
    + combine with given entities
    + see if we have a matching note that is in agents' beliefs (from perception)
    + if not, launch into dialogue for resolution

** Plan Recognition

+ Given above false-belief scenario, have user continue to operate in it
+ Have user make an action consistent with an incorrect goal state
+ Provide course correction, utilizing the observations of users' action
    sequence

* Conclusion

** Outline objectives of dissertation, met

** Identify future work

+ Extensions to CEC
    + Strength Factors
    + Emotional Theory
+ Handling privacy and ethical issues in the space
+ Extensions to technology
+ User studies