The Generative AI-Assisted Interactive Resource Center is an ambitious project aimed at revolutionizing the Immigration, Refugees, and Citizenship Canada (IRCC)’s user engagement experience using RAG. At the core of this project is a Generative AI framework that facilitates a dynamic chat environment, enabling users to pose queries and receive precise, informative responses sourced from IRCC's extensive documentation.
The idea behind RAG applications is to provide LLMs with additional context at query time for answering the user’s question.
- When a user asks the support agent a question, the question first goes through an embedding model to calculate its vector representation.
- The next step is to find the most relevant nodes in the database by comparing the cosine similarity of the embedding values of the user’s question and the documents in the database.
- Once the relevant nodes are identified using vector search, the application is designed to retrieve additional information from the nodes themselves and also by traversing the relationships in the graph.
- Finally, the context information from the database is combined with the user question and additional instructions into a prompt that is passed to an LLM to generate the final answer, which is then sent to the user.
-
Intelligent Query Handling:
- Leveraging cutting-edge Generative AI technology, the system comprehends user queries, regardless of the language they are posed in.
- The AI agent navigates through the IRCC's documentation, extracting and generating responses that succinctly address user inquiries.
- Multilingual support ensures inclusivity, allowing users from diverse linguistic backgrounds to interact with the system effortlessly.
-
Document Retrieval and Analysis:
- A robust document retrieval system that swiftly traverses through IRCC’s repository to fetch relevant information.
- Advanced natural language processing capabilities enable the system to understand and generate accurate responses from the documentation.
-
Interactive Dashboard:
- A comprehensive dashboard that visually represents critical metrics such as the volume of Permanent Residency (PR) applications, distribution of applicants by age, nationality, and other demographics.
- Real-time updates on the dashboard provide a snapshot of ongoing immigration trends and other pertinent activities.
-
Web Interface:
- A user-friendly web interface serving as the primary interaction point for users.
- The website will host the interactive chat environment and the dashboard, providing a one-stop solution for users seeking information and insights regarding the IRCC processes.
Using `Scrapy`` for web scraping involves several steps. Here's a simplified guide to help you get started:
- You'll need to have Python installed on your machine.
- Install Scrapy using pip:
$ pip install scrapy $ pip install html2text
- In your terminal, navigate to the directory where you want to create your project.
- Run the following command, replacing
myproject
with the name of your project:$ scrapy startproject ircc
- A spider is a script that tells Scrapy what URLs to scrape and how to extract data from the pages.
- Inside your project directory, create a spider using the following command, replacing
myspider
andexample.com
with your desired spider name and target URL:$ scrapy genspider ircc canada.ca
- Open the generated spider file (located in the
ircc/spiders
directory). - Define the initial URLs, the parsing logic, and how to follow links.
- You can find some examples in the scripts directory of this repository.
- In the terminal, navigate to your project directory.
- Run the spider using the following command, replacing
myspider
with the name of your spider:$ scrapy crawl canada_ca_md
Note: The script folder contains a script to convert the links to markdown inline links. Script can be run as follows:
$ ./replace_links.sh /Users/mohsen/code/Personal/ircc-dump-scrapy/canadascraper/output-md
To run the project locally, there are a few prerequisites:
- The Rust toolchain
- The Onnx Runtime. Will be downloaded and installed automatically when building the project.
- Docker to run the QdrantDB instance.
make
for easy automation and development workflow
Once, the above requirements are satisfied, you can run the project like so:
The ircc-ai
engine (oracle), embed and bot can also be run locally via a docker container and
includes all the necessary dependencies.
Note: Please ensure the scrape data is available in the content
directory. The content
directory should contain the output-md/en
directory from the scrapy
project.
The data can also be passed in via the CONTENT_PATH_HOST
environment variable but for sake of simplicity, we will assume the data is available in the content
directory.
To build the docker images run:
$ make -f Makefile.local local-image
To create the embeddings, run the following command. It will traverse the directory specified by CONTENT_PATH_HOST
, creating embeddings for each file. These embeddings will then be stored in QdrantDB
.
$ make -f Makefile.local start-embed
To start the engine, run the following command. It will start the engine and expose it on port 3000
.
$ make -f Makefile.local start-oracle
The database dashboard will be accessible at localhost:6333/dashboard, the project communicates with the DB on port 6334
.
Note: Since the service returns responses as SSEs, a REST client like Postman is recommended. Download it here. The Postman web client does not support requests to
localhost
.
Endpoint | Method | Description |
---|---|---|
/ |
GET | Redirects to the configured redirect URL. |
/query |
POST | Perform a query on the API with a specific question. |
The parameters are passed as a JSON object in the request body:
query
(string, required): The question or query you want to ask.
The request is processed by the server and responses are sent as Server-sent events(SSE). The event stream will contain events with optional data.
$ curl --location 'localhost:3000/query' \
--header 'Content-Type: application/json' \
--data '{
"query": "How long must I stay in Canada to keep my permanent resident status?"
}'
To start the telegram bot, run the following command. It will start the telegram bot and start listening for messages.
$ make -f Makefile.local make start-bot