Piazza AI: Wikipedia Updater Framework

Welcome to the Piazza Updater, a framework developed by Piazza AI that demonstrates the power of Weaviate vector databases combined with real-time data updates. This repository is an open-source demo showcasing how our framework processes Wikipedia data, fetches new information in real-time from the web, and updates a vector database. The goal is to simplify Large Language Model (LLM) deployments by leveraging advanced techniques like Retrieval-Augmented Generation (RAG).

Piazza Workflow

How to Run the Demo

Prerequisites

Docker: Ensure Docker is installed on your machine.
API Keys: Create a .env file to include API keys required for other modules (e.g., OpenAI, Anthropic). Refer to the Verba repository for details.

Steps to Run

Clone this repository and install dependencies

git clone https://github.com/piazza-tech/Piazza-Updater.git
cd Piazza-Updater
pip install -r requirements.txt

Provide execution permissions for the start.sh script:
```
chmod +x start.sh
```
Run the framework:
```
./start.sh
```
Open your browser and navigate to http://localhost:8000.
In the Verba web interface:
- Choose Docker Deployment.
- Select Documents to observe Wikipedia data being processed and updated in real-time.
Once the initial Wikipedia dumps are processed:
- The script begins searching the internet for new data.
- Chat with the LLM using up-to-date information!

Configuration Options

Development Mode:
- Skip the PRODUCTION variable in .env for a lightweight demo (processes a small subset of Wikipedia).
- Use docker-compose-s.yml for minimal resource usage.
Production Mode:
- Add the PRODUCTION variable in .env to process the entire Wikipedia dataset.
- Use docker-compose.yml for full-scale deployment (requires more time and resources).

Technologies Used

Weaviate: Vector database for efficient semantic search and data retrieval.
LLMs: Powered by Ollama Llama 3.2 for natural language understanding.
RAG Framework: Combines vectorized data with real-time search to enhance LLM performance.
Verba: Web app for seamless user interaction and deployment (learn more).

Use Cases Beyond Wikipedia

While this demo focuses on Wikipedia, the Piazza Updater framework is highly adaptable:

Integrate with any database or website.
Fetch and process real-time internet data for various domains, such as:
- News websites
- E-commerce platforms
- Scientific research databases

License

This project is open-source under the MIT License.

Contributing

We welcome contributions to enhance this demo! Feel free to fork the repository, make changes, and submit pull requests.

For questions or support, reach out to Piazza AI or visit the Verba repository for additional deployment details.

Start exploring the future of real-time LLMs today! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose-s.yml		docker-compose-s.yml
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Piazza AI: Wikipedia Updater Framework

Piazza Workflow

How to Run the Demo

Prerequisites

Steps to Run

Configuration Options

Technologies Used

Use Cases Beyond Wikipedia

License

Contributing

About

Releases

Packages

Contributors 2

Languages

License

Piazza-tech/Piazza-Updater

Folders and files

Latest commit

History

Repository files navigation

Piazza AI: Wikipedia Updater Framework

Piazza Workflow

How to Run the Demo

Prerequisites

Steps to Run

Configuration Options

Technologies Used

Use Cases Beyond Wikipedia

License

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages