DiscSim is a simulation tool developed for the Center for Effective Governance of Indian States (CEGIS), an organization dedicated to assisting state governments in India to achieve better development outcomes.
An important goal of CEGIS is to improve the quality of administrative data collected by state governments. One approach is to re-sample a subset of the data and measure deviations from the original samples collected. These deviations are quantified as discrepancy scores, and significant scores are flagged for third-party intervention.
Often, it's unclear which re-sampling strategy yields the most accurate and reliable discrepancy scores. The goal of this project is to create a simulator that predicts discrepancy scores and assesses their statistical accuracy across different re-sampling strategies.
DiscSim comprises a backend API built with FastAPI and a frontend interface developed using Streamlit. The project utilizes PostgreSQL for database management and is containerized with Docker for easy deployment.
git clone https://github.com/cegis-org/discsim.git
cd discsim
We recommend using a virtual environment to manage dependencies. You can use either venv
or conda
.
-
Create the virtual environment:
python3 -m venv venv
-
Activate the virtual environment:
-
On macOS/Linux:
source venv/bin/activate
-
On Windows:
venv\Scripts\activate
-
-
Create the environment:
conda create -n discsim-env python=3.11
-
Activate the environment:
conda activate discsim-env
With the virtual environment activated, install the required packages:
pip install --upgrade pip
pip install -r requirements.txt
sudo apt update
sudo apt install postgresql postgresql-contrib
Install PostgreSQL using Homebrew:
brew update
brew install postgresql
brew services start postgresql
Download and install PostgreSQL from the official website.
-
Start the PostgreSQL service (if not already running):
-
On Ubuntu/Linux:
sudo service postgresql start
-
-
Switch to the
postgres
user:sudo -u postgres psql
-
Create the database and user:
-- Create the database CREATE DATABASE discsim; -- Create the user with a password CREATE USER "user" WITH PASSWORD 'password'; -- Grant privileges on the database GRANT ALL PRIVILEGES ON DATABASE discsim TO "user";
-
Grant privileges on the
public
schema:-- Change ownership of the public schema ALTER SCHEMA public OWNER TO "user"; -- Grant all privileges on the schema GRANT ALL ON SCHEMA public TO "user"; -- Grant privileges on all tables, sequences, and functions GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO "user"; GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO "user"; GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO "user";
-
Verify the privileges (optional):
-- List all schemas and their owners \dn+ -- Check privileges on the public schema SELECT nspname, nspowner, has_schema_privilege('user', nspname, 'CREATE, USAGE') AS has_privs FROM pg_namespace WHERE nspname = 'public';
-
Exit the
psql
shell:\q
Create a .env
file in the project's root directory and add the following content:
# API configuration
API_BASE_URL="http://localhost:8000"
# PostgreSQL configuration
POSTGRES_USER="user"
POSTGRES_PASSWORD="password"
POSTGRES_DB="discsim"
# Database URL
DATABASE_URL="postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@localhost:5432/${POSTGRES_DB}"
# Log level
LOG_LEVEL=info
Ensure that the DATABASE_URL
matches your local PostgreSQL configuration.
-
Activate your virtual environment if not already active:
-
On macOS/Linux:
source venv/bin/activate
-
On Windows:
venv\Scripts\activate
-
-
Run the API server:
python api/run.py
The API server will start on
http://localhost:8000
.
-
Open a new terminal window.
-
Activate your virtual environment.
-
Run the Streamlit app:
streamlit run dashboard/app.py --server.port=8501
The frontend will be accessible at
http://localhost:8501
.
- Frontend Interface: Open your web browser and navigate to
http://localhost:8501
to interact with the application. - API Documentation: Access the API docs at
http://localhost:8000/docs
.
If you prefer to use Docker, you can run the entire application stack using Docker Compose.
- Docker
- Docker Compose
-
Build and start the containers:
docker-compose build docker-compose up
-
Access the services:
- API Server:
http://localhost:8000
- Frontend:
http://localhost:8501
Note: The PostgreSQL database runs inside Docker and is accessible to the other containers.
- API Server:
We welcome contributions! If you'd like to contribute to DiscSim:
-
Fork the repository.
-
Create a new branch for your feature or bug fix:
git checkout -b feature-name
-
Commit your changes and push to your fork.
-
Submit a pull request.
For major changes, please open an issue first to discuss your ideas.
MIT License.
Thank you for checking out DiscSim! We hope this tool aids in enhancing the quality of administrative data and contributes to better governance and development outcomes.