Text Analysis of Entrepreneurial Culture & Isomorphism in Korean Universities

Overview

This research project analyzes the relationship between entrepreneurial culture, institutional isomorphism, and organizational performance in Korean universities using machine learning-based text analysis. The study focuses on examining university reports using natural language processing and topic modeling techniques.

Research Process

Text Preprocessing
- Special character removal and text cleaning
- Morphological analysis using Mecab
- Custom dictionary enhancement for academic terms
Topic Analysis
- Co-occurrence analysis of key terms
- Advanced topic modeling using KoBERT+CTM
- Similarity analysis using KR-SBERT

Directory Structure

├── data/               
│   ├── raw/           # Original university reports (72 universities)
│   └── processed/     # Preprocessed text data
├── notebooks/         
│   ├── preprocessing.ipynb  # Text preprocessing steps
│   └── analysis.ipynb      # Topic modeling and analysis
└── src/               
    ├── preprocess.py  # Text preprocessing utilities
    └── modeling.py    # Topic modeling implementation

Requirements

numpy>=1.19.2
pandas>=1.2.0
konlpy>=0.6.0
torch>=1.9.0
transformers>=4.11.0
contextualized-topic-models>=2.2.0
sentence-transformers>=2.2.0

Installation

Install MeCab and related dependencies:

apt-get update
apt-get install -y git make curl xz-utils file
apt-get install -y mecab libmecab-dev mecab-ipadic-utf8 python3-dev

Install Python packages:

pip install -r requirements.txt

Install Korean language resources:

git clone https://bitbucket.org/eunjeon/mecab-ko.git
cd mecab-ko && ./autogen.sh && ./configure && make && make install

git clone https://bitbucket.org/eunjeon/mecab-ko-dic.git
cd mecab-ko-dic && ./autogen.sh && ./configure && make && make install

Usage

1. Text Preprocessing

from konlpy.tag import Mecab
from src.preprocess import clean_text, preprocess_documents

# Initialize MeCab tokenizer
mecab = Mecab()

# Process documents
preprocessed_docs = preprocess_documents(raw_documents, mecab)

2. Topic Modeling

from src.modeling import train_ctm_model

# Train CTM model
ctm_model = train_ctm_model(preprocessed_docs, n_components=5)

# Get topics
topics = ctm_model.get_topics(10)

3. Similarity Analysis

# Load SBERT model
model = SentenceTransformer('snunlp/KR-SBERT-V40K-klueNLI-augSTS')

# Compute similarities
similarities = compute_similarities(documents, reference_topics, model)

Contributing

This is a research project conducted at Sogang University's Graduate School of Management of Technology. For questions or contributions, please contact:

Research Advisor: Prof. Kyootai Lee
Graduate School of MOT, Sogang University

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Analysis of Entrepreneurial Culture & Isomorphism in Korean Universities

Overview

Research Process

Directory Structure

Requirements

Installation

Usage

1. Text Preprocessing

2. Topic Modeling

3. Similarity Analysis

Contributing

License

About

Releases

Packages

Languages

License

YewonCALLI/Linc3.0-mecab

Folders and files

Latest commit

History

Repository files navigation

Text Analysis of Entrepreneurial Culture & Isomorphism in Korean Universities

Overview

Research Process

Directory Structure

Requirements

Installation

Usage

1. Text Preprocessing

2. Topic Modeling

3. Similarity Analysis

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages