GitBook Scraper

This Python script scrapes documentation from a GitBook site and converts the extracted HTML content into Markdown format. It can be used to back up documentation or convert it for offline use in Markdown-based repositories or projects.

Features

Scrapes all accessible pages of a GitBook website.
Converts the HTML content into Markdown format using the html2text library.
Handles links between pages and includes them in the final Markdown document.
Saves the final Markdown content to a file.

Requirements

Ensure that you have the following installed:

Python 3.x
Required Python packages: requests, beautifulsoup4, html2text

You can install the necessary packages using:

pip install requests beautifulsoup4 html2text

Usage

To use the script, follow these steps:

Clone this repository:

git clone https://github.com/A2-Security/GitBook-Scraper
cd gitbook-scraper

Update the script:

Replace the gitbook_url variable in the script with the GitBook URL you want to scrape.

gitbook_url = 'docs-one.example.xyz'  # Example GitBook URL

You can also change the output file name by modifying the output_file variable.

output_file = 'documentation.md'  # Desired output file name

Run the script:

Execute the script to start scraping the GitBook:

    python gitbook_scraper.py

The script will:

Fetch the main page of the GitBook.
Extract and follow all internal links.
Scrape the content of each page.
Convert the content to Markdown format.
Save the output to a Markdown file.

Check the output:

Once the script has finished, the Markdown file will be available in the specified output path (documentation.md by default).

Example

For example, to scrape the documentation from a GitBook page like https://docs-one.example.xyz/, you would:

Update the gitbook_url in the script to this URL.
Run the script, and it will generate a file named documentation.md with the full scraped content.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
gitbook_scraper.py		gitbook_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitBook Scraper

Features

Requirements

Usage

Clone this repository:

Update the script:

Run the script:

Check the output:

Example

About

Releases

Packages

Languages

A2-Security/GitBook-Scraper

Folders and files

Latest commit

History

Repository files navigation

GitBook Scraper

Features

Requirements

Usage

Clone this repository:

Update the script:

Run the script:

Check the output:

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages