Skip to content

Commit

Permalink
EPUB 3 support improvements (#21)
Browse files Browse the repository at this point in the history
* Parsing EPUB 3 navigation document
* Parsing linear reading order
* Changes to WpfDemo to use the new version of the library
* General refactoring
* Documentation update
  • Loading branch information
vers-one authored Apr 8, 2019
1 parent edc173a commit 0b3bcb8
Show file tree
Hide file tree
Showing 96 changed files with 2,449 additions and 890 deletions.
53 changes: 37 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
# EpubReader
.NET library for reading EPUB files.

Supports .NET Framework >= 4.5, .NET Core >= 1.0, and .NET Standard >= 1.3.
Supports .NET Framework >= 4.6, .NET Core >= 1.0, and .NET Standard >= 1.3.

Supports EPUB 2 (2.0, 2.0.1) and EPUB 3 (3.0, 3.0.1, 3.1).

[Download](#download-latest-stable-release) | [WPF & .NET Core demo apps](#demo-apps)

## Migration from 2.x

[How to migrate from 2.x to 3.x](https://github.com/vers-one/EpubReader/wiki/Migrating-from-2.x-to-3.x)

## Example
```csharp
// Opens a book and reads all of its content into memory
Expand Down Expand Up @@ -32,19 +38,25 @@ if (coverImageContent != null)
}
}

// CHAPTERS
// TABLE OF CONTENTS
// Enumerating chapters
foreach (EpubChapter chapter in epubBook.Chapters)
foreach (EpubNavigationItem chapter in epubBook.Navigation)
{
// Title of chapter
string chapterTitle = chapter.Title;

// HTML content of current chapter
string chapterHtmlContent = chapter.HtmlContent;

// Nested chapters
List<EpubChapter> subChapters = chapter.SubChapters;
List<EpubNavigationItem> subChapters = chapter.NestedItems;
}

// READING ORDER
// Enumerating the whole text content of the book in the order of reading
foreach (EpubTextContentFile textContentFile in book.ReadingOrder)
{
// HTML of current text content file
string htmlContent = textContentFile.Content;
}


Expand Down Expand Up @@ -116,32 +128,41 @@ foreach (EpubMetadataContributor contributor in package.Metadata.Contributors)
string contributorRole = contributor.Role;
}

// EPUB NCX data
EpubNavigation navigation = epubBook.Schema.Navigation;
// EPUB 2 NCX data
Epub2Ncx epub2Ncx = epubBook.Schema.Epub2Ncx;

// Enumerating NCX metadata
foreach (EpubNavigationHeadMeta meta in navigation.Head)
// Enumerating EPUB 2 NCX metadata
foreach (Epub2NcxHeadMeta meta in epub2Ncx.Head)
{
string metadataItemName = meta.Name;
string metadataItemContent = meta.Content;
}

// EPUB 3 navigation
Epub3NavDocument epub3NavDocument = epubBook.Schema.Epub3NavDocument

// Accessing structural semantics data of the head item
StructuralSemanticsProperty? ssp = epub3NavDocument.Navs.First().Type;
```

## More examples
[How to extract plain text from all chapters.](https://github.com/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/ExtractPlainText.cs)

1. [How to extract the plain text of the whole book.](https://github.com/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/ExtractPlainText.cs)
2. [How to extract the table of contents.](https://github.com/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/PrintNavigation.cs)
3. [How to iterate over all EPUB files in a directory and collect some statistics.](https://github.com/vers-one/EpubReader/tree/master/Source/VersOne.Epub.NetCoreDemo/TestDirectory.cs)

## Download latest stable release
[Via NuGet package from nuget.org](https://www.nuget.org/packages/VersOne.Epub)

DLL file from GitHub: [for .NET Framework](https://github.com/vers-one/EpubReader/releases/download/v2.0.5/VersOne.Epub.Net45.zip) (26.9 KB) / [for .NET Core](https://github.com/vers-one/EpubReader/releases/download/v2.0.5/VersOne.Epub.NetCore.zip) (27.0 KB) / [for .NET Standard](https://github.com/vers-one/EpubReader/releases/download/v2.0.5/VersOne.Epub.NetStandard.zip) (27.0 KB)
DLL file from GitHub: [for .NET Framework](https://github.com/vers-one/EpubReader/releases/download/v3.0.0/VersOne.Epub.Net46.zip) (38.3 KB) / [for .NET Core](https://github.com/vers-one/EpubReader/releases/download/v3.0.0/VersOne.Epub.NetCore.zip) (38.4 KB) / [for .NET Standard](https://github.com/vers-one/EpubReader/releases/download/v3.0.0/VersOne.Epub.NetStandard.zip) (38.4 KB)

## Demo apps
[Download WPF demo app ](https://github.com/vers-one/EpubReader/releases/download/v2.0.5/WpfDemo.zip) (WpfDemo.zip, 409 KB)
[Download WPF demo app](https://github.com/vers-one/EpubReader/releases/download/v3.0.0/WpfDemo.zip) (WpfDemo.zip, 479 KB)

This .NET Framework application demonstrates how to open EPUB books and extract their content using the library.

HTML renderer used in this demo app may be a little bit slow for some books.
HTML renderer used in this demo app may have difficulties while rendering HTML content for some of the books if the HTML structure is too complicated.

[Download .NET Core console demo app](https://github.com/vers-one/EpubReader/releases/download/v2.0.5/NetCoreDemo.zip) (NetCoreDemo.zip, 17.6 MB)
[Download .NET Core console demo app](https://github.com/vers-one/EpubReader/releases/download/v3.0.0/NetCoreDemo.zip) (NetCoreDemo.zip, 17.6 MB)

This .NET Core console application demonstrates how to open EPUB books and retrieve their text content.
2 changes: 1 addition & 1 deletion Source/EpubReader.sln
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "VersOne.Epub", "VersOne.Epu
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VersOne.Epub.WpfDemo", "VersOne.Epub.WpfDemo\VersOne.Epub.WpfDemo.csproj", "{2C48D6FB-EC93-4B79-8E52-79B579B3C324}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "VersOne.Epub.NetCoreDemo", "VersOne.Epub.NetCoreDemo\VersOne.Epub.NetCoreDemo.csproj", "{A6ED4735-3D37-4E44-BEE4-218C6BBAC1BD}"
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "VersOne.Epub.NetCoreDemo", "VersOne.Epub.NetCoreDemo\VersOne.Epub.NetCoreDemo.csproj", "{A6ED4735-3D37-4E44-BEE4-218C6BBAC1BD}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Expand Down
18 changes: 6 additions & 12 deletions Source/VersOne.Epub.NetCoreDemo/ExtractPlainText.cs
Original file line number Diff line number Diff line change
Expand Up @@ -9,30 +9,24 @@ internal static class ExtractPlainText
public static void Run(string filePath)
{
EpubBook book = EpubReader.ReadBook(filePath);
foreach (EpubChapter chapter in book.Chapters)
foreach (EpubTextContentFile textContentFile in book.ReadingOrder)
{
PrintChapter(chapter);
PrintTextContentFile(textContentFile);
}
}

private static void PrintChapter(EpubChapter chapter)
private static void PrintTextContentFile(EpubTextContentFile textContentFile)
{
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(chapter.HtmlContent);
htmlDocument.LoadHtml(textContentFile.Content);
StringBuilder sb = new StringBuilder();
foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//text()"))
{
sb.AppendLine(node.InnerText.Trim());
}
string chapterTitle = chapter.Title;
string chapterText = sb.ToString();
Console.WriteLine("------------ ", chapterTitle, "------------ ");
Console.WriteLine(chapterText);
string contentText = sb.ToString();
Console.WriteLine(contentText);
Console.WriteLine();
foreach (EpubChapter subChapter in chapter.SubChapters)
{
PrintChapter(subChapter);
}
}
}
}
26 changes: 0 additions & 26 deletions Source/VersOne.Epub.NetCoreDemo/ListChapters.cs

This file was deleted.

30 changes: 30 additions & 0 deletions Source/VersOne.Epub.NetCoreDemo/PrintNavigation.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
using System;

namespace VersOne.Epub.NetCoreDemo
{
internal static class PrintNavigation
{
public static void Run(string filePath)
{
using (EpubBookRef bookRef = EpubReader.OpenBook(filePath))
{
Console.WriteLine("Navigation:");
foreach (EpubNavigationItemRef navigationItemRef in bookRef.GetNavigation())
{
PrintNavigationItem(navigationItemRef, 0);
}
}
Console.WriteLine();
}

private static void PrintNavigationItem(EpubNavigationItemRef navigationItemRef, int identLevel)
{
Console.Write(new string(' ', identLevel * 2));
Console.WriteLine(navigationItemRef.Title);
foreach (EpubNavigationItemRef nestedNavigationItemRef in navigationItemRef.NestedItems)
{
PrintNavigationItem(nestedNavigationItemRef, identLevel + 1);
}
}
}
}
46 changes: 39 additions & 7 deletions Source/VersOne.Epub.NetCoreDemo/Program.cs
Original file line number Diff line number Diff line change
Expand Up @@ -11,49 +11,81 @@ static void Main(string[] args)
while (input != 'Q')
{
Console.WriteLine("Select example:");
Console.WriteLine("1. List all chapters");
Console.WriteLine("2. Extract plain text from all chapters");
Console.WriteLine("1. Print book navigation tree (table of contents)");
Console.WriteLine("2. Extract plain text from the whole book");
Console.WriteLine("3. Test the library by reading all EPUB files from a directory");
Console.WriteLine("Q. Exit");
input = Char.ToUpper(Console.ReadKey(true).KeyChar);
Console.WriteLine();
switch (input)
{
case '1':
RunExample(ListChapters.Run);
RunFileExample(PrintNavigation.Run);
break;
case '2':
RunExample(ExtractPlainText.Run);
RunFileExample(ExtractPlainText.Run);
break;
case '3':
RunDirectoryExample(TestDirectory.Run);
break;
case 'Q':
break;
default:
Console.WriteLine("Input is not recognized. Please try again.");
Console.WriteLine();
break;
}
}
}

static void RunExample(Action<string> example)
private static void RunFileExample(Action<string> example)
{
Console.Write("Enter the path to the EPUB file: ");
string filePath = Console.ReadLine();
Console.WriteLine();
if (File.Exists(filePath) && Path.GetExtension(filePath).ToLower() == ".epub")
{
try
{
example(filePath);
Console.WriteLine();
}
catch (Exception ex)
{
Console.WriteLine("Exception was thrown:");
Console.WriteLine(ex.ToString());
Console.WriteLine();
}
}
else
{
Console.WriteLine("File doesn't exist.");
Console.WriteLine();
}
}
}

private static void RunDirectoryExample(Action<string> example)
{
Console.Write("Enter the path to the directory with EPUB files: ");
string directoryPath = Console.ReadLine();
Console.WriteLine();
if (Directory.Exists(directoryPath))
{
try
{
example(directoryPath);
}
catch (Exception ex)
{
Console.WriteLine("Exception was thrown:");
Console.WriteLine(ex.ToString());
Console.WriteLine();
}
}
else
{
Console.WriteLine("Directory doesn't exist.");
Console.WriteLine();
}
}
}
}
Loading

0 comments on commit 0b3bcb8

Please sign in to comment.