Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
9fa557a
Created basic structure for a download service
christianlupus Jan 23, 2021
f01786f
Save HTML in object field for later recovery
christianlupus Jan 23, 2021
3855fe2
Integrating download service partially with existing RecipeService
christianlupus Jan 23, 2021
a98a1a2
Corrected casing of class names
christianlupus Jan 23, 2021
145d9f8
Basic HTML parser structure plus a JSON+LD parser created
christianlupus Jan 23, 2021
e06c657
Created basic Microdata parser
christianlupus Jan 23, 2021
fe7533e
Include extraction service in import routines
christianlupus Jan 23, 2021
0f35d9a
Fixed some bugs in the code
christianlupus Jan 23, 2021
a7dda3d
Added Changelog
christianlupus Jan 23, 2021
e5fc5bd
Writing test cases for new code
christianlupus Aug 26, 2021
0ee3f89
Created tests for JSON+LD metadata parser
christianlupus Aug 27, 2021
2f0a3d1
Reanmed resource folder
christianlupus Aug 27, 2021
11aad12
Added test for microdata parser
christianlupus Aug 27, 2021
74c9c4b
Typo in filename
christianlupus Aug 31, 2021
fad8dcf
Added test for html decoder
christianlupus Aug 31, 2021
9267ff4
Updated test namespace
christianlupus Sep 1, 2021
77c78c3
Added parser for HTML files
christianlupus Sep 1, 2021
7b9f60b
Adding a download service
christianlupus Sep 2, 2021
4fe07d0
Added ImportException to code coverage report
christianlupus Sep 2, 2021
ae683ab
Make test code compatible with PHP 7.3
christianlupus Sep 2, 2021
19b6f43
Corrected code styling
christianlupus Sep 2, 2021
e3a57a7
Fix #724
christianlupus Oct 20, 2021
07ada44
Added test case
christianlupus Oct 20, 2021
7b0a5b1
Fixing code style after big rebase
christianlupus May 7, 2022
c608c49
Apply suggestions regarding typos and language from code review
christianlupus May 21, 2022
457e88d
Fix PR checks
christianlupus May 24, 2022
aac0600
Apply suggestions from code review
christianlupus May 24, 2022
0f6749f
Fix some manual corrections as suggested in code review
christianlupus May 24, 2022
d008b4f
Fixed test cases
christianlupus May 24, 2022
8651abd
Corrected Workflow
christianlupus May 24, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/check-todo/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@ runs:
steps:
- name: Add annotations
shell: bash
run: ./.github/actions/check-todo/check.sh "${{ inputs.path }}" "${{ inputs.extension }}"
run: ./.github/actions/check-todo/check.sh HEAD $GITHUB_BASE_REF
20 changes: 17 additions & 3 deletions .github/actions/check-todo/check.sh
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
#!/bin/bash

# Looking for all files
find "$1" -type f -name "*.$2" | while read line
# set -x

BRANCH_REF=HEAD
BASE_REF=master

if [ $# -gt 0 ]; then
BRANCH_REF="$1"
shift
fi

if [ $# -gt 0 ]; then
BASE_REF="$1"
shift
fi

git diff --name-only "$BASE_REF...$BRANCH_REF" | grep -E '[.](php|phpt|vue|js)$' | while read line
do
file=$(echo "$line" | sed 's@^\./@@')

grep -noE '(TODO|ToDo|@todo|XXX|FIXME|FixMe).*' "$line" | while read match
grep -noE '(TODO|ToDo|@todo|XXX|FIXME|FixMe)([^a-zA-Z].*)?$' "$line" | while read match
do
IFS=: read lineno msg <<< "$match"
echo "::warning file=$file,line=$lineno::Found $msg"
Expand Down
23 changes: 23 additions & 0 deletions .github/workflows/pull-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,29 @@ jobs:
exit 1
if: ${{ steps.diff.outputs.lines == 0 && steps.file-names.outputs.num > 0 }}

todo-checker:
name: Check for added todo messages
runs-on: ubuntu-latest
steps:
- name: Manual checkout of the app
run: |
REPO="https://${GITHUB_ACTOR}:${GITHUB_TOKEN}@github.com/${GITHUB_REPOSITORY}.git"
BRANCH="${GITHUB_REF/#refs\/heads\//}"

git version

git clone --filter=tree:0 "$REPO" .
git checkout ${{ github.head_ref }}

# git init $GITHUB_WORKSPACE
# git remote add origin https://github.com/$GITHUB_REPOSITORY
# git config --local gc.auto 0

# git fetch --filter tree:none origin +${GITHUB_SHA}:refs/remotes/origin/${BRANCH} +${GITHUB_BASE_REF}
# git checkout --progress --force -B ${BRANCH} refs/remotes/origin/${BRANCH}
- name: Check for open TODO annotations in source code
uses: ./.github/actions/check-todo

appinfo:
name: Check for matching app info file
runs-on: ubuntu-latest
Expand Down
16 changes: 0 additions & 16 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -92,22 +92,6 @@ jobs:
exit 1;
}

- name: Check for open todos in lib folder
uses: ./.github/actions/check-todo
with:
path: lib

- name: Check for open todos in unit test folder
uses: ./.github/actions/check-todo
with:
path: tests/Unit

- name: Check for open todos in integration test folder
uses: ./.github/actions/check-todo
with:
path: tests/Integration



unit-tests:
name: Run the unittests
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
## [Unreleased]

### Added
- Create service class for downloading and extracting JSON
[#553](https://github.com/nextcloud/cookbook/pull/553) @christianlupus

### Fixed
- Fix visual regression in edit mode to prevent overflow of breadcrumbs
[#989](https://github.com/nextcloud/cookbook/pull/989) @christianlupus
Expand Down Expand Up @@ -57,6 +61,9 @@
- Add example to OpenAPI specification
[#957](https://github.com/nextcloud/cookbook/pull/972) @christianlupus

### Deprecated
- Method RecipeService::parseRecipeHtml()


## 0.9.11 - 2022-03-28

Expand Down
9 changes: 9 additions & 0 deletions lib/Exception/HtmlParsingException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?php

namespace OCA\Cookbook\Exception;

class HtmlParsingException extends \Exception {
public function __construct($message = null, $code = null, $previous = null) {
parent::__construct($message, $code, $previous);
}
}
9 changes: 9 additions & 0 deletions lib/Exception/ImportException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?php

namespace OCA\Cookbook\Exception;

class ImportException extends \Exception {
public function __construct($message = null, $code = null, $previous = null) {
parent::__construct($message, $code, $previous);
}
}
15 changes: 15 additions & 0 deletions lib/Helper/HTMLFilter/AbstractHtmlFilter.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<?php

namespace OCA\Cookbook\Helper\HTMLFilter;

abstract class AbstractHtmlFilter {

/**
* Filter the HTML according to the rules of this class
*
* This class operates on the original HTML code as passed by reference and may therefore modify the HTML string.
*
* @param string $html The HTML code to be filtered
*/
abstract public function apply(string &$html): void;
}
9 changes: 9 additions & 0 deletions lib/Helper/HTMLFilter/HtmlEntityDecodeFilter.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?php

namespace OCA\Cookbook\Helper\HTMLFilter;

class HtmlEntityDecodeFilter extends AbstractHtmlFilter {
public function apply(string &$html): void {
$html = html_entity_decode($html);
}
}
27 changes: 27 additions & 0 deletions lib/Helper/HTMLParser/AbstractHtmlParser.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<?php

namespace OCA\Cookbook\Helper\HTMLParser;

use OCA\Cookbook\Exception\HtmlParsingException;
use OCP\IL10N;

abstract class AbstractHtmlParser {

/**
* @var IL10N
*/
protected $l;

public function __construct(IL10N $l10n) {
$this->l = $l10n;
}

/**
* Extract the recipe from the given document.
*
* @param \DOMDocument $document The document to parse
* @return array The JSON content in the document as a PHP array
* @throws HtmlParsingException If the parsing was not successful
*/
abstract public function parse(\DOMDocument $document): array;
}
9 changes: 9 additions & 0 deletions lib/Helper/HTMLParser/AttributeNotFoundException.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
<?php

namespace OCA\Cookbook\Helper\HTMLParser;

class AttributeNotFoundException extends \Exception {
public function __construct($message = null, $code = null, $previous = null) {
parent::__construct($message, $code, $previous);
}
}
187 changes: 187 additions & 0 deletions lib/Helper/HTMLParser/HttpJsonLdParser.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
<?php

namespace OCA\Cookbook\Helper\HTMLParser;

use OCA\Cookbook\Exception\HtmlParsingException;
use OCP\IL10N;
use OCA\Cookbook\Service\JsonService;

/**
* This class is an AbstractHtmlParser which tries to extract a JSON+LD script from the HTML page.
* @author Christian Wolf
*/
class HttpJsonLdParser extends AbstractHtmlParser {

/**
* @var JsonService
*/
private $jsonService;

public function __construct(IL10N $l10n, JsonService $jsonService) {
parent::__construct($l10n);

$this->jsonService = $jsonService;
}

public function parse(\DOMDocument $document): array {
$xpath = new \DOMXPath($document);

$json_ld_elements = $xpath->query("//*[@type='application/ld+json']");

foreach ($json_ld_elements as $json_ld_element) {
if (!$json_ld_element || !$json_ld_element->nodeValue) {
continue;
}

try {
return $this->parseJsonLdElement($json_ld_element);
} catch (HtmlParsingException $ex) {
// Parsing failed for this element. Let's see if there are more...
}
}

throw new HtmlParsingException($this->l->t('Could not find recipe in HTML code.'));
}

/**
* Parse a JSON+LD element in the DOM tree for a recipe
*
* @param \DOMNode $node The node to parse
* @throws HtmlParsingException The node does not contain a valid recipe
* @return array The recipe as an associate array
*/
private function parseJsonLdElement(\DOMNode $node): array {
$string = $node->nodeValue;

$this->fixRawJson($string);

$json = json_decode($string, true);

if ($json === null) {
throw new HtmlParsingException($this->l->t('JSON cannot be decoded.'));
}

if ($json === false || $json === true || ! is_array($json)) {
throw new HtmlParsingException($this->l->t('No recipe was found.'));
}

// Look through @graph field for recipe
$this->mapGraphField($json);

// Look for an array of recipes
$this->mapArray($json);

// Ensure the type of the object is never an array
$this->checkForArrayType($json);

if ($this->jsonService->isSchemaObject($json, 'Recipe')) {
// We found our recipe
return $json;
} else {
throw new HtmlParsingException($this->l->t('No recipe was found.'));
}
}

/**
* Fix any JSON issues before trying to decode it
*
* @param string $rawJson The JSON string to check and fix
*/
private function fixRawJson(string &$rawJson): void {
$rawJson = $this->removeNewlinesInJson($rawJson);
}

/**
* Fix newlines in raw JSON string
*
* Some recipes have newlines inside quotes, which is invalid JSON. Fix this before continuing.
*
* @param string $rawJson The original string
* @return string The corrected JSON
*/
private function removeNewlinesInJson(string $rawJson): string {
return preg_replace('/\s+/', ' ', $rawJson);
}

/**
* Look for recipes in the JSON graph
*
* Some sites use the @graph property to define elements.
* This is a quick workaround to extract the corresponding recipe.
*
* @todo This only extracts the very first recipe in the graph and only that.
* It might be favorable to look further into the json objects.
* This might especially be true when the recipe uses links to external JSON objects
* (as specified by the standard).
* Then, it might become necessary to parse ALL objects in the graph in order to extract e.g.
* the instruction objects for a recipe.
*
* @param array $json The JSON object to check
*/
private function mapGraphField(array &$json) {
if (isset($json['@graph']) && is_array($json['@graph'])) {
$tmp = $this->searchForRecipeInArray($json['@graph']);

if ($tmp !== null) {
$json = $tmp;
}
}
}

/**
* Look for an array of recipes.
*
* Some sites return an array of JSON objects instead of a plain recipe object.
* This functions checks for an indexed array and searches in it for recipes.
*
* When an array of recipes is found, the first found recipe will be used and written over the
* input parameter.
* @param array $json The JSON object to inspect
*/
private function mapArray(array &$json) {
if (isset($json[0])) {
$tmp = $this->searchForRecipeInArray($json);

if ($tmp !== null) {
$json = $tmp;
}
}
}

/**
* Search for a recipe object in an array
* @param array $arr The array to search
* @return array|NULL The found recipe or null if no recipe was found in the array
*/
private function searchForRecipeInArray(array $arr): ?array {
// Iterate through all objects in the array ...
foreach ($arr as $item) {
// ... looking for a recipe
if ($this->jsonService->isSchemaObject($item, 'Recipe')) {
// We found a recipe in the array, use it
return $item;
}
}

// No recipe was found
return null;
}

/**
* Check if the JSON element is a schema.org object but malformed.
*
* This checks if the '@type' entry is an array and corrects that.
*
* @param array $json The JSON object to parse
* @return void
*/
private function checkForArrayType(array &$json) {
if (! $this->jsonService->isSchemaObject($json)) {
return;
}

if (is_array($json['@type'])) {
$json['@type'] = $json['@type'][0];
}
}
}
Loading