Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlNode.GetEncapsulatedData fails when target property is a nullable value type #510

Open
elgonzo opened this issue Jul 25, 2023 · 3 comments
Assignees

Comments

@elgonzo
Copy link
Contributor

elgonzo commented Jul 25, 2023

1. Description

HtmlNode.GetEncapsulatedData throws an exception when properties of some nullable value type should be populated.

2. Exception

Unhandled exception. System.Exception: Unhandled Exception : Invalid cast from 'System.String' to 'System.Nullable`1[[System.Int32, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]'.
   at HtmlAgilityPack.HtmlNode.GetEncapsulatedData(Type targetType, HtmlDocument htmlDocument) in C:\Repos\HtmlAgilityPack\HtmlAgilityPack.Shared\HtmlNode.Encapsulator.cs:line 224
   at HtmlAgilityPack.HtmlNode.GetEncapsulatedData[T]() in C:\Repos\HtmlAgilityPack\HtmlAgilityPack.Shared\HtmlNode.Encapsulator.cs:line 36

3. Fiddle or Project

https://dotnetfiddle.net/6JUCcn

[HasXPath]
public class Model
{
    [XPath("div", "foo")]
    public int? NullableInt {get;set;}
}

					
string html = @"<div foo=""42"">Hello world!</div>";
		
var doc = new HtmlDocument();
doc.LoadHtml(html);
var m = doc.DocumentNode.GetEncapsulatedData<Model>();
		
Console.WriteLine($"m.NullableInt = {m.NullableInt}");

4. Any further technical details

HAP 1.11.50
.NET 7
(as per dotnetfiddle demo linked above)

@elgonzo
Copy link
Contributor Author

elgonzo commented Jul 25, 2023

Even though the stacktrace of the exception doesn't reveal it [1], the exception is caused by a call of Convert.ChangeType. According to Github code search, only the HtmlNode.Encapsulator.cs source file features Convert.ChangeType calls: https://github.com/search?q=repo%3Azzzprojects%2Fhtml-agility-pack%20Convert.ChangeType&type=code

To properly support nullable value types, calls of Convert.ChangeType could be replaced with a utility function similar to this:

static object ChangeType(string value, Type targetType)
{
    var nullableUnderlyingValueType = Nullable.GetUnderlyingType(targetType);
    if (nullableUnderlyingValueType == null)
    {
        return Convert.ChangeType(value, targetType);
    }

    var valueInstance = Convert.ChangeType(value, nullableUnderlyingValueType);
    return Activator.CreateInstance(targetType, valueInstance);
}

(Not sure if the HAP code has otherwise issues with nullable value types, but since Convert.ChangeType famously doesn't support nullable value types, i focused on call sites of Convert.ChangeType.)


[1] Which isn't helped by the fact that the GetEncapsulatedData implementation seems to have a habit of catching exceptions and then replacing that exception with another more generic exception with less information:

catch (Exception ex)
{
throw new Exception("Unhandled Exception : " + ex.Message);
}

catch (Exception ex)
{
throw new Exception("Unhandled Exception : " + ex.Message);
}

catch (Exception ex)
{
throw new Exception("Unhandled Exception : " + ex.Message);
}

Oof... What is the point of doing that at all?

There are also many other catch clauses in the GetEncapsulatedData implementation that catch an exception and throw a new exception with a different exception message. But, except two such catch blocks, the caught original exception is then not passed as InnerException to the newly created exception. Which isn't really helping troubleshootingand maintenance efforts by zzzproject, as users reporting problems typically can only report what they do and the information they can obtain from the exceptions they see, and some such...

@rwecho
Copy link
Contributor

rwecho commented Jul 26, 2023

Additionally, I'm not sure if this is a major feature. Apart from the three places where I have seen type conversions so far, are there any other areas involved? Will it potentially affect more people since the project has not enabled the Nullable project option?

Also, some of our files are encoded in Windows-1252. Every time I make changes, to avoid disrupting others' work, I save them back as Windows-1252. This seems to be a specific encoding that certain editors use for saving files, right? It that correct to keep Windows-1252 or should we switch to UTF-8 with BOM?

@elgonzo
Copy link
Contributor Author

elgonzo commented Jul 26, 2023

Will it potentially affect more people since the project has not enabled the Nullable project option?

The <nullable> setting in .csproj project files (and the equivalent #nullable pragma) is about nullable reference types only. It is not about nullable value types and therefore has no impact on nor is it an indication of whether a project can or does use nullable value types.

Also, some of our files are encoded in Windows-1252. Every time I make changes, to avoid disrupting others' work, I save them back as Windows-1252. This seems to be a specific encoding that certain editors use for saving files, right? It that correct to keep Windows-1252 or should we switch to UTF-8 with BOM?

If this is about contributions / pull requests to projects you are not a core member of, then try to keep the original text encoding of the text/source file you modify. For any other text files that belong to a project or team you are part of, make this a team decision. Generally speaking, prefer UTF-8 (preferably with BOM) over vintage text encodings/code pages, unless there is a very strong reason not to...

@JonathanMagnan JonathanMagnan self-assigned this Jul 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants