This repository has been archived by the owner on Mar 2, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 94
Optimize dictionary lookups in C# #32
Merged
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
bin | ||
obj |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
// Original version by John Taylor | ||
|
||
using System; | ||
using System.IO; | ||
using System.Collections.Generic; | ||
using System.Linq; | ||
|
||
class Program | ||
{ | ||
public sealed class Ref<T> { | ||
public Ref(T initialValue) { | ||
Value = initialValue; | ||
} | ||
public T Value { get; set; } | ||
} | ||
|
||
static void Main(string[] args) | ||
{ | ||
var counts = new Dictionary<string, Ref<int>>(); | ||
string line; | ||
while ((line = Console.ReadLine()) != null) | ||
{ | ||
line = line.ToLower(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could save this potential string allocation by instantiating the dictionary with one of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be honest I intentionally did not include any of those optimizations that were plentiful in the many parallel PRs :) |
||
var words = line.Split(' ', StringSplitOptions.RemoveEmptyEntries); | ||
string word; | ||
for (int i = 0; i < words.Length; i++) | ||
{ | ||
word = words[i]; | ||
if (!counts.TryGetValue(word, out var count)) { | ||
counts.Add(word, new Ref<int>(1)); | ||
} else { | ||
count.Value += 1; | ||
} | ||
} | ||
} | ||
var ordered = counts.OrderByDescending(pair => pair.Value.Value); | ||
foreach (var entry in ordered) | ||
{ | ||
Console.WriteLine("{0} {1}", entry.Key, entry.Value.Value); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
|
||
<PropertyGroup> | ||
<OutputType>Exe</OutputType> | ||
<TargetFramework>net5.0</TargetFramework> | ||
</PropertyGroup> | ||
|
||
</Project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
// Original version by John Taylor | ||
|
||
using System; | ||
using System.IO; | ||
using System.Collections.Generic; | ||
using System.Linq; | ||
|
||
class Program | ||
{ | ||
static void Main(string[] args) | ||
{ | ||
var counts = new Dictionary<string, int>(); | ||
string line; | ||
while ((line = Console.ReadLine()) != null) | ||
{ | ||
line = line.ToLower(); | ||
var words = line.Split(' ', StringSplitOptions.RemoveEmptyEntries); | ||
foreach (string word in words) | ||
{ | ||
counts[word] = counts.GetValueOrDefault(word, 0) + 1; | ||
} | ||
} | ||
var ordered = counts.OrderByDescending(pair => pair.Value); | ||
foreach (var entry in ordered) | ||
{ | ||
Console.WriteLine("{0} {1}", entry.Key, entry.Value); | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<Project Sdk="Microsoft.NET.Sdk"> | ||
|
||
<PropertyGroup> | ||
<OutputType>Exe</OutputType> | ||
<TargetFramework>net5.0</TargetFramework> | ||
</PropertyGroup> | ||
|
||
</Project> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small note: The BCL supplies
StrongBox<T>
(underSystem.Runtime.CompilerServices
) that fulfils the exact same purpose as this type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of having wrapped the integer anyway? My initial thought was that this would actually make the program less performant, because a reference type lives on the heap, whereas an
int
lives on the stack. Clearly there must be something I'm missing here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calledude it's a reference type, so we don't copy a new int every time we get it out of the dictionary - but instead increment the existing one. If you try it without you will see all counts are just 1.
In a more general sense: we're storing a pointer to integer instead of only the integer itself.
Otherwise we would need to get it out of the dictionary, increment it and then set the new incremented value back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a
ref struct
then? This would force it to still live on the stack and we would have a pointer to it, perhaps that would make it even more performant? I assume there's a tradeoff in the current solution where having it on the stack does not outweigh the cost of copying the value all the time?Edit; Nvm, ref structs can't be used as generic type arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I haven't looked into that part too much.
A trace I did earlier shows that the actual lookup takes the most time, followed by string creation. (not sure, if the latest code version, but should be close)
So I'm focusing on that first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW; According to BenchmarkDotNet
int
is still faster.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@calledude, I'm not sure what exactly you are measuring there, but if you simply removed wrapping object and still use the rest of the optimized code as-is (that is, using
TryGetValue
), your faster code does not work correctly. As @erikbozic already pointed out - all your counters will never increment and always stay at 1.Yes, extra object indirection adds overhead, but this code is faster than code in "simple" without it exactly because code without it will have to do hash lookup twice: first, to read value and then to store it back again. That is because current Dictionary API does not support atomic increment or in-place value modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for you question on
ref struct
- the whole Dictionary lives on the heap and anything you store there has to be boxed and put on a heap, whichref struct
cannot be.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checked and I could've sworn I had made sure I was actually incrementing the value correctly, but I guess not then.
Not my proudest moment... :) Sorry about that!