Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document limitations of Bounter #37

Merged
merged 16 commits into from
Jan 17, 2019
Merged

Conversation

aneesh-joshi
Copy link
Contributor

@aneesh-joshi aneesh-joshi commented Oct 11, 2018

Credit goes to the people on this page.
I merely worded their concerns.

Fixes #36 .

CC: @menshikh-iv

@menshikh-iv
Copy link
Contributor

Thanks @aneesh-joshi

@piskvorky any comments?

README.md Outdated
```python
from bounter import bounter
bounts = bounter(size_mb=1)
bounts.update(str(i) for i in range(1000000))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another 0 (10000000), to make it clear this is not related to 1 MB.

README.md Outdated


## When not to use Bounter?
Beware, Bounter is only a probabilistic frequency counter and cannot be relied on for fine counting. (You can't expect a data structure with finite size to hold infinite data.)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine => exact

README.md Outdated
0
```

Please use `Counter` or `dict` when such fine counts matter. When they don't matter, like in most NLP applications with a huge corpora, Bounter is a very good alternative.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine => exact.

README.md Outdated
0
```

Please use `Counter` or `dict` when such fine counts matter. When they don't matter, like in most NLP applications with a huge corpora, Bounter is a very good alternative.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

…like in most NLP and ML applications with huge datasets, …

README.rst Outdated
When not to use Bounter?
------------------------

Beware, Bounter is only a probabilistic frequency counter and cannot be relied on for fine counting. (You can't expect a data structure with finite size to hold infinite data.)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file auto-generated? Or why are we keeping two parallel READMEs?

Anyway, the same changes here please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I propose to stay only .rst (because this used for PyPI too) and drop markdown version (to don't support 2 almost same versions)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if they're out of sync already… @aneesh-joshi can you check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky
I will check. The .rst is not auto generated. I manually edited it.

@piskvorky
Copy link
Owner

Good idea! Thanks.

@menshikh-iv
Copy link
Contributor

@piskvorky README rst & md really not synced yet. I still think that we should stay only one version (create best .rst version & drop markdown version to avoid "de-sync" state).

@piskvorky
Copy link
Owner

I still think…

Did we already discuss this? To me a single version (.rst) sounds better too, but I don't remember the pros/cons. What were the arguments against it? Why are we keeping both .rst and .md?

@menshikh-iv
Copy link
Contributor

I don't remember (I'm even not sure if we discussed this)

Maybe @isamaru remember?

Right now I see no pros to maintain both .md and .rst, I prefer .md, but PyPI can render only .rst, for this reason, single .rst should be our choice.

@aneesh-joshi
Copy link
Contributor Author

@menshikh-iv @piskvorky
I've made the mentioned changes.
Waiting on your decision on whether to keep .rst or .md

Some questions:

  • Does .rst render properly on github?
  • How do I autogenerate the .rst? (I can't remember it and couldn't get the exact one through google)

README.md Outdated
bounts['100']
0
```

Please use `Counter` or `dict` when such fine counts matter. When they don't matter, like in most NLP applications with a huge corpora, Bounter is a very good alternative.
Please use `Counter` or `dict` when such exact counts matter. When they don't matter, like in most NLP and ML applications with a huge datasets, Bounter is a very good alternative.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a huge datasets => huge datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot! Missed that somehow.

README.rst Outdated
bounts['100']
0

Please use ``Counter`` or ``dict`` when such fine counts matter. When they don't matter, like in most NLP applications with a huge corpora, Bounter is a very good alternative.
Please use ``Counter`` or ``dict`` when such exact counts matter. When they don't matter, like in most NLP and ML applications with a huge datasets, Bounter is a very good alternative.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

@menshikh-iv
Copy link
Contributor

@aneesh-joshi

Does .rst render properly on github?

Yes

How do I autogenerate the .rst? (I can't remember it and couldn't get the exact one through google)

is http://pandoc.org/try/ works OK?

@aneesh-joshi
Copy link
Contributor Author

@menshikh-iv
I autogenerated the .rst with
pandoc --from=markdown --to=rst --output=README.rst README.md

but the generated file has a huge diff compared with the original .rst

@menshikh-iv
Copy link
Contributor

@aneesh-joshi that's expected: if you looking into commit history, you'll see a huge diff between this 2 files.

Here you need to do some manual work ("join" both to one file).

@aneesh-joshi
Copy link
Contributor Author

@menshikh-iv
If you see the diff, there is a very large change. Editing it manually will be tedious.
Is there a version mismatch or config mismatch in the pandocs?
Can i just keep the old version of the rst?

@menshikh-iv
Copy link
Contributor

@aneesh-joshi okay, but please raise an issue about syncing .rst and .md

@aneesh-joshi
Copy link
Contributor Author

@menshikh-iv
No problem. I fixed it (I think?).

README.rst Outdated Show resolved Hide resolved
@aneesh-joshi
Copy link
Contributor Author

ping @menshikh-iv @piskvorky

@menshikh-iv
Copy link
Contributor

Thanks @aneesh-joshi 🚀

@menshikh-iv menshikh-iv changed the title Add limitations of Bounter. Addresses #36 Document limitations of Bounter Jan 17, 2019
@menshikh-iv menshikh-iv merged commit 98e6ba6 into piskvorky:master Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants