diff --git a/.markdown-lint.yml b/.markdown-lint.yml new file mode 100644 index 0000000..295074e --- /dev/null +++ b/.markdown-lint.yml @@ -0,0 +1,6 @@ +# MD024/no-duplicate-heading : Multiple headings with the same content : https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md024.md +MD024: + # Only check sibling headings (default is false) + # Set to true to conform to the Keep a Changelog format + # See also https://github.com/olivierlacan/keep-a-changelog/issues/274#issuecomment-484065486 + siblings_only: true diff --git a/.readthedocs.yaml b/.readthedocs.yml similarity index 100% rename from .readthedocs.yaml rename to .readthedocs.yml diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..31acec2 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,237 @@ +# Changelog + +All notable changes to this project will be documented here. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). +This project has adhered to +[Semantic Versioning](https://semver.org/spec/v2.0.0.html) since version 3.0.0. + +## [Unreleased] + +### Added + +- (planned: Add support for Python 3.13) +- Add Read the Docs documentation (). +- (planned: Document benchmark results + ()). + +### Changed + +- Change the format of the changelog to conform to the Keep a Changelog + standard. + +## [4.1.0] - 2024-01-09 + +### Added + +- Add support for Python 3.12. + +### Fixed + +- Fix issues with Bazel by changing the directory structure of the project + (). +- Fix incorrect type hints (). +- Fix invalid results on s390x when the arg `x64arch` of `hash64` or + `hash_bytes` is set to `False` (). + +## [4.0.1] - 2023-07-14 + +### Changed + +- Refactor the project structure (). + +### Fixed + +- Fix incorrect type hints. + +## [4.0.0] - 2023-05-22 + +The major version bump is due to the backward incompatible changes. + +### Added + +- Add experimental support for `hashlib`-compliant hasher classes + (). Note that they are not yet + fully tuned for performance. +- Add support for type hints (). +- Add wheels for more platforms (`musllinux`, `s390x`, `win_arm64`, and + `macosx_universal2`). +- Add a code of conduct (the ACM Code of Ethics and Professional Conduct). + +### Changed + +- Switch license from CC0 to MIT (). + +### Removed + +- Drop support for Python 3.7, as it will reach the end of life on 2023-06-27. +- Backward incompatible changes: + - A hash function now returns the same value under big-endian platforms as + that under little-endian ones (). + - Remove the `__version__` constant from the module + (). Use `importlib.metadata` + instead. + +## [3.1.0] - 2023-03-24 + +### Added + +- Add support for Python 3.10 and 3.11. Thanks + [wouter bolsterlee](https://github.com/wbolster) and + [Dušan Nikolić](https://github.com/n-dusan)! +- Add support for 32-bit architectures such as `i686` and `armv7l`. From now on, + `hash` and `hash_from_buffer` on these architectures will generate the same + hash values as those on other environments. Thanks + [Danil Shein](https://github.com/dshein-alt)! +- In relation to the above, `manylinux2014_i686` wheels are now available. +- Support for hashing huge data (>16GB). Thanks + [arieleizenberg](https://github.com/arieleizenberg)! + +### Removed + +- Drop support for Python 3.6; remove legacy code for Python 2.x at the source + code level. + +## [3.0.0] - 2021-02-23 + +### Added + +- Python wheels are now available, thanks to the power of + [cibuildwheel](https://github.com/joerick/cibuildwheel). + - Supported platforms are `manylinux1_x86_64`, `manylinux2010_x86_64`, + `manylinux2014_aarch64`, `win32`, `win_amd64`, `macosx_10_9_x86_64`, and + `macosx_11_0_arm64` (Apple Silicon). +- Add support for newer macOS environments. Thanks + [Matthew Honnibal](https://github.com/honnibal)! +- Add support for Python 3.7, 3.8, and 3.9. + +### Changed + +- Migrate CI from Travis CI and AppVeyor to GitHub Actions. + +### Removed + +- Drop support for Python 2.7, 3.3, 3.4, and 3.5. + +## [2.5.1] - 2017-10-31 + +### Fixed + +- Bugfix for `hash_bytes`. Thanks [doozr](https://github.com/doozr)! + +## [2.5] - 2017-10-28 + +### Added + +- Add `hash_from_buffer`. Thanks [Dimitri Vorona](https://github.com/alendit)! +- Add a keyword argument `signed`. + +## [2.4] - 2017-05-27 + +### Added + +- Support seeds with 32-bit unsigned integers; thanks + [Alexander Maznev](https://github.com/pik)! +- Support 64-bit data (under 64-bit environments) +- Add unit testing and continuous integration with Travis CI and AppVeyor. + +### Fixed + +- Fix compile errors for Python 3.6 under Windows systems. + +## [2.3.2] - 2017-05-26 + +### Changed + +- Relicensed from public domain to CC0-1.0. + +## [2.3.1] - 2015-06-07 + +### Fixed + +- Fix compile errors for gcc >=5. + +## [2.3] - 2013-12-08 + +The first two commits are from [Derek Wilson](https://github.com/underrun). +Thanks! + +### Added + +- Add `hash128`, which returns a 128-bit signed integer. + +### Fixed + +- Fix a misplaced operator which could cause memory leak in a rare condition. +- Fix a malformed value to a Python/C API function which may cause runtime + errors in recent Python 3.x versions. + +## [2.2] - 2013-03-03 + +### Added + +- Improve portability to support systems with old gcc (version < 4.4) such as + CentOS/RHEL 5.x. (Commit from + [Micha Gorelick](https://github.com/mynameisfiber). Thanks!) + +## [2.1] - 2013-02-25 + +### Added + +- Add `__version__` constant. Check if it exists when the following revision + matters for your application. + +### Changed + +- Incorporate the revision r147, which includes robustness improvement and minor + tweaks. + +Beware that due to this revision, **the result of 32-bit version of 2.1 is NOT +the same as that of 2.0**. E.g.,: + +```shell +>>> mmh3.hash("foo") # in mmh3 2.0 +-292180858 +>>> mmh3.hash("foo") # in mmh3 2.1 +-156908512 +``` + +The results of hash64 and hash_bytes remain unchanged. Austin Appleby, the +author of Murmurhash, ensured this revision was the final modification to +MurmurHash3's results and any future changes would be to improve performance +only. + +## [2.0] - 2011-06-07 + +### Added + +- Support both Python 2.7 and 3.x. + +### Changed + +- Change the module interface. + +## [1.0] - 2011-04-27 + +### Added + +- As + [Softpedia collected mmh3 1.0 on April 27, 2011](https://web.archive.org/web/20110430172027/https://linux.softpedia.com/get/Programming/Libraries/mmh3-68314.shtml), + it must have been uploaded to PyPI on or slightly before this date. + +[unreleased]: https://github.com/hajimes/mmh3/compare/v4.1.0...HEAD +[4.1.0]: https://github.com/hajimes/mmh3/compare/v4.0.1...v4.1.0 +[4.0.1]: https://github.com/hajimes/mmh3/compare/v4.0.0...v4.0.1 +[4.0.0]: https://github.com/hajimes/mmh3/compare/v3.1.0...v4.0.0 +[3.1.0]: https://github.com/hajimes/mmh3/compare/v3.0.0...v3.1.0 +[3.0.0]: https://github.com/hajimes/mmh3/compare/v2.5.1...v3.0.0 +[2.5.1]: https://github.com/hajimes/mmh3/compare/v2.5...v2.5.1 +[2.5]: https://github.com/hajimes/mmh3/compare/v2.4...v2.5 +[2.4]: https://github.com/hajimes/mmh3/compare/v2.3.2...v2.4 +[2.3.2]: https://github.com/hajimes/mmh3/compare/v2.3.1...v2.3.2 +[2.3.1]: https://github.com/hajimes/mmh3/compare/v2.3...v2.3.1 +[2.3]: https://github.com/hajimes/mmh3/compare/v2.2...v2.3 +[2.2]: https://github.com/hajimes/mmh3/compare/v2.1...v2.2 +[2.1]: https://github.com/hajimes/mmh3/compare/v2.0...v2.1 +[2.0]: https://github.com/hajimes/mmh3/releases/tag/v2.0 +[1.0]: https://web.archive.org/web/20110430172027/https://linux.softpedia.com/get/Programming/Libraries/mmh3-68314.shtml diff --git a/README.md b/README.md index e375f69..efba441 100644 --- a/README.md +++ b/README.md @@ -9,13 +9,23 @@ [![Total Downloads](https://static.pepy.tech/badge/mmh3)](https://pepy.tech/project/mmh3?versions=*&versions=4.*&versions=3.*&versions=2.*) [![Recent Downloads](https://static.pepy.tech/badge/mmh3/month)](https://pepy.tech/project/mmh3?versions=*&versions=4.*&versions=3.*&versions=2.*) -mmh3 is a Python extension for [MurmurHash (MurmurHash3)](https://en.wikipedia.org/wiki/MurmurHash), a set of fast and robust non-cryptographic hash functions invented by Austin Appleby. +mmh3 is a Python extension for +[MurmurHash (MurmurHash3)](https://en.wikipedia.org/wiki/MurmurHash), a set of +fast and robust non-cryptographic hash functions invented by Austin Appleby. -Combined with probabilistic techniques like a [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter), [MinHash](https://en.wikipedia.org/wiki/MinHash), and [feature hashing](https://en.wikipedia.org/wiki/Feature_hashing), mmh3 allows you to develop high-performance systems in fields such as data mining, machine learning, and natural language processing. +Combined with probabilistic techniques like a +[Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter), +[MinHash](https://en.wikipedia.org/wiki/MinHash), and +[feature hashing](https://en.wikipedia.org/wiki/Feature_hashing), mmh3 allows +you to develop high-performance systems in fields such as data mining, machine +learning, and natural language processing. -Another common use of mmh3 is to [calculate favicon hashes](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a) used by [Shodan](https://www.shodan.io), the world's first IoT search engine. +Another common use of mmh3 is to +[calculate favicon hashes](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a) +used by [Shodan](https://www.shodan.io), the world's first IoT search engine. -This page offers a quick start guide. For more detailed information, see the [documentation](https://mmh3.readthedocs.io/en/latest/). +This page offers a quick start guide. For more detailed information, see the +[documentation](https://mmh3.readthedocs.io/en/latest/). ## How to use @@ -33,7 +43,7 @@ Quickstart: >>> import mmh3 >>> mmh3.hash("foo") # returns a 32-bit signed int -156908512 ->>> mmh3.hash("foo", 42) # uses 42 as a seed +>>> mmh3.hash("foo", 42) # uses 42 as the seed -1322301282 >>> mmh3.hash("foo", signed=False) # returns a 32-bit unsigned int 4138058784 @@ -42,15 +52,15 @@ Quickstart: Other functions: ```shell ->>> mmh3.hash64("foo") # two 64 bit signed ints (by using the 128-bit algorithm as its backend) +>>> mmh3.hash64("foo") # two 64-bit signed ints using the 128-bit algorithm (-2129773440516405919, 9128664383759220103) ->>> mmh3.hash64("foo", signed=False) # two 64 bit unsigned ints +>>> mmh3.hash64("foo", signed=False) # two 64-bit unsigned ints (16316970633193145697, 9128664383759220103) ->>> mmh3.hash128("foo", 42) # 128 bit unsigned int +>>> mmh3.hash128("foo", 42) # 128-bit unsigned int 215966891540331383248189432718888555506 ->>> mmh3.hash128("foo", 42, signed=True) # 128 bit signed int +>>> mmh3.hash128("foo", 42, signed=True) # 128-bit signed int -124315475380607080215185174712879655950 ->>> mmh3.hash_bytes("foo") # 128 bit value as bytes +>>> mmh3.hash_bytes("foo") # 128-bit value as bytes 'aE\xf5\x01W\x86q\xe2\x87}\xba+\xe4\x87\xaf~' >>> import numpy as np >>> a = np.zeros(2 ** 32, dtype=np.int8) @@ -58,9 +68,11 @@ Other functions: b'V\x8f}\xad\x8eNM\xa84\x07FU\x9c\xc4\xcc\x8e' ``` -Beware that `hash64` returns **two** values, because it uses the 128-bit version of MurmurHash3 as its backend. +Beware that `hash64` returns **two** values, because it uses the 128-bit version +of MurmurHash3 as its backend. -`hash_from_buffer` hashes byte-likes without memory copying. The method is suitable when you hash a large memory-view such as `numpy.ndarray`. +`hash_from_buffer` hashes byte-likes without memory copying. The method is +suitable when you hash a large memory-view such as `numpy.ndarray`. ```shell >>> mmh3.hash_from_buffer(numpy.random.rand(100)) @@ -69,7 +81,9 @@ Beware that `hash64` returns **two** values, because it uses the 128-bit version 3812874078 ``` -`hash64`, `hash128`, and `hash_bytes` have the third argument for architecture optimization (keyword arg: `x64arch`). Use True for x64 and False for x86 (default: True): +`hash64`, `hash128`, and `hash_bytes` have the third argument for architecture +optimization (keyword arg: `x64arch`). Use True for x64 and False for x86 +(default: True): ```shell >>> mmh3.hash64("foo", 42, True) @@ -78,11 +92,19 @@ Beware that `hash64` returns **two** values, because it uses the 128-bit version ### `hashlib`-style hashers -`mmh3` implements hashers whose interfaces are similar to `hashlib` in the standard library: `mmh3_32()` for 32 bit hashing, `mmh3_x64_128()` for 128 bit hashing optimized for x64 architectures, and `mmh3_x86_128()` for 128 bit hashing optimized for x86 architectures. +`mmh3` implements hashers whose interfaces are similar to `hashlib` in the +standard library: `mmh3_32()` for 32 bit hashing, `mmh3_x64_128()` for 128 bit +hashing optimized for x64 architectures, and `mmh3_x86_128()` for 128 bit +hashing optimized for x86 architectures. -In addition to the standard `digest()` method, each hasher has `sintdigest()`, which returns a signed integer, and `uintdigest()`, which returns an unsigned integer. 128 bit hashers also have `stupledigest()` and `utupledigest()` which return two 64 bit integers. +In addition to the standard `digest()` method, each hasher has `sintdigest()`, +which returns a signed integer, and `uintdigest()`, which returns an unsigned +integer. 128 bit hashers also have `stupledigest()` and `utupledigest()` which +return two 64 bit integers. -Note that as of version 4.1.0, the implementation is still experimental and its performance can be unsatisfactory (especially `mmh3_x86_128()`). Also, `hexdigest()` is not supported. Use `digest().hex()` instead. +Note that as of version 4.1.0, the implementation is still experimental and its +performance can be unsatisfactory (especially `mmh3_x86_128()`). Also, +`hexdigest()` is not supported. Use `digest().hex()` instead. ```shell >>> import mmh3 @@ -107,52 +129,78 @@ b'\x82_n\xdd \xac\xb6j\xef\x99\xb1e\xc4\n\xc9\xfd' ## Changelog -### 4.1.0 (2024-01-09) +See [Changelog](https://mmh3.readthedocs.io/en/latest/changelog.html) for the +complete changelog. + +### [Unreleased] + +#### Added + +- (planned: Add support for Python 3.13) +- Add Read the Docs documentation (). +- (planned: Document benchmark results + ()). + +#### Changed + +- Change the format of the changelog to conform to the Keep a Changelog + standard. + +### [4.1.0] - 2024-01-09 + +#### Added - Add support for Python 3.12. -- Change the project structure to fix issues when using Bazel (). + +#### Fixed + +- Fix issues with Bazel by changing the directory structure of the project + (). - Fix incorrect type hints (). -- Fix invalid results on s390x when the arg `x64arch` of `hash64` or `hash_bytes` is set to `False` (). +- Fix invalid results on s390x when the arg `x64arch` of `hash64` or + `hash_bytes` is set to `False` (). -### 4.0.1 (2023-07-14) +### [4.0.1] - 2023-07-14 -- Fix incorrect type hints. -- Refactor the project structure (). +#### Changed -### 4.0.0 (2023-05-22) +- Refactor the project structure (). -- Add experimental support for `hashlib`-compliant hasher classes (). Note that they are not yet fully tuned for performance. -- Add support for type hints (). -- Add wheels for more platforms (`musllinux`, `s390x`, `win_arm64`, and `macosx_universal2`). -- Drop support for Python 3.7, as it will reach the end of life on 2023-06-27. -- Switch license from CC0 to MIT (). -- Add a code of conduct (the ACM Code of Ethics and Professional Conduct). -- Backward incompatible changes: - - A hash function now returns the same value under big-endian platforms as that under little-endian ones (). - - Remove the `__version__` constant from the module (). Use `importlib.metadata` instead. +#### Fixed -See [Changelog](https://mmh3.readthedocs.io/en/latest/changelog.html) for the complete changelog. +- Fix incorrect type hints. ## License -[MIT](https://github.com/hajimes/mmh3/blob/master/LICENSE), unless otherwise noted within a file. +[MIT](https://github.com/hajimes/mmh3/blob/master/LICENSE), unless otherwise +noted within a file. ## Known Issues ### Getting different results from other MurmurHash3-based libraries -By default, mmh3 returns **signed** values for 32-bit and 64-bit versions and **unsigned** values for `hash128`, due to historical reasons. Please use the keyword argument `signed` to obtain a desired result. +By default, mmh3 returns **signed** values for 32-bit and 64-bit versions and +**unsigned** values for `hash128`, due to historical reasons. Please use the +keyword argument `signed` to obtain a desired result. -From version 4.0.0, `mmh3` returns the same value under big-endian platforms -as that under little-endian ones, while the original C++ library is endian-sensitive. If you need to obtain the original-compliant results under big-endian environments, please use version 3.\*. +From version 4.0.0, `mmh3` returns the same value under big-endian platforms as +that under little-endian ones, while the original C++ library is +endian-sensitive. If you need to obtain the original-compliant results under +big-endian environments, please use version 3.\*. -For compatibility with [Google Guava (Java)](https://github.com/google/guava), see . +For compatibility with [Google Guava (Java)](https://github.com/google/guava), +see +. -For compatibility with [murmur3 (Go)](https://pkg.go.dev/github.com/spaolacci/murmur3), see . +For compatibility with +[murmur3 (Go)](https://pkg.go.dev/github.com/spaolacci/murmur3), see +. ### Unexpected results when given non 32-bit seeds -Version 2.4 changed the type of seeds from signed 32-bit int to unsigned 32-bit int. The resulting values with signed seeds still remain the same as before, as long as they are 32-bit. +Version 2.4 changed the type of seeds from signed 32-bit int to unsigned 32-bit +int. The resulting values with signed seeds still remain the same as before, as +long as they are 32-bit. ```shell >>> mmh3.hash("aaaa", -1756908916) # signed representation for 0x9747b28c @@ -161,7 +209,8 @@ Version 2.4 changed the type of seeds from signed 32-bit int to unsigned 32-bit 1519878282 ``` -Be careful so that these seeds do not exceed 32-bit. Unexpected results may happen with invalid values. +Be careful so that these seeds do not exceed 32-bit. Unexpected results may +happen with invalid values. ```shell >>> mmh3.hash("foo", 2 ** 33) @@ -176,7 +225,9 @@ See [Contributing](https://mmh3.readthedocs.io/en/latest/CONTRIBUTING.html). ## Authors -MurmurHash3 was originally developed by Austin Appleby and distributed under public domain [https://github.com/aappleby/smhasher](https://github.com/aappleby/smhasher). +MurmurHash3 was originally developed by Austin Appleby and distributed under +public domain +[https://github.com/aappleby/smhasher](https://github.com/aappleby/smhasher). Ported and modified for Python by Hajime Senuma. @@ -184,28 +235,58 @@ Ported and modified for Python by Hajime Senuma. ### Tutorials (High-Performance Computing) -The following textbooks and tutorials are great sources to learn how to use mmh3 (and other hash algorithms in general) for high-performance computing. - -- Chapter 11: _Using Less Ram_ in Micha Gorelick and Ian Ozsvald. 2014. _High Performance Python: Practical Performant Programming for Humans_. O'Reilly Media. [ISBN: 978-1-4493-6159-4](https://www.amazon.com/dp/1449361595). - - 2nd edition of the above (2020). [ISBN: 978-1492055020](https://www.amazon.com/dp/1492055026). -- Max Burstein. February 2, 2013. _[Creating a Simple Bloom Filter](http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/)_. -- Duke University. April 14, 2016. _[Efficient storage of data in memory](http://people.duke.edu/~ccc14/sta-663-2016/20B_Big_Data_Structures.html)_. -- Bugra Akyildiz. August 24, 2016. _[A Gentle Introduction to Bloom Filter](https://www.kdnuggets.com/2016/08/gentle-introduction-bloom-filter.html)_. KDnuggets. +The following textbooks and tutorials are great sources to learn how to use mmh3 +(and other hash algorithms in general) for high-performance computing. + +- Chapter 11: _Using Less Ram_ in Micha Gorelick and Ian Ozsvald. 2014. _High + Performance Python: Practical Performant Programming for Humans_. O'Reilly + Media. [ISBN: 978-1-4493-6159-4](https://www.amazon.com/dp/1449361595). + - 2nd edition of the above (2020). + [ISBN: 978-1492055020](https://www.amazon.com/dp/1492055026). +- Max Burstein. February 2, 2013. + _[Creating a Simple Bloom Filter](http://www.maxburstein.com/blog/creating-a-simple-bloom-filter/)_. +- Duke University. April 14, 2016. + _[Efficient storage of data in memory](http://people.duke.edu/~ccc14/sta-663-2016/20B_Big_Data_Structures.html)_. +- Bugra Akyildiz. August 24, 2016. + _[A Gentle Introduction to Bloom Filter](https://www.kdnuggets.com/2016/08/gentle-introduction-bloom-filter.html)_. + KDnuggets. ### Tutorials (Internet of Things) -[Shodan](https://www.shodan.io), the world's first [IoT](https://en.wikipedia.org/wiki/Internet_of_things) search engine, uses MurmurHash3 hash values for [favicons](https://en.wikipedia.org/wiki/Favicon) (icons associated with web pages). [ZoomEye](https://www.zoomeye.org) follows Shodan's convention. -[Calculating these values with mmh3](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a) is useful for OSINT and cybersecurity activities. - -- Jan Kopriva. April 19, 2021. _[Hunting phishing websites with favicon hashes](https://isc.sans.edu/diary/Hunting+phishing+websites+with+favicon+hashes/27326)_. SANS Internet Storm Center. -- Nikhil Panwar. May 2, 2022. _[Using Favicons to Discover Phishing & Brand Impersonation Websites](https://bolster.ai/blog/how-to-use-favicons-to-find-phishing-websites)_. Bolster. -- Faradaysec. July 25, 2022. _[Understanding Spring4Shell: How used is it?](https://faradaysec.com/understanding-spring4shell/)_. Faraday Security. -- Debjeet. August 2, 2022. _[How To Find Assets Using Favicon Hashes](https://payatu.com/blog/favicon-hash/)_. Payatu. +[Shodan](https://www.shodan.io), the world's first +[IoT](https://en.wikipedia.org/wiki/Internet_of_things) search engine, uses +MurmurHash3 hash values for [favicons](https://en.wikipedia.org/wiki/Favicon) +(icons associated with web pages). [ZoomEye](https://www.zoomeye.org) follows +Shodan's convention. +[Calculating these values with mmh3](https://gist.github.com/yehgdotnet/b9dfc618108d2f05845c4d8e28c5fc6a) +is useful for OSINT and cybersecurity activities. + +- Jan Kopriva. April 19, 2021. + _[Hunting phishing websites with favicon hashes](https://isc.sans.edu/diary/Hunting+phishing+websites+with+favicon+hashes/27326)_. + SANS Internet Storm Center. +- Nikhil Panwar. May 2, 2022. + _[Using Favicons to Discover Phishing & Brand Impersonation Websites](https://bolster.ai/blog/how-to-use-favicons-to-find-phishing-websites)_. + Bolster. +- Faradaysec. July 25, 2022. + _[Understanding Spring4Shell: How used is it?](https://faradaysec.com/understanding-spring4shell/)_. + Faraday Security. +- Debjeet. August 2, 2022. + _[How To Find Assets Using Favicon Hashes](https://payatu.com/blog/favicon-hash/)_. + Payatu. ### Similar libraries -- : mmh3 in pure python (Fredrik Kihlander and Swapnil Gusani) -- : Python bindings for CityHash (Eugene Scherba) -- : Python bindings for FarmHash (Veelion Chong) -- : Python bindings for MetroHash (Eugene Scherba) -- : Python bindings for xxHash (Yue Du) +- : mmh3 in pure python (Fredrik Kihlander + and Swapnil Gusani) +- : Python bindings for CityHash + (Eugene Scherba) +- : Python bindings for FarmHash + (Veelion Chong) +- : Python bindings for MetroHash + (Eugene Scherba) +- : Python bindings for xxHash (Yue + Du) + +[unreleased]: https://github.com/hajimes/mmh3/compare/v4.1.0...HEAD +[4.1.0]: https://github.com/hajimes/mmh3/compare/v4.0.1...v4.1.0 +[4.0.1]: https://github.com/hajimes/mmh3/compare/v4.0.0...v4.0.1 diff --git a/docs/CODE_OF_CONDUCT.md b/docs/CODE_OF_CONDUCT.md index 7d3a275..01f7425 100644 --- a/docs/CODE_OF_CONDUCT.md +++ b/docs/CODE_OF_CONDUCT.md @@ -1,3 +1,5 @@ # Code of Conduct -This project adheres the ACM Code of Ethics and Professional Conduct ([https://www.acm.org/code-of-ethics](https://www.acm.org/code-of-ethics)), as specified in the version adopted in June 22nd, 2018. +This project adheres the ACM Code of Ethics and Professional Conduct +([https://www.acm.org/code-of-ethics](https://www.acm.org/code-of-ethics)), as +specified in the version adopted in June 22nd, 2018. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index e724dae..54b091e 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -2,17 +2,18 @@ Thank you for your interest in contributing to the `mmh3` project! -Read [README](https://github.com/hajimes/mmh3/blob/master/README.md) to get an overview of the `mmh3` project, -and follow our [Code of Conduct](./CODE_OF_CONDUCT) -(ACM Code of Ethics and Professional Conduct). +Read [README](https://github.com/hajimes/mmh3/blob/master/README.md) to get an +overview of the `mmh3` project, and follow our +[Code of Conduct](./CODE_OF_CONDUCT) (ACM Code of Ethics and Professional +Conduct). ## Issues -You can contribute to our project by -simply submitting a bug report or a feature suggestion -through the [issue tracker](https://github.com/hajimes/mmh3/issues). +You can contribute to our project by simply submitting a bug report or a feature +suggestion through the [issue tracker](https://github.com/hajimes/mmh3/issues). -Before submitting a new issue, it's a good idea to check [Known Issues section on README](https://github.com/hajimes/mmh3#known-issues). +Before submitting a new issue, it's a good idea to check +[Known Issues section on README](https://github.com/hajimes/mmh3#known-issues). ## Maintaining and developing the project @@ -21,25 +22,32 @@ Before submitting a new issue, it's a good idea to check [Known Issues section o As of 4.1.0, the layout of the project is as follows: - `src/mmh3` - - `mmh3module.c`: the main file that serves as the interface between Python and the MurmurHash3 c implementations. - - `murmurhash.c`: implementations of the MurmurHash3 family. Auto-generated from Austin Appleby's original code. DO NOT edit this file manually. See [README in the util directory](https://github.com/hajimes/mmh3/blob/master/util/README.md) for details. - - `murmurhash.h`: headers and macros for MurmurHash3. Auto-generated from `util/refresh.py`. DO NOT edit this file manually. - - `hashlib.h`: taken from [CPython's code base](https://github.com/python/cpython/blob/9ce0f48e918860ffa32751a85b0fe7967723e2e3/Modules/hashlib.h). + - `mmh3module.c`: the main file that serves as the interface between Python + and the MurmurHash3 c implementations. + - `murmurhash.c`: implementations of the MurmurHash3 family. Auto-generated + from Austin Appleby's original code. DO NOT edit this file manually. See + [README in the util directory](https://github.com/hajimes/mmh3/blob/master/util/README.md) + for details. + - `murmurhash.h`: headers and macros for MurmurHash3. Auto-generated from + `util/refresh.py`. DO NOT edit this file manually. + - `hashlib.h`: taken from + [CPython's code base](https://github.com/python/cpython/blob/9ce0f48e918860ffa32751a85b0fe7967723e2e3/Modules/hashlib.h). - `util` - - `refresh.py`: file that generates `src/mmh3/murmurhash.c` and `src/mmh3/murmurhash.h` from the original MurmurHash3 C++ code. Edit this file to modify the contents of these files. + - `refresh.py`: file that generates `src/mmh3/murmurhash.c` and + `src/mmh3/murmurhash.h` from the original MurmurHash3 C++ code. Edit this + file to modify the contents of these files. ### Testing Before submitting your changes, make sure to run the project's tests to ensure -that everything is working as expected. -At least you should run `pytest` and `mypy --strict tests` -from the project root directory. +that everything is working as expected. At least you should run `pytest` and +`mypy --strict tests` from the project root directory. #### (Optional) Testing on s390x -When you have modified the code in a way which may cause endian issues, you may want -to locally test on s390x, the only big-endian platform officially supported by -Python. +When you have modified the code in a way which may cause endian issues, you may +want to locally test on s390x, the only big-endian platform officially supported +by Python. [_Emulating a big-endian s390x with QEMU_](https://til.simonwillison.net/docker/emulate-s390x-with-qemu) by Simon Willison is a good introduction to Docker/QEMU settings for emulating diff --git a/docs/changelog.md b/docs/changelog.md deleted file mode 100644 index f2681f1..0000000 --- a/docs/changelog.md +++ /dev/null @@ -1,103 +0,0 @@ -# Changelog - -## 4.1.0 (2024-01-09) - -- Add support for Python 3.12. -- Change the project structure to fix issues when using Bazel (). -- Fix incorrect type hints (). -- Fix invalid results on s390x when the arg `x64arch` of `hash64` or `hash_bytes` is set to `False` (). - -## 4.0.1 (2023-07-14) - -- Fix incorrect type hints. -- Refactor the project structure (). - -## 4.0.0 (2023-05-22) - -- Add experimental support for `hashlib`-compliant hasher classes (). Note that they are not yet fully tuned for performance. -- Add support for type hints (). -- Add wheels for more platforms (`musllinux`, `s390x`, `win_arm64`, and `macosx_universal2`). -- Drop support for Python 3.7, as it will reach the end of life on 2023-06-27. -- Switch license from CC0 to MIT (). -- Add a code of conduct (the ACM Code of Ethics and Professional Conduct). -- Backward incompatible changes: - - A hash function now returns the same value under big-endian platforms as that under little-endian ones (). - - Remove the `__version__` constant from the module (). Use `importlib.metadata` instead. - -## 3.1.0 (2023-03-24) - -- Add support for Python 3.10 and 3.11. Thanks [wouter bolsterlee](https://github.com/wbolster) and [Dušan Nikolić](https://github.com/n-dusan)! -- Drop support for Python 3.6; remove legacy code for Python 2.x at the source code level. -- Add support for 32-bit architectures such as `i686` and `armv7l`. From now on, `hash` and `hash_from_buffer` on these architectures will generate the same hash values as those on other environments. Thanks [Danil Shein](https://github.com/dshein-alt)! -- In relation to the above, `manylinux2014_i686` wheels are now available. -- Support for hashing huge data (>16GB). Thanks [arieleizenberg](https://github.com/arieleizenberg)! - -## 3.0.0 (2021-02-23) - -- Python wheels are now available, thanks to the power of [cibuildwheel](https://github.com/joerick/cibuildwheel). - - Supported platforms are `manylinux1_x86_64`, `manylinux2010_x86_64`, `manylinux2014_aarch64`, `win32`, `win_amd64`, `macosx_10_9_x86_64`, and `macosx_11_0_arm64` (Apple Silicon). -- Add support for newer macOS environments. Thanks [Matthew Honnibal](https://github.com/honnibal)! -- Drop support for Python 2.7, 3.3, 3.4, and 3.5. -- Add support for Python 3.7, 3.8, and 3.9. -- Migrate CI from Travis CI and AppVeyor to GitHub Actions. - -## 2.5.1 (2017-10-31) - -- Bugfix for `hash_bytes`. Thanks [doozr](https://github.com/doozr)! - -## 2.5 (2017-10-28) - -- Add `hash_from_buffer`. Thanks [Dimitri Vorona](https://github.com/alendit)! -- Add a keyword argument `signed`. - -## 2.4 (2017-05-27) - -- Support seeds with 32-bit unsigned integers; thanks [Alexander Maznev](https://github.com/pik)! -- Support 64-bit data (under 64-bit environments) -- Fix compile errors for Python 3.6 under Windows systems. -- Add unit testing and continuous integration with Travis CI and AppVeyor. - -## 2.3.2 (2017-05-26) - -- Relicensed from public domain to CC0-1.0. - -## 2.3.1 (2015-06-07) - -- Fix compile errors for gcc >=5. - -## 2.3 (2013-12-08) - -- Add `hash128`, which returns a 128-bit signed integer. -- Fix a misplaced operator which could cause memory leak in a rare condition. -- Fix a malformed value to a Python/C API function which may cause runtime errors in recent Python 3.x versions. - -The first two commits are from [Derek Wilson](https://github.com/underrun). Thanks! - -## 2.2 (2013-03-03) - -- Improve portability to support systems with old gcc (version < 4.4) such as CentOS/RHEL 5.x. (Commit from [Micha Gorelick](https://github.com/mynameisfiber). Thanks!) - -## 2.1 (2013-02-25) - -- Add `__version__` constant. Check if it exists when the following revision matters for your application. -- Incorporate the revision r147, which includes robustness improvement and minor tweaks. - -Beware that due to this revision, **the result of 32-bit version of 2.1 is NOT the same as that of 2.0**. E.g.,: - -```shell ->>> mmh3.hash("foo") # in mmh3 2.0 --292180858 ->>> mmh3.hash("foo") # in mmh3 2.1 --156908512 -``` - -The results of hash64 and hash_bytes remain unchanged. Austin Appleby, the author of Murmurhash, ensured this revision was the final modification to MurmurHash3's results and any future changes would be to improve performance only. - -## 2.0 (2011-06-07) - -- Support both Python 2.7 and 3.x. -- Change the module interface. - -## 1.0 (<= 2011-04-27) - -- As [Softpedia collected mmh3 1.0 on April 27, 2011](https://web.archive.org/web/20110430172027/https://linux.softpedia.com/get/Programming/Libraries/mmh3-68314.shtml), it must have been uploaded to PyPI on or slightly before this date. diff --git a/docs/changelog_link.md b/docs/changelog_link.md new file mode 100644 index 0000000..59c0bd9 --- /dev/null +++ b/docs/changelog_link.md @@ -0,0 +1,5 @@ + + +```{include} ../CHANGELOG.md + +``` diff --git a/docs/conf.py b/docs/conf.py index 613e7b0..d4925df 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -1,5 +1,5 @@ # pylint: disable=C0114,C0103 -# # Configuration file for the Sphinx documentation builder. +# Configuration file for the Sphinx documentation builder. # # For the full list of built-in configuration values, see the documentation: # https://www.sphinx-doc.org/en/master/usage/configuration.html diff --git a/docs/index.rst b/docs/index.rst index 0019c18..74470dc 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,7 +14,7 @@ mmh3 is a Python extension for `MurmurHash (MurmurHash3) CODE_OF_CONDUCT Indices and tables diff --git a/util/README.md b/util/README.md index 5406d62..6f9a5e6 100644 --- a/util/README.md +++ b/util/README.md @@ -5,12 +5,14 @@ This directory contains C files that were generated from the ## Updating \_mmh3 -Try `git submodule update --init` to fetch Appleby's original SMHasher project as a github submodule. -Then, run the `refresh.py` script to generate PEP 7-compliant C code from the original project, instead of editing `murmurhash3.*` files manually. -Add transformation code to the `refresh.py` script to perform further edits. - -After file generation, use `clang-format` to format the generated code. -Try `clang-format -i src/mmh3/*.{c,h}` from the project's top-level directory. +Try `git submodule update --init` to fetch Appleby's original SMHasher project +as a github submodule. Then, run the `refresh.py` script to generate PEP +7-compliant C code from the original project, instead of editing `murmurhash3.*` +files manually. Add transformation code to the `refresh.py` script to perform +further edits. + +After file generation, use `clang-format` to format the generated code. Try +`clang-format -i src/mmh3/*.{c,h}` from the project's top-level directory. ## Local files