Twitter::Text - Perl implementation of the twitter-text parsing library
use Twitter::Text;
$result = parse_tweet('Hello world こんにちは世界');
print $result->{valid} ? 'valid tweet' : 'invalid tweet';
Twitter::Text is a Perl implementation of the twitter-text parsing library.
This library does not implement auto-linking and hit highlighting.
Please refer Implementation status for latest status.
All functions below are exported by default.
$hashtags = extract_hashtags($text);
Returns an array reference of extracted hashtag string from $text
.
$hashtags_with_indices = extract_hashtags_with_indices($text, [\%options]);
Returns an array reference of hash reference of extracted hashtag from $text
.
Each hash reference consists of hashtag
(hashtag string) and indices
(range of hashtag).
$screen_names = extract_mentioned_screen_names($text);
Returns an array reference of exctacted screen name string from $text
.
$screen_names_with_indices = extract_mentioned_screen_names_with_indices($text);
Returns an array reference of hash reference of extracted screen name or list from $text
.
Each hash reference consists of screen_name
(screen name string) and indices
(range of screen name).
$mentions_or_lists_with_indices = extract_mentions_or_lists_with_indices($text);
Returns an array reference of hash reference of extracted screen name from $text
.
Each hash reference consists of screen_name
(screen name string) and indices
(range of screen name or list). If it is a list, the hash reference also contains list_slug
item.
$urls = extract_urls($text);
Returns an array reference of extracted URL string from $text
.
$urls = extract_urls_with_indices($text, [\%options]);
Returns an array reference of hash reference of extracted URL from $text
.
Each hash reference consists of url
(URL string) and indices
(range of screen name).
$parse_result = parse_tweet($text, [\%options]);
The parse_tweet
function takes a $text
string and optional \%options
parameter and returns a hash reference with following values:
-
weighted_length
The overall length of the tweet with code points weighted per the ranges defined in the configuration file.
-
permillage
Indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.
-
valid
Indicates if input text length corresponds to a valid result.
-
display_range_start
,display_range_end
An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.
-
valid_range_start
,valid_range_end
An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.
use Data::Dumper;
use Twitter::Text;
$result = parse_tweet('Hello world こんにちは世界');
print Dumper($result);
# $VAR1 = {
# 'weighted_length' => 33
# 'permillage' => 117,
# 'valid' => 1,
# 'display_range_start' => 0,
# 'display_range_end' => 32,
# 'valid_range_start' => 0,
# 'valid_range_end' => 32,
# };
$valid = is_valid_hashtag($hashtag);
Validate $hashtag
is a valid hashtag and returns a boolean value that indicates if given argument is valid.
$valid = is_valid_list($username_list);
Validate $username_list
is a valid @username/list and returns a boolean value that indicates if given argument corresponds to a valid result.
$valid = is_valid_url($url, [unicode_domains => 1, require_protocol => 1]);
Validate $url
is a valid URL and returns a boolean value that indicates if given argument is valid.
If unicode_domains
argument is a truthy value, validate $url
is a valid URL with Unicode characters. (default: true)
If require_protocol
argument is a truthy value, validation requires a protocol of URL. (default: true)
$valid = is_valid_username($username);
Validate $username
is a valid username for Twitter and returns a boolean value that indicates if given argument is valid.
twitter-text. Implementation of Twitter::Text (this library) is heavily based on Ruby implementation of twitter-text.
https://developer.twitter.com/en/docs/counting-characters
Copyright (C) Twitter, Inc and other contributors
Copyright (C) utgwkk.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
utgwkk [email protected]