python tips (most 2.7)

python tips (most 2.7)

语法技巧

Print without new line

for i in range(5):
    # return only, no new line
    print( i, end="\r" )

for i in range(5):
    # no return, no new line
    print( i, end="" )

eval 字符串表达式求值

eval() 默认使用当前环境的名字空间，也可以带入自定义字典

ns=dict(x=10,y=20)
eval("x+y" , ns )

eval只能用来处理表达式, 对于代码段, 可以使用 exec()

获取变量x的内存地址

id(x)

for i, v 枚举

for i, item in enumerate(  iterable ):

min return both value and index

mport operator
>>> scores = [30, 10,20 ]
>>> min(enumerate(scores ), key=operator.itemgetter(1))
(1, 10)

数组排序

# .sort(), in-place sort, return None
autodances.sort( key = lambda x  :  x["time"] , reverse = False )

use cmp method . (PS. deprecated in python3 )

# python2 only
l.sort(cmp=lambda x,y:cmp( x.lower(), y.lower()  ))

字典排序 sorted

>>> d={"b":2, "a":3, "c":1}
>>> sorted(d)     #对 key 进行排序，输出一个key list
['a', 'b', 'c']      
>>> sorted(d.iteritems())     #对key 进行排序，返回 元组 list
[('a', 3), ('b', 2), ('c', 1)]
>>> sorted(d.iteritems() , key=lambda x:x[1])    # 对 值 进行排序， 返回 元组 list
[('c', 1), ('b', 2), ('a', 3)]
>>> sorted(d.iteritems() , key=lambda x:x[1] , reverse = True )
[('a', 3), ('b', 2), ('c', 1)]

迭代和组合

了解itertools模块：该模块对迭代和组合是非常有效的

>>> import itertools 
>>> iter = itertools.permutations([1,2,3]) 
>>> list(iter) 
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

bisect模块保持列表排序

这是一个免费的二分查找实现和快速插入有序序列的工具。你已將一个元素插入列表中, 而你不需要再次调用 sort() 来保持容器的排序, 因为这在长序列中这会非常昂贵.

>>> import bisect 
>>> bisect.insort(list, element)

url unescape

import HTMLParser
html_parser = HTMLParser.HTMLParser()
txt = html_parser.unescape(html)

import urllib
urllib.unquote(a)

# for python3
urllib.parse.unquote(url)

html escape

>>> print cgi.escape.__doc__
Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
    is also translated.

>>> import cgi
>>> cgi.escape( '>' )
'&gt;'
>>> cgi.escape( '"' , True )
'&quot;'

序列 ()

速度比列表快，可以作为字典关键字

自省的核心 getattr 函数

getattr(obj, name [ ,  default_method_return_if_not_exist ] )

example:

for i in  dir( obj ):
    method = getattr( obj, i   )
    print i , method

callable

methodList = [method for method in dir(object) if callable(getattr(object, method))]

dict get

从字典中获取一个值

if d.has_key('key'):
    print d['key']
else:
    print 'not found'

可以简化为:

print d.get('key', 'not found')

dict setdefault

dict 插入key-value时，如果key不存在，先初始化为默认值(一般用于value是list, dict 类型)

def addword2dict(word, pagenumber): 
    dict.setdefault(word, []).append(pagenumber)

或者直接使用 defaultdict

from collections import defaultdict
d = defaultdict( dict )  # default value is empty dict
d = defaultdict( lambda: 2 )  # default value is 2

dict insection

找出两个字典的交集

print "Intersects:", [k for k in some_dict if k in another_dict]

速度上取胜:

print "Intersects:", filter(another_dict.has_key, some_dict.keys())

dict key/value 反转

>>> m = {"a":1,"b":2,"c":3}
>>> dict( zip( m.values(), m.keys() ) )
{1: 'a', 2: 'b', 3: 'c'}

or

>>> {v:k for k,v in m.iteritems() }
{1: 'a', 2: 'b', 3: 'c'}

convert a list to dict

>>> a = [3,1,2,4]
>>> dict(  zip(  *[iter( a )] *2  )  )
{3: 1, 2: 4}

or more simple

i = iter(a)
dict(zip(i, i))   # you must use a single iterator

python2 unicode 判断

# python2
isinstance(u'a', unicode)

方法内全部局部变量

Python has a locals() function which gives you back a dictionary of local variables within the function

输出一个对象各个成员的名称和值

>>> g = lambda m: '\n'.join([ '%s=%s'%(k, repr(v)) for k, v in m.__dict__.iteritems() ])
>>> g(obj)

python 下划线变量

核心风格：避免用下划线作为变量名的开始。

_xxx
- 不能用'from module import *'导入
- 保护变量，意思是只有类对象和子类对象自己能访问到这些变量
__xxx
- 类中的私有变量名
- 私有成员，意思是只有类对象自己能访问，连子类对象也不能访问到这个数据
__xxx__
- 系统定义名字
- python里特殊方法专用的标识

re.sub : gourp '\number' followd normal numbers

re.sub(r'(foo)', r'\1123', 'foobar') 
=>
re.sub(r'(foo)', r'\g<1>123', 'foobar')

call super class constructor

def __init__(self) :
    super( self.__class__ , self).__init__()

numpy argmax tie breaking

b = np.array( [0,1,1] )
np.random.choice(np.where(b == b.max())[0])

or

np.random.choice(np.flatnonzero(b == b.max()))

b == b.max() will return an array of boolean, with values of true where items are max and values of false for other items
flatnonzero() will do to things: ignore the false values (nonzero part) then return indices of true values. In other words, you get an array with indices of items matching the max value
Finally, you pick random index of these

数字进制转换

10进制数字 => 2,8,16进制字符串

>>> bin(123) # 2
'0b1111011'
>>> oct(18) # 8
'0o22'
>>> hex(10) # 16
'0xa'

2,8,16进制字符串 ==> 10进制数字

>>> int('022',8) 
18

格式化数字为16进制字符串

>>> "%x" % 108
'6c'
>>>
>>> "%X" % 108
'6C'
>>>
>>> "%#X" % 108
'0X6C'
>>>
>>> "%#x" % 108
'0x6c'

字符处理

ascii列表 -> 字符串

good 1

import string
def f6(list):
    return string.joinfields(map(chr, list), "")

the best 1

import array
def f7(list):
    return array.array('B', list).tostring()

字符串编码相关

python2 unicode/str convert

string -> decode -> unicode
unicode -> encode -> string

char / ascii 互转

>>> print ord('a')
97
>>> print chr(97)
a

'2' == '\x32' == '\062'

python2 unichr / unicode string 互转

>>> print ord(u"我")
25105
>>> print unichr( 25105 )
我

python2 unicode -> special encoded string

unicodestring = u"Hello world"
utf8string = unicodestring.encode("utf-8")
asciistring = unicodestring.encode("ascii")
isostring = unicodestring.encode("ISO-8859-1")
utf16string = unicodestring.encode("utf-16")

python2 special encoded string -> unicode

plainstring1 = unicode(utf8string, "utf-8")
plainstring2 = unicode(asciistring, "ascii")
plainstring3 = unicode(isostring, "ISO-8859-1")
plainstring4 = unicode(utf16string, "utf-16")

unicode 可以使用 u"\uxxxx" 表示，但是当我们从某处获取 "\uxxxx"，并不能直接还原成 unicode，需要通过 "\uxxxx".decode("unicode-escape") 来转成 unicode，注意， xxxx 必须保证有4个，不足以0补全

反之，通过 uni.encode("unicode-escape") 来获得 "\uxxxx" 形式的字符串

convert '\n' to '\n'

>>> "\\n"
'\\n'
>>> "\\n".decode('string_escape')
'\n'

中文处理

改变脚本本地编码

有时候，print 打印某些unicode字符的时候，会报 UnicodeError

reload(sys)
sys.setdefaultencoding('utf8')

python 2.7 写带中文字符的文件

fp = codecs.open( target_sheet_name + '.txt'  , "w", "utf-8")
fp.write(jsonObj )
fp.close()

Json dump, indent + sort keys

json.dump( obj, fp, ensure_ascii=False , separators=(',',':') , indent=4, sort_keys=True  )

python2 获取中文字符长度

需要转成unicode字符

unicode_string = bytes.decode("utf-8")
print len(unicode_string)

encrypt

base64

import base64
# base 64 decode
data = base64.b64decode( data ) 
# base64 encode
result_data = base64.b64encode( result_data)

md5

>>> import md5
>>> m = md5.new()
>>> m.update("Nobody inspects")
>>> m.update(" the spammish repetition")
>>> m.digest()

CRC32 IEEE

# python
import binascii
binascii.crc32(b"hello world")

# go
import "hash/crc32"

crc32.ChecksumIEEE( []byte("hello world") ) )

Base58

import base58
base58.b58encode( raw_str )

import "github.com/btcsuite/btcutil/base58"
decoded := base58.Decode(b58_str)

Misc

try - except 打印错误

import traceback
traceback.print_exc()

或者

s=sys.exc_info()
print "Error '%s' happened on line %d" % (s[1],s[2].tb_lineno)

python 并行任务技巧

使用带有并发功能的map
Dummy是一个多进程包的完整拷贝
唯一不同的是，多进程包使用进程，而dummy使用线程
简言之，IO 密集型任务选择multiprocessing.dummy(多线程) ， CPU 密集型任务选择multiprocessing(多进程)

from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool( poolsize )
results = pool.map( func ,  param_set_list  )
pool.close()
pool.join()

profile

python -m cProfile  xxx.py

or

import profile
profile.run ( 'func_name')

强制浮点数运算

>>> from __future__ import division
>>> 1/2
0.5

float -> IEEE 754

online convert

# use numpy
>>> b = bin( np.float32( 0.15625 ).view(np.int32) )
>>> b
'0b111110001000000000000000000000'
>>> expo, mantissa =  b[ -23-8:-23 ] , b[ -23: ]
>>> print(expo, mantissa)
('b1111100', '01000000000000000000000')

int -> Binary

>>> np.binary_repr( 200, width=8)
'11001000'
>>> np.binary_repr( -1, width=8)
'11111111'

读取文件特定行

import linecache
#thefiepath             文件路径
#desired_line_number    整数，文件的特定行 
theline = linecache.getline(thefilepath, desired_line_number)

文件修改／创建时间

import   os,time 
time.ctime(os.stat( "d:/learn/flash.txt ").st_mtime)   #文件的修改时间 
time.ctime(os.stat( "d:/learn/flash.txt ").st_ctime)   #文件的创建时间

python 写只读文件

读之前：

os.chmod(_path,  stat.S_IREAD)

写之前:

os.chmod(_path, stat.S_IWRITE | stat.S_IREAD)

uninstall files via `python setup.py install`

python ./setup.py install --record install.txt

cat install.txt | xargs [sudo] rm -rf

enter interact mode after executing a python file

import code 

...
code.interact(local=locals())

add python module search path

src_path = os.path.dirname( os.path.abspath(  __file__ ) )
# add parent folder as search path
sys.path.append( os.path.normpath( os.path.join( src_path , ".." )) )

open file with both reading and writing

with open(filename, "r+") as f:
    data = f.read()
    f.seek(0)
    f.write(output)
    f.truncate()

read a large file

for large file, parse file line by line, but do NOT use readlines()

with open( ... ) as fp
  for line in fp:

basis of Datetime and Time

import datetime as dt
import time as tm

# time returns the current time in seconds since the Epoch. (January 1st, 1970)
tm.time()
# 1511154150.7125366

# Convert the timestamp to datetime.
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
# datetime.datetime(2017, 11, 20, 5, 3, 1, 695393)

# Handy datetime attributes:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second
# (2017, 11, 20, 5, 3, 1)

# timedelta is a duration expressing the difference between two dates.
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta
# datetime.timedelta(100)

# date.today returns the current local date.
today = dt.date.today()
today - delta # the date 100 days ago
# datetime.date(2017, 8, 12)
today > today-delta # compare dates
# True

seconds to readable date

# here , `t` is  millis
>>> datetime.datetime.utcfromtimestamp( t/1000 ).strftime('%Y-%m-%dT%H:%M:%SZ')
'2019-03-01T09:33:08Z'
>>> datetime.datetime.fromtimestamp( t/1000 ).strftime('%Y-%m-%dT%H:%M:%SZ')
'2019-03-01T17:33:08Z'   # local time
>>> datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]+"Z" 
'2019-09-19 07:56:37.326Z'

convert between seconds since the epoch and struct_time

Use	From	To
gmtime()	seconds since the epoch	struct_time in UTC
calendar.timegm()	struct_time in UTC	seconds since the epoch

Example:

>>> calendar.timegm(  ( 2020,1,1,0,0,0 )  )
1577836800
>>> time.gmtime( 1577836800  )
time.struct_time(tm_year=2020, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)

Determining application path in a Python EXE generated by pyInstaller

# determine if application is a script file or frozen exe
if getattr(sys, 'frozen', False):
    application_path = os.path.dirname(sys.executable)
elif __file__:
    application_path = os.path.dirname(__file__)

config_path = os.path.join(application_path, config_name)

subprocess

使用subprocess模块
- 这个模块比较复杂，可以对子进程做更多控制
- Popen is nonblocking. call and check_call are blocking
- Popen(cmd , shell=True, cwd= arg) .stdout.readlines()
  - stdout=subprocess.PIPE , 会等待执行完毕再返回，同时原来脚本的 print ，都会从 stdout read 出来
- 有些命令，并不是从标准 stdout 输出结果，可以:
  - res = subprocess.Popen( cmd , stdout=subprocess.PIPE , stdin=subprocess.PIPE, stderr=subprocess.STDOUT , shell=True ).stdout.read()
- TODO : subprocess Popen 指定 shell 执行路径
  - print Popen(cmd , stdout=subprocess.PIPE , shell=True, cwd= arg) ?
- 注：如果没有 pipe 通讯的需求，推荐的用法如下
```
subprocess.call( cmd.split() +  sys.argv[1:]  , stderr=subprocess.STDOUT ,shell=True )
```
- shell=True 是在 shell中执行，以便获取环境变量之类的设置
- 如果需要获取 exit code
```
child = subprocess.Popen( cmd , stdout=subprocess.PIPE , stdin=subprocess.PIPE, stderr=subprocess.STDOUT , shell=True )
streamdata = child.communicate()[0]
rc = child.returncode
```

Python3 subprocess.run

subprocess.run( [...] )

parent process will be blocked to wait child process to finish.
catputre_output=True to acquire the output

>>> import subprocess
>>> result = subprocess.run( ["rm", "does not exist"], capture_output=True, shell=True )
>>> result
CompletedProcess(args=['rm', 'does not exist'], returncode=1, stdout=b'', stderr=b'rm: does not exist: No such file or directory\n')
>>> print ( result.stdout )
b''
>>> print ( result.stderr.decode() ) # to utf8 by default
rm: does not exist: No such file or directory

python 解析 curl 命令获取的 json 字符串

result=`curl --silent  ... | python -c "import json,sys;obj=json.load(sys.stdin);print obj['anykey'];"`
echo result: $result

memoize decorator

implementation in Python Decorator Library

From python 3.2, functools.lru_cache

By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:

import functools

@functools.lru_cache(maxsize=None)
def fib(num):
    if num < 2:
        return num
    else:
        return fib(num-1) + fib(num-2)

Files

python_tips_1.md

Latest commit

History