Skip to content

Latest commit

 

History

History
1071 lines (740 loc) · 24.8 KB

python_tips_1.md

File metadata and controls

1071 lines (740 loc) · 24.8 KB

python tips (most 2.7)


语法技巧

Print without new line

for i in range(5):
    # return only, no new line
    print( i, end="\r" )
4
for i in range(5):
    # no return, no new line
    print( i, end="" )
01234

eval 字符串表达式求值

eval() 默认使用当前环境的名字空间,也可以带入自定义字典

ns=dict(x=10,y=20)
eval("x+y" , ns )

eval只能用来处理表达式, 对于 代码段, 可以使用 exec()

获取变量x的内存地址

id(x) 

for i, v 枚举

for i, item in enumerate(  iterable ):

min return both value and index

mport operator
>>> scores = [30, 10,20 ]
>>> min(enumerate(scores ), key=operator.itemgetter(1))
(1, 10)

数组排序

# .sort(), in-place sort, return None
autodances.sort( key = lambda x  :  x["time"] , reverse = False )

use cmp method . (PS. deprecated in python3 )

# python2 only
l.sort(cmp=lambda x,y:cmp( x.lower(), y.lower()  ))

字典排序 sorted

>>> d={"b":2, "a":3, "c":1}
>>> sorted(d)     #对 key 进行排序,输出一个key list
['a', 'b', 'c']      
>>> sorted(d.iteritems())     #对key 进行排序,返回 元组 list
[('a', 3), ('b', 2), ('c', 1)]
>>> sorted(d.iteritems() , key=lambda x:x[1])    # 对 值 进行排序, 返回 元组 list
[('c', 1), ('b', 2), ('a', 3)]
>>> sorted(d.iteritems() , key=lambda x:x[1] , reverse = True )
[('a', 3), ('b', 2), ('c', 1)]

迭代和组合

了解itertools模块: 该模块对迭代和组合是非常有效的

>>> import itertools 
>>> iter = itertools.permutations([1,2,3]) 
>>> list(iter) 
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]

bisect模块保持列表排序

这是一个免费的二分查找实现和快速插入有序序列的工具。你已將一个元素插入列表中, 而你不需要再次调用 sort() 来保持容器的排序, 因为这在长序列中这会非常昂贵.

>>> import bisect 
>>> bisect.insort(list, element) 

url unescape

import HTMLParser
html_parser = HTMLParser.HTMLParser()
txt = html_parser.unescape(html)
import urllib
urllib.unquote(a)
# for python3
urllib.parse.unquote(url)

html escape

>>> print cgi.escape.__doc__
Replace special characters "&", "<" and ">" to HTML-safe sequences.
    If the optional flag quote is true, the quotation mark character (")
    is also translated.

>>> import cgi
>>> cgi.escape( '>' )
'&gt;'
>>> cgi.escape( '"' , True )
'&quot;'

序列 ()

速度比列表快, 可以作为字典关键字

自省的核心 getattr 函数

getattr(obj, name [ ,  default_method_return_if_not_exist ] )

example:

for i in  dir( obj ):
    method = getattr( obj, i   )
    print i , method

callable

methodList = [method for method in dir(object) if callable(getattr(object, method))]

dict get

从字典中获取一个值

if d.has_key('key'):
    print d['key']
else:
    print 'not found'

可以简化为:

print d.get('key', 'not found')

dict setdefault

dict 插入key-value时,如果key不存在,先初始化为默认值(一般用于value是list, dict 类型)

def addword2dict(word, pagenumber): 
    dict.setdefault(word, []).append(pagenumber)

或者直接使用 defaultdict

from collections import defaultdict
d = defaultdict( dict )  # default value is empty dict
d = defaultdict( lambda: 2 )  # default value is 2

dict insection

找出两个字典的交集

print "Intersects:", [k for k in some_dict if k in another_dict]

速度上取胜:

print "Intersects:", filter(another_dict.has_key, some_dict.keys())

dict key/value 反转

>>> m = {"a":1,"b":2,"c":3}
>>> dict( zip( m.values(), m.keys() ) )
{1: 'a', 2: 'b', 3: 'c'}

or

>>> {v:k for k,v in m.iteritems() }
{1: 'a', 2: 'b', 3: 'c'}

convert a list to dict

>>> a = [3,1,2,4]
>>> dict(  zip(  *[iter( a )] *2  )  )
{3: 1, 2: 4}

or more simple

i = iter(a)
dict(zip(i, i))   # you must use a single iterator 

python2 unicode 判断

# python2
isinstance(u'a', unicode)

方法内全部局部变量

Python has a locals() function which gives you back a dictionary of local variables within the function

输出一个对象各个成员的名称和值

>>> g = lambda m: '\n'.join([ '%s=%s'%(k, repr(v)) for k, v in m.__dict__.iteritems() ])
>>> g(obj)

python 下划线变量

核心风格:避免用下划线作为变量名的开始。

  • _xxx
    • 不能用'from module import *'导入
    • 保护变量,意思是只有 类对象和子类对象自己 能访问到这些变量
  • __xxx
    • 类中的私有变量名
    • 私有成员,意思是只有类对象自己能访问,连子类对象也不能访问到这个数据
  • __xxx__
    • 系统定义名字
    • python里特殊方法专用的标识

re.sub : gourp '\number' followd normal numbers

re.sub(r'(foo)', r'\1123', 'foobar') 
=>
re.sub(r'(foo)', r'\g<1>123', 'foobar')

call super class constructor

def __init__(self) :
    super( self.__class__ , self).__init__()

numpy argmax tie breaking

b = np.array( [0,1,1] )
np.random.choice(np.where(b == b.max())[0])

or

np.random.choice(np.flatnonzero(b == b.max()))
  • b == b.max() will return an array of boolean, with values of true where items are max and values of false for other items
  • flatnonzero() will do to things: ignore the false values (nonzero part) then return indices of true values. In other words, you get an array with indices of items matching the max value
  • Finally, you pick random index of these


数字进制转换

10进制数字 => 2,8,16进制字符串

>>> bin(123) # 2
'0b1111011'
>>> oct(18) # 8
'0o22'
>>> hex(10) # 16
'0xa'

2,8,16进制字符串 ==> 10进制数字

>>> int('022',8) 
18

格式化数字为16进制字符串

>>> "%x" % 108
'6c'
>>>
>>> "%X" % 108
'6C'
>>>
>>> "%#X" % 108
'0X6C'
>>>
>>> "%#x" % 108
'0x6c'

字符处理

ascii列表 -> 字符串

good 1

import string
def f6(list):
    return string.joinfields(map(chr, list), "")

the best 1

import array
def f7(list):
    return array.array('B', list).tostring()

字符串编码相关

python2 unicode/str convert

  • string -> decode -> unicode
  • unicode -> encode -> string

char / ascii 互转

>>> print ord('a')
97
>>> print chr(97)
a

'2' == '\x32' == '\062'

python2 unichr / unicode string 互转

>>> print ord(u"我")
25105
>>> print unichr( 25105 )
我

python2 unicode -> special encoded string

unicodestring = u"Hello world"
utf8string = unicodestring.encode("utf-8")
asciistring = unicodestring.encode("ascii")
isostring = unicodestring.encode("ISO-8859-1")
utf16string = unicodestring.encode("utf-16")

python2 special encoded string -> unicode

plainstring1 = unicode(utf8string, "utf-8")
plainstring2 = unicode(asciistring, "ascii")
plainstring3 = unicode(isostring, "ISO-8859-1")
plainstring4 = unicode(utf16string, "utf-16")

unicode 可以使用 u"\uxxxx" 表示,但是当我们从某处获取 "\uxxxx", 并不能直接还原成 unicode,需要通过 "\uxxxx".decode("unicode-escape") 来转成 unicode, 注意, xxxx 必须保证有4个,不足以0补全

反之 ,通过 uni.encode("unicode-escape") 来 获得 "\uxxxx" 形式的字符串

convert '\n' to '\n'

>>> "\\n"
'\\n'
>>> "\\n".decode('string_escape')
'\n'

中文处理

改变脚本本地编码

  • 有时候,print 打印某些unicode字符的时候,会报 UnicodeError
reload(sys)
sys.setdefaultencoding('utf8') 

python 2.7 写 带中文字符的文件

fp = codecs.open( target_sheet_name + '.txt'  , "w", "utf-8")
fp.write(jsonObj )
fp.close()

Json dump, indent + sort keys

json.dump( obj, fp, ensure_ascii=False , separators=(',',':') , indent=4, sort_keys=True  )  

python2 获取中文字符长度

  • 需要转成unicode字符
unicode_string = bytes.decode("utf-8")
print len(unicode_string)

encrypt

base64

import base64
# base 64 decode
data = base64.b64decode( data ) 
# base64 encode
result_data = base64.b64encode( result_data)

md5

>>> import md5
>>> m = md5.new()
>>> m.update("Nobody inspects")
>>> m.update(" the spammish repetition")
>>> m.digest()

CRC32 IEEE

# python
import binascii
binascii.crc32(b"hello world")
# go
import "hash/crc32"

crc32.ChecksumIEEE( []byte("hello world") ) )

Base58

import base58
base58.b58encode( raw_str )
import "github.com/btcsuite/btcutil/base58"
decoded := base58.Decode(b58_str)

Misc

try - except 打印错误

import traceback
traceback.print_exc()

或者

s=sys.exc_info()
print "Error '%s' happened on line %d" % (s[1],s[2].tb_lineno)

python 并行任务技巧

  • 使用带有并发功能的map
  • Dummy是一个多进程包的完整拷贝
  • 唯一不同的是,多进程包使用进程,而dummy使用线程
  • 简言之,IO 密集型任务选择multiprocessing.dummy(多线程) , CPU 密集型任务选择multiprocessing(多进程)
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool( poolsize )
results = pool.map( func ,  param_set_list  )
pool.close()
pool.join()

profile

python -m cProfile  xxx.py

or

import profile
profile.run ( 'func_name')

强制浮点数运算

>>> from __future__ import division
>>> 1/2
0.5

float -> IEEE 754

online convert

# use numpy
>>> b = bin( np.float32( 0.15625 ).view(np.int32) )
>>> b
'0b111110001000000000000000000000'
>>> expo, mantissa =  b[ -23-8:-23 ] , b[ -23: ]
>>> print(expo, mantissa)
('b1111100', '01000000000000000000000')

int -> Binary

>>> np.binary_repr( 200, width=8)
'11001000'
>>> np.binary_repr( -1, width=8)
'11111111'

读取文件特定行

import linecache
#thefiepath             文件路径
#desired_line_number    整数,文件的特定行 
theline = linecache.getline(thefilepath, desired_line_number)

文件修改/创建时间

import   os,time 
time.ctime(os.stat( "d:/learn/flash.txt ").st_mtime)   #文件的修改时间 
time.ctime(os.stat( "d:/learn/flash.txt ").st_ctime)   #文件的创建时间

python 写 只读文件

读之前:

os.chmod(_path,  stat.S_IREAD)

写之前:

os.chmod(_path, stat.S_IWRITE | stat.S_IREAD)

uninstall files via python setup.py install

python ./setup.py install --record install.txt

cat install.txt | xargs [sudo] rm -rf

enter interact mode after executing a python file

import code 

...
code.interact(local=locals())

add python module search path

src_path = os.path.dirname( os.path.abspath(  __file__ ) )
# add parent folder as search path
sys.path.append( os.path.normpath( os.path.join( src_path , ".." )) )

open file with both reading and writing

with open(filename, "r+") as f:
    data = f.read()
    f.seek(0)
    f.write(output)
    f.truncate()

read a large file

for large file, parse file line by line, but do NOT use readlines()

with open( ... ) as fp
  for line in fp:

basis of Datetime and Time

import datetime as dt
import time as tm

# time returns the current time in seconds since the Epoch. (January 1st, 1970)
tm.time()
# 1511154150.7125366

# Convert the timestamp to datetime.
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
# datetime.datetime(2017, 11, 20, 5, 3, 1, 695393)

# Handy datetime attributes:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second
# (2017, 11, 20, 5, 3, 1)

# timedelta is a duration expressing the difference between two dates.
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta
# datetime.timedelta(100)

# date.today returns the current local date.
today = dt.date.today()
today - delta # the date 100 days ago
# datetime.date(2017, 8, 12)
today > today-delta # compare dates
# True

seconds to readable date

# here , `t` is  millis
>>> datetime.datetime.utcfromtimestamp( t/1000 ).strftime('%Y-%m-%dT%H:%M:%SZ')
'2019-03-01T09:33:08Z'
>>> datetime.datetime.fromtimestamp( t/1000 ).strftime('%Y-%m-%dT%H:%M:%SZ')
'2019-03-01T17:33:08Z'   # local time
>>> datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]+"Z" 
'2019-09-19 07:56:37.326Z'

convert between seconds since the epoch and struct_time

Use From To
gmtime() seconds since the epoch struct_time in UTC
calendar.timegm() struct_time in UTC seconds since the epoch

Example:

>>> calendar.timegm(  ( 2020,1,1,0,0,0 )  )
1577836800
>>> time.gmtime( 1577836800  )
time.struct_time(tm_year=2020, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)

Determining application path in a Python EXE generated by pyInstaller

# determine if application is a script file or frozen exe
if getattr(sys, 'frozen', False):
    application_path = os.path.dirname(sys.executable)
elif __file__:
    application_path = os.path.dirname(__file__)

config_path = os.path.join(application_path, config_name)

subprocess

  • 使用subprocess模块

    • 这个模块比较复杂,可以对子进程做更多控制
    • Popen is nonblocking. call and check_call are blocking
    • Popen(cmd , shell=True, cwd= arg) .stdout.readlines()
      • stdout=subprocess.PIPE , 会等待执行完毕再返回, 同时 原来脚本的 print ,都会从 stdout read 出来
    • 有些命令 , 并不是从标准 stdout 输出结果,可以:
      • res = subprocess.Popen( cmd , stdout=subprocess.PIPE , stdin=subprocess.PIPE, stderr=subprocess.STDOUT , shell=True ).stdout.read()
    • TODO : subprocess Popen 指定 shell 执行路径
      • print Popen(cmd , stdout=subprocess.PIPE , shell=True, cwd= arg) ?
    • 注: 如果没有 pipe 通讯的需求,推荐的用法如下
      subprocess.call( cmd.split() +  sys.argv[1:]  , stderr=subprocess.STDOUT ,shell=True )
    • shell=True 是在 shell中执行,以便获取环境变量之类的设置
    • 如果需要获取 exit code
      child = subprocess.Popen( cmd , stdout=subprocess.PIPE , stdin=subprocess.PIPE, stderr=subprocess.STDOUT , shell=True )
      streamdata = child.communicate()[0]
      rc = child.returncode
  • Python3 subprocess.run

    subprocess.run( [...] )
    • parent process will be blocked to wait child process to finish.
    • catputre_output=True to acquire the output
    >>> import subprocess
    >>> result = subprocess.run( ["rm", "does not exist"], capture_output=True, shell=True )
    >>> result
    CompletedProcess(args=['rm', 'does not exist'], returncode=1, stdout=b'', stderr=b'rm: does not exist: No such file or directory\n')
    >>> print ( result.stdout )
    b''
    >>> print ( result.stderr.decode() ) # to utf8 by default
    rm: does not exist: No such file or directory

python 解析 curl 命令获取的 json 字符串

result=`curl --silent  ... | python -c "import json,sys;obj=json.load(sys.stdin);print obj['anykey'];"`
echo result: $result

memoize decorator

  1. implementation in Python Decorator Library
  2. From python 3.2, functools.lru_cache
    • By default, it only caches the 128 most recently used calls, but you can set the maxsize to None to indicate that the cache should never expire:
    import functools
    
    @functools.lru_cache(maxsize=None)
    def fib(num):
        if num < 2:
            return num
        else:
            return fib(num-1) + fib(num-2)