- python tips (most 2.7)
- 语法技巧
- Print without new line
- eval 字符串表达式求值
- 获取变量x的内存地址
- for i, v 枚举
- min return both value and index
- 数组排序
- 字典排序 sorted
- 迭代和组合
- bisect模块保持列表排序
- url unescape
- html escape
- 序列 ()
- 自省的核心 getattr 函数
- callable
- dict get
- dict setdefault
- dict insection
- dict key/value 反转
- convert a list to dict
- python2 unicode 判断
- 方法内全部局部变量
- 输出一个对象各个成员的名称和值
- python 下划线变量
- re.sub : gourp '\number' followd normal numbers
- call super class constructor
- numpy argmax tie breaking
- 数字进制转换
- 字符处理
- 中文处理
- encrypt
- Misc
- try - except 打印错误
- python 并行任务技巧
- profile
- 强制浮点数运算
- float -> IEEE 754
- int -> Binary
- 读取文件特定行
- 文件修改/创建时间
- python 写 只读文件
- uninstall files via
python setup.py install
- enter interact mode after executing a python file
- add python module search path
- open file with both reading and writing
- read a large file
- basis of Datetime and Time
- seconds to readable date
- convert between seconds since the epoch and struct_time
- Determining application path in a Python EXE generated by pyInstaller
- subprocess
- python 解析 curl 命令获取的 json 字符串
- memoize decorator
- 语法技巧
for i in range(5):
# return only, no new line
print( i, end="\r" )
4
for i in range(5):
# no return, no new line
print( i, end="" )
01234
eval() 默认使用当前环境的名字空间,也可以带入自定义字典
ns=dict(x=10,y=20)
eval("x+y" , ns )
eval只能用来处理表达式, 对于 代码段, 可以使用 exec()
id(x)
for i, item in enumerate( iterable ):
mport operator
>>> scores = [30, 10,20 ]
>>> min(enumerate(scores ), key=operator.itemgetter(1))
(1, 10)
# .sort(), in-place sort, return None
autodances.sort( key = lambda x : x["time"] , reverse = False )
use cmp
method . (PS. deprecated in python3 )
# python2 only
l.sort(cmp=lambda x,y:cmp( x.lower(), y.lower() ))
>>> d={"b":2, "a":3, "c":1}
>>> sorted(d) #对 key 进行排序,输出一个key list
['a', 'b', 'c']
>>> sorted(d.iteritems()) #对key 进行排序,返回 元组 list
[('a', 3), ('b', 2), ('c', 1)]
>>> sorted(d.iteritems() , key=lambda x:x[1]) # 对 值 进行排序, 返回 元组 list
[('c', 1), ('b', 2), ('a', 3)]
>>> sorted(d.iteritems() , key=lambda x:x[1] , reverse = True )
[('a', 3), ('b', 2), ('c', 1)]
了解itertools模块: 该模块对迭代和组合是非常有效的
>>> import itertools
>>> iter = itertools.permutations([1,2,3])
>>> list(iter)
[(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
这是一个免费的二分查找实现和快速插入有序序列的工具。你已將一个元素插入列表中, 而你不需要再次调用 sort() 来保持容器的排序, 因为这在长序列中这会非常昂贵.
>>> import bisect
>>> bisect.insort(list, element)
import HTMLParser
html_parser = HTMLParser.HTMLParser()
txt = html_parser.unescape(html)
import urllib
urllib.unquote(a)
# for python3
urllib.parse.unquote(url)
>>> print cgi.escape.__doc__
Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.
>>> import cgi
>>> cgi.escape( '>' )
'>'
>>> cgi.escape( '"' , True )
'"'
速度比列表快, 可以作为字典关键字
getattr(obj, name [ , default_method_return_if_not_exist ] )
example:
for i in dir( obj ):
method = getattr( obj, i )
print i , method
methodList = [method for method in dir(object) if callable(getattr(object, method))]
从字典中获取一个值
if d.has_key('key'):
print d['key']
else:
print 'not found'
可以简化为:
print d.get('key', 'not found')
dict 插入key-value时,如果key不存在,先初始化为默认值(一般用于value是list, dict 类型)
def addword2dict(word, pagenumber):
dict.setdefault(word, []).append(pagenumber)
或者直接使用 defaultdict
from collections import defaultdict
d = defaultdict( dict ) # default value is empty dict
d = defaultdict( lambda: 2 ) # default value is 2
找出两个字典的交集
print "Intersects:", [k for k in some_dict if k in another_dict]
速度上取胜:
print "Intersects:", filter(another_dict.has_key, some_dict.keys())
>>> m = {"a":1,"b":2,"c":3}
>>> dict( zip( m.values(), m.keys() ) )
{1: 'a', 2: 'b', 3: 'c'}
or
>>> {v:k for k,v in m.iteritems() }
{1: 'a', 2: 'b', 3: 'c'}
>>> a = [3,1,2,4]
>>> dict( zip( *[iter( a )] *2 ) )
{3: 1, 2: 4}
or more simple
i = iter(a)
dict(zip(i, i)) # you must use a single iterator
# python2
isinstance(u'a', unicode)
Python has a locals() function which gives you back a dictionary of local variables within the function
>>> g = lambda m: '\n'.join([ '%s=%s'%(k, repr(v)) for k, v in m.__dict__.iteritems() ])
>>> g(obj)
核心风格:避免用下划线作为变量名的开始。
_xxx
- 不能用'from module import *'导入
- 保护变量,意思是只有 类对象和子类对象自己 能访问到这些变量
__xxx
- 类中的私有变量名
- 私有成员,意思是只有类对象自己能访问,连子类对象也不能访问到这个数据
__xxx__
- 系统定义名字
- python里特殊方法专用的标识
re.sub(r'(foo)', r'\1123', 'foobar')
=>
re.sub(r'(foo)', r'\g<1>123', 'foobar')
def __init__(self) :
super( self.__class__ , self).__init__()
b = np.array( [0,1,1] )
np.random.choice(np.where(b == b.max())[0])
or
np.random.choice(np.flatnonzero(b == b.max()))
- b == b.max() will return an array of boolean, with values of true where items are max and values of false for other items
- flatnonzero() will do to things: ignore the false values (nonzero part) then return indices of true values. In other words, you get an array with indices of items matching the max value
- Finally, you pick random index of these
>>> bin(123) # 2
'0b1111011'
>>> oct(18) # 8
'0o22'
>>> hex(10) # 16
'0xa'
>>> int('022',8)
18
>>> "%x" % 108
'6c'
>>>
>>> "%X" % 108
'6C'
>>>
>>> "%#X" % 108
'0X6C'
>>>
>>> "%#x" % 108
'0x6c'
good 1
import string
def f6(list):
return string.joinfields(map(chr, list), "")
the best 1
import array
def f7(list):
return array.array('B', list).tostring()
- string -> decode -> unicode
- unicode -> encode -> string
>>> print ord('a')
97
>>> print chr(97)
a
'2' == '\x32' == '\062'
>>> print ord(u"我")
25105
>>> print unichr( 25105 )
我
unicodestring = u"Hello world"
utf8string = unicodestring.encode("utf-8")
asciistring = unicodestring.encode("ascii")
isostring = unicodestring.encode("ISO-8859-1")
utf16string = unicodestring.encode("utf-16")
plainstring1 = unicode(utf8string, "utf-8")
plainstring2 = unicode(asciistring, "ascii")
plainstring3 = unicode(isostring, "ISO-8859-1")
plainstring4 = unicode(utf16string, "utf-16")
unicode 可以使用 u"\uxxxx" 表示,但是当我们从某处获取 "\uxxxx", 并不能直接还原成 unicode,需要通过 "\uxxxx".decode("unicode-escape") 来转成 unicode, 注意, xxxx 必须保证有4个,不足以0补全
反之 ,通过 uni.encode("unicode-escape") 来 获得 "\uxxxx" 形式的字符串
>>> "\\n"
'\\n'
>>> "\\n".decode('string_escape')
'\n'
- 有时候,print 打印某些unicode字符的时候,会报 UnicodeError
reload(sys)
sys.setdefaultencoding('utf8')
fp = codecs.open( target_sheet_name + '.txt' , "w", "utf-8")
fp.write(jsonObj )
fp.close()
json.dump( obj, fp, ensure_ascii=False , separators=(',',':') , indent=4, sort_keys=True )
- 需要转成unicode字符
unicode_string = bytes.decode("utf-8")
print len(unicode_string)
import base64
# base 64 decode
data = base64.b64decode( data )
# base64 encode
result_data = base64.b64encode( result_data)
>>> import md5
>>> m = md5.new()
>>> m.update("Nobody inspects")
>>> m.update(" the spammish repetition")
>>> m.digest()
# python
import binascii
binascii.crc32(b"hello world")
# go
import "hash/crc32"
crc32.ChecksumIEEE( []byte("hello world") ) )
import base58
base58.b58encode( raw_str )
import "github.com/btcsuite/btcutil/base58"
decoded := base58.Decode(b58_str)
import traceback
traceback.print_exc()
或者
s=sys.exc_info()
print "Error '%s' happened on line %d" % (s[1],s[2].tb_lineno)
- 使用带有并发功能的map
- Dummy是一个多进程包的完整拷贝
- 唯一不同的是,多进程包使用进程,而dummy使用线程
- 简言之,IO 密集型任务选择multiprocessing.dummy(多线程) , CPU 密集型任务选择multiprocessing(多进程)
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool( poolsize )
results = pool.map( func , param_set_list )
pool.close()
pool.join()
python -m cProfile xxx.py
or
import profile
profile.run ( 'func_name')
>>> from __future__ import division
>>> 1/2
0.5
# use numpy
>>> b = bin( np.float32( 0.15625 ).view(np.int32) )
>>> b
'0b111110001000000000000000000000'
>>> expo, mantissa = b[ -23-8:-23 ] , b[ -23: ]
>>> print(expo, mantissa)
('b1111100', '01000000000000000000000')
>>> np.binary_repr( 200, width=8)
'11001000'
>>> np.binary_repr( -1, width=8)
'11111111'
import linecache
#thefiepath 文件路径
#desired_line_number 整数,文件的特定行
theline = linecache.getline(thefilepath, desired_line_number)
import os,time
time.ctime(os.stat( "d:/learn/flash.txt ").st_mtime) #文件的修改时间
time.ctime(os.stat( "d:/learn/flash.txt ").st_ctime) #文件的创建时间
读之前:
os.chmod(_path, stat.S_IREAD)
写之前:
os.chmod(_path, stat.S_IWRITE | stat.S_IREAD)
python ./setup.py install --record install.txt
cat install.txt | xargs [sudo] rm -rf
import code
...
code.interact(local=locals())
src_path = os.path.dirname( os.path.abspath( __file__ ) )
# add parent folder as search path
sys.path.append( os.path.normpath( os.path.join( src_path , ".." )) )
with open(filename, "r+") as f:
data = f.read()
f.seek(0)
f.write(output)
f.truncate()
for large file, parse file line by line, but do NOT use readlines()
with open( ... ) as fp
for line in fp:
import datetime as dt
import time as tm
# time returns the current time in seconds since the Epoch. (January 1st, 1970)
tm.time()
# 1511154150.7125366
# Convert the timestamp to datetime.
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow
# datetime.datetime(2017, 11, 20, 5, 3, 1, 695393)
# Handy datetime attributes:
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second
# (2017, 11, 20, 5, 3, 1)
# timedelta is a duration expressing the difference between two dates.
delta = dt.timedelta(days = 100) # create a timedelta of 100 days
delta
# datetime.timedelta(100)
# date.today returns the current local date.
today = dt.date.today()
today - delta # the date 100 days ago
# datetime.date(2017, 8, 12)
today > today-delta # compare dates
# True
# here , `t` is millis
>>> datetime.datetime.utcfromtimestamp( t/1000 ).strftime('%Y-%m-%dT%H:%M:%SZ')
'2019-03-01T09:33:08Z'
>>> datetime.datetime.fromtimestamp( t/1000 ).strftime('%Y-%m-%dT%H:%M:%SZ')
'2019-03-01T17:33:08Z' # local time
>>> datetime.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]+"Z"
'2019-09-19 07:56:37.326Z'
Use | From | To |
---|---|---|
gmtime() | seconds since the epoch | struct_time in UTC |
calendar.timegm() | struct_time in UTC | seconds since the epoch |
Example:
>>> calendar.timegm( ( 2020,1,1,0,0,0 ) )
1577836800
>>> time.gmtime( 1577836800 )
time.struct_time(tm_year=2020, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=0)
# determine if application is a script file or frozen exe
if getattr(sys, 'frozen', False):
application_path = os.path.dirname(sys.executable)
elif __file__:
application_path = os.path.dirname(__file__)
config_path = os.path.join(application_path, config_name)
-
使用subprocess模块
- 这个模块比较复杂,可以对子进程做更多控制
- Popen is nonblocking. call and check_call are blocking
Popen(cmd , shell=True, cwd= arg) .stdout.readlines()
- stdout=subprocess.PIPE , 会等待执行完毕再返回, 同时 原来脚本的 print ,都会从 stdout read 出来
- 有些命令 , 并不是从标准 stdout 输出结果,可以:
res = subprocess.Popen( cmd , stdout=subprocess.PIPE , stdin=subprocess.PIPE, stderr=subprocess.STDOUT , shell=True ).stdout.read()
- TODO : subprocess Popen 指定 shell 执行路径
print Popen(cmd , stdout=subprocess.PIPE , shell=True, cwd= arg)
?
- 注: 如果没有 pipe 通讯的需求,推荐的用法如下
subprocess.call( cmd.split() + sys.argv[1:] , stderr=subprocess.STDOUT ,shell=True )
shell=True
是在 shell中执行,以便获取环境变量之类的设置- 如果需要获取 exit code
child = subprocess.Popen( cmd , stdout=subprocess.PIPE , stdin=subprocess.PIPE, stderr=subprocess.STDOUT , shell=True ) streamdata = child.communicate()[0] rc = child.returncode
-
Python3 subprocess.run
subprocess.run( [...] )
- parent process will be blocked to wait child process to finish.
catputre_output=True
to acquire the output
>>> import subprocess >>> result = subprocess.run( ["rm", "does not exist"], capture_output=True, shell=True ) >>> result CompletedProcess(args=['rm', 'does not exist'], returncode=1, stdout=b'', stderr=b'rm: does not exist: No such file or directory\n') >>> print ( result.stdout ) b'' >>> print ( result.stderr.decode() ) # to utf8 by default rm: does not exist: No such file or directory
result=`curl --silent ... | python -c "import json,sys;obj=json.load(sys.stdin);print obj['anykey'];"`
echo result: $result
- implementation in Python Decorator Library
- From python 3.2,
functools.lru_cache
- By default, it only caches the 128 most recently used calls, but you can set the
maxsize
to None to indicate that the cache should never expire:
import functools @functools.lru_cache(maxsize=None) def fib(num): if num < 2: return num else: return fib(num-1) + fib(num-2)
- By default, it only caches the 128 most recently used calls, but you can set the