Python

Pythonジェネレータ

Pythonにはリスト内包表記とよく似た表記のジェネレータ式がある。
 
リスト内包表記はリストを返す。
 
>>> [ x ** 2 for x in range(4)]
[0, 1, 4, 9]
ジェネレータ式は反復処理可能なオブジェクト「ジェネレータオブジェクト」を返す。
 
>>> ( x ** 2 for x in range(4))
<generator object at 0x00AB0648>
ジェネレータオブジェクトは、イテレータプロトコルをサポートしている。
 
>>> g = ( x ** 2 for x in range(4))
>>> g.next()
0
>>> g.next()
1
>>> g.next()
4
>>> g.next()
9
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

字典

字典交换

a = dict()
a['one']=1
a['two']=2
 
res = dict((v,k) for k,v in a.iteritems())

遍历字典

>>> d = {'Python': 'Guido van Rossum',
...      'Perl': 'Larry Wall',
...      'Tcl': 'John Ousterhout' }
for文を用いて、ディクショナリのキーと、それに対応する値を表示する。
 
keysメソッドでキーのリストを得て、キーごとの値を表示する。
 
>>> for key in d.keys():
...     print '%s=%s' % (key, d[key])
...
Python=Guido van Rossum
Tcl=John Ousterhout
Perl=Larry Wall
Pythonのディクショナリはイテレータが定義されているので、keysメソッドを用いなくても、同様のforループを使用できる。
 
>>> for key in d:
...     print '%s=%s' % (key, d[key])
...
Python=Guido van Rossum
Tcl=John Ousterhout
Perl=Larry Wall

snow leopard install mysqldb-python

versions:
mysql adapter for python: 1.2.3c1
mysql: 5.1.41 32-bit version
default python for snow leopard: 2.6.1
 
1.install mysql package: http://dev.mysql.com/downloads/mysql/5.1.html#macosx-dmg
 
2. grab mysql adapter for python: http://sourceforge.net/projects/mysql-python/files/
 
3.in directory for mysql for python:
Edit the setup_posix.py and change the following
 
mysql_config.path = “mysql_config”
 
to
 
mysql_config.path = “/usr/local/mysql/bin/mysql_config”
 
4. sudo ARCHFLAGS=’-arch i386′ CC=/usr/bin/gcc-4.0 python setup.py build
 
5. sudo ARCHFLAGS=’-arch i386′ CC=/usr/bin/gcc-4.0 python setup.py install
 
6. force python to be 32-bit
defaults write com.apple.versioner.python Prefer-32-Bit -bool yes
 
7. python
Greg-Elliotts-Mac-Pro:~ greg$ python
Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import MySQLdb
>>>

安装python 2.6

tar -zxvf Python-2.6.4.tgz
cd Python-2.6.4 && ./configure &&make &&make install
wget http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg
sh setuptools-0.6c11-py2.6.egg
easy_install readline
easy_install ipython

yield 的简单解释

yield就相当于往一个list中塞东西而已(初步这么感觉),只不过写法很奇怪罢了。呵呵
 
>>> def kk(x):
...     yield x
...     yield x+7
...     yield x*2
...
>>> b = kk(14)
>>> b.next()
14
>>> b.next()
21
>>> b.next()
28
>>> b.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
StopIteration
 
每个yield都往返回的叠代器中添加一个项,值是yield后面的内容。虽然函数没有明确写返回值,但是就返回了这个隐秘的叠带器。
 
>>> for i in kk(11):
...     print i
...
11
18
22
 
果然。
另外yield后面是可以随便跟什么东西的。比如列表:
 
>>> def kk(x,y):
...     yield [x,y]
...     yield [x+1,y+1]
...
>>> for i in kk(12,33):
...     print i
...
[12, 33]
[13, 34]

Python函数参数中的*,**

问题:
     Python的函数定义中有两种特殊的情况,即出现***的形式。
     如:def myfun1(username, *keys)def myfun2(username, **keys)等。
 
解释:
  * 用来传递任意个无名字参数,这些参数会一个Tuple的形式访问。
 
   **用来处理传递任意个有名字的参数,这些参数用dict来访问。*
 
应用:
#########################
 
# “*” 的应用
 
#########################
>>> def fun1(*keys):
...     print "keys type=%s" % type(keys)
...     print "keys=%s" % str(keys)
...     for i in range(0, len(keys)):
...             print "keys[" + str(i) + "]=%s" % str(keys[i])
...
>>> fun1(2,3,4,5)
 
输出以下结果:
keys type=<type 'tuple'>
keys=(2, 3, 4, 5)
keys[0]=2
keys[1]=3
keys[2]=4
keys[3]=5
 
 
 
#########################
 
# “**” 的应用
 
#########################
 
>>> def fun2(**keys):
...     print "keys type=%s" % type(keys)
...     print "keys=%s" % str(keys)
...     print "name=%s" % str(keys['name'])
...
>>>
>>> fun2(name="vp", age=19)
 
输出以下结果:
keys type=<type 'dict'>
keys={'age': 19, 'name': 'vp'}
name=vp

Mac下选择python版本

#python_select python26

Python easy_install

#wget http://peak.telecommunity.com/dist/ez_setup.py
#python ez_setup.py
#easy_install hashlib

笔记

for i in range(20):
    print ("*"*i)
 
 >>>li
['a', 'b', 'mpilgrim', 'z', 'example']
 
>>>li[-3]
mpilgrim
 
#如果负数索引使您感到糊涂,可以这样理解:li[-n] == li[len(li) - n]。
#所以在这个 list 里,li[-3] == li[5 - 3] == li[2]。
 
>>>li.pop()
example
 
#pop(),删除list中最后一个元素并返回该元素的值

Python数组解决

说明:本文并不详细介绍Python中的列表,可以参看Python文档。

Python中没有数组的数据结构,但列表很像数组,如:a=[0,1,2]

这时a[0]=0, a[1]=1, a[2]=2,但引出一个问题,即如果数组a想定义为0到999怎么办,这时可能通过a = range(0, 1000)实现。

或省略为a = range(1000).如果想定义1000长度的a,初始值全为0,则 a = [0 for x in range(0, 1000)]下面是二维数组的定义:

直接定义 a=[[1,1],[1,1]],这里定义了一个2*2的,且初始为0的二维数组。

间接定义 a=[[0 for x in range(10)] for y in range(10)],这里定义了10*10初始为0的二维数组。


后来,我在网上找到了更简单的字义二维数组的方法:

b = [[0]*10]*10,定义10*10初始为0的二维数组。

与a=[[0 for x in range(10)] for y in range(10)]比较:print a==b的结果为True。

但用b的定义方法代替a后,以前的可以正常运行的程序也出错了,经过仔细分析得出区别:

a[0][0]=1时,只有a[0][0]为1,其他全为0。

b[0][0]=1时,a[0][0],a[1][0],只到a[9,0]全部为1。由此得到大数组中的10个小的一维数据全是一个相同的引用,即指向同一地址。

故b = [[0]*10]*10并不符合我们常规意义上的二维数组。

同时经过试验:c=[0]*10的定义与c=[0 for x in range(10)]有同样的效果,而没有上面相同引用的问题,
估计数组c的定义时是值类型相乘,而前面b的用类型的相乘,因为一维数组是一个引用(借用C#中的值类型和引用类型,不知是否合适)。

logger模块

import logging
logger = logging.getLogger('myapp')
hdlr = logging.FileHandler('var/myapp.log')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.INFO)
 
####And then to use it:
 
logger.info('a log message')

生成连续数列

num=range(1,19)
host_list=[]
for n in num:
        host_list.append(“srv" + str("%02d" % n))
 
print host_list

连续字符串

import string
print string.lowercase

字典

「2つのオブジェクトのリストかタプル」のリストに対し、「dict」キャスティングを行うと辞書化できます。
1
2
3
>>> dic=dict([['hoge',123], ['moge', 456]])
>>> print dic
{'moge': 456, 'hoge': 123}
また、別個のリスト2つから、上記のようなセットを作れる「zip」というビルトイン関数を使うと、 以下のような変換ができます。
1
2
3
>>> dic=dict(zip(['hoge', 'moge'],[123, 456]))
>>> print dic
{'moge': 456, 'hoge': 123}

字母转十进制数字

python -c 'print reduce(lambda a,b: a*256+ord(b), raw_input("string: "), 0)’

显示文件大小

f = “/tmp/test.fileos.stat(f)[6]

Python tips

pip

pip 1.2.0开始不再支持python2.4

python2.4需要安装pip 1.1版本

python deploy

環境:tomcat + haproxy

#!/usr/bin/env python
 
import func.overlord.client as fc
client = fc.Client("gill9008*")
print client.local.copyfile.send("/root/tomcatlog.cron", "/etc/cron.daily/tomcatlog.cron", 755)
[root@jack94202 ~]# more d2t.py
#!/usr/bin/env python
# Filename: deploy2tomcat.py
 
import func.overlord.client as fc
import sys
import os
import pycurl
 
def show_tomcat_srv_list(haproxys):
    show_stat_cmd = "echo 'show stat'|/usr/bin/socat /var/lib/haproxy/stats stdio|grep tomcatcluster|grep -v BACKEND|awk -F, '{print $2}'"
    tomcat_srv_list = {}
    for ha_srv in haproxys:
        tmp_srv_list = []
        client = fc.Client(ha_srv)
        results = client.command.run(show_stat_cmd)
        tomcat_srv_list[ha_srv] = list(set(results.values()[0][1].split("\n")) & set(deploy_srv_list(tomcats)))
    return  tomcat_srv_list
 
def deploy_srv_list(tomcats):
#    if not tomcats:
#        tomcats =  ["gill90080", "gill90082", "gill90084", "gill90086", "gill90088"]
    return ["%s:%s" % (m,n) for m in tomcats for n in tomcatcluster]
 
def deploy_url_list(tomcats):
    return ["http://honey:opera@%s:%s/manager/deploy?path=/&update=true" % (m,n) for m in tomcats for n in tomcatports]
 
def cmd_list(active):
    cmds = {}
    for (srv,deploy_srvs) in show_tomcat_srv_list(haproxys).items():
        tmp_cmd_list = []
        for deploy_srv in deploy_srvs:
            tmp_cmd_list.append('echo "%s server tomcatcluster/%s"|/usr/bin/socat /var/lib/haproxy/stats stdio' % (active,deploy_srv))
        cmds[srv] = ";".join(tmp_cmd_list)
    return cmds
 
 
def run_cmd(action):
    if not action:
        print "ERROR action"
        sys.exit()
    for (srv,cmd) in cmd_list(action).items():
        client = fc.Client(srv)
        results = client.command.run(cmd)
 
def deploy2srv(filename):
    if not filename:
        print 'ERROR deploy filename!'
        sys.exit()
 
    filesize = os.path.getsize(filename)
#    print filesize
 
    for url in deploy_url_list(tomcats):
        print url
        f = file(filename)
        c = pycurl.Curl()
        c.setopt(pycurl.URL, url)
        c.setopt(pycurl.PUT, 1)
        c.setopt(pycurl.INFILE, f)
        c.setopt(pycurl.INFILESIZE, filesize)
        c.perform()
 
def restart_tomcats(tomcats):
    stop_monit = "initctl stop monit"
    start_monit = "initctl start monit"
    monitor_all = "monit monitor all"
    restart_tomcats = "monit -g tomcat restart"
    for client in tomcats:
        client = fc.Client(client)
        client.command.run(stop_monit)
        client.command.run(restart_tomcats)
        client.command.run(start_monit)
        client.command.run(monitor_all)
 
if __name__ == "__main__":
#    print deploy_srv_list(tomcats)
    tomcats = ["gill90080", "gill90082"]
    haproxys = ["cody90081", "cody90083", "cody90085"]
    tomcatcluster = ["tomcat1", "tomcat2", "tomcat3", "tomcat4", "tomcat5", "tomcat6", "tomcat7", "tomcat8",]
    tomcatports = ["8081", "8082", "8083", "8084", "8085", "8086", "8087", "8088"]
#    tomcatcluster = ["tomcat%s" % (n) for n in range(1,9)]
 
#    print show_tomcat_srv_list(haproxys)
#    for (k,v) in cmd_list("disable").items():
#        print "%s => %s\n" % (k,v)
#    run_cmd("disable")
#    deploy2srv('/root/5238ROOT.war')
#    deploy2srv('/root/5271ROOT.war')
    run_cmd("enable")
#    restart_tomcats(tomcats)

python fabric.api メモ

#!/usr/local/bin/python
# -*- coding: utf-8 -*-
 
from fabric.api import env,run,put,get
from os import path
from re import findall
from sys import argv
from fabric.context_managers import hide
from time import sleep
 
USER='root'
HOST,IP_LIST=[],[]
PORT='22'
PRI_KEY,PASSWORD,CMD,uSRC,uDST,dSRC,dDST='','','','','','',''
timeout=1
 
for i in range(1,len(argv)+1):
    if argv[i-1] == '-h' or len(argv)==1:
                print """
                USAGE:
                                -u [user]       Use this argument to specify the user,default is 'root'
                                -H [host]       The host that you want to connect
                                -f [file]       The file content multiple ip address you want to connect
                                -P [port]       The ssh port,default is 22
                                -p [pwd|file]   You can specify password or a priviate key file to connect the host
                                -c [command]    The command you want the host(s) to run
                                -U [src,dst]    The local file that you want to upload to the remote host(s)
                                -D [src,dst]    The remote file that you want to download to the local host
                                -t [timeout]    The program running timeout,default is 1(s)
                                -h              Print this help screen
                """
 
    if argv[i-1] == '-u':
            USER=argv[i]
            env.user='%s'%(USER)
    else:
            env.user='%s'%(USER)
    if argv[i-1] == '-H':
        arg=findall('(\d+\.\d+\.\d+\.\d+|\s+\.{3,4})',argv[i])
        for j in arg:
            if type(j).__name__ !='NoneType':
                HOST.append(j)
            else:
                print 'The HostIP input error'
    if argv[i-1] == '-P':
        PORT=argv[i]
    if argv[i-1] == '-f':
        if path.isfile('%s'%(argv[i])) == True:
            IP_LIST=open('%s'%(argv[i]),'r').readlines()
    if argv[i-1] == '-p':
        if path.isfile(argv[i]) == True:
            PRI_KEY=argv[i]
            env.key_filename='%s'%(PRI_KEY)
        else:
            PASSWORD=argv[i]
            env.password='%s'%(PASSWORD)
    if argv[i-1] == '-c':
        CMD=argv[i]
    if argv[i-1] == '-t':
        timeout=argv[i]
 
    SLP='sleep %s'%(timeout)
 
    if argv[i-1] == '-U':
        x=src=argv[i].split(',')
        uSRC=x[0]
        uDST=x[1]
 
    if argv[i-1] == '-D':
        y=src=argv[i].split(',')
        dSRC=y[0]
        dDST=y[1]
 
else:
    IP_PORT=[]
    if len(IP_LIST)!=0:
        for k in IP_LIST:
            IP_PORT.append(k.strip()+':'+PORT)
    if len(HOST)!=0:
        for k in HOST:
            IP_PORT.append(k.strip()+':'+PORT)
if CMD != '':
    def command():
        with hide('running'):
            run("%s;%s" %(CMD,SLP))
    for ip in IP_PORT:
        env.host_string=ip
 
        print "Execute command : \"%s\" at Host : %s" %(CMD,ip.split(':')[0])
        print "-------------------------------------------------"
        command()
        print "-------------------------------------------------"
 
if uSRC and uDST !='':
        def upload():
                with hide('running'):
                        put("%s" %(uSRC),"%s" %(uDST))
        for ip in IP_PORT:
                env.host_string=ip
                print "Upload local file : \"%s\" to Host : %s \"%s\"" %(uSRC,ip.split(':')[0],uDST)
                print "-------------------------------------------------"
                upload()
                print "-------------------------------------------------"
 
if dSRC and dDST !='':
        def download():
                with hide('running'):
                        get("%s" %(dSRC),"%s" %(dDST))
        for ip in IP_PORT:
                env.host_string=ip
                print "Download remote file : \"%s\" from Host : %s to local \"%s\"" %(dSRC,ip.split(':')[0],dDST)
                print "-------------------------------------------------"
                download()
                print "-------------------------------------------------"

set()

>>>basket = ['apple','orange','apple','pear','apple','banana']
 
>>>fruit=set(basket)
 
>>>fruit
set(['orange', 'pear', 'apple', 'banana'])
 
>>>'orange' in fruit
True
 
>>>a=set('abracadabew')
>>>a
set(['a', 'c', 'b', 'e', 'd', 'r', 'w'])
 
>>>b=set('wajgwaoihwb')
>>>b
set(['a', 'b', 'g', 'i', 'h', 'j', 'o', 'w'])
 
>>>a-b    #差
set(['c', 'r', 'e', 'd'])
 
>>>a|b   #并
set(['a', 'c', 'b', 'e', 'd', 'g', 'i', 'h', 'j', 'o', 'r', 'w'])
 
>>>a&b   #交
set(['a', 'b', 'w'])
 
>>>a^b   #(并-交)
set(['c', 'e', 'd', 'g', 'i', 'h', 'j', 'o', 'r'])

进程锁

只启动一个进程
lockfp = file("/home/atlantis/trans2hadoop/mod.lck","w”)
fcntl.flock(lockfp.fileno(),fcntl.LOCK_EX | fcntl.LOCK_NB)

除去列表重复元素

>>> a = [11,22,33,44,11,22]
>>> b = set(a)
>>> b
set([33, 11, 44, 22])
>>> c = [i for i in b]
>>> c
[33, 11, 44, 22]

格式转换

[(1,), (2,), (3,)] 如何转为 [1, 2, 3]?

a = [(1,), (2,), (3,)]
[tuple(i)[0] for i in a]

测试服务器端口

python -m SimpleHTTPServer 8888

python 服务器

#/bin/env python
#coding:utf-8
 
import socket,select,sys,time
import thread
 
s_list = []
 
def loop(cs,addr,s_ip,s_port):
    print '%s %d connected.' % addr
    ts = socket.socket()
 
    try:
        ts.connect((s_ip,s_port))
    except:
        cs.close()
        print '%s %d closed.' % addr
        sys.exit(0)
 
    while True:
 
        rl,wl,xl = select.select([cs.fileno(),ts.fileno()],[],[cs.fileno(),ts.fileno()])
 
        if len(xl) > 0:
            cs.close()
            ts.close()
            print '%s %d closed.' % addr
            sys.exit(0)
 
        if len(rl) > 0:
            if rl[0] == cs.fileno():
                rs = ts
                ws = cs
            else:
                rs = cs
                ws = ts
 
            try:
                buffer = ws.recv(10000)
                if len(buffer) == 0:
                    raise
                rs.send(buffer)
            except:
                rs.close()
                ts.close()
                print '%s %d closed.' % addr
                sys.exit(0)
 
def mainserver(l_port,s_ip,s_port):
    global s_list
    try:
        ss = socket.socket()
        ss.bind(('0.0.0.0',l_port))
        ss.listen(10)
        s_list.append((l_port,s_ip,s_port))
    except:
        sys.exit(0)
 
    while True:
        cs,addr = ss.accept()
 
        thread.start_new_thread(loop,(cs,addr,s_ip,s_port))
 
def manager(l_port):
    global start,s_list
 
    ss = socket.socket()
    ss.bind(('0.0.0.0',l_port))
    ss.listen(10)
 
    while True:
        cs,addr = ss.accept()
        cs.send("""trans server 1.0\r\ntype 'help' to get help\r\n""")
        buffer = ''
        while True:
            buf = cs.recv(10000)
            if len(buf) == 0:
                cs.close()
                break
            if buf[-1] not in ('\r','\n'):
                buffer += buf
                continue
            buffer += buf
            cmd = buffer.strip()
            buffer = ''
            if cmd == 'exit':
                cs.close()
                break
            elif cmd == 'stop':
                start = 0
                cs.close()
                sys.exit(0)
            elif cmd == 'list':
                b = ''
                for l in s_list:
                    b += '%4d %s:%d\r\n' % l
 
                if len(b) > 0:
                    cs.send(b)
            elif cmd in ('help','?'):
                cs.send("""-------------------------------------------\r
exit\r
    exit telnet\r
start localport serverip:serverport\r
    start a new server\r
list\r
    list all server\r
-------------------------------------------\r
""")
            else:
                cmds = cmd.split(" ",1)
                if len(cmds) > 1 and cmds[0] == 'start':
                    args = cmds[1].strip().split(" ",1)
                    if len(args) != 2:
                        cs.send('start localport serverip:serverport\r\n')
                        continue
                    arg = args[1].split(":",1)
                    if len(arg) != 2:
                        cs.send('start localport serverip:serverport\r\n')
                        continue
 
                    try:
                        l_port = int(args[0])
                        s_ip = arg[0]
                        s_port = int(arg[1])
                    except:
                        cs.send('start localport serverip:serverport\r\n')
                        continue
 
                    thread.start_new_thread(mainserver,(l_port,s_ip,s_port))
                    cs.send('start OK!\r\n')
                else:
                    cs.send('no command [%s]\r\n' % cmd)
                    continue
 
def main():
    global start
 
    if len(sys.argv) == 3:
        try:
            l_port = int(sys.argv[1])
            s_ip,s_port = sys.argv[2].split(":")
            s_port = int(s_port)
            thread.start_new_thread(mainserver,(l_port,s_ip,s_port))
        except:
            pass
 
    start = 1
 
    thread.start_new_thread(manager,(9000,))
 
    while start:
        time.sleep(1)
 
if __name__ == '__main__':
 
    start = 0
 
    main()

How to get a Month Name in Python

>>> import datetime
>>> named_month = lambda month_num:datetime.date(1900,month_num,1).strftime('%B')
>>> print named_month(5)
'May'

Python 取IP

>>>import socket
>>>print socket.gethostbyname(socket.gethostname())    #windows版和mac版相同

邮件群发

#!/usr/bin/env python
# -*- coding: utf8 -*-
#包含收件人列表,一行一个地址,保存为list.txt
#群发邮件内容,保存为mail.eml
 
import smtplib
import time
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email.MIMEMultipart import MIMEMultipart
import email
 
#邮件发送服务器地址
smtp_server = '127.0.0.1'
 
#发件人地址
from_usr = 'test@abc.com'
 
#邮件标题
title = 'Test Mail'
 
#发一封歇息多久
delay = 0.1
 
#smtp服务器是否需要验证
#需要验证为1不需要为0
auth = 0
 
#如果需要验证,请在下面输入用户名和密码
log_usr=""
log_passwd=""
 
def gingerMail(smtp_server,from_usr,to_usr,title,msg,auth,log_usr,log_passwd):
   server = smtplib.SMTP(smtp_server)
#   server.set_debuglevel(1)
   if auth == 1:
       server.login(log_usr,log_passwd)
   subject= to_usr.split('@')[0]+', Look this: '+title
   msg.replace_header('Subject',email.Header.Header(subject, 'utf-8'))
   msg.replace_header('Date',time.ctime())
   msg.replace_header('From',from_usr)
   msg.replace_header('To',to_usr)
   server.sendmail(from_usr, to_usr, msg.as_string())
   server.quit()
 
f_list='list.txt';
fp=open('mail.eml','r')
msg=email.message_from_file(fp)
fp.close()
 
try:
   f=open(f_list,'r')
   lines=f.readlines()
   #邮件计数变量
   cnt = 0
   for line in lines:
       to_usr = line.strip()
       cnt = cnt + 1
       gingerMail(smtp_server,from_usr,to_usr,title,msg,auth,log_usr,log_passwd)
       print "Email No." + str(cnt) + " has been sent to: " + to_usr + ""
       time.sleep(delay)
   f.close()
   print "......All Finished!!!"
except IOError msg:
   print 'error:', msg[0]

目录统计

################################################################################
##############
#  SpaceFinder.py v.1  07/01/2004
#  Plagerized from many sources by: triggernum5
#  If you see your code in here, then by all means claim the credit
#  Use:  Enter the name of the root directory you wish scanned  in the name field on line #67
#          Press White button to view output.
#  Notice:  May take a while when run on huge directories
#  Future versions will incorporate a browsing interface, and directory depth options.
#  For now please bear in mind that I began stealing this code today
################################################################################
##############
 
import os
 
listG = []
 
def GetTotalFileSize(dummy_param, directory, list_of_files):
'''Given a list of files and the directory they're in, add the
total size and directory name to the global list listG.
'''
global listG
currdir = os.getcwd()
os.chdir(directory)
total_size = 0
if len(list_of_files) != 0:
  for file in list_of_files:
  if file == ".." or file == ".": continue
  try:
    size = os.path.getsize(file)
    total_size = total_size + size
  except:
    continue
listG.append([total_size, directory])
os.chdir(currdir)
 
def GetSize(directory):
'''Returns a list of the form [ [a, b], [c, d], ... ] where
a, c, ... are the number of total bytes in the directory and
b, d, ... are the directory names. The indicated directory
is recursively descended and the results are sorted by directory
size with the largest directory at the beginning of the list.
'''
import os
global listG
listG = []
os.path.walk(directory, GetTotalFileSize, "")
listG.sort()
listG.reverse()
 
def ShowBiggestDirectories(directory):
import regsub
GetSize(directory)
# Get total number of bytes
total_size = 0
for dir in listG:
  total_size = total_size + dir[0]
if total_size != 0:
  print "For directory '%s': " % directory,
  print "[total bytes = %.1f MB]" % (total_size / (1024.0*1024))
  print "Size            -    Directory Name"
  print ""
  print "---------------- " + "-" * 50
  not_shown_count = 0
  for dir in listG:
  dirsize = dir[0] / (1024*1024)
  dir[0] = 100.0 * dir[0] / total_size
  dir[1] = regsub.gsub("\\\\", "/", dir[1])
  print "%6.1fMB %s" % (dirsize, dir[1])
 
if __name__ == '__main__':
import sys
name = 'f:\\games\\'
ShowBiggestDirectories(name)

haproxy log parser to mongodb

#!/usr/bin/env python
 
import re, time, sys, gzip, mmap
import thread
from datetime import datetime
from subprocess import Popen, PIPE, STDOUT
from pymongo import Connection
from pymongo.errors import CollectionInvalid
from multiprocessing import *
from uasparser import UASparser
import httpagentparser
 
''' exp
'''
 
def yesterday():
    return time.strftime('%Y%m%d',time.localtime(time.time() - 24*60*60) )
 
def insert_data(date):
    data = []
#    logs = open('/var/log/haproxy/haproxy.log.%s' % (date), 'r')
    with open('/var/log/haproxy/haproxy.log.%s' % (date), 'r+b') as f:
        logs = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
        for line in iter(logs.readline, ""):
            msg = parse_line(line)
            if msg:
                if msg["domain_ua_ssl_referer"]:
                    domain_ua_ssl_referer_msg = domain_ua_ssl_referer(msg.pop('domain_ua_ssl_referer'))
 
                if msg["reptype_replength"]:
                    reptype_replength_msg = reptype_replength(msg.pop('reptype_replength'))
 
                if msg["method_uri_protocol"]:
                    method_uri_protocol_msg = parse_method_uri_protocol(msg.pop('method_uri_protocol'),msg["status_code"])
 
                if domain_ua_ssl_referer_msg["user_agent"]:
                    ua_msg = httpagentparser.detect(domain_ua_ssl_referer_msg["user_agent"])
                #ua_msg = ua_parse(msg["user_agent"])
                #if msg["user_agent"]:
                #    ua_msg = ua_parse(msg["user_agent"])
                #else:
                #    ua_msg = ua_parse(" ")
 
                time_msg = timestamp(msg["accept_date"])
                #print uri_msg
                #print time_msg
                t_msg = dict(domain_ua_ssl_referer_msg, **reptype_replength_msg)
                t_msg = dict(method_uri_protocol_msg, **t_msg)
                t_msg = dict(ua_msg, **t_msg)
                t_msg = dict(ua_msg, **t_msg)
                msg = dict(msg, **t_msg)
                mongo_coll.insert(msg)
                #data.append(msg)
                #if len(data) == 100:
                #    mongo_coll.insert(data)
                #    data = []
            else:
                print line
                sys.exit()
        logs.close()
 
def parse_line(line):
    haproxy_re = (
    r'(?P<date>\w+\s+\d+\s+\d+:\d+:\d+)\s+'
    r'(?P<host>\w+)\s+'
    r'haproxy\[(?P<pid>\d+)\]:\s+'
    r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):(?P<client_port>\d{1,5})\s+'
    r'\[(?P<accept_date>.*)\]\s+'
    r'(?P<frontend_name>[\w-]+)\s+'
    r'(?P<backend_name>\S+)/(?P<backendnode>\S+)\s+'
    r'(?P<Tq>(-1|\d+))/(?P<Tw>(-1|\d+))/(?P<Tc>(-1|\d+))/(?P<Tr>(-1|\d+))/(?P<Tt>\+?\d+)\s+'
    r'(?P<status_code>(-1|\d+)?) (?P<bytes_read>\+?\d+)\s+'
    r'(?P<captured_request_cookie>\S+) (?P<captured_response_cookie>\S+)\s+'
    r'(?P<termination_state>[\w-]{4}) (?P<actconn>\d+)/(?P<feconn>\d+)/(?P<beconn>\d+)/(?P<srv_conn>\d+)/(?P<retries>\d+)\s+'
    r'(?P<server_queue>\d+)/(?P<listener_queue>\d+)\s+'
    r'(\{(?P<domain_ua_ssl_referer>.*?)\})?\s+'
    r'(\{(?P<reptype_replength>.*?)\})\s+'
    r'(\"(?P<method_uri_protocol>.*?)\")'
    )
 
#    r'(\{(?P<domain>[\S\s]+)?\|(?P<ssl>[\w-]+)?\|(?P<user_agent>.*?)\|(?P<req_content>\d+)?\|(?P<referer>[\S+\s+]+)?\})?\s+'
#    r'(\{(?P<rep_content_type>[\S+]+)?\|(?P<rep_content_length>\d+)?\})?\s+'
#    r'(("(?P<method>\S+) (?P<uri>\S+?) (?P<protocol>.*?)")?)'
 
    haproxy_re = re.compile(haproxy_re)
 
    m = haproxy_re.match(line)
    if m:
        msg = m.groupdict()
        return msg
    else:
        return {}
 
def domain_ua_ssl_referer(domain_ua_ssl_referer):
    fields = ["domain", "ssl", "user_agent", "req_content", "referer"]
    parsed = domain_ua_ssl_referer.split("|")
    domain_ua_ssl_referer_msg = dict(zip(fields, parsed))
    return domain_ua_ssl_referer_msg
 
def reptype_replength(reptype_replength):
    fields = ["rep_content_type", "rep_content_length"]
    parsed = reptype_replength.split("|")
    reptype_replength_msg = dict(zip(fields, parsed))
    return reptype_replength_msg
 
def parse_method_uri_protocol(method_uri_protocol,stat_code):
    fields = ["method", "uri", "protocol", "serviceid", "siteid", "path"]
 
    parsed = method_uri_protocol.split()
 
    if parsed[1]:
        parts = parsed[1].split("/")
        if len(parts) > 3 and stat_code != "400" :
            uri_msg = [parts[1], parts[2], "/".join(parts[3:])]
        else:
            uri_msg =[]
 
    parsed = parsed + uri_msg
    method_uri_protocol_msg = dict(zip(fields, parsed))
    return method_uri_protocol_msg
 
def ua_parse(user_agent):
    user_agent = list(user_agent)
    android_ua_re = (
    r'[\S+]+\s+'
    r'[\S+]+\s+'
    r'[\S+]+\s+'
    r'(?P<device_os>[\S+]+)\s+(?P<device_version>[\d\w\.-]+);\s+\S+\s+(?P<device_name>[\S+\s+]+)?\s+Build/\S+\)\s+'
    r'\S+\s+\(\S+\s+\S+\s+\S+\)\s+'
    r'Version/(?P<browser_version>[\d\.]+)? Mobile\s+(?P<browser_name>\w+)/[\d\. ]+'
    )
    android_ua_re = re.compile(android_ua_re,re.I)
 
    ios_ua_re = (
    r'\S+\s+'
    r'\((?P<device_name>[\w+]+)?;\s+'
    r'(U;)?\s+'
    r'\S+\s+'
    r'\S+\s+\S+\s+(?P<device_version>[\S+]+)\s+'
    r'\S+\s+'
    r'(?P<device_os>Mac OS X)[\S+\s+]+'
    r'Version/(?P<browser_version>[\w+\.]+)\s+'
    r'[\S+\s+]+\s+'
    r'(?P<browser_name>[\w+]+)'
    )
    ios_ua_re = re.compile(ios_ua_re,re.I)
 
    win_ua_re = (
    r'\S+\s+'
    r'\S+\s+'
    r'(?P<browser_name>[\S+]+)\s+'
    r'(?P<browser_version>[\S+]+);\s+'
    r'(?P<device_os>Windows Phone)\s+\S+\s+'
    r'(?P<device_version>[\S+]+);\s+'
    r'\S+\s+'
    r'\S+\s+'
    r'\S+\s+'
    r'(?P<device_name>[\S+]+);\s+'
    )
 
    win_ua_re = re.compile(win_ua_re,re.I)
 
 
#    uas_parse = UASparser()
#    ua_msg = uas_parse.parse(user_agent)
#    return ua_msg
 
def timestamp(accept_date):
    timestamp_msg = {"timestamp" : time.mktime(datetime.strptime(accept_date,"%d/%b/%Y:%H:%M:%S.%f").timetuple())}
    return timestamp_msg
 
if __name__ == '__main__':
    DB_NAME = 'haproxy'
    mongo_conn = Connection()
    mongo_db = mongo_conn[DB_NAME]
    if len(sys.argv) == 1:
        COLLECTION_NAME = 'access%s' % (yesterday())
        date = yesterday()
    else:
        COLLECTION_NAME = 'access%s' % (sys.argv[1])
        date = sys.argv[1]
 
    try:
#        mongo_coll = mongo_db.create_collection(COLLECTION_NAME, capped=True, size=MAX_COLLECTION_SIZE*1048576)
        mongo_coll = mongo_db.create_collection(COLLECTION_NAME)
    except CollectionInvalid:
        mongo_coll = mongo_db[COLLECTION_NAME]
 
    insert_data(date)

生成shadow格式密码

python -c 'import crypt; print crypt.crypt("password", "$1$Ti.VaigZ")
/var/lib/openshift/bccd8eac1968476490eaee9ced33c7bf/app-root/runtime/repo/php/data/pages/python.txt · 最后更改: 2012/11/12 14:07 由 admin
到顶部
CC Attribution-Noncommercial-Share Alike 3.0 Unported
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0