洛阳铲的日志

2010年10月13日

python的生成器

Filed under: Python — 标签:, , — HackGou @ 17:58

这个列子来自yield的手册页:

def echo(value=None):
#print "Execution starts when 'next()' is called for the first time."
try:
i = 1
while True:
try:
a = (yield value+i)
i=i+1
print "The value of yield:",a
except GeneratorExit:
# never catch GeneratorExit
raise
except Exception, e:
a = e
finally:
print "Don't forget to clean up when 'close()' is called."
print "generator starting"
generator = echo(1)
print "generator retured"
print
print "1st time next starting:"
t=generator.next()
print "return of the 1st next:",t
print "1st time next end "
print
print "2nd time next starting:"
t=generator.next()
print "return of the 2st next:", t
print "2nd time next end"
print
print "3rd time next starting:"
t=generator.send("My Value")
print "return of the send:",t
print "3rd time next end"
print
generator.throw(TypeError, "spam")
print
generator.close()
 

执行结果:

generator starting
generator retured
1st time next starting:
return of the 1st next: 2
1st time next end
2nd time next starting:
The value of yield: None
return of the 2st next: 3
2nd time next end
3rd time next starting:
The value of yield: My Value
return of the send: 4
3rd time next end
            
Don't forget to clean up when 'close()' is called.

换句话说,结合上面上面的例子,可以知道yield会出现这些结果:

1. 同时next()和send(None)是等价的。

2. 而send(value), a=yield a+i两条语句, 会让value的值成为yield的返回值,即a=value,同时a+i作为send(value)的返回值。

没错: a=value; 而 send()的值为: a+i.

3. next()返回yield之后的值,即: next()返回value+i

参考资料:

[1]: 奇妙的流控制 Python中的迭代器与生成器: http://developer.51cto.com/art/201007/212841.htm

yield之py和rb

Filed under: Python,ruby — 标签:, , — HackGou @ 17:56

py和rb里面都有yield, 但是他们的用法却天壤之别:

py中yield用于生成器,引用[1]的话说:

yield 简单说来就是一个生成器,生成器是这样一个函数,它记住上一次返回时在函数体中的位置。对生成器函数的第二次(或第 n 次)调用跳转至该函数中间,而上次调用的所有局部变量都保持不变。

生成器 是 一个函数
函数的所有参数都会保留

第二次调用 此函数 时
使用的参数是前一次保留下的.

生成器还"记住"了它在流控制构造
生成器不仅"记住"了它数据状态。 生成器还"记住"了它在流控制构造(在命令式编程中,这种构造不只是数据值)中的位置。由于连续性使您在执行框架间任意跳转,而不总是返回到直接调用者的上下文(如同生成器那样),因此它仍是比较一般的。

这是个列子:

>>> def tt(x):
...     print "starting"
...     yield x
...     print "continue"
...     yield x+10
...     print "3th step"
...     yield x*5
... 
>>> i=tt(1)
>>> i.next()
starting
1
>>> i.next()
continue
11
>>> i.next()
3th step
5
>>> i.next()
Traceback (most recent call last):
  File "
", line 1, in 

StopIteration
>>> for k in tt(1): ... print k ... starting 1 continue 11 3th step 5
>>> type(i) <type 'generator'> >>> type(type(i)) <type 'type'> >>> type(tt) <type 'function'> >>>

而rb中,yield却是对传入块的调用,比如:

irb(main):001:0> a=[1,2,3]
=> [1, 2, 3]
irb(main):002:0> def foo
irb(main):003:1> puts "starting"
irb(main):004:1> yield a
irb(main):005:1> puts "finished"
irb(main):006:1> end
irb(main):007:0> a.each do | a1 |
irb(main):008:1* foo(a1) { | x | puts x*x }
irb(main):009:1> end
starting
1
finished
starting
4
finished
starting
9
finished
=> [1, 2, 3]
irb(main):010:0>

可以看出,其实yied a1就是对{|x| puts x*x}的调用,

[1]: 《Python yield 用法》: http://www.pythonclub.org/python-basic/yield

2010年03月29日

moin wiki农场

Filed under: Python — HackGou @ 10:59

two parts:

  1. step 0-2. all are required
  2. step 3,4,5. web server configration, you only need one of them.

all these installation are done in $HOME , so don’t need any root prerogative

step 0. install/upgrade Moin to 1.9.2 (when upgrade,
recommend to delete the old release, I didn’t do this before , that cause a big problem )

install new moin ignore.

this is the upgrade:

1.8.3->1.9.2

python setup.py install

gavin_kou@shadow:~/downloads/python/moin-1.9.2$ ll /home/gavin_kou/local/lib/python2.5/site-packages/moin-* -d
drwxrwxr-x 6 gavin_kou pg2184500 4096 2009-06-03 02:32 /home/gavin_kou/local/lib/python2.5/site-packages/moin-1.8.3-py2.5.egg
-rw-rw-r– 1 gavin_kou pg2184500 3183 2010-03-10 02:26 /home/gavin_kou/local/lib/python2.5/site-packages/moin-1.9.2-py2.5.egg-info

>>> import MoinMoin.version

>>> MoinMoin.version.release
‘1.9.2’

there are a special directory: ~/local/share/moin/ ,it contains all of wiki initialization data, especially the data and underlay directory and
some files in the server and config directory: server(eg: moin.fcgi is the server of fcgi ENV, and moin.cgi is the cgi script , etc ) and
config( eg: config/wikifarm/farmconfig.py is the wiki farm configuration sample )
gavin_kou@shadow:~/sites/wiki/bin$ ll ~/local/share/moin/
total 16
drwxrwxr-x 5 gavin_kou pg2184500 4096 2010-03-10 02:26 config
drwxrwxr-x 7 gavin_kou pg2184500 4096 2010-03-10 02:26 data
drwxrwxr-x 2 gavin_kou pg2184500 4096 2010-03-10 02:26 server
drwxrwxr-x 3 gavin_kou pg2184500 4096 2010-03-10 02:26 underlay

step 1. design the directory structure, all the setting are based on this stucture. if change the structure ,
DO NOT forget to change the corresponding setting files.

wiki/
├─bin/
│ ├─mointwisted
│ ├─mointwisted.py
│ ├─moin.fcgi
│ ├─moin.cgi
│ └─moin
├─config/
│ ├─farmconfig.py
│ ├─hackgou.py
│ └─hiking.py
├─data/
│ ├─hackgou
│ │ ├─data/
│ │ └─underlay/
│ ├─hiking/
│ │ ├─data/
│ │ └─underlay/
│ └─user/
└─static/
└─htdocs/
and create it by:

mkdir -p wiki/{bin,config,data/hackgou,data/hiking,static}

step1. copy data ( the data and underlay dir)
cp -rp ~/local/share/moin/data ~/local/share/moin/underlay wiki/data/hackgou
cp -rp ~/local/share/moin/data ~/local/share/moin/underlay wiki/data/hiking

step2. copy the ~/local/share/moin/config/wikifarm/farmconfig.py and ~/local/share/moin/config/wikifarm/mywiki.py to your config dir
mywiki.py

cp ~/local/share/moin/config/wikifarm/farmconfig.py wiki/config/
cp ~/local/share/moin/config/wikifarm/mywiki.py wiki/config/hackgou.py
cp ~/local/share/moin/config/wikifarm/mywiki.py wiki/config/hiking.py

and change the farmconfig.py:

wikis = [
(“hiking”, r’^http://hackgou.itbbq.com/wiki/hiking.*$’),
(“hackgou”, r’^http://hackgou.itbbq.com/.*$’),
]

add the following code into the hackgou.py and hiking.py(Both of them):

import os
app_root=os.path.realpath( os.path.join( os.path.dirname( os.path.realpath(__file__) ) ,’..’) )
data_root=os.path.join(app_root, ‘data’)
data_dir = os.path.join(data_root, __name__, ‘data’ )
data_underlay_dir = os.path.join(data_root, __name__, ‘underlay’ )

step3. config the mointwisted ,

add the following lines into mointwisted.py

import sys, os
app_root=os.path.realpath( os.path.join( os.path.dirname( os.path.realpath(__file__) ) ,’..’) )
sys.path.insert(0, os.path.join(app_root, ‘config’) )

step4. config the WSGI

cp ~/local/share/moin/server/moin.wsgi bin/

change the following codes:

app_root=os.path.realpath( os.path.join( os.path.dirname( os.path.realpath(__file__) ) ,’..’) )
sys.path.insert(0, os.path.join(app_root, ‘config’) )

application = make_application(shared=os.path.join(app_root,’static’,’htdocs’))

setp5. config the FCGI, two files: moin.fcgi and .htaccess
cp ~/local/share/moin/server/moin.fcgi and change it.

change the following line

import sys, os

sys.path.insert(0, ‘/home/gavin_kou/local/lib/python2.5/site-packages’)

app_root=os.path.realpath( os.path.join( os.path.dirname( os.path.realpath(__file__) ) ,’..’,’wiki’) ) # based the above directory structure
sys.path.insert(0, os.path.join(app_root, ‘config’) )

from MoinMoin import log
#enable the log, for trouble shooting. the log.conf is the logger config file,
# if enable it , please make sure it’s right
log.load_config( os.path.join( app_root,’config’,’log.conf’) )
logging = log.getLogger(__name__)

from MoinMoin.web.serving import make_application
app = make_application( shared = os.path.join(app_root,’static’,’htdocs’) ) # <– adapt here as above directory structure

fix_script_name = ‘/wiki’

.htaccess add the following lines:

AddHandler fastcgi-script .fcgi
RewriteRule wiki(/?.*) /moin.fcgi/$1 [L]

other advanced setting:

user_dir=os.path.join(app_root, ‘data’, ‘user’)
#session data stored in cache_dir/__session_. so
# if shared the cache_dir, shared the login information(SSO)
cache_dir=os.path.join(app_root,’var’,’cache’)

2009年11月6日

crawler collection

Filed under: Python — 标签:, , , , , , , — HackGou @ 18:15

python based crawler:

  1. Atomisator: http://atomisator.ziade.org/ to build custom RSS feeds
  2. Orchid: http://pypi.python.org/pypi/Orchid/1.1
    Orchid is a python crawler I developed for one of my graduate courses. It is a generic multi-threaded web crawler complete with documentation. We used this crawler to locate web pages which contained malicious code. However, the logic of what to do with the crawled pages is implemented in a separate class and therefore Orchid can easily be used for any application which requires crawling the web
  3. Ruya: http://pypi.python.org/pypi/Ruya/1.0
    Ruya is a Python-based breadth-first, level-, delayed, event-based-crawler for crawling English, Japanese websites. It is targeted solely towards developers who want crawling functionality in their projects using API, and crawl control
  4. harvestman:
    HarvestMan (with a capital ‘H’ and a capital ‘M’) is a webcrawler program. HarvestMan belongs to a family of
    programs frequently addressed as webcrawlers, webbots, web-robots, offline browsers etc.
    These programs are used to crawl a distributed network of computers like the Internet and download files locally
    1. http://code.google.com/p/harvestman-crawler/
    2. http://www.harvestmanontheweb.com/
  5. : http://dev.scrapy.org/
    Scrapy a is an application framework for crawling web sites and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival.
    Even though Scrapy was originally designed for screen scraping, it can also be used to extract data using APIs (such as Amazon Associates Web Services) or as a general purpose web crawler.
    The purpose of this document is to introduce you to the concepts behind Scrapy so you can get an idea of how it works and decide if Scrapy is what you need.
  6. Webstemmer : http://www.unixuser.org/~euske/python/webstemmer/index.html
    Webstemmer is a web crawler and HTML layout analyzer that automatically extracts main text of a news site without having banners, ads and/or navigation links mixed up

Other Crawler:

  1. droids http://incubator.apache.org/droids/
  2. Heritrix: http://crawler.archive.org/articles/user_manual/creating.html

Del.icio.us : , , , , , , ,

2008年01月17日

再次质疑Django+M od_python时对环境 变量的处理

Filed under: Apache,django,Python — HackGou @ 12:12

今天在升级一个django开发的系统到V2的时候,
发现~/无法正确的展开成 /home/hackgou.(apache是以hackgou帐号执行的,)
也不会展开成/root(apache是以root启动的) 觉得非常奇怪,
就算是用os.path.expanduser(‘~/’)也无济于事。 于是怀疑是os.envrion[‘HOME’]不对,
因为expanduser是需要这个变量来展开~/的。 于是使用setenv HOME /home/hackgou/ 可是也不工作,
后来在django/core/handlers/modpython.py的ModPythonHandler中 发现Django开发组已经注意到,
mod_python不理会apache的setENV指令:
# mod_python fakes the environ, and thus doesn’t process SetEnv. This fixes that
os.environ.update(req.subprocess_env)

但是这个和SetEnv DJANGO_SETTINGS_MODULE app.settings 自相矛盾矛盾,真晕,
自己不设置正确的,也不搭理管理员指定的,怎么办?
后来在http://code.google.com/p/modwsgi/wiki/ApplicationIssues找到一些类似的情况,
提到sudo的时候有bug,会导致HOME和root启动的HOME不一样,太好了,我就要这样的bug。
设置hackgou账号的sudo权限,然后用sudo来启动apache,果然所有的expanduser(‘~/’)都顺利。
除此之外, 文中提到
import os, pwd os.environ[“HOME”] = pwd.getpwuid(os.getuid()).pw_dir
这样的代码,似乎可以解决这个问题,神啊,我很反感这种做法:
1.django中似乎没有地方可以放置这样的需要这个APP都需要的代码,我的DRY啊
2.Django修改环境变量似乎成了习惯,之前碰到个TIME_ZONE的问题,就是因为修改了APACHE的环境变量,
导致 别的应用环境受到污染,要知道一个apache进程是很多应用共享的, 除了Djano还有别的应用,
比如别的Django应用或者PHP,都有可能在一个相同的APACHE进程空间中处理,
说有可能是因为apache本身的一些设置会出现这些差别,跟py没有关系, 这样会导致他们的工作环境受到污染。
这似乎没有好的方法可以解决这个问题, 对Django或者说对mod_python 的这种拖泥带水的做法感觉非常的不爽。
也不知道有没有更好的更彻底的解决方法?
如果谁知道,能够告诉我那是最好不过的了

Older Posts »

Powered by WordPress