Plan 9 from Bell Labs’s /usr/web/sources/contrib/stallion/root/sys/lib/python2.7/robotparser.pyo

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


�`^c@s}dZddlZddlZdgZddd��YZddd��YZdd
d��YZd	ejfd
��YZdS(s& robotparser.py

    Copyright (C) 2000  Bastian Kleineidam

    You can choose between two licenses when using this package:
    1) GNU GPLv2
    2) PSF license for Python 2.2

    The robots.txt Exclusion Protocol is implemented as specified in
    http://www.robotstxt.org/norobots-rfc.txt

i�NtRobotFileParsercBsbeZdZdd�Zd�Zd�Zd�Zd�Zd�Zd�Z	d	�Z
d
�ZRS(ss This class provides a set of methods to read, parse and answer
    questions about a single robots.txt file.

    tcCs>g|_d|_t|_t|_|j|�d|_dS(Ni(tentriestNonet
default_entrytFalsetdisallow_allt	allow_alltset_urltlast_checked(tselfturl((s!/sys/lib/python2.7/robotparser.pyt__init__s				
cCs|jS(s�Returns the time the robots.txt file was last fetched.

        This is useful for long-running web spiders that need to
        check for new robots.txt files periodically.

        (R	(R
((s!/sys/lib/python2.7/robotparser.pytmtime!scCsddl}|j�|_dS(sYSets the time the robots.txt file was last fetched to the
        current time.

        i�N(ttimeR	(R
R((s!/sys/lib/python2.7/robotparser.pytmodified*scCs/||_tj|�dd!\|_|_dS(s,Sets the URL referring to a robots.txt file.iiN(Rturlparsethosttpath(R
R((s!/sys/lib/python2.7/robotparser.pyR2s	cCs�t�}|j|j�}g|D]}|j�^q"}|j�|j|_|jdkrkt|_nO|jdkr�|jdkr�t|_n%|jdkr�|r�|j	|�ndS(s4Reads the robots.txt URL and feeds it to the parser.i�i�i�i�i�N(i�i�(
t	URLopenertopenRtstriptcloseterrcodetTrueRRtparse(R
topenertftlinetlines((s!/sys/lib/python2.7/robotparser.pytread7s	
cCsAd|jkr-|jdkr=||_q=n|jj|�dS(Nt*(t
useragentsRRRtappend(R
tentry((s!/sys/lib/python2.7/robotparser.pyt
_add_entryEscCs&d}d}t�}|j�x�D]�}|d7}|s�|dkrZt�}d}q�|dkr�|j|�t�}d}q�n|jd�}|dkr�|| }n|j�}|s�q&n|jdd�}t|�dkr&|dj�j�|d<tj	|dj��|d<|ddkru|dkrX|j|�t�}n|j
j|d�d}q|ddkr�|dkr�jjt
|dt��d}q�|ddkr|dkr�jjt
|dt��d}q�q&q&W|dkr"|j|�nd	S(
s�parse the input lines from a robots.txt file.
           We allow that a user-agent: line is not preceded by
           one or more blank lines.iiit#t:s
user-agenttdisallowtallowN(tEntryRR#tfindRtsplittlentlowerturllibtunquoteR R!t	rulelinestRuleLineRR(R
Rtstatet
linenumberR"Rti((s!/sys/lib/python2.7/robotparser.pyRNsP	


		
	

	cCs�|jr
tS|jrtS|js'tStjtj|��}tjdd|j	|j
|j|jf�}tj
|�}|s�d}nx-|jD]"}|j|�r�|j|�Sq�W|jr�|jj|�StS(s=using the parsed robots.txt decide if useragent can fetch urlRt/(RRRRR	RR-R.t
urlunparseRtparamstquerytfragmenttquoteRt
applies_tot	allowanceR(R
t	useragentRt
parsed_urlR"((s!/sys/lib/python2.7/robotparser.pyt	can_fetch�s$					cCs-djg|jD]}t|�d^q�S(NRs
(tjoinRtstr(R
R"((s!/sys/lib/python2.7/robotparser.pyt__str__�s(t__name__t
__module__t__doc__RR
RRRR#RR>RA(((s!/sys/lib/python2.7/robotparser.pyRs								4	 R0cBs)eZdZd�Zd�Zd�ZRS(soA rule line is a single "Allow:" (allowance==True) or "Disallow:"
       (allowance==False) followed by a path.cCsS|dkr|rt}ntjtj|��}tj|�|_||_dS(NR(RRR5R-R9RR;(R
RR;((s!/sys/lib/python2.7/robotparser.pyR�s
	cCs|jdkp|j|j�S(NR(Rt
startswith(R
tfilename((s!/sys/lib/python2.7/robotparser.pyR:�scCs|jrdpdd|jS(NtAllowtDisallows: (R;R(R
((s!/sys/lib/python2.7/robotparser.pyRA�s(RBRCRDRR:RA(((s!/sys/lib/python2.7/robotparser.pyR0�s		R(cBs2eZdZd�Zd�Zd�Zd�ZRS(s?An entry has one or more user-agents and zero or more rulelinescCsg|_g|_dS(N(R R/(R
((s!/sys/lib/python2.7/robotparser.pyR�s	cCsjg}x'|jD]}|jd|dg�qWx*|jD]}|jt|�dg�q:Wdj|�S(NsUser-agent: s
R(R textendR/R@R?(R
trettagentR((s!/sys/lib/python2.7/robotparser.pyRA�scCs]|jd�dj�}x=|jD]2}|dkr9tS|j�}||kr#tSq#WtS(s2check if this entry applies to the specified agentR4iR(R*R,R RR(R
R<RK((s!/sys/lib/python2.7/robotparser.pyR:�scCs.x'|jD]}|j|�r
|jSq
WtS(sZPreconditions:
        - our agent applies to this entry
        - filename is URL decoded(R/R:R;R(R
RFR((s!/sys/lib/python2.7/robotparser.pyR;�s(RBRCRDRRAR:R;(((s!/sys/lib/python2.7/robotparser.pyR(�s
			
RcBs#eZd�Zd�Zd�ZRS(cGs tjj||�d|_dS(Ni�(R-tFancyURLopenerRR(R
targs((s!/sys/lib/python2.7/robotparser.pyR�scCsdS(N(NN(R(R
Rtrealm((s!/sys/lib/python2.7/robotparser.pytprompt_user_passwd�scCs(||_tjj||||||�S(N(RR-RLthttp_error_default(R
RtfpRterrmsgtheaders((s!/sys/lib/python2.7/robotparser.pyRP�s	(RBRCRRORP(((s!/sys/lib/python2.7/robotparser.pyR�s		((((	RDRR-t__all__RR0R(RLR(((s!/sys/lib/python2.7/robotparser.pyt<module>s	�$

Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to [email protected].