Code snippets for symfony 1.x

Navigation

Refine Tags

Snippets tagged "xml"

translate messages with html tags

Example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xliff PUBLIC "-//XLIFF//DTD XLIFF//EN" "http://www.oasis-open.org/committees/xliff/documents/xliff.dtd">
<xliff version="1.0">
    <file original="global" source-language="en" datatype="plaintext" date="2009-08-12T12:32:00Z">
        <header/>
        <body>
            ...
            <trans-unit>
                <source>from %from_date%&lt;br /&gt;to %to_date%</source>
                <target>de %from_date%&lt;br /&gt;até %to_date%</target>
            </trans-unit>
        </body>
    </file>
</xliff>
 

Use '&lt;' and '&gt;' to replace '<' and '>'. respectively.

by Raphael Araújo on 2012-01-26, tagged i18n  translate  xml 

Sending application/xhtml+xml

All web browsers support the 'text/html' content type. However, since the introduction of XHTML, a new content type has been available that brings the benefits of XML, which is 'application/xhtml+xml' (more info, why do this)

Although not all browsers support 'application/xhtml+xml', fortunately, the designers of the HTTP protocol provided a good solution: the browser tells the server what content types it can accept and which it prefers. So the server is able to choose which content type to serve.

Firefox is one of the browsers that supports 'application/xhtml+xml' and with the new Firefox3 comes incremental rendering, which means that it has fixed the earlier disadvantage in Firefox2 that 'text/html' was faster to render (more info).

The most noticeable advantage of XHTML rendering is that errors are displayed by the browser, meaning that it's easier to identify and fix layout problems caused by muddled content. If your content is ill-formed, Firefox displays a yellow error page indicating where the error was located. This can be a most useful development tool.

So how can we serve 'application/xhtml+xml' content?

  1. Your content has to be written to be XHTML instead of HTML (the W3C website provides helpful information in explaining how to do this). Your pages must include the correct DOCTYPE (how to do this).

  2. Your application has to change the content-type header depending on whether the browser can accept XHTML or not. Obviously, it's useless to send 'application/xhtml+xml' to older browsers that don't support it.

This snippet helps with the second issue. I have written a filter (shown below) to change the response headers depending on what each browser is able to handle.

Create a new class in your lib folder called XHTMLFilter.class.php thus:

<?php
 
class XHTMLFilter extends sfFilter
{
  public function execute ($filterChain)
  {
    $applyIn = $this->getParameter('environments', array('dev'));
    if (!is_array($applyIn))
    {
      throw new sfException("XHTMLFilter param: not an array ".print_r($applyIn, true));
    }
 
    if (in_array( SF_ENVIRONMENT, $applyIn ))
    {
      $accept = $this->context->getRequest()->getAcceptableContentTypes();
      if (in_array('application/xhtml+xml', $accept))
      {
        $this->context->getResponse()->setContentType( 'application/xhtml+xml' );
      }
    }
 
    $filterChain->execute();
  }
}
 

Now add it to your app/config/filters.yml:

rendering: ~
web_debug: ~
security:  ~
 
XHTMLFilter:
  class: XHTMLFilter
  param:
    environments:    [ dev ]
 
cache:     ~
common:    ~
flash:     ~
execution: ~
 

There is one configuration parameter (environments) that contains an array of the environments to which the filter will apply. By default, only the 'dev' environment is included but you can add any or all of your environments here.

by Rick Beton on 2008-05-25, tagged contenttype  filter  header  xhtml  xml 
(2 comments)

Enhanced DB Designer XSL Transformer Utility

USAGE: create folders in project directory /batch/in /batch/out

place - transform.php - dbd2propel.xsl - transform into folder /batch

call transform script from symfony project console root directory: [LINUX] 1. php batch/transformer your_schema - input from /batch/in/your_schema.xml - output to folder /config

  1. php batch/transformer your_schema yourout

call transform script from symfony project console root directory: [WINDOWS] 1. php batch/transformer.php your_schema - input from /batch/in/your_schema.xml - output to folder /config

  1. php batch/transformer.php your_schema yourout

transform

#!/usr/bin/env php
<?php
chdir(dirname(__FILE__));
include('transform.php');
?>
 

tranform.php

<?php
/*
This PHP5 script will load the transform a DB Designer 4 database model to the
propel database schema file format
*/
 
define('SCHEMA_INPUT_DIR', dirname(__FILE__). DIRECTORY_SEPARATOR . 'in' . DIRECTORY_SEPARATOR );
define('SCHEMA_OUTPUT_TEST_DIR', dirname(__FILE__). DIRECTORY_SEPARATOR . 'out' . DIRECTORY_SEPARATOR );
define('SCHEMA_OUTPUT_DIR', dirname(dirname(__FILE__)). DIRECTORY_SEPARATOR .'config'.DIRECTORY_SEPARATOR );
 
 
$args = $_SERVER['argv'];
 
if(count($args) > 1 && count($args) < 4 && is_file(SCHEMA_INPUT_DIR . $args[1] . '.xml')) {
 
    $loadedDBDesignerScheme = SCHEMA_INPUT_DIR . $args[1] . '.xml';
    $processedScheme        = !empty($args[2])
                            ? SCHEMA_OUTPUT_TEST_DIR . $args[2] . '.xml'
                            : SCHEMA_OUTPUT_DIR .'schema.xml';
 
    $transformatorScheme    = 'dbd2propel.xsl';
 
    // load the DB Designer 4 XML
    $xml = new DOMDocument;
    $xml->load($loadedDBDesignerScheme);
 
    // load the transformation stylesheet
    $xsl = new DOMDocument;
    $xsl->load($transformatorScheme);
 
    $proc = new XSLTProcessor();
    // attach the xsl rules
    $proc->importStyleSheet($xsl);
 
    $schema_xml = $proc->transformToXML($xml);
 
    file_put_contents($processedScheme, $schema_xml);
 
    echo "\nTransformation done successfully\n";
    echo "DB Designer Input Schema File: " . $args[1] . ".xml\n";
    if(!empty($args[2]))
        echo "Symfony Output Schema File: " . $args[2] . ".xml\n";
    echo "Folder:" . ($processedScheme==SCHEMA_OUTPUT_DIR .'schema.xml' ?SCHEMA_OUTPUT_DIR: SCHEMA_OUTPUT_TEST_DIR);
 
} elseif(!is_file(SCHEMA_INPUT_DIR . $args[1] . '.xml') ){
    echo "\nError!\nschema file >>". SCHEMA_INPUT_DIR . $args[1] . '.xml<< does not exist'."\n\n";
} else {
    echo "\nSyntax: \n" ;
    echo "transform <schema_in> [optional: schema_out]";
}
 
?>
 

dbd2propel.xsl

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
 
<!--
DB Designer XML to Propel Schema XML
 
==== Author: Jonathan Graham <jkgraham@gmail.com>
==== Version: 0.4 (2006-11-13) (http://blog.tooleshed.com/?p=6)
==== Description:
This XSL will transform a DB Designer 4 database model XML file into a
Propel database schema file.  This allows you to design your database
model using DB Designer 4 (models are saved in XML format) and then
easily create the Propel database schema file.
 
The PROPEL properties that this XSL will translate are listed below.
 
TABLE: name, description
COLUMN: name, primaryKey, required, type, size, scale, default, autoIncrement, description
FOREIGN-KEY: foreignTable, name, onDelete
REFERENCE: local, foreign
INDEX: index (not related to FK), unique, fulltext
 
==== Usage:
- Simply feed this XSL into your favorite XSLT engine along with your DB Designer
XML model.  The resulting output is a Propel database schema XML.
 
==== Collaboration:
Peter Banik <peter@froggle.org> - UNIQUE TAG
Benedicto Franco Jr. - MULTIPLE FOREIGN KEY, DATABASE NAME
Martin Kreidenweis <martin@kreidenweis.com> - Bug fixes, INDEX
Michiel Hakvoort - onDelete
Michel D'HOOGE - FULLTEXT
Thomas Schaefer - v0.5
 
==== Software:
Propel: http://propel.phpdb.org/
DB Designer 4: http://www.fabforce.net/dbdesigner4/
 
==== Copyright (c) 2004-2006, Jonathan Graham
Licensed under the GNU Lesser General Public License (LGPL) - http://www.gnu.org/copyleft/lgpl.html.
 
==== Change Log
version 0.1 (2004-11-08) - initial version
version 0.2 (2006-10-18) - Added Peter and Benedicto's updates.
version 0.3 (2006-11-05) - added non-unique-INDEXes and onDelete
version 0.4 (2006-11-13) - added support for index names and FULLTEXT indexes, changed license to LGPL
version 0.5 (2008-01-16) - added support for comments for enabling phpName-Attribute, changed license to LGPL
version 0.5.1 (2008-02-07) - changed support for comments for enabling phpName-Attribute, changed license to LGPL
 
-->
 
<!-- ============================================================ DATABASE template -->
<xsl:template match="/">
    <database defaultIdMethod="native">
        <xsl:attribute name="name">
            <xsl:value-of select="/DBMODEL/SETTINGS/GLOBALSETTINGS/@ModelName"/>
        </xsl:attribute>
        <xsl:attribute name="package">
            <xsl:value-of select="/DBMODEL/SETTINGS/GLOBALSETTINGS/@Comments"/>
        </xsl:attribute>
        <xsl:if test="/DBMODEL/SETTINGS/GLOBALSETTINGS/@Comments != ''">
            <xsl:if test="contains(/DBMODEL/SETTINGS/GLOBALSETTINGS/@Comments,'package=')">
                <xsl:attribute name="package">
                    <xsl:value-of select="substring-after(/DBMODEL/SETTINGS/GLOBALSETTINGS/@Comments,'package=')"  />
                </xsl:attribute>
                <xsl:variable name="aa" select="substring-before(/DBMODEL/SETTINGS/GLOBALSETTINGS/@Comments,'package=')"/>
                <xsl:choose>
                    <xsl:when test="string-length($aa) &gt; 0">
                        <xsl:attribute name="description">
                            <xsl:value-of select="substring-before(/DBMODEL/SETTINGS/GLOBALSETTINGS/@Comments,'package=')"  />
                        </xsl:attribute>
                    </xsl:when>
                    <xsl:otherwise />
                </xsl:choose>
            </xsl:if>
        </xsl:if>
        <xsl:apply-templates />
    </database>
</xsl:template>
 
<!-- ============================================================ TABLES template -->
<xsl:template match="/DBMODEL/METADATA/TABLES/TABLE">
    <xsl:if test="@Collapsed = 0">
        <xsl:if test="@IsLinkedObject = 0">
            <table>
                <xsl:attribute name="name">
                    <xsl:value-of select="@Tablename"/>
                </xsl:attribute>
                <xsl:if test="@Comments != ''">
                    <xsl:if test="contains(@Comments,'phpName=')">
                        <xsl:attribute name="phpName">
                            <xsl:value-of select="substring-after(@Comments,'phpName=')"  />
                        </xsl:attribute>
                        <xsl:variable name="aa" select="substring-before(@Comments,'phpName=')"/>
                        <xsl:choose>
                            <xsl:when test="string-length($aa) &gt; 0">
                                <xsl:attribute name="description">
                                    <xsl:value-of select="substring-before(@Comments,'phpName=')"  />
                                </xsl:attribute>
                            </xsl:when>
                            <xsl:otherwise />
                        </xsl:choose>
                    </xsl:if>
                    <xsl:if test="not(contains(@Comments,'phpName='))">
                        <xsl:attribute name="description">
                            <xsl:value-of select="@Comments"  />
                        </xsl:attribute>
                    </xsl:if>
                </xsl:if>
                <xsl:apply-templates />
            </table>
        </xsl:if>
    </xsl:if>
</xsl:template>
 
 
<!-- ============================================================ COLUMNS template -->
<xsl:template match="COLUMNS/COLUMN">
    <column>
        <!-- get data type -->
        <xsl:variable name="datatype">
            <xsl:call-template name="get_datatype">
                <xsl:with-param name="id"><xsl:value-of select="@idDatatype"/></xsl:with-param>
            </xsl:call-template>
        </xsl:variable>
 
        <!-- remove parens from datatypeparams -->
        <xsl:variable name="dtpclean">
            <xsl:call-template name="clean_dataparams">
                <xsl:with-param name="dtp"><xsl:value-of select="@DatatypeParams"/></xsl:with-param>
            </xsl:call-template>
        </xsl:variable>
 
        <!-- ==== name ==== -->
        <xsl:attribute name="name">
            <xsl:value-of select="@ColName"/>
        </xsl:attribute>
 
        <!-- ==== type ==== -->
        <xsl:attribute name="type">
            <xsl:value-of select="$datatype"/>
        </xsl:attribute>
 
        <xsl:if test="$dtpclean != ''">
            <!-- ==== size ==== -->
            <xsl:attribute name="size">
                <xsl:call-template name="get_datasize">
                    <xsl:with-param name="dtpc"><xsl:value-of select="$dtpclean"/></xsl:with-param>
                    <xsl:with-param name="dtype"><xsl:value-of select="$datatype"/></xsl:with-param>
                </xsl:call-template>
            </xsl:attribute>
 
            <xsl:if test="contains('FLOAT,DOUBLE',$datatype)">
                <!-- ==== scale ==== -->
                <xsl:attribute name="scale">
                    <xsl:value-of select="substring-after($dtpclean,',')"/>
                </xsl:attribute>
            </xsl:if>
 
        </xsl:if>
 
        <!-- ==== primaryKey ==== -->
        <xsl:if test="@PrimaryKey = '1'">
            <xsl:attribute name="primaryKey">true</xsl:attribute>
        </xsl:if>
 
        <!-- ==== required ==== -->
        <xsl:if test="@NotNull = '1'">
            <xsl:attribute name="required">true</xsl:attribute>
        </xsl:if>
 
        <!-- ==== default ==== -->
        <xsl:if test="@DefaultValue != ''">
            <xsl:attribute name="default">
                <xsl:value-of select="@DefaultValue"/>
            </xsl:attribute>
        </xsl:if>
 
        <!-- ==== autoIncrement ==== -->
        <xsl:if test="@AutoInc = '1'">
            <xsl:attribute name="autoIncrement">true</xsl:attribute>
        </xsl:if>
 
        <!-- ==== description ==== -->
        <xsl:if test="@Comments != ''">
            <xsl:if test="contains(@Comments,'phpName=')">
                <xsl:attribute name="phpName">
                    <xsl:value-of select="substring-after(@Comments,'phpName=')"  />
                </xsl:attribute>
                <xsl:variable name="aa" select="substring-before(@Comments,'phpName=')"/>
                <xsl:choose>
                    <xsl:when test="string-length($aa) &gt; 0">
                        <xsl:attribute name="description">
                            <xsl:value-of select="substring-before(@Comments,'phpName=')"  />
                        </xsl:attribute>
                    </xsl:when>
                    <xsl:otherwise />
                </xsl:choose>
            </xsl:if>
            <xsl:if test="not(contains(@Comments,'phpName='))">
                <xsl:attribute name="description">
                    <xsl:value-of select="@Comments"  />
                </xsl:attribute>
            </xsl:if>
        </xsl:if>
 
    </column>
</xsl:template>
 
<!-- ============================================================ RELATIONS template -->
<xsl:template match="RELATIONS_END/RELATION_END">
 
    <xsl:variable name="id"><xsl:value-of select="@ID"/></xsl:variable>
 
    <xsl:call-template name="show_ForeignKey">
        <xsl:with-param name="relation" select="/DBMODEL/METADATA/RELATIONS/RELATION[@ID=$id]"/>
    </xsl:call-template>
 
</xsl:template>
 
<!-- ============================================================ INDEX template -->
<xsl:template match="INDICES/INDEX">
    <xsl:choose>
        <xsl:when test="@IndexKind = '1' and @FKRefDef_Obj_id='-1'">
            <index>
                <xsl:attribute name="name"><xsl:value-of select="@IndexName"/></xsl:attribute>
                <xsl:apply-templates select="INDEXCOLUMNS/INDEXCOLUMN" mode="normal"/>
            </index>
        </xsl:when>
        <xsl:when test="@IndexKind = '2'">
            <unique>
                <xsl:attribute name="name"><xsl:value-of select="@IndexName"/></xsl:attribute>
                <xsl:apply-templates select="INDEXCOLUMNS/INDEXCOLUMN" mode="unique"/>
            </unique>
        </xsl:when>
        <xsl:when test="@IndexKind = '3'">
            <index>
                <xsl:attribute name="name"><xsl:value-of select="@IndexName"/></xsl:attribute>
                <xsl:apply-templates select="INDEXCOLUMNS/INDEXCOLUMN" mode="normal"/>
                <vendor type="mysql">
                    <parameter name="Index_type" value="FULLTEXT"/>
                </vendor>
            </index>
        </xsl:when>
    </xsl:choose>
</xsl:template>
 
 
<!-- ============================================================ columns within an index -->
<xsl:template match="INDICES/INDEX/INDEXCOLUMNS/INDEXCOLUMN" mode="normal">
    <xsl:variable name="columnId"><xsl:value-of select="@idColumn"/></xsl:variable>
    <index-column>
        <xsl:attribute name="name"><xsl:value-of select="//COLUMNS/COLUMN[@ID=$columnId]/@ColName"/></xsl:attribute>
    </index-column>
</xsl:template>
 
<xsl:template match="INDICES/INDEX/INDEXCOLUMNS/INDEXCOLUMN" mode="unique">
    <xsl:variable name="columnId"><xsl:value-of select="@idColumn"/></xsl:variable>
    <unique-column>
        <xsl:attribute name="name"><xsl:value-of select="//COLUMNS/COLUMN[@ID=$columnId]/@ColName"/></xsl:attribute>
    </unique-column>
</xsl:template>
 
 
<!-- ============================================================ show_ForeignKey -->
<xsl:template name="show_ForeignKey">
    <xsl:param name="relation"/>
    <foreign-key>
 
        <!-- foreignTable -->
        <xsl:attribute name="foreignTable">
            <xsl:value-of select="/DBMODEL/METADATA/TABLES/TABLE[@ID=$relation/@SrcTable]/@Tablename"/>
        </xsl:attribute>
 
        <!-- name -->
        <xsl:attribute name="name">
            <xsl:value-of select="$relation/@RelationName"/>
        </xsl:attribute>
 
        <!-- onDelete -->
        <xsl:attribute name="onDelete">
 
            <xsl:variable name="actionId">
                <xsl:call-template name="str_replace">
                    <xsl:with-param name="stringIn" select="substring-before(substring-after($relation/@RefDef,'\n'), '\n')"/>
                    <xsl:with-param name="charsIn" select="'OnDelete='"/>
                    <xsl:with-param name="charsOut" select="''"/>
                </xsl:call-template>
            </xsl:variable>
 
            <xsl:call-template name="get_actiontype">
                <xsl:with-param name="id" select="$actionId" />
            </xsl:call-template>
 
        </xsl:attribute>
 
        <!-- === reference tag === -->
        <xsl:call-template name="build_fk">
            <xsl:with-param name="stringIn" select="$relation/@FKFields"/>
        </xsl:call-template>
 
    </foreign-key>
</xsl:template>
 
<!--
============================================================
============================================================ template "functions"
============================================================
-->
 
<!-- ============================================================ get_datatype -->
<xsl:template name="get_datatype">
    <xsl:param name="id"/>
 
    <xsl:variable name="type">
        <xsl:value-of select="/DBMODEL/SETTINGS/DATATYPES/DATATYPE[@ID=$id]/@TypeName"/>
    </xsl:variable>
 
    <xsl:choose>
        <xsl:when test="$type = 'DATETIME'" >TIMESTAMP</xsl:when>
        <xsl:when test="$type = 'TEXT'" >LONGVARCHAR</xsl:when>
        <xsl:when test="$type = 'BOOL'" >BOOLEAN</xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$type"/>
        </xsl:otherwise>
    </xsl:choose>
 
</xsl:template>
 
 
<!-- ============================================================ get_datasize -->
<xsl:template name="get_datasize">
    <xsl:param name="dtpc"/>
    <xsl:param name="dtype"/>
 
    <xsl:choose>
        <xsl:when test="contains('FLOAT,DOUBLE',$dtype)" >
            <xsl:value-of select="substring-before($dtpc,',')"/>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$dtpc"/>
        </xsl:otherwise>
    </xsl:choose>
 
</xsl:template>
 
 
<!-- ============================================================ clean_dataparams -->
<xsl:template name="clean_dataparams">
    <xsl:param name="dtp"/>
 
    <xsl:variable name="dtp2">
        <xsl:call-template name="str_replace">
            <xsl:with-param name="stringIn" select="$dtp"/>
            <xsl:with-param name="charsIn" select="'('"/>
            <xsl:with-param name="charsOut" select="''"/>
        </xsl:call-template>
    </xsl:variable>
 
    <xsl:call-template name="str_replace">
        <xsl:with-param name="stringIn" select="$dtp2"/>
        <xsl:with-param name="charsIn" select="')'"/>
        <xsl:with-param name="charsOut" select="''"/>
    </xsl:call-template>
 
</xsl:template>
 
 
<!-- ============================================================ str_replace -->
<xsl:template name="str_replace">
    <xsl:param name="stringIn"/>
    <xsl:param name="charsIn"/>
    <xsl:param name="charsOut"/>
    <xsl:choose>
        <xsl:when test="contains($stringIn,$charsIn)">
            <xsl:value-of select="concat(substring-before($stringIn,$charsIn),$charsOut)"/>
            <xsl:call-template name="str_replace">
                <xsl:with-param name="stringIn" select="substring-after($stringIn,$charsIn)"/>
                <xsl:with-param name="charsIn" select="$charsIn"/>
                <xsl:with-param name="charsOut" select="$charsOut"/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$stringIn"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>
 
 
<!-- ============================================================== build_fk -->
<xsl:template name="build_fk">
    <xsl:param name="stringIn"/>
    <xsl:variable name="FKClean">
        <xsl:value-of select="substring-before($stringIn, '\n')"/>
    </xsl:variable>
    <reference>
    <xsl:attribute name="local">
        <xsl:value-of select="substring-after($FKClean, '=')"/>
    </xsl:attribute>
    <xsl:attribute name="foreign">
        <xsl:value-of select="substring-before($FKClean, '=')"/>
    </xsl:attribute>
    </reference>
    <xsl:if test="contains(substring-after($stringIn,'\n'),'=')">
        <xsl:call-template name="build_fk">
            <xsl:with-param name="stringIn" select="substring-after($stringIn,'\n')"/>
        </xsl:call-template>
    </xsl:if>
</xsl:template>
 
<!-- ======================================================== get_actiontype -->
 
<xsl:template name="get_actiontype">
    <xsl:param name="id"/>
 
    <xsl:choose>
        <xsl:when test="$id = 0">restrict</xsl:when>
        <xsl:when test="$id = 1">cascade</xsl:when>
        <xsl:when test="$id = 2">setnull</xsl:when>
        <xsl:otherwise>restrict</xsl:otherwise>
    </xsl:choose>
</xsl:template>
 
</xsl:stylesheet>
 

Enjoy it, cause it saves a lot of time!

by Thomas Schäfer on 2008-03-19, tagged dbdesigner  model  transform  xml  xsl 

Getting Raw Data From POSTed Body

I tried to find a way to get the raw data from the body of a HTTP request. In my case the request contained an XML document upon which my action had to act.

One of the options was to use the global $HTTP_RAW_POST_DATA. However, this required altering the php.ini file.

The preferred way to achieve this eventually was to use the php://input stream wrapper. For instance, in my case the following read the XML from the request body and converts it to XML using SimpleXML:

  public function executeTest() {
         $this->xml = simplexml_load_file('php://input');
         return sfView::SUCCESS;
  }

Note: It won't work in cases where enctype="multipart/form-data". More infromation about this and other PHP wrappers can be found at http://php.net/wrappers.php

by Avi Cohen on 2006-11-07, tagged body  post  raw  wrapper  xml 

crawler (google sitemap xml)

this crawler returns an array. format is like this.

array( 'url_address' => array( 'page_title', 'status_code', 'content_type' ) )

*requires PEAR::HTTP_Client

with the returned array, you can create sitemap page and/or sitemap xml for google webmaster central.

<?php
require_once('HTTP/Client.php');
 
class HTTP_Crawler extends HTTP_Client {
 
    function _getMetaRedirect(&$request){
        return $not_to_redirected_automatically = null;
    }
 
    function getAbsoluteUrl($netUrl, $ambiguousUrl){
        return $this->_redirectUrl($netUrl, $ambiguousUrl);
    }
}
 
class Crawler {
 
    private 
        $baseNetUrl,
        $baseUrlEscaped,
        $client,
        $urlList = array(),
        $sleepMicroSeconds = 500000;    //0.5seconds
 
 
    /**
     * constructor
     *
     * @param string $baseUrl your domain name
     * @param int $sleepMicroSeconds crawler sleep time on each request
     */
    function __construct($baseUrl, $sleepMicroSeconds = 500000){
        $this->baseNetUrl = new Net_URL($baseUrl, true);
        $this->baseUrlEscaped = str_replace('/', '\/', preg_quote($baseUrl));
 
        $this->sleepMicroSeconds = $sleepMicroSeconds;
 
        $this->client = new HTTP_Crawler();
        $this->client->setDefaultHeader('User-Agent', 'php crawler');
    }
 
    /**
     * returns an array of found pages
     *
     * format:
     * array(
     *      'url_address' => array(
     *          'page_title',
     *          'status_code',
     *          'content_type'
     *      )
     * )
     * 
     * @param string $url crawling starting URL
     */
    public function clawThis($url){
        $url = $this->client->getAbsoluteUrl($this->baseNetUrl, $url);
        $this->setUrls($url);
        $links = $this->retrieveLinks($url);
        $this->setUrls($links);
 
        foreach ($this->urlList as $eachUrl => $statusCode){
            if($statusCode === false){
                $this->clawThis($eachUrl);
            }
        }
    }
 
    private static function isValidFile($each){
 
        $doNotCrawlFiles = array(
            'jpg',
            'gif',
            'png',
            'zip',
            'lzh',
            'xls',
            'ppt',
            'doc',
            'tif',
            'exe',
            'avi',
            'mpg',
            'swf',
            'mp3',
            '#[^#]*'
        );
 
        if(preg_match('/\.(' . join('|', $doNotCrawlFiles) . ')$/i', $each)){
            return false;
        }
        return true;
    }
 
    private function isLocalAddress($url){
        if(preg_match(sprintf('/^%s/i', $this->baseUrlEscaped), $url)){
            return true;
        }
        return false;
    }
 
    private function setUrls($urls = array()){
        if(!is_array($urls)){
            $urls = array($urls);
        }
 
        foreach($urls as $each){
            $each = $this->client->getAbsoluteUrl($this->baseNetUrl, $each);
 
            if(!$this->isLocalAddress($each) ||
                !$this->isValidFile($each) ||
                isset($this->urlList[$each])){
                continue;
            }
 
            $this->urlList[$each] = false;
        }
    }
 
    private function retrieveLinks($url){
        if($this->urlList[$url] !== false){
            return array();
        }
 
        if(!$this->isLocalAddress($url)){
            return array();
        }
 
        $this->client->get($url);
        $response = $this->client->currentResponse();
        $body = mb_convert_encoding($response['body'], mb_internal_encoding(), mb_detect_encoding($response['body']));
 
 
        $this->urlList[$url]['title'] = '';
        $this->urlList[$url]['code'] = $response['code'];
        $this->urlList[$url]['type'] = $response['headers']['content-type'];
 
        $document = new DOMDocument('1.0', mb_internal_encoding());
        $document->preserveWhiteSpace = false;
        $document->LoadHTML($body);
 
        foreach ($document->getElementsByTagName('title') as $title){
            $this->urlList[$url]['title'] = $title->textContent; //innerHTML
            break;
        }
 
 
        $links = array();
        foreach ($document->getElementsByTagName('a') as $eachElement){
            $links[] = $eachElement->getAttribute('href');
        }
        foreach ($document->getElementsByTagName('meta') as $eachElement){
            if(preg_match('/refresh/i', $eachElement->getAttribute('http-equiv'))){
                $content = $eachElement->getAttribute('content');
 
                foreach (split(';', $content) as $each){
                    if(preg_match('/url=(.*)/i', trim($each), $match)){
                        $links[] = $match[1];
                        break;
                    }
                }
            }
        }
 
        usleep($this->sleepMicroSeconds);
        return array_unique($links);
    }
 
    public function getUrlList(){
        $list = array();
        foreach ($this->urlList as $url =>$attributes){
            if(
                preg_match('/text\/html/i', $attributes['type']) &&
                !preg_match('/#(.*)$/i', $url)
            ){
                $list[$url] = $attributes;
            }
        }
 
        ksort($list);
        return $list;
    }
}

you can write a batch script to crawl your site periodically.

<?php
ini_set('max_execution_time', 0);
error_reporting(E_ALL);
require_once('Crawler.class.php');
require_once('File.php');
 
$root = 'http://YOUR_OWN_SITE_URL';
 
$crawler = new Crawler($root);
$crawler->clawThis($root);
 
$pageList = $crawler->getUrlList();
$fileName = dirname(__FILE__) . DIRECTORY_SEPARATOR . 'Sitemap.txt';
File::writeLine($fileName, serialize($pageList), FILE_MODE_WRITE);
File::close($fileName, FILE_MODE_WRITE);

to output sitemap xml, write an action like this

public function executeSitemapXml(){
    error_reporting(E_ALL);
 
    $sitemapCacheFile = sfConfig::get('sf_bin_dir') . DIRECTORY_SEPARATOR . 'sitemap.txt';
 
    require_once('File.php');
    $sitemapCache = File::readAll($sitemapCacheFile);
    $this->pageList = unserialize($sitemapCache);
    $this->today = date('Y-m-d', filemtime($sitemapCacheFile));
}

its template is like this

<?xml version="1.0" encoding="UTF-8"?>
<urlset
  xmlns="http://www.google.com/schemas/sitemap/0.84"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
                      http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
 
<?php foreach ($pageList as $url => $attributes):?>
<url>
  <loc><?php echo $url?></loc>
  <lastmod><?php echo $today?></lastmod>
  <priority>0.8</priority>
  <changefreq>weekly</changefreq>
</url>
<?php endforeach;?>
 
</urlset>

google needs the sitemap xml to be in the top directory, so you need to wirte a route rule like this.

sitemap_xml:
  url:   /sitemapxml
  param: { module: default, action: sitemapXml }

this is actually used on our site sticker20.com

by sticker 20 on 2006-10-04, tagged crawler  google  sitemap  xml 
(4 comments)