<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Web:Extend &#187; Database</title>
	<atom:link href="http://blog.extend.ws/category/database/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.extend.ws</link>
	<description>Web:Extend's Development Blog</description>
	<lastBuildDate>Thu, 16 Sep 2010 20:38:15 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Report on website optimization &#8211; Part 1</title>
		<link>http://blog.extend.ws/2008/10/11/report-on-website-optimization-part-1/</link>
		<comments>http://blog.extend.ws/2008/10/11/report-on-website-optimization-part-1/#comments</comments>
		<pubDate>Fri, 10 Oct 2008 23:24:29 +0000</pubDate>
		<dc:creator>Loïc Hoguin</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[Web:Extend]]></category>

		<guid isPermaLink="false">http://blog.extend.ws/2008/10/11/report-on-website-optimization-part-1/</guid>
		<description><![CDATA[Sorry for not posting until now. I promise we&#8217;ll update the blog more often from now on.
Today I&#8217;d like to start a series on website optimization. The order in which I will do things is not necessarily the best order to optimize a website. However it will be full of tips and explanations for every [...]]]></description>
			<content:encoded><![CDATA[<p><em>Sorry for not posting until now. I promise we&#8217;ll update the blog more often from now on.</em></p>
<p>Today I&#8217;d like to start a series on website optimization. The order in which I will do things is not necessarily the best order to optimize a website. However it will be full of tips and explanations for every single aspect of website optimization. A better structured article will later be based off this series and put in <a href="http://wee.extend.ws/">Web:Extend</a>&#8217;s <a href="http://wee.extend.ws/wiki/Documentation">documentation</a>. <em>We know, the documentation is pretty empty right now. We intend to fix this before we reach beta.</em></p>
<h2>Context</h2>
<p>I currently have a contract with a client whose website had huge performance issues. Our job was first to port his code to our framework, then optimize both the code and the server configuration. When he contacted me he had peaks of 130 registered users at the same time with loads between 15 and 30. Some pages could take up to 30 seconds to load.</p>
<h2>The rewrite</h2>
<p>So I did just that, port the code to use Web:Extend instead of various <a href="http://pear.php.net/">PEAR</a> and non-pear libraries. At the same time we would remove the biggest problems in the database schema and in the queries. Let me get this straight: the website was slow not because it used PEAR; it was because the queries would sometimes select thousands of rows that would then be filtered by PHP, and because some operations generated a huge number of queries because <a href="http://en.wikipedia.org/wiki/Join_(SQL)">JOINs</a> were not used as often as it should have been. The use of the framework was motivated by the client&#8217;s will to move to PHP5 instead of the now deprecated PHP4, and this framework was chosen because I&#8217;m its main developer and I know it very well.</p>
<p>Once the code was fully ported and tested (fixing at the same time a few bugs in the original code), we put it on the production server. The load reduced to an average of 3 at peak times. That&#8217;s nice, but it can be better. I should point here that any user on the website produce an <a href="http://en.wikipedia.org/wiki/AJAX">AJAX</a> query (producing several SQL queries) every 10 seconds. That means at least 13 queries per second at peak times. It&#8217;s actually more, since that number does not include non-registered users.</p>
<h2>The biggest point of slowness</h2>
<p>The queries were not optimized yet.</p>
<p>We looked into it and decided to first try to find the biggest performance problem. The chat functionality was the biggest problem. Whenever someone posted a new chat message, a noticeable delay would happen. All chat messages since the launch of the website were stored in one table&#8230; containing about 1.5 million rows. The table was big and not properly indexed nor optimized (it&#8217;s still not perfectly optimized but I&#8217;ll get back on this point later). Most chat messages in this table were never used since they were read and long forgotten. We decided to archive the table periodically (deleting wasn&#8217;t an option). A copy of the table was thus created and a cron job would run the PHP code to archive all the read messages sent more than 15 days ago and all unread messages sent more than 3 months ago. This gave us a nice 150 thousand rows on the main table instead of the 1.5 million we started with.</p>
<p>This last point helped a lot; the slowness went away. But this didn&#8217;t fix everything. More queries needed help, and the database schema was still in need of huge changes.</p>
<h2>EXPLAIN ALL QUERIES</h2>
<p><em>This part is very technical. Do not hesitate to post a comment asking for answers or clarification. Thank you.</em></p>
<p>This brings us to the most interesting part of this post. I needed to analyze the <a href="http://en.wikipedia.org/wiki/Select_(SQL)">SELECT queries</a> used by the application. <strong>All</strong> the queries. My first thought was: is there a way to analyze everything in a completely automated way without having to copy/paste the queries or rewrite anything? If such a beast exist, I didn&#8217;t find it. So I wrote it myself.</p>
<p>This website uses <a href="http://mysql.com">MySQL</a>. A good way to check whether your queries are optimized is to run an <a href="http://dev.mysql.com/doc/refman/5.0/en/explain.html">EXPLAIN statement</a>. An EXPLAIN statement is basically just a SELECT query with &#8220;EXPLAIN &#8221; put at the start. Instead of running the query, it will analyze it and return various informations about how MySQL will run it. This basically means that I just have to put &#8220;EXPLAIN &#8221; at the start of all my SQL queries to get a full report on all the queries. And this is basically what I did, but without changing the application&#8217;s code.</p>
<p>In Web:Extend, we used the <a href="http://en.wikipedia.org/wiki/Set_(mathematics)">concept of sets</a> to define the model part of applications. A set contains elements, just like a table contains rows, or like a path contains files, directories or links. You can retrieve either &#8220;exactly 1&#8243; or &#8220;n (with n>=0)&#8221; elements from a set. Let&#8217;s take an example: the <code>users</code> set. This set have one user with nickname <code>"essen"</code>, and 42 users with the <code>"invisible"</code> option activated. In our framework, <code>"essen"</code> is an individual element while the <code>"invisible"</code> users are a subset of the <code>users</code> set. <em>Mathematically, <code>"essen"</code> is both an individual element and a subset, but for the sake of simplicity we consider it to be nothing more than an individual element.</em></p>
<p>Web:Extend defines base classes for both a <a href="http://wee.extend.ws/browser/trunk/wee/model/weeSet.class.php">set</a> and an <a href="http://wee.extend.ws/browser/trunk/wee/model/weeModel.class.php">individual element</a>. All operations performed on a set (e.g. all SELECT statements, along with the UPDATE and DELETE operations potentially affecting n rows) are written in the set class; all operations performed on an individual element (e.g. INSERT, UPDATE, DELETE affecting only 1 row) are written in the element class. When a set returns elements, it creates the correct instance for the element associated with the given set.</p>
<p>You probably understood by now that all SELECT queries are stored in only one type of class, the set classes, and that these classes do not contain any overhead related to the handling of individual elements. In other words, we&#8217;re free to instanciate them at will and call its methods without any worry. Most methods are of the type <em>count</em>, <em>fetch</em> or <em>fetchAll</em>, which respectively count the number of elements in the set (or in a subset), retrieve exactly 1 element and retrieve n elements.</p>
<p>All we have to do now is put an &#8220;EXPLAIN &#8221; at the start of all these queries. Now that we saw how it works, it should be quite easy, right? As long as we resolve the following problems:</p>
<ol>
<li>When fetching exactly one element, the model will return exactly one row and not the result of the query containing all the rows returned by the EXPLAIN query.</li>
<li>When counting rows, the row is fetched from the database and the value returned directly.</li>
</ol>
<p>Let&#8217;s take this code for example:</p>
<pre class="php code">
class demoUsersSet extends weeDbSet
{
	protected $sModel = 'demoUsersModel';

	/**
		@return int The number of users.
	*/

	public function count()
	{
		$a = $this->getDb()->query('
			SELECT COUNT(*) AS c FROM users
		')->fetch();

		return $a['c'];
	}

	/**
		@param $iUserId The user identifier.
		@return demoUsersModel The requested user.
	*/

	public function fetch($iUserId)
	{
		return $this->getDb()->query('
			SELECT * FROM users WHERE user_id=? LIMIT 1
		', $iUserId)->rowClass($this->sModel)->fetch();
	}

	/**
		@return weeDatabaseResults All the users.
	*/

	public function fetchAll()
	{
		return $this->getDb()->query('
			SELECT * FROM users
		')->rowClass($this->sModel);
	}
}
</pre>
<p>We have the <code>array access</code> in <code>count</code> and <a href="http://wee.extend.ws/browser/trunk/wee/db/weeDatabaseResult.class.php#L46">fetch</a> methods to bypass. We are going to bypass them by writing <a href="http://wee.extend.ws/browser/trunk/wee/db/weeDatabaseResult.class.php">weeDatabaseResult</a> child classes that will always return <code>$this</code> for both <code>array access</code> and <code>fetch</code> methods. Let&#8217;s call the resulting class <a href="http://wee.extend.ws/browser/trunk/wee/tests/db/weeExplainSQLResult.class.php">weeExplainSQLResult</a> (no pun intended).</p>
<p>This won&#8217;t be enough, however; we still need to tell the database driver used by the model to return &#8220;EXPLAIN&#8221; result objects instead of normal results. We can do this simply by extending the database driver classes, make the required changes, and tell the model classes to use that driver for their queries. Extending was easy for both <a href="http://wee.extend.ws/browser/trunk/wee/tests/db/weeExplainMySQLDatabase.class.php">MySQL</a> and <a href="http://wee.extend.ws/browser/trunk/wee/tests/db/weeExplainPgSQLDatabase.class.php">PgSQL</a>.</p>
<p>We finished writing the classes that will bypass the execution of all the queries in the model and execute EXPLAIN queries instead.</p>
<h2>All SELECT queries explained</h2>
<p>Let&#8217;s try it!</p>
<pre class="php code">
$oDb = new weeExplainMySQLDatabase(array(
	'host'		=> 'localhost',
	'user'		=> 'wee',
	'password'	=> 'wee',
	'dbname'	=> 'wee_tests',
));

$oSet = new demoUsersSet;
$oSet->setDb($oDb);

foreach ($oSet->count() as $aExplainRow)
	var_dump($aExplainRow);

foreach ($oSet->fetch(42) as $aExplainRow)
	var_dump($aExplainRow);

foreach ($oSet->fetchAll() as $aExplainRow)
	var_dump($aExplainRow);
</pre>
<p>When ran, this code will output this:</p>
<pre class="code">
array(10) {
  ["id"]=>
  string(1) "1"
  ["select_type"]=>
  string(6) "SIMPLE"
  ["table"]=>
  NULL
  ["type"]=>
  NULL
  ["possible_keys"]=>
  NULL
  ["key"]=>
  NULL
  ["key_len"]=>
  NULL
  ["ref"]=>
  NULL
  ["rows"]=>
  NULL
  ["Extra"]=>
  string(28) "Select tables optimized away"
}
array(10) {
  ["id"]=>
  string(1) "1"
  ["select_type"]=>
  string(6) "SIMPLE"
  ["table"]=>
  string(5) "users"
  ["type"]=>
  string(5) "const"
  ["possible_keys"]=>
  string(7) "PRIMARY"
  ["key"]=>
  string(7) "PRIMARY"
  ["key_len"]=>
  string(1) "4"
  ["ref"]=>
  string(5) "const"
  ["rows"]=>
  string(1) "1"
  ["Extra"]=>
  string(0) ""
}
array(10) {
  ["id"]=>
  string(1) "1"
  ["select_type"]=>
  string(6) "SIMPLE"
  ["table"]=>
  string(5) "users"
  ["type"]=>
  string(3) "ALL"
  ["possible_keys"]=>
  NULL
  ["key"]=>
  NULL
  ["key_len"]=>
  NULL
  ["ref"]=>
  NULL
  ["rows"]=>
  string(1) "3"
  ["Extra"]=>
  string(0) ""
}
</pre>
<p>By applying this on all the queries of our application we will get the complete analyze of all our SELECT queries, available for review and optimization to anyone working on the project. These &#8220;EXPLAIN&#8221; classes will allow us to test the results of changes on the database over all the queries rather than looking at them one by one and possibly missing things by making changes that affect a query we already looked at and fixed.</p>
<p>Feel free to use, improve or adapt this method in your projects. If you do, please let us know about it so we can use your improvements.</p>
<h2>Conclusion</h2>
<p>We&#8217;ll take a break from the query optimization in the next post of this series to talk about <a href="http://talks.php.net/show/froscon08/">Simple is Hard</a> and its implications for our framework, our thoughts about it and what are our plans to improve performance as a whole in the framework. <em>I&#8217;ll only say for now that our framework performs really, really well.</em> 3rd part will talk about query optimization and the importance of database design, highlighting errors and explaining good practices.</p>
<p>Thanks for reading, see you soon.</p>
<p><em>Disclaimer: If you feel like exploring the framework then have fun, but remember that the documentation is still lacking and that some parts of the API may change before we reach beta. We advise you to wait for the beta release before trying to use it, unless you feel adventurous or just plain curious.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.extend.ws/2008/10/11/report-on-website-optimization-part-1/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

