Category Archives: PHP

How to Dump the Generated Zend_Db SQL Query

The Typical PHP Approach

Typically a PHP programmer will write his SQL query as a string and will execute it via mysql_query.

$sql = "SELECT * FROM my_table";
$resource = mysql_query($sql);

So eventually when you want to dump this “complex” query, or whatever query there is, you can simply “echo” it and see what’s its syntax.

// this query is WRONG because of the where clause
$sql = "SELECT * FROM my_table WHERE id = ";
 
// dump and debug the wrong query
die($sql);
 
// this line won't be executed
$resource = mysql_query($sql);

So far so good, but things appear to be a bit different when you start to work with Zend Framework. Higher levels of abstraction come with slightly more difficult ways to dump (debug) your SQL queries.

OK you’ve two options. Using Zend_Db_Select or … not.
Continue reading How to Dump the Generated Zend_Db SQL Query

Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution

Overview

Two variants of run-length encoding are the diagram encoding and the pattern substitution algorithms. The diagram encoding is actually a very simple algorithm. Unlike run-length encoding, where the input stream must consists of many repeating elements, as “aaaaaaaa” for instance, which are very rare in a natural language, there are many so called “diagrams” in almost any natural language. In plain English there are some diagrams as “the”, “and”, “ing” (in the word “waiting” for example), “ a”, “ t”, “ e” and many doubled letters. Actually we can extend those diagrams by adding surrounding spaces. Thus we can encode not only “the”, but “ the “, which are 5 characters (2 spaces and 3 letters) with something shorter. In the other hand, as I said, in plain English there are two many doubled letters, which unfortunately aren’t something special for run-length encoding and the compression ratio will be small. Even worse the encoded text may happen to be longer than the input message. Let’s see some examples.

Let’s say we’ve to encode the message “successfully accomplished”, which consists of four doubled letters. However to compress it with run-length encoding we’ll need at least 8 characters, which doesn’t help us a lot.

// 8 chars replaced by 8 chars!?
input: 	"successfully accomplished"
output:	"su2ce2sfu2ly a2complished"

The problem is that if the input text contains numbers, “2” in particular, we’ve to chose an escape symbol (“@” for example), which we’ll use to mark where the encoded run begins. Thus if the input message is “2 successfully accomplished tasks”, it will be encoded as “2 su@2ce@2sfu@2ly a@2complished tasks”. Now the output message is longer!!! than the input string.

// the compressed message is longer!!!
input:	"2 successfully accomplished"
output:	"2 su@2ce@2sfu@2ly a@2complished tasks"

Again if the input stream contains the escape symbol, we have to find another one, and the problem is that it is often too difficult to find short escape symbol that doesn’t appear in the input text, without a full scan of the text. Continue reading Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution

Computer Algorithms: Data Compression with Bitmaps

Overview

In my previous post we saw how to compress data consisting of very long runs of repeating elements. This type of compression is known as “run-length encoding” and can be very handy when transferring data with no loss. The problem is that the data must follow a specific format. Thus the string “aaaaaaaabbbbbbbb” can be compressed as “a8b8”. Now a string with length 16 can be compressed as a string with length 4, which is 25% of its initial length without loosing any information. There will be a problem in case the characters (elements) were dispersed in a different way. What would happen if the characters are the same, but they don’t form long runs? What if the string was “abababababababab”? The same length, the same characters, but we cannot use run-length encoding! Indeed using this algorithm we’ll get at best the same string.

In this case, however, we can see another fact. The string consists of too many repeating elements, although not arranged one after another. We can compress this string with a bitmap. This means that we can save the positions of the occurrences of a given element with a sequence of bits, which can be easily converted into a decimal value. In the example above the string “abababababababab” can be compressed as “1010101010101010”, which is 43690 in decimals, and even better AAAA in hexadecimal. Thus the long string can be compressed. When decompressing (decoding) the message we can convert again from decimal/hexadecimal into binary and match the occurrences of the characters. Well, the example above is too simple, but let’s say only one of the characters is repeating and the rest of the string consists of different characters like this: “abacadaeafagahai”. Then we can use bitmap only for the character “a” – “1010101010101010” and compress it as “AAAA bcdefghi”. As you can see all the example strings are exactly 16 characters and that is a limitation. To use bitmaps with variable length of the data is a bit tricky and it is not always easy (if possible) to decompress it.

Bitmap Compression
Basically bitmap compression saves the positions of an element that is repeated very often in the message!

Continue reading Computer Algorithms: Data Compression with Bitmaps

Computer Algorithms: Data Compression with Run-length Encoding

Introduction

No matter how fast today’s computers and networks are, the users will constantly need faster and faster services. To reduce the volume of the transferred data we usually use some sort of compression. That is why this computer sciences area will be always interesting to research and develop.

There are many data compression algorithms, some of them lossless, others lossy, but their main goal aways will be to spare storage space and traffic. These algorithms are very useful when talking about data transfer between two distant places. Perhaps the best example is the transfer between a web server and a browser.

In the last few years a lot of research has been done on compressing files, executed on the client side. Such files are javascript, css, htmls and images. In fact servers and clients already have some techniques to compress data, like using GZIP for instance, that can dramatically decrease the transfer. In the other hand there are lots of tools and tricks in order to decrease the size of the data.

Actually when a file is executed by the client’s virtual machine, it doesn’t matter how “beautifully” it is formatted from a programmer’s point of view. Thus the spaces, tabs and the new lines don’t bring any significant information for the environment. That is why such compressing tools like YUI Compressor, Google Closure Compiler, etc. remove those symbols. Well, they can achieve even more in order to improve the compression rate. In this post I won’t cover this, but this shows how important data compression algorithms are.

It would be great if we could just compress data with some tool. Unfortunately this is not the case and usually the compression rate depends on the data itself. It is obvious that the choice of data compression algorithm depends mainly on the data and first of all we must explore the data.

Here I’ll cover one very simple lossless data compression algorithm called “run-length encoding” that can be very useful in some cases.

Run-length Encoding

Overview

This algorithm consists of replacing large sequences of repeating data with only one item of this data followed by a counter showing how many times this item is repeated. To become clearer let’s see a string example.

aaaaaaaaaabbbaxxxxyyyzyx

This string’s length is 24 and as we can see there are lots of repetitions. Using the run-length algorithm, we replace any run with shorter string followed by a counter.

a10b3a1x4y3z1y1x1

The length of this string is 17, which is approximately 70% of the initial length. Continue reading Computer Algorithms: Data Compression with Run-length Encoding

PHP Performance: Bitwise Division

Recently I wrote about binary search and then I said that in some languages, like PHP, bitwise division by two is not faster than the typical “/” operator. However I decided to make some experiments and here are the results.

Important Note

It’s very important to say that the following results are dependant from the machine and the environment!

Source Code

Here’s the PHP source code.

function divide($n = 1) 
{
	$a = microtime(true);
	for ($i = 0; $i < $n; $i++) {
		300/2;
	}
	echo microtime(true) - $a;
}
 
divide(100);
//divide(1000);
//divide(10000);
//divide(100000);
//divide(1000000);
//divide(10000000);

Continue reading PHP Performance: Bitwise Division