# Computer Algorithms: Data Compression with Relative Encoding

## Overview

Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur.

The only problem with these algorithms is that not always the input stream of data is constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress.

`1, 2, 3, 4, 5, 6, 7`

We can hardly imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the alphabet letters the very base of the words so it is the minimal part for word construction and it’s hard to compress them.

Fortunately this isn’t true always. An algorithm that tryies to deal with non repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90’s).

`1991,1991,1999,1998,1991,1993,1992,1992`

Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do.

`91,91,99,98,91,93,92,92`

Now we have a shorter string, but we can go even further with keeping only the first year. All other years will as relative to this year.

`91,0,8,7,0,2,1,1`

Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont be always formatted in such pretty way. How about the next character stream?

`91,94,95,95,98,100,101,102,105,110`

We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become:

`-9,-6,-5,-5,-2,100,1,2,5,10`

The problem is that we can’t decide which value will be the base value so easily. What if the data was dispersed in a different way.

`96,97,98,99,100,101,102,103,999,1000,1001,1002`

Now the value of “100” isn’t useful, because compressing the stream will get something like this:

`-4,-3,-2,-1,100,1,2,3,899,900,901,902`

To group the relative values around “some” base values will be far more handy.

`(-4,-3,-2,-1,100,1,2,3)(-1,1000,1,2)`

However to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. In the other hand this type of encoding can be useful in som specific cases as we can see bellow.
Continue reading Computer Algorithms: Data Compression with Relative Encoding

# Computer Algorithms: Data Compression with Run-length Encoding

## Introduction

No matter how fast today’s computers and networks are, the users will constantly need faster and faster services. To reduce the volume of the transferred data we usually use some sort of compression. That is why this computer sciences area will be always interesting to research and develop.

There are many data compression algorithms, some of them lossless, others lossy, but their main goal aways will be to spare storage space and traffic. These algorithms are very useful when talking about data transfer between two distant places. Perhaps the best example is the transfer between a web server and a browser.

In the last few years a lot of research has been done on compressing files, executed on the client side. Such files are javascript, css, htmls and images. In fact servers and clients already have some techniques to compress data, like using GZIP for instance, that can dramatically decrease the transfer. In the other hand there are lots of tools and tricks in order to decrease the size of the data.

Actually when a file is executed by the client’s virtual machine, it doesn’t matter how “beautifully” it is formatted from a programmer’s point of view. Thus the spaces, tabs and the new lines don’t bring any significant information for the environment. That is why such compressing tools like YUI Compressor, Google Closure Compiler, etc. remove those symbols. Well, they can achieve even more in order to improve the compression rate. In this post I won’t cover this, but this shows how important data compression algorithms are.

It would be great if we could just compress data with some tool. Unfortunately this is not the case and usually the compression rate depends on the data itself. It is obvious that the choice of data compression algorithm depends mainly on the data and first of all we must explore the data.

Here I’ll cover one very simple lossless data compression algorithm called “run-length encoding” that can be very useful in some cases.

## Overview

This algorithm consists of replacing large sequences of repeating data with only one item of this data followed by a counter showing how many times this item is repeated. To become clearer let’s see a string example.

`aaaaaaaaaabbbaxxxxyyyzyx`

This string’s length is 24 and as we can see there are lots of repetitions. Using the run-length algorithm, we replace any run with shorter string followed by a counter.

`a10b3a1x4y3z1y1x1`

The length of this string is 17, which is approximately 70% of the initial length. Continue reading Computer Algorithms: Data Compression with Run-length Encoding

# Diving into Node.js – A Long Polling Example

## Node.js vs. The World

What is typical for most of the web servers is that they listen for requests and respond as quickly as possible on every one of them. In fact the speed of the response is one of the main targets of optimization for developers. Fast servers are what everyone needs. From web developers to website visitors!

In the field of the that battle different web servers have different “weapons” to gain time. While this is useful in most of the cases, when it comes to a chat-like applications and Node.js approaches, the response is not always immediately returned. As I described in my posts until now about Node.js, a simple web server may wait for an event to be emitted, and than return the response. Continue reading Diving into Node.js – A Long Polling Example

# Diving into Node.js – Introduction & Installation

## Why I Need Something Like Node.js?

First of all the use of some software is needed not because of itself, but because of the need of some specific functionality. In my case this was the need of real time news feed. Of course there is a way to make this without Node.js, as I’ll describe later in this post, but there are several disadvantages. However to begin from somewhere, let me explain what is Node.js.

## Introducing Node.js

Perhaps the best way do describe what is Node.js is from its about page.

Node’s goal is to provide an easy way to build scalable network programs. In the “hello world” web server example above, many client connections can be handled concurrently. Node tells the operating system (through epoll, kqueue, /dev/poll, or select) that it should be notified when a new connection is made, and then it goes to sleep. If someone new connects, then it executes the callback. Each connection is only a small heap allocation.

In general Node is a program using the Google Chrome’s V8 JavaScript engine, which in turn is a program that can parse and execute code written in JavaScript. V8 is a very very interesting project itself. First of all from Google have developed this engine especially for one of his products – their browser Chrome. It pretends to be and by no means is one of the masterpieces of Google. It is fast and reliable engine, written in C++ and JavaScript, as Wikipedia’s page says. Actually this code is open source and can be embedded in whatever application written in C++. Thus you can have in your application a JavaScript engine. Continue reading Diving into Node.js – Introduction & Installation

# How to Overcome Zend_Cache_Frontend_Page’s Problem with Cookies

## Zend_Cache_Frontend_Page

First of all there are several things to know about Zend Framework and caching. Whenever you work on a big web application caching is one of the mostly used mechanisms of speeding up the app and improve user performance. In general the task and the solution are pretty simple and natural.

As the application grows up the visitors become more and more impatient about what they receive. A single page is becoming slower and slower and the result is painful. First of all every time a user hits a page the application server uses the web server, a script interpreter, a database server and potentially the file system. But that’s not all. After all this output is generated on the server, as HTML in the most cases, it is sent to the client where again CSS and JavaScript engines parse and execute them.

In this scenario it’s easy to imagine how many time is spent. While there are several techniques to optimize the client side by optimizing JavaScript, CSS and the static images used for the design of the site, here I’m going to talk more about the backend.

## Beside the Optimization

Let’s assume we’ve one of the very used combination between Apache (as a webserver), PHP (as server scripting language) and MySQL (as database server). Here you can choose to optimize all three of them. However beside the optimization of them one of the most simple steps you can do is to cache the output generated by these three branches of your web app.

## Caching the Content

In fact you can cache only single parts of the whole process. For instance you can cache only the result returned by some slow database query. Let’s imagine a query takes about 2 seconds to execute. Now you can cache the result into a file and during the cache is active, i.e. it has not expired, the application takes it from a file stored somewhere in the file system.

In a typical Zend Framework scenario you can first setup the frontend and backend options of the cache.

```\$frontendOptions = array( 'lifetime' => 600, // in seconds - this is 10 seconds 'automatic_serialization' => true, ); \$backendOptions = array('cache_dir' => 'cache/'); \$cache = Zend_Cache::factory('Core', 'File', \$frontendOptions, \$backendOptions);   \$cacheKey = md5('mykey');   if (!\$cache->load(\$cacheKey)) { \$slowQueryResult = \$article->fetchAll(); \$cache->save(\$slowQueryResult, \$cacheKey); } else { \$slowQueryResult = \$cache->load(\$cacheKey); }```

You can setup different options here by setting up the cache directory, lifetime, etc.

Note that the cache directory must exists and with write permissions and Zend Framework doesn’t create it for you and will throw error.

The problem here is that now you cache only part of the generated content and in many cases this is still too slow for most of the users. However all the work (in most of the cases) of the web server, the script interpreter and the database server result in a simple HTML output. What if you have this output generated or cached for you and when the user hit the page the server will return this pre-generated code?

This is indeed very fast, because it’s similar to return a text file, as the HTML is simply formatted text.

## Caching the Entire Page

Before I proceed, I’d like to say that I work with Zend Framework 1.9.x. Now in the latest versions of ZF there are new mechanisms of caching the output even Zend_Cache_Frontend_Page works fine on them.

You can simply setup the page cache within few simple lines of code:

```\$fo = array( 'lifetime' => 600, 'regexps' => array( '^/' => array( 'cache' => true, 'cache_with_cookie_variables' => true, ), ) );   \$bo = array( 'cache_dir' => 'cache/' );   \$cache = Zend_Cache::factory('Page', 'File', \$fo, \$bo); \$cache->start();```

However my advise is to place this code as high as possible, because this will cache everything generated as output. It’s a good practice if you place this even in the bootstrap before you make the connection with the database. Actually you don’t need a database connection when you’ve to return a simple text(html) file.

This will improve your app’s performance a lot!

However there are few things to know. When you setup the cache to work even with cookie variables, you can see that hitting the page with different browsers Zend Framework will generated different cache pages. This is quite useless because than you don’t have any benefit of caching the content.

First of all let me say that THIS IS NOT A BUG! of ZF. Simply the framework will use the cookie variables to generate the cache key. It’s obvious that different browsers, even more different users, will have different cookie set and the framework will generate different cache keys for them.

Thus you’ve to change the setting to generate the cache key from cookie variable by explicitly set this option to false:

```\$fo = array( 'lifetime' => 600, 'regexps' => array( '^/' => array( 'cache' => true, 'cache_with_cookie_variables' => true, 'make_id_with_cookie_variables' => false, ), ) );   \$bo = array( 'cache_dir' => 'cache/' );   \$cache = Zend_Cache::factory('Page', 'File', \$fo, \$bo); \$cache->start();```

Note that in this example we cache every single page generated by the framework explained in the regexps. This is not so good especially when the users have the possibility to login and to see customized content for them, so you can be careful what you cache.

A typical problem that this solution solves is when your application uses Google Analytics. As you may know Google Analytics sets up a cookie every time when an user hits the page, so every time the framework will generate a different cache for him and in result he won’t see any benefit and performance improvement from your site.