Tag Archives: JSON

Computer Algorithms: Data Compression with Relative Encoding

Overview

Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur.

The only problem with these algorithms is that not always the input stream of data is constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress.

1, 2, 3, 4, 5, 6, 7

We can hardly imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the alphabet letters the very base of the words so it is the minimal part for word construction and it’s hard to compress them.

Fortunately this isn’t true always. An algorithm that tryies to deal with non repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90’s).

1991,1991,1999,1998,1991,1993,1992,1992

Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do.

91,91,99,98,91,93,92,92

Now we have a shorter string, but we can go even further with keeping only the first year. All other years will as relative to this year.

91,0,8,7,0,2,1,1

Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont be always formatted in such pretty way. How about the next character stream?

91,94,95,95,98,100,101,102,105,110

We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become:

-9,-6,-5,-5,-2,100,1,2,5,10

The problem is that we can’t decide which value will be the base value so easily. What if the data was dispersed in a different way.

96,97,98,99,100,101,102,103,999,1000,1001,1002

Now the value of “100” isn’t useful, because compressing the stream will get something like this:

-4,-3,-2,-1,100,1,2,3,899,900,901,902

To group the relative values around “some” base values will be far more handy.

(-4,-3,-2,-1,100,1,2,3)(-1,1000,1,2)

However to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. In the other hand this type of encoding can be useful in som specific cases as we can see bellow.
Continue reading Computer Algorithms: Data Compression with Relative Encoding

Computer Algorithms: Data Compression with Bitmaps

Overview

In my previous post we saw how to compress data consisting of very long runs of repeating elements. This type of compression is known as “run-length encoding” and can be very handy when transferring data with no loss. The problem is that the data must follow a specific format. Thus the string “aaaaaaaabbbbbbbb” can be compressed as “a8b8”. Now a string with length 16 can be compressed as a string with length 4, which is 25% of its initial length without loosing any information. There will be a problem in case the characters (elements) were dispersed in a different way. What would happen if the characters are the same, but they don’t form long runs? What if the string was “abababababababab”? The same length, the same characters, but we cannot use run-length encoding! Indeed using this algorithm we’ll get at best the same string.

In this case, however, we can see another fact. The string consists of too many repeating elements, although not arranged one after another. We can compress this string with a bitmap. This means that we can save the positions of the occurrences of a given element with a sequence of bits, which can be easily converted into a decimal value. In the example above the string “abababababababab” can be compressed as “1010101010101010”, which is 43690 in decimals, and even better AAAA in hexadecimal. Thus the long string can be compressed. When decompressing (decoding) the message we can convert again from decimal/hexadecimal into binary and match the occurrences of the characters. Well, the example above is too simple, but let’s say only one of the characters is repeating and the rest of the string consists of different characters like this: “abacadaeafagahai”. Then we can use bitmap only for the character “a” – “1010101010101010” and compress it as “AAAA bcdefghi”. As you can see all the example strings are exactly 16 characters and that is a limitation. To use bitmaps with variable length of the data is a bit tricky and it is not always easy (if possible) to decompress it.

Bitmap Compression
Basically bitmap compression saves the positions of an element that is repeated very often in the message!

Continue reading Computer Algorithms: Data Compression with Bitmaps

Computer Algorithms: Data Compression with Run-length Encoding

Introduction

No matter how fast today’s computers and networks are, the users will constantly need faster and faster services. To reduce the volume of the transferred data we usually use some sort of compression. That is why this computer sciences area will be always interesting to research and develop.

There are many data compression algorithms, some of them lossless, others lossy, but their main goal aways will be to spare storage space and traffic. These algorithms are very useful when talking about data transfer between two distant places. Perhaps the best example is the transfer between a web server and a browser.

In the last few years a lot of research has been done on compressing files, executed on the client side. Such files are javascript, css, htmls and images. In fact servers and clients already have some techniques to compress data, like using GZIP for instance, that can dramatically decrease the transfer. In the other hand there are lots of tools and tricks in order to decrease the size of the data.

Actually when a file is executed by the client’s virtual machine, it doesn’t matter how “beautifully” it is formatted from a programmer’s point of view. Thus the spaces, tabs and the new lines don’t bring any significant information for the environment. That is why such compressing tools like YUI Compressor, Google Closure Compiler, etc. remove those symbols. Well, they can achieve even more in order to improve the compression rate. In this post I won’t cover this, but this shows how important data compression algorithms are.

It would be great if we could just compress data with some tool. Unfortunately this is not the case and usually the compression rate depends on the data itself. It is obvious that the choice of data compression algorithm depends mainly on the data and first of all we must explore the data.

Here I’ll cover one very simple lossless data compression algorithm called “run-length encoding” that can be very useful in some cases.

Run-length Encoding

Overview

This algorithm consists of replacing large sequences of repeating data with only one item of this data followed by a counter showing how many times this item is repeated. To become clearer let’s see a string example.

aaaaaaaaaabbbaxxxxyyyzyx

This string’s length is 24 and as we can see there are lots of repetitions. Using the run-length algorithm, we replace any run with shorter string followed by a counter.

a10b3a1x4y3z1y1x1

The length of this string is 17, which is approximately 70% of the initial length. Continue reading Computer Algorithms: Data Compression with Run-length Encoding

Object Oriented JavaScript: Inheritance

Objects and JavaScript

JavaScript, being a functional language, differs from most of the procedural/object oriented languages we know. The object oriented approach in JavaScript is rather strange. However there is much power in making objects! The syntax is really odd and there are several approaches.

Literal Notation

As many of you may know the most used notation is the JSON (JavaScript Object Notation).

{ 'key1' : 'val1'
, 'key2' : 'val2'
, 'key3' : 'val3'
}

Of course this is the very basic example. You can use as value any JavaScript object – another similar object or a function.

{ 'key1' : 'val1'
, 'key2' : { 'inner_key1' : 'inner_val1' }
, 'key3' : function() {
			return 10 + 5;
		 }
}

The two examples above are showing an anonymous object in JavaScript, but you can assign this code to some variable.

var myObject = 
	{ 'key1' : 'val1'
	, 'key2' : 'val2'
	, 'key3' : 'val3'
	}

or

var myObject =
	{ 'key1' : 'val1'
	, 'key2' : { 'inner_key1' : 'inner_val1' }
	, 'key3' : function() {
				return 10 + 5;
			 }
	}

and then you can call the properties of these objects with the ‘.’ operator:

myObject.key1;
myObject.key2.inner_key1;
myObject.key3();

So far so good – this is the literal object notation in JavaScript. However there is another “objects” in JavaScript.
Continue reading Object Oriented JavaScript: Inheritance

Returning JSON in a Zend Controller’s Action – Part 2

In a reply from my latest post after following the comments, there is a much elegant solution to this use case. Simply by changing the output with a JSON helper. This also changes the content-type of the returned response.

$data = array(...);
$this->_helper->json($data);