Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution

Overview

Two variants of run-length encoding are the diagram encoding and the pattern substitution algorithms. The diagram encoding is actually a very simple algorithm. Unlike run-length encoding, where the input stream must consists of many repeating elements, as “aaaaaaaa” for instance, which are very rare in a natural language, there are many so called “diagrams” in almost any natural language. In plain English there are some diagrams as “the”, “and”, “ing” (in the word “waiting” for example), “ a”, “ t”, “ e” and many doubled letters. Actually we can extend those diagrams by adding surrounding spaces. Thus we can encode not only “the”, but “ the “, which are 5 characters (2 spaces and 3 letters) with something shorter. In the other hand, as I said, in plain English there are two many doubled letters, which unfortunately aren’t something special for run-length encoding and the compression ratio will be small. Even worse the encoded text may happen to be longer than the input message. Let’s see some examples.

Let’s say we’ve to encode the message “successfully accomplished”, which consists of four doubled letters. However to compress it with run-length encoding we’ll need at least 8 characters, which doesn’t help us a lot.

```// 8 chars replaced by 8 chars!?
input: 	"successfully accomplished"
output:	"su2ce2sfu2ly a2complished"
```

The problem is that if the input text contains numbers, “2” in particular, we’ve to chose an escape symbol (“@” for example), which we’ll use to mark where the encoded run begins. Thus if the input message is “2 successfully accomplished tasks”, it will be encoded as “2 su@2ce@2sfu@2ly a@2complished tasks”. Now the output message is longer!!! than the input string.

```// the compressed message is longer!!!
input:	"2 successfully accomplished"
```

Again if the input stream contains the escape symbol, we have to find another one, and the problem is that it is often too difficult to find short escape symbol that doesn’t appear in the input text, without a full scan of the text. Continue reading Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution

PHP: Does SimpleXMLElement have toString() method? No, better use asXML()!

SimpleXML is a PHP extension that “provides a very simple and easily usable toolset to convert XML to an object” [1]. Thus you can pass a link to an XML file and SimpleXML will return an object.

`\$xml = simplexml_load_file('path_to_the_file');`

Sometimes you’d need to dump or save the entire XML as a string, but there’s no toString method! As you can see \$xml is an instance of the SimpleXMLElement class.

`var_dump(\$xml); // object(SimpleXMLElement)...`

Actually if you take a closer look:

`\$xml->toString();`

will return “Call to an undefined method toString()”, which is frustrating because the developers community is used to use toString() when converting an object into a string.

The solution

In fact there’s a method doing exactly what’s needed. This is SimpleXMLElement::asXML

As described in the manual page: “SimpleXMLElement::asXML — Return a well-formed XML string based on SimpleXML element” [2].

Besides that it does exactly what’s needed it sounds irrelevant, because you’ve an XML object and the name “asXML” doesn’t describe correctly what’s expected.

`\$xml->asXML() // ?!`

PHP Strings: How to Get the Extension of a File

EXE or GIF or DLL or …

Most of the code chunks I’ve seen about getting a file extension from a string are based on some sort of string manipulation.

```\$filename = '/my/path/image.jpeg'; echo substr(\$filename, strrpos(\$filename, '.') + 1);```

Howerver there is a more elegant solution.

```\$filename = '/my/path/image.jpeg'; echo strtolower(pathinfo(\$filename, PATHINFO_EXTENSION));```

Thus you rely on PHP built in functions and it’s harder to overlook the exact string manipulation approach.

ffmpeg, libx264 and presets

When working with FFMPEG you can convert/encode in various formats, but in my case this should be an MP4 file, encoded with libx264. This combination is quite well known in the web community, because those videos are playable under almost any flash player and Apple products.

Now the problem is that this comes with very large list of options, which are difficult to setup, especially for newbies like me. Than everything’s difficult, once because the resulting quality is not always the same as the input file’s quality.

The Solution

Is to use presets. Thankfully ffmpeg with libx264 can be used with presets, i.e. NORMAL, HD etc. which means that you don’t have to setup a command by hand and the resulting quality is the same as the input quality.

Here’s a sample command using normal preset:

`ffmpeg -i source.mp4 -acodec libfaac -ab 128k -ac 2 -vcodec libx264 -vpre normal -threads 0 -crf 22 output.mp4`

here the source can be either mp4 or some other file format as FLV for instance.

jQuery: Setting Up a Vector Path Fill Color

It’s quite unusual to think of the jQuery’s attr() method as a generic method that can only change basic attributes as value, style, etc. However attr() can change whatever DOM element attribute. In this case you may know that the SVG path element can have a fill attribute, so you can simply setup a:

`\$('path').attr('fill', '#ccc');`