Tag Archives: Technology/Internet

Computer Algorithms: Brute Force String Matching

Introduction

String matching is something crucial for database development and text processing software. Fortunately every modern programming language and library is full of functions for string processing that help us in our everyday work. However is great to understand their principles.

String algorithms can be mainly divided into several categories. One of these categories is string matching.

When we come to string matching the most basic approach is what is known as brute force, which means just to check every single character from the text to match against the pattern. In general we have a text and a pattern (most commonly shorter than the text). What we need to do is to answer the question whether this pattern appears into the text.

Overview

The principles of brute force string matching are quite simple. We must check for a match between the first characters of the pattern with the first character of the text as on the picture bellow.

First step of brute force string matching
We start by comparing the first characters of the text and the pattern!
Continue reading Computer Algorithms: Brute Force String Matching

5 Tips on How to Spot a Bad Developer and Team Member

1. He hates libraries, framworks and cutting edge technologies

He always hates newest technologies and considers that pure programming with no libraries and frameworks is the masterpiece of coding and only way to show his greatness. Any mention of a library name results in something like “blah, this sucks” with no explanation why. However most often the reason is “because it’s slow” with no examples of how slow actually this library is – he does not make any performance tests at all. Always prefers to write his own pure “javascript” or “php” based code/library because he thinks this is the cutting edge of modern technology.

2. He hates helping and talking to clients

Always after a phone call with a client he’s angry, because “the sutpid client” doesn’t understand his way of brilliant thinking. He thinks that a project must exists just to show what a “good” developer he is and how much he knows, not because the product will be someday, somehow used by poor clients.

3. He loves to see how others make mistakes

Bugs by other developers are always accepted with a smile with some words like “I told you”. He doesn’t think bugs will be comited if he were the developer. However bugs in his code are always accepted and explained as misconseption of the client!

4. Always negative at meetings

Whatever solution is proposed on a meeting he’s negative. But he doesn’t propose his own solution. Typically a bad team member is always trying to say “whatever solution you get I’m telling you that there are some pitfalls”.

5. He hates interviews

Because he thinks he’s the best programmer in the world, he hates going to interviews. His behaviour on interviews is always like “you know less than me and I don’t know why I’m here”. However once he gets the job he behaves with his boss like “I’m making you a favor to receive a salary from you!”

Advice: To avoid working with such jerks try to spot them on the interview. If he tells you that your choice of library and technology sucks you should better get another candidate.

Computer Algorithms: Insertion Sort

Overview

Sorted data can dramatically change the speed of our program, therefore sorting algorithms are something quite special in computer science. For instance searching in a sorted list is faster than searching in an unordered list.

There are two main approaches in sorting – by comparing the elements and without comparing them. A typical algorithm from the first group is insertion sort. This algorithm is very simple and very intuitive to implement, but unfortunately it is not so effective compared to other sorting algorithms as quicksort and merge sort. Indeed insertion sort is useful for small sets of data with no more than about 20 items.

Insertion sort it is very intuitive method of sorting items and we often use it when we play card games. In this case the player often gets an unordered set of playing cards and intuitively starts to sort it. First by taking a card, making some comparisons and then putting the card on the right position.

So let’s say we have an array of data. In the first step the array is unordered, but we can say that it consists of two sub-sets: sorted and unordered, where on the first step the only item in the sorted sub-set is its first item. If the length of the array is n the algorithm is considered completed in n-1 steps. On each step our sorted subset is growing with one item. The thing is that we take the first item from the unordered sub-set and with some comparisons we put it into its place in the sorted sub-set, like on the diagram bellow.

Main principle of insertion sort
Main principle of insertion sort.

Continue reading Computer Algorithms: Insertion Sort

Computer Algorithms: Data Compression with Prefix Encoding

Overview

Prefix encoding, sometimes called front encoding, is yet another algorithm that tries to remove duplicated data in order to reduce its size. Its principles are simple, however this algorithm tend to be difficult to implement. To understand why, first let’s take a look of its nature.

Please, have a look on the following dictionary.

use
used
useful
usefully
usefulness
useless
uselessly
uselessness

Instead of keeping all these words in plain text or transferring all them over a network, we can compress (encode) them with prefix encoding.

Prefix Encoding

Continue reading Computer Algorithms: Data Compression with Prefix Encoding

Computer Algorithms: Data Compression with Relative Encoding

Overview

Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur.

The only problem with these algorithms is that not always the input stream of data is constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress.

1, 2, 3, 4, 5, 6, 7

We can hardly imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the alphabet letters the very base of the words so it is the minimal part for word construction and it’s hard to compress them.

Fortunately this isn’t true always. An algorithm that tryies to deal with non repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90’s).

1991,1991,1999,1998,1991,1993,1992,1992

Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do.

91,91,99,98,91,93,92,92

Now we have a shorter string, but we can go even further with keeping only the first year. All other years will as relative to this year.

91,0,8,7,0,2,1,1

Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont be always formatted in such pretty way. How about the next character stream?

91,94,95,95,98,100,101,102,105,110

We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become:

-9,-6,-5,-5,-2,100,1,2,5,10

The problem is that we can’t decide which value will be the base value so easily. What if the data was dispersed in a different way.

96,97,98,99,100,101,102,103,999,1000,1001,1002

Now the value of “100” isn’t useful, because compressing the stream will get something like this:

-4,-3,-2,-1,100,1,2,3,899,900,901,902

To group the relative values around “some” base values will be far more handy.

(-4,-3,-2,-1,100,1,2,3)(-1,1000,1,2)

However to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. In the other hand this type of encoding can be useful in som specific cases as we can see bellow.
Continue reading Computer Algorithms: Data Compression with Relative Encoding