Skip to content

Double Metaphone

Some time ago, I wrote a wrapper around the CPAN module Text-DoubleMetaphone by Maurice Aubrey for PHP. This allowed the use of the DoubleMetaphone algorithm with PHP. While the PHP core already has Metaphone, it lacks support for the second version, which handles latin languages a bit more correctly.

At the time, I did post an email to the PECL dev lists, but didn't receive any positive or negative answer. I thus never decided to publish it on PECL, thinking people weren't interested in the package - I was wrong. I have been contacted this week by email, asking if I could send the package to someone, if I still had the code. Well, lucky me, it had not the time to bitrot.

So here we go. If you need Double Metaphone inside PHP, grab doublemetaphone-0.1.0.tgz from here and follow the instructions on how to install PECL packages. You should be able to do something similar to this:

olivier@geeko-book $ pecl download
http://www.olivierhill.ca/doublemetaphone-0.1.0.tgz
downloading doublemetaphone-0.1.0.tgz ...
Starting to download doublemetaphone-0.1.0.tgz (7,450 bytes)
.....done: 7,450 bytes

olivier@geeko-book $ sudo pecl install doublemetaphone-0.1.0.tgz

Now, don't forget to dl() the extension (bad...) or add it to your php.ini (good!)

If this works for you and you find it useful, you can always visit my PECL profile. It does have a wishlist URL if you have an urgent need to thank me.


geoip-0.2.0 released

Finally, GeoIP is available on PECL. It's been a while since it all started, but now the first release is out the door.

First of all, GeoIP is a piece of software that can map an IP address or hostname to a geographic place. Although it cannot be 100% accurate (since your IP address is from your ISP which might not be next door), it gives you a rough idea in which city the user is coming from. With this data handy, you can then choose to present a local version of your website. Also, it can be really useful for demographic analysis where you calculate where your users are located based on log files. Anyway, what you do with that piece of software is up to you.

The good news though, is that it is now easily available from PHP. After long debates and licenses clashes (GPL being incompatible with the PHP License). The author of the GeoIP C Library kindly accepted to release the new version as LGPL. With this change, the GeoIP PECL module could be created, using the skeleton from SourceForge. Since that, many memory leaks were fixed, and documentation has been written. It should now be in good working conditions, let me know if you find bugs.

By the way, if someone builds a website using GeoIP to determine where the user is from, outputting this on a Yahoo! map using their API and fetching photos near where the user is from Flickr using geotags, leave a comment, I want to see this :)

stem-1.4.3 for PHP released!

As of today, you can download a new release of the PHP stem interface to the Snowball API on PECL. While this extension has been written by Jay Smith, I have since joined him to help on further development.

If you do not know what a stemmer is, the article on Wikipedia is self explanatory. Basically, it allows a computer program to find a common root for different forms of the same word. While Dr. Porter did a great job creating stemmers for different languages and the Snowball API, it was not available directly from a PHP script.

Now that this limitation is gone, you might want to try using the stemmer to create an intelligent search engine for your website. If you want to give it a try, issue the following command on your favorite UNIX based machine: pecl install stem. Once the installation has completed, you might want to modify your php.ini to load the extension and then try the following example:

<?php
  print stem_english('cleaner') ."\n";
  print stem_french('épouses') ."\n";
?>

This would output clean and épous respectively. In some cases, the word outputed by the stemmer will not exist in a dictionary, but this is rarely a problem. In fact, you should only stem words to use them as keywords in some kind of database.


PECL on Gentoo

If you try to install a PECL package without using Portage (thus using the PHP tool pecl), you might encounter an error like this snippet:


bender ~ # pecl install apc
downloading APC-3.0.8.tgz ...
[...]
autoconf: Undefined macros:
configure.in:63:AC_PROG_LIBTOOL
ERROR: `phpize' failed

The main problem lies with the use of automake v1.9.x. Since Gentoo comes with a bunch of different versions of the autotools, you can choose to use automake v1.8, which will result in a complete built.


bender ~ # WANT_AUTOMAKE="1.8" pecl install apc

As simple as it seems, it took me a while to fix it. Let me know if this helps.


PFE presentation

The slides for my PFE (Projet de Fin d'Études) are available here.

It is in French, but if you are looking on how to integrate different technologies to build a full featured email server, you might have a look at some of the schematics. I will see if I can post the full report online shortly. The best would be to create an HOWTO, but it would be long to write.

My FOAF description

After doing some readings on the semantic web and RDF, I came accross an emerging standard called FOAF.

What is FOAF? Actually, it is an acronym for Friend Of A Friend. What is its purpose? It let you describe yourself inside an RDF document with such infos as: your name, your email, your school and workplace.

But the juicy stuff comes right after. You can list your friends inside your FOAF description and link their FOAF to yours. As a result, you could build a giant tree of people knowing each others. Hypothetically, an RDF crawler could then parse the tree and permit queries such as: "Does Bob know Alice?" or "Show me Charlie's friends".

So go ahead, crawl my FOAF description and create yours today!