Author Archives: fab

New WHOIS tool to get structured whois information for a domain released

During the development of our WebPageAnalyse project we wanted to give users the ability to find WHOIS information for any domain they like. I tend to do WHOIS for a domain on a regular basis to check who is the owner of a site, i.e. if I need to contact them if they put up some links on their site which they should not.

I used a multitude of tools in the past but was never satisfied with them because I always wanted to have the WHOIS information returned in a structured way so I can actually read and understand them. Normally you just get a simple text page back. We decided to enhance this and actually parse and enrich the WHOIS data if possible. As a basis for this we use the Ruby WHOIS library. We take the results and display them in a user friendly format.

We support top level domains (TLDs) from different countries and in addition to the most often used generic domain endings like .com, .net, .de, .fr, etc. we support many other different country domains. For german top level domains we can not output all information currently as the DENIC is not allowing that.

Check out the Whois Tool and give some feedback if you like it or not.

Whois Tool WebPageAnalyse

When to use MongoDB or another document oriented database system?

We are building a platform for comparing websites on a detailed level. We are using MongoDB to store all the information and it works quite nicely. We use it to store all meta-information of the domains, because MongoDB better fits the requirements. For example: We retrieve different kind of data for every domain so I think that MongoDB is perfect to store this unstructured data and keep it searchable.

Now one of the next steps will be to provide a forum for our users. The question that now arises is: Use a MySQL db to store all the forum related data. or should we use mongodb for this as well. So the question is: when to use MongoDB and when to use a traditional RDBMS.

After a lot of investigating I found this article here

In NoSQL: If Only It Was That Easy, the author writes the following about MongoDB:

…MongoDB is not a key/value store, it’s quite a bit more. It’s definitely not a RDBMS either. I haven’t used MongoDB in production, but I have used it a little building a test app and it is a very cool piece of kit. It seems to be very performant and either has, or will have soon, fault tolerance and auto-sharding (aka it will scale). I think Mongo might be the closest thing to a RDBMS replacement that I’ve seen so far. It won’t work for all data sets and access patterns, but it’s built for your typical CRUD stuff. Storing what is essentially a huge hash, and being able to select on any of those keys, is what most people use a relational database for. If your DB is 3NF and you don’t do any joins (you’re just selecting a bunch of tables and putting all the objects together, AKA what most people do in a web app), MongoDB would probably kick ass for you…

And finally he concludes with:

…The real thing to point out is that if you are being held back from making something super awesome because you can’t choose a database, you are doing it wrong. If you know mysql, just use it. Optimize when you actually need to. Use it like a k/v store, use it like a rdbms, but for god sake, build your killer app! None of this will matter to most apps. Facebook still uses MySQL, a lot. Wikipedia uses MySQL, a lot. FriendFeed uses MySQL, a lot. NoSQL is a great tool, but it’s certainly not going to be your competitive edge, it’s not going to make your app hot, and most of all, your users won’t give a shit about any of this.
What am I going to build my next app on? Probably Postgres. Will I use NoSQL? Maybe. I might also use Hadoop and Hive. I might keep everything in flat files. Maybe I’ll start hacking on Maglev. I’ll use whatever is best for the job. *If I need reporting, I won’t be using any NoSQL.* If I need caching, I’ll probably use Tokyo Tyrant. If I need ACIDity, I won’t use NoSQL. If I need a ton of counters, I’ll use Redis. If I need transactions, I’ll use Postgres. *If I have a ton of a single type of documents, I’ll probably use Mongo.* If I need to write 1 billion objects a day, I’d probably use Voldemort. If I need full text search, I’d probably use Solr. If I need full text search of volatile data, I’d probably use Sphinx.
I like this article, I find it very informative, it gives a good overview of the NoSQL landscape and hype. But, and that’s the most important part, it really helps to ask yourself the right questions when it comes to choose between RDBMS and NoSQL. Worth the read IMHO…

Hope that helps a little bit

Lessons learned for large MongoDB databases

We are currently developing a system which wants to analyze all the domains in the internet. This is a really challenging task and not easily done in a few months time. Besides loads of problems, like finding so many domains and parsing them in a reasonable amount of time we also implement a MongoDB cluster to store the analyzed information. Our database has currently 200GB split into two shards but we expect this to grow up to 1-2 TB of data.

There are a lot of posts like this on the web so I’m probably not telling you something new (especially if you are senior dev dba who just goes: “oh my god… i knew that 10 years ago, it’s the same with every database” :)) but I really wanted to share the following things which bugged me quite a while:
Continue reading

How to do a deployment pipeline in jenkins

On our current project we are aiming to reach that goal of continuos delivery. There are a lot of things you need to get right to achieve that but one of the more important ones is a functional deployment build pipeline where a defined version of the source code is being built and pushed through various stages. The later the stage the more confidence you should have in your software and you should feel more sage to actually deploy it to production if this is requested by the business. The whole process should be automatic, except for some stages where one might want to trigger that stage manually.

A really nice build server which supports build pipelines out of the box is the commercial Go build server from ThoughtWorks. It was built with all the ideas of CruiseControl (an open source build server partially developed by ThoughtWorks as well) in mind but going a step further and applying all the latest real life experiences from projects on how to build high quality software.
Continue reading

Web Dev Bros with WordPress 2.7 now

We upgraded the blog to the latest version and added some new widgets. The site was down one day because we forgot to upload a missing file … Sorry for that.

We are trying to do add some posts the next days to get this up and running again. I want to welcome Julien and we can’t wait to see a post from him :-)

What's new in Rails 2.1?

If you want to know what’s new in Rails 2.1 or think that you may have missed a new feature, you should check out the e-book “Ruby on Rails 2.1. What’s new?“.

There are all new features described with an example. It’s quite amazing to see how rails evolves.
btw: did you know that there is finally an i18n patch in edge rails, to be released with rails 2.2? this is a solid foundation for other plugins to build on.

I love HAML

Just a quick post to let everybody know that i’m switching my erb templates over to HAML. All my css with SASS will follow. These two template engines enhance my productivity a lot. it’s compatible with rails 2.

I must admit that there are still some issues like: No ruby code on multiple lines to format the code properly on very long ruby statements. And: HAML source code highlighting in netbeans plugin is quite old and not updated, but this doesnt stop me from using it.

just checkout the “showdown” on the bottom of the first page to get a feeling of HAML.

Rounded corners with CSS (and no images)

I searched a lot to find this great solution for doing rounded borders with just CSS and no images. There are loads of solutions out there which use images to make the borders or doing some funky javascripts which take too much time on page load and the rounding is only applied after onload.

so thanks to Deathshadow for this great solution with pure CSS. just check out the link. there are the examples and the source needed.

Powerfull online bookmark service

With the web 2.0 a lot of online boomark services came up, the most prominent might be del.icio.us. I really like the idea of having all my bookmarks online so i can access them from wherever I want and I don’t need to backup them by myself etc. I tried del.icio.us and some others but never was really satisfied because every service missed some feature i’d like to have in such a service.

Must have features:

  • Possibility to group bookmarks in folders
  • Possibility to tag bookmarks
  • search through them
  • make them public or private
  • plugin to access the bookmarks in FF or IE
  • Import/Export your bookmarks

Nice to have features:

  • share them with friends
  • random bookmarks
  • recommended bookmarks

Finally I found the free GPL’d online bookmark service called Chipmark, which actually has all the features mentioned above. I can highly recommend it since I use it every day, in IE and FF. They also started to provide a public webservice API that you can use in your own applications. By now you can login and the get all your folders and links. This might be really usefull for some people.

Wow, that was finally my first article here :-) I hope it is usefull for somebody.