Archive for the ‘System Administration’ Category

Hi all,

after many months and working hours, I finally very happy to comunicate about a new release of munin_plugins. 

This version is the forth major release and it is a very heavy refator of this software. First of all, the sources are available on GitHub (as usual) and on Pypi (only verions >=4). I’m very excited about the last one.

Change log:

 

  • egg
  • removed symlinks with complex naming 
  • only one sensor for all nginx measuring
  • expandable nginx plugin developing class
  • plone sensors based on psutil (really hard to do)
  • multigraph usage
  • cache a lot

To submit issues, you ca use GitHub issue tracker.

Cheers.

 

Advertisements

Hi All,

In these days I studied about HA of Plone. Some years ago I tested the same things using Oracle in an Enterprise context, so with licenced support of Oracle. Oracle has a client called instantclient, who hide any problem about the presence of more than a dbserver, so relstorage makes no difference between a local db and a cluster. So it was very easy.

With Postgres there’s pgpool, a tool to make the same thing, but I want to take some time to study repmgr, some kind of wrapper around PostgreSQL Hot standby.

The configuration is not so complicated but a little bit long and I don’t want to repeat every step, so: this is the guide about configure repmgr and postgresql:

https://github.com/2ndQuadrant/repmgr/blob/master/autofailover_quick_setup.rst

About this guide, I have to say a couple of words:

  • promote_command.sh is never shown but it may be something like this:
#!/bin/bash
repmgr -f /etc/repmgr/repmgr.conf --verbose standby promote
/etc/init.d/postgresql restart
  • repmgr try to use pg_ctl: in debian this is not necessary available, so after clone command or promote (see before) is necessary to restart postgres by hand

If everything goes well, we now have 2 dbservers in Master/Slave mode using Streaming Replication. Well a few steps are necessary:

  • take a buildout for Plone: a new one or your favorite
  • locate the instance or the instances (sections with recipe plone.recipe.zope2instance)
  • add as usual, the relstorage support (be sure you get at least 1.5.1)
  • remove from DSN relstorage option, ‘host=…’
  • add option replica-conf (see relstorage documentation)
  • write a file in the root of buildout with the list of dbase ips, one for line
  • Buildout and enjoy

To get a really HA, you have to build at least two dbservers and two plone servers with public ip and a round robin DNS (are always in HA) to manage two or more ips for the same domain. To avoid the single point of failure is necessary to get a real HA.

Cheers

Hi all,

finally, after two weeks of debug, here we are with the first release of my munin plugins, nginx version.

Changelog

  • various fixes about parsing
  • lock on cache files
  • sensor for monit downtime
  • moved bots from a single sensor to multiple sensors (one for file)

Requerements

A python2.7 at least. Be sure to have python2.7 in system path for every user.

Install

Take a look at INSTALL file.

Cheers.

Hi All,

on github I released a new version of my munin plugins.

ChangeLog

  • added Bots sensor
  • refactoring code
  • moved from Apache to Nginx
  • added etc folder for configuration
  • added cache for shared informations (bots only in this release)

Here is the repo, for download and/or issues.

Requirements

It requires Nginx as webserver, with custom access log, see README file for details.

Requires Python 2.7 because I use collections.Counter, read this if you don’t know.

Install

Download the zip or make a clone using git. Modify etc/env.py to set file locations: by default I assume a self compiled nginx in /opt/nginx, but you can modify as you want.

After configuration, you can use generate.py to link plugins in munin/plugins folder.

Remember to restart munin-node.

Known bugs

  • generate.py append [runner_*] config every time you run it, so it will be duplicated

Cheers.

Nginx is the actual choice, talking about webserver. It is easy to configure and can serve high levels of parallel requests. Apache, long loved webserver, is losing fan… until now.

Google develop a very interesting module called pagespeed who transform a site in a optimized site using PageSpeed,analyzer. A simple example would be images: every time you analyze a site, PageSpeed shown you how your images are bad, fat a bandwidth eager. This module perform the optimization on the image and keep it a local cache to serve any request.

The site you have to read is https://developers.google.com/speed/pagespeed/mod.

Setup is very easy, use apt, rpm or what_you_like toi install Apache. After that, you can download mod_pagespeed directly from the previous url, and install it.

Now you are ready to fight: start your backend, start apache and test it. Usually I user pagespeed plugin for firefox, because the one for chromium doesn’t works.

If you use Plone as backend, many problems are easy fixed installing plone.app.caching. Strong caching for static resources and no-cache for else may be a good way. Remember to enable gzip compression. Forget all about cache invalidation, it doesn’t works and I’m able to explain why. Contact me if you want to know.

After setting content expiration, the big problem is optimize all images: this link show you how you ca do that cut & pasting few rows in apache cfg file.

My test was made using a production website that obtain 84/100 with pagespeed using varnish as frontend and Plone as backend. I installed apache instead varnish and applied the image optimization: the ne score is 93/100.

Nice fly for 15 working minutes.

Secure SSD

Posted: 18/05/2012 in System Administration
Tags:

http://www.tomshw.it/mobile/cont/news/ssd-che-si-auto-distrugge-cliccando-il-pulsante-rosso/37516/1.html

Hi All,

Following the idea in a previous report about using AWS, this is a report after 3 years.

Differences? No, my idea is not changed: a cloud solution is better than a physical server.

The main event/problem happen during these last three years was the August 2011 power outage. During the blackout, a single zone was affected with a real downtown around 48 hours. The recovery of the zone was completed in more than a week.

At first time, I thought that it was a very long time, but after some reflections, I understood that this kind of problems reach the limit of our technology. I remember some years ago when I worked on a physical server in a farm in Miami. During a night in the building, a switch started to burn with all consequences you can imagine. The farm was down for about 24 hours.

You can follow all best practice you know but there’s always something unpredictable who breaks your eggs in the basket. So what is important? Have a plan B to recover everything.

An example on a physical server you usually use a backup where your usually store the application data, only in few situations you backup all files of a server. So restore a server means to reinstall the OS, all user applications and reconfigure every service. This may takes a lot of time.

On AWS, this is very fast: usually you have static copy of your server called AMI, who is different from the running copy of your server called Instance. They are stored in two different stores located in two different zones, one in the store of AWS (S3) and the other on the physical server. Creating an Instance from S3 takes less than 10 minutes, a big advantage respect to reinstall a server. Creating an AMI is easy too using an Instance, but in this case the time is a little bit longer. What is really important is that you have all instruments to do that, and you have not to buy a specific software or a special DAT resource.

Backing to AWS August disaster, a zone was affected, putting down some all services but not completely: in my case I lost the raid data disk but not the instance, others lost their instances. All restore actions takes less than two hours, mainly spent moving files from snapshots to volumes and restore the data.fs, plone/zope store.

At the end, AWS confirms my ideas, the infrastructure is very powerful, and it requires less administration time compare a physical server.

Cheers.