Sunday, October 11, 2015

Populating a PostgreSQL table / database

The other day I had to populate a table with hundreds of millions of records.
Here's how I managed to do it quite rapidly:

  1. Remove all the constraints and indexes from the table.
  2. Prepare your data in a CSV file that you put on the DB server.
  3. Open a console on the DB server: psql -d myDb
  4. And here comes the star of the show, the COPY statement:
    COPY myTable (theFirstColumn, theSecondColumn) FROM '/path/to/some/file.csv' WITH CSV;

    Append QUOTE AS '"' DELIMITER AS ',' ESCAPE AS '\' if required.
    The table definition is not required if the columns in your file match their order in the table, but I would recommend against it.
  5. Add the constraints and indexes

If you really have to use INSERT statements, at least put them in one big transaction.

Note it is not going to check against NOT NULL!

Wednesday, October 7, 2015

OS X 10.11 El Capital hangs when plugging a USB device in VMware Workstation 12.0

The other day I tried to "hot plug" my iPad into OS X running as a guest, which consistently resulted in the guest OS freezing / hanging with no other solution than reboot it.

I haven't found a real solution for this, nor do I really care to investigate the problem. My workaround is to connect it before OS X starts, and this way it's properly recognized.

A quick example about geofencing (sorry, "region monitoring") on iOS with Swift is coming up!

Monday, March 16, 2015

One more hour of sleep every night

Cheerful Sunset
(no copyright information)


It's 7 pm and the sky has this nice orange - red color.
But my computer screen does not. It's "shouting" a blue-ish tone at me. But not for long!

Everybody knows that watching screens (TV, computer or handheld devices) before going to bed is bad for your sleep. What you might not know, is that if you do use screens at night, you'll sleep more and better if the screen is yellow - red instead of blue. Yes, blue! I won't start the blue/black gold/white dress war again, but it's true and you don't realize it: there is no such thing as white in nature and color analysis is one of these very complex tasks your brain "computes" yet is subjective at the end because your brain is not wired the same way than mine, so at some point you might see a gold dress when I see a blue dress. So believe me, if you are looking at your screen right now, it's blue. If you think car headlights are white, then put one car with regular headlights and one with Xenon lamps next to each other. You will probably say one is yellow-ish and the other is blue-ish.
But that's the beauty of it. If your screen goes very slowly and smoothly from blue to yellow - red, you won't notice it.

So, about that one hour of sleep. There was a study I cannot find anymore, where they had people lying in bed doing nothing, and they measured how fast these people would fell asleep. Then they would repeat the experiment with a more yellow / red light. They found people would fell asleep faster with the second setup.

Redshift on Windows

I haven't tried it, but there is an experimental version for Windows. It does seem like f.lux is older but maybe it works better, I don't know. Tell me if you find out.

Installing and configuring Redshift on Kubuntu (or any Linux distribution really)

In this tutorial I'll assume you live in western Switzerland, and that you use Kubuntu.
The procedure is similar on other systems. I am also assuming you are not changing timezones all the time. (But there is a solution for you if you find yourself in this situation, look at the documentation.)

  1. Install Redshift:
    sudo apt-get install redshift
  2. Create and open the configuration file with your favorite editor:
    nano ~/.config/redshift.conf
  3. Paste this (Ctrl + Shift + V in Konsole) :
    [redshift]
    transition=1
    location-provider=manual
    adjustment-method=randr
    [manual]
    lat=46.7
    lon=7.1

    Change the last two lines with your latitude and longitude (yes, go ahead, click on this link). You can keep all digits if you want to. If your latitude reads South (Australia, New Zealand) or your longitude reads West (North America), use negative values where appropriate.

    If you can see "GNU nano" at the top left of the console window, press Ctrl+O then Ctrl+X when you are done to close the editor and save the file (or the other way around).
  4. Start "redshift" from the Terminal to check if it works. Your screen should go a bit yellow in a matter of seconds if the sun is not up. Otherwise try to mess with the latitude and longitude or your computer clock. There should be no output on the console.
  5. All well? Time to start Redshift automatically. Open the KDE menu and type "autostart". Select the entry that appears. Click "Add Program..." then type "redshift" (without quotes). Don't select anything just type "redshift" and click OK. Click OK again to close the window.
  6. Log out and in again. Your screen should be slightly yellow. It does? Congratulations. You just bought yourself one hour of sleep each night.
Now I suggest you install a similar app like Twilight on your phone.

Thursday, March 5, 2015

Control Netflix from your phone

UPDATE: Teclado Flix has been discontinued.

Check out my new Android application Teclado Flix.



It's a remote control for people watching Netflix on their PC.
TVs from 2005 to 2014 feature HDMI inputs but cannot be connected to the Internet or run apps like Smart TVs can.
So what can Netflix fans with "old" TVs do? Just plug your laptop on the TV and watch Netflix.
But it would be silly to get up to pause and resume a video, so I made a remote control app to do that.

A computer program is required. I made a very tiny one that runs on any operating system.

Sunday, March 1, 2015

Netflix on Arch Linux (or Linux in general)

UPDATE: 
You don't need an unstable version of Chrome to play Netflix streamed media anymore. Simply download the latest stable version of Chrome.
Note that Chromium (it is exactly the same as Chrome, minus features such as syncing using a Google account and proprietary plugins) does not support the widevine and thus cannot play Netflix content.

Original article:

Netflix has thrown away its Microsoft Silverlight interface and is now using HTML5 to stream its content.

Well, almost. Not surprisingly, the videos are still encrypted and DRM'd. It is done using a plugin developed by Widevine Technologies (a Google company).
Chromium is the open-source part of the Google Chrome browser (everything but the Google branding, the syncing mechanism with the Google account and a few plugins, is open source). So nowadays you'll often find the Chromium browser on Linux distributions.

The Widevine plugin is not included in Chromium for licensing reasons, yet you would expect to be able to install it anyway. There seems to be people working on that but I couldn't make it work.

What worked for me is this:

1) Install google-chrome-dev from AUR. The reason you need the unstable version is that you need a very recent version (42+) and that you need a Google-branded browser including the Widevine plugin.
yaourt google-chrome-dev
2) Launch it with the google-chrome-unstable command.
3) Open Netflix and check if it works.
4) If it doesn't work, make sure the Widevine plugin is installed and working. Type chrome://plugins and possibly chrome://components in the address bar and check it is enabled.


Also, check out this cool browser extension Flix Plus from Lifehacker. I find it very useful!

I have to say I am impressed by this company from a business and technological point of view. Did you know they release a lot of their code with an open source license? Did you know they don't manage any hardware but rely heavily on Amazon for their computing and storage needs?
There is a trend nowadays, except for a few giants like Facebook, for most companies to transition from managing their own hardware and storage to have it them hosted by platforms such as Amazon AWS, Heroku, GitHub, Bitbucket, Microsoft Azure and other online services.
In many IT companies, there's no server room anymore, except for switches and routers, as file servers and other tools are physically located outside the company in data centers.

Saturday, February 28, 2015

Arch Linux and Arduino with your favorite IDE

You can edit your Arduino code from any IDE and compile it / upload it without using the Arduino IDE. To do that, the ino tool comes to the rescue.

If you are using Arch Linux you might be interested to know I uploaded a package named ino-1.6-git on AUR. It has been patched to work with the latest version of the Arduino SDK.

Installation

yaourt ino-1.6-git
sudo gpasswd -a $USER tty
sudo gpasswd -a $USER uucp
sudo gpasswd -a $USER lock

Log out and in again, or reboot.

Quick Start:

cd my-arduino-project
ino init
$EDITOR src/sketch.ino
ino build
ino upload

Happy Arduino-ing!

Saturday, February 21, 2015

Remove audio from a video on Linux

If you are working on a computer vision project, there is a great chance that you have been recording videos with sound, which you have no use of. To save space and processing power, you can strip the audio from the video just like this:

avconv -i input_video.mp4 -an output_video.mp4

(Replace avconv with ffmpeg if necessary.)

Just so you know, there is something dishonest with avconv as they make you believe ffmpeg is dead or obsolete. In fact it's nothing like that. avconv is a fork of ffmpeg and has become the default on Ubuntu. So if you are reading to use a filter that isn't present in avconv, then you need to download the latest version of ffmpeg from source or from a PPA. Read more on this subject here.

Rotate a video on Linux

When you record videos on your smartphone, the video itself is not rotated. The video recorder simply adds a tag telling the player how to rotate the video.
That's why mediainfo reports "Rotation: 180°" or other angles on videos recorded from my Samsung Galaxy S4.

Unfortunately unlike the video player on your phone, the vast majority of video players on desktop computers and TVs don't support this tag.

If the video is upside down:

avconv -i original_video.mp4 -vf "vflip,hflip" output_video.mp4
or
avconv -i original_video.mp4 -vf "transpose=2,transpose=2" output_video.mp4

(You can replace avconv with ffmpeg if avconv is not found on your system.)

Note that the first version is slightly faster (2-5 %).

If the video was recorded in portrait mode:

if the top of the image is on the left of the screen, it needs a +90° rotation:

avconv -i original_video.mp4 -vf "transpose=1" output_video.mp4

if the top of the image is on the right of the screen, it needs a -90° or +270° rotation:

avconv -i original_video.mp4 -vf "transpose=2" output_video.mp4

Sunday, February 15, 2015

Improving FabulaTech USB over Network

I've been looking for a solution to share a USB device over the network for years. Unfortunately, as of today, no viable opensource solution exists. usbip is supposed to be that solution, but it's broken and I never managed to make it work on Ubuntu, despite the program being available in universe. The installation on Windows is a very tedious process, even more complicated IMO than the procedure below.

There are however commercial solutions. One of them is Fabulatech USB over Network. It will work on Windows, Mac and Linux.

On Linux, there is no graphical user interface, and they don't provide anything that would make our lives easier. Only a server program and a client program that can be invoked from the command-line.
That's why we'll have to write a daemon and a script to automatically share some devices after they are plugged in.

Step 0 : Unzip the Linux tarball somewhere

Let's say this somewhere is /opt/ftusbnet.

Step 1: Start the daemon (ftusbnetd) at boot

Install supervisord: sudo apt-get install supervisor

Write a configuration file for supervisor by editing /etc/supervisor/conf.d/ftusbnetd.conf:

[program:ftusbnetd]
command=/opt/ftusbnet/sbin/ftusbnetd
autostart=true
autorestart=unexpected

Restart supervisord: sudo service supervisor restart
(You can use supervisorctl if you don't want to restart all the running services.)

If you have never used supervisor before, you really should. Regular init scripts are still useful, but most of the time supervisor does the job much better, with only two lines.

UPDATE: supervisor 3.0b2-1 distributed with Ubuntu 14.04.1 LTS has a little bug. It won't start, after complaining that /var/log/supervisor doesn't exist.
You can either change the line childlogdir=/var/log/supervisor in /etc/supervisor/supervisord.conf, or create the directory before starting supervisord by modifying /etc/init.d/supervisor and adding the line mkdir /var/log/supervisor after case "$1" in start)

Step 2: Run a script when a USB device is connected

Read the previous blog post.
Let's assume the "add" script is /opt/ftusbnet/bin/udev-connect.

Usually, to share a device you would issue the ftusbnetctl command like this:
ftusbnetctl share 1234, where 1234 is an ID generated by ftusbnet and that even changes each time you plug the device.
It would be nice if we could tell udev to run a script only after ftusbnetctl "sees" the USB device. Unfortunately we can't do that. The device won't be accessible until udev has run all the scripts we told him to run. We can't just "sleep" or start a background process. 

The only workaround I found is to run the script 1 to 60 seconds after the USB device is processed by udev. So yes: You might have to wait up to a minute before the device is shared. 
For this we will use the at utility. Unfortunately, at is an old UNIX tool and it only works with minutes, days, and weeks. Tell me if you find a better solution!

/opt/ftusbnet/bin/udev-connect contains this:

#!/bin/sh
echo "/opt/ftusbnet/bin/scansnap-connect" | at 'now + 1 minute'

In my case I am connecting a device that is shown as "ScanSnap" when executing ftusbnetctl list.
Make sure the script is executable.

Step 3: (Cont'd)

Here is the content of /opt/ftusbnet/bin/scansnap-connect:

#!/bin/sh

ftusbclient=/opt/ftusbnet/bin/ftusbnetctl

while ! $ftusbclient list | grep ScanSnap
do  
    sleep 1
done
sleep 1

ftid=$( ${ftusbclient} list | grep ScanSnap | awk '{print $1;}' )
logger -t ftusbnet-udev "Sharing device ID $ftid"
$ftusbclient share $ftid

Make sure the script is executable.

The before-last line will print a message in the syslog. If all you never see the line or the ID is truncated, you have a problem.

Ubuntu: run a script when a USB device is plugged in

There's a lot of stuff you can do with udev. One of them is running a script when a USB device is plugged in.

It's as simple as:

1. figure out the vendor id and product id with lsusb
2. edit /etc/udev/rules.d/85-something.rules:

ACTION=="add", SUBSYSTEM=="usb", ATTR{idVendor}=="04c5", ATTR{idProduct}=="11a2", RUN+="/absolute/path/to/your/script/that/will/be/executed/as/root"

3. Write the script mentioned above and make sure it's executable.
4. Restart udev: service udev restart.

You can do other things with udev such as managing /dev/ and the permissions on the device.
However you should not use udev to mount USB disks, there are other tools for that.

To run a script when detaching a USB device, replace ACTION=="add" with ACTION=="remove". You can put several lines in the same ".rules" file.


Two scripts for the Cisco VPN Client

I recently wrote a post about using openconnect instead of the Cisco VPN Client on Linux. If you really want to use the Cisco VPN Client but miss the handy vpnc and vpnd commands, this article is just for you.
It uses the expect tool from Don Libes, who wrote it in 1990 when he was working for the U.S. government. Therefore, Expect is in the public domain.

Before executing the commands and modifying the script, you should run /opt/cisco/anyconnect/bin/vpn to understand what is going on. The script tries to mimic a user waiting for the console to print things, and then type on the keyboard.

Here's your vpnc script:



And here is vpnd:

Friday, February 13, 2015

Speed up Windows updates

Apparently there's something you can do to speed up Windows updates:

1. Open a command-line prompt with elevated privileges (Win + S, search "prompt", right-click, Run as administrator)
2. sfc /scannow

The Windows System File Checker tool will detect corrupt system files, and some people have experienced a noticeable speed up of the Windows updates.

OpenVPN with two simple commands

In the previous blog post, I wrote about Cisco VPN. Now here is the same thing with OpenVPN.

It also contains a little trick I have to use because I can't manage to get the DNS configuration right after connecting. (I need to access the DNS server internal to the LAN I am connecting to.)

Here's the connect.sh script:



And here's the disconnect.sh script:



To make the scripts work, you have to put them in the same folder than your OpenVPN configuration file (here named "client.conf"). This directory must also contain a valid "resolv.conf" file that will be used when the VPN connection is up and running.

As in the previous article, you can execute connect.sh and then close the terminal.
I advise you to create aliases to these scripts. The scripts are made so that you can run them from any directory.

Cisco VPN on Linux, with free software

Many companies and schools use a VPN solution from Cisco.
While the client on Windows is fine, the Linux version does nasty stuff such as creating copies and aliases of the /etc/hosts file. Also, I don't like running closed-source software with root privileges.
The third reason not to use Cisco VPN Client on Linux: you can't manage it with the package manager.

There's an open source alternative called openconnect.

I've written two aliases:

1) vpnc initiates the session, then goes to the background. You can close the terminal.
2) vpnd disconnects your from the VPN network.

Put this in your ~/.bash_aliases or whatever:

alias vpnc='echo Your_Password | sudo openconnect vpn_server_adress.domain.com --authgroup="AuthGroup as displayed on Cisco VPN client" --user=Your_Username --passwd-on-stdin --background'
alias vpnd='sudo pkill openconnect'

You can also put your password in a file, chmod 400 it, and use "cat my_file" instead of "echo Your_Password".
If you think don't like my pkill solution, you can use openconnect's "--pid-file" argument.

There you have it. Your friends using Cisco VPN client will envy you so much. 4 characters and you are connected!

Wednesday, February 11, 2015

Google's No reCAPTCHA in PHP

Google bought reCAPTCHA in 2009, and recently they made their small revolution by replacing the usual but at the end craaaaaaaazy skewed texts with a simple checkbox, except in a few cases where there might be a regular captcha or a small popup to ask you to identify pictures of cats :


On my website I was still using the old library with PHP classes and functions. Now they provide a RESTful JSON web service to do the same. Unfortunately Google provides no example and if you are using PHP on a shared hosting you will run into problems if you try to use methods commonly found on the Internet, StackOverflow included.

So here it is.

On the page displaying the captcha:


In the PHP script handling the form:



The public and private keys are provided by Google for each reCAPTCHA instance.

The Internet tells you to use the one-liner file_get_contents(url) to retrieve the JSON response, unfortunately on shared hosting this is not an option and you have to fallback to cURL instead.

The 1st generation

The story behind reCAPTCHA is an interesting one.
Originally the slogan was "Stop spam. Read books" because each day millions of captchas were solved to help digitizing books were OCR technologies were unable to do so.

They had this system with two different OCR engines and if neither of them was able to find a word from the dictionary, then the world would be presented to users. Words from OCR get 0.5 point, and answers from humans get 1.0 point. When the image gets 2.5 points with the same deciphered word, it is considered as accepted.
Words considered valid are then shown next to unknown words in order to tell humans and machines apart (this is called a Turing test). These valid words can be words that were decoded correctly by the two OCR engines or words resulting from the process described in the previous paragraph.

The New York Times Archive

At some point, the slogan disappeared because reCAPTCHA stopped reading books... Apparently it helped digitizing 13 million articles from The New York Times.

The 2nd generation

On websites not using the new API demonstrated in the animated GIF above, the captcha image looks like this:

It is only today that I realized the number on the right comes right from Google Street View and made the link between Google buying reCAPTCHA and these house numbers. So instead of helping culture millions of people are now helping Google with their business and working for free for the company. (Yet this blog is hosted by Google, I know ;-))

Like me you are probably amazed how the guys at Google are able to automatically detect house numbers in all images the cameras take, so you would ask yourself: Is it really that complicated to identify a 3 digit house number?
No, it is not! But Google wants to make sure its OCR engine is right before filling its database with wrong information to provide its users accurate information and make more money.

The 3rd generation

It's great news for us humans only having to click a textbox instead of trying to decipher impossibly complex CAPTCHAs as found on downloading websites. Remember the one with the cats and dogs, and I am not talking of Microsoft latest finding?

Alternatives

TextCaptcha is another good alternative you should checkout.

Good reads

At Stanford they studied how CAPTCHAs have become hard to solve for humans.

Last but not least, I didn't resist posting this, a Turing test letting only computers in:



(Source: http://www.smbc-comics.com/?id=2999)

More ditching: OpenSSL

That thing must go in the trash, for real.
It turns out the Heartbleed vulnerability everybody has been talking about in April 2014 was just the tip of the iceberg.
The reasons? The whole thing is buggy, very badly written, opened to NSA backdoors, and having tested it myself, the library (that is, the C interface) is impossible to use without losing one's temper...



Let's welcome LibreSSL, a fork of OpenSSL by the OpenBSD community. They have already developed a more user-friendly library (libtls) and they have been actively fixing the codebase since May.

LibreSSL has replaced OpenSSL in OpenBSD 5.6, released in November 2014. It is now production ready and can be trusted. [1]
In my opinion, people should move away from OpenSSL, and even more from Microsoft CryptoAPI which is closed-source. [2]

[1] http://www.openbsd.org/papers/eurobsdcon2014-libressl.html
[2] http://blog.cryptographyengineering.com/2013/09/on-nsa.html (An excellent blog for security lovers by the way! Love what this guy writes.)

Tuesday, February 10, 2015

Swisscom PLC device password Asoka PL7667-ETH

I have been looking to configure Swisscom's PLC / HomePlug AV device (Asoka PL7667-ETH) for quite some time now, but I couldn't find any manual mentioning the password of the web interface.

Here it is:

Username: admin
Password: welcome

(found in this document)

You might want to use the interface to:
  • switch the attached device on and off, thus reducing the electrical bill (I'll probably write a script for that ;-))
  • change the IP address and other settings

Monday, February 9, 2015

Python traps

Short post today: I love Python. Unfortunately, like all programming languages, it has some traps that you might fall into if you don't pay attention. Read this blog post for more information.

For instance

def myMethod(lst=[]):
    lst += 'a'
    return lst

doesn't do what you think it does. Indeed:

> myMethod()
['a']
> myMethod()
['a', 'a']
> myMethod()
['a', 'a', 'a']

Suprise!
For once, C++ is more user-friendly, as it will create a new instance for the value of default parameters every time the method is called. That is not the case in Python.

Downloading subtitles

Subliminal is a command-line utility written in Python that downloads subtitles for TV episodes or movies.

While media players such as Kodi (formerly known as XBMC) offer this feature, it's a painful process to pause the video, open the subtitles menu, download the subtitles file, wait until the download completes and so on.
I stream videos directly from my NAS, so that wasn't even an option because the smart TV is unable to store the subtitles file anywhere, it has to be at the same location as the video.

With one little command, Subliminal fetches subtitles from various sources and finds what it thinks is the best subtitles track for the media files you throw at it.

Here's how to install and use it:

1. Try to find if you can install Subliminal directly from your package manager. If so, skip to step 4.
2. Install Python PIP.
3a. Recommended method: using virtualenv(wrapper): mkvirtualenv subtitles; pip install subliminal colorlog
3b. Alternate method: directly on the system: sudo pip install subliminal colorlog
4. Download the subtitles: subliminal -l en -s --color videoOrDirectory anotherVideoOrDirectory...
You can use other flags. These will download a subtitle track in English. The "-s" flag means the file will be named something.srt instead of something.en.srt. I use this option because some players are not able to associate the ".en.srt" subtitles file with the video. "--color" is just for eye candy. You can use "-v" for more messages.

To my knowledge subliminal won't browse directories recursively. You can achieve that effect with:
find . -type d -exec sh -c 'subliminal -l en -s  "{}"' \;

Subliminal is smart enough to detect if the file is a video, and if it needs subtitles.
I was confused with the message "No video to download subtitles for". It will be printed if there was no video in the path(s) you provided and if the video(s) already contain(s) a subtitles track. You can check that with mediainfo.

As with Kodi, you should follow certain naming conventions. It is best to include the full name of the show or movie in the title, and describe the episode as "S01E02" (Season 1 Episode 2) or "1x02". For best compatibility avoid spaces and non-ASCII characters. If you downloaded the video from the Internet, you should however try to keep the filename as it was, including the uploader's nickname and the video quality (HDTV, 720p, 1080p...) There might be a subtitles track for the exact video file you have.

Sunday, February 8, 2015

Ditch scp (SSH copy) and use rsync instead

That is a strong statement, I know.
You have been living happily and using scp to transfer files between machines is a no brainer. But here is the thing: HTTP, FTP, SMTP, and SSH are bad file transfer protocols. Why you ask? Because they offer no guarantee you have got yourself a carbon copy.

The solution? Checksums! And not MD5 or SHA1 which are both broken, but SHA256 or SHA512, both already available on your system (except for Windows...)

So you could sha256sum the file before transfer, transfer, then sha256sum the file after the transfer. But that's rather painful and you'd have to write a complicated script to automate this. There's a much simpler solution.

For some reason, people think rsync is a complicated tool. Maybe because it's got so many flags and options. But it's actually a very complete tool that does what you want without having to write complicated shell scripts.
Four amazing things it can do:

  • Syncing directories (great for backup), transferring only what needs to be transferred, deleting things that aren't present on one side if that's what you want
  • Resuming directory transfers where it left, copying only what's needed
  • On-the-fly compression
  • Checksums
The last feature is a very interesting one, because you will know immediately if the transfer failed, whereas scp won't even tell you anything and might act like everything was fine. It's only 6 months later when you try to restore a backup that you realize all your data is lost forever...

So, replace this:

scp myfile.tar.gz remote-user@remote-host:/remote/directory/

with this:

rsync -avPe ssh myfile.tar.gz remote-user@remote-host:/remote/directory/

Of course you can transfer files the other way around. There is no additional flag needed for directories, but you have to understand the importance of trailing slashes. They play an important role.
Here I used the SSH protocol to perform the transfer but there are other options, including the rsync protocol, that is commonly used to mirror distribution repositories accross the world, amongst other things.

Parallel Gzip with a progress bar

Everybody knows how to create a "tar.gz", probably using something like this:

tar zcvf foo.tar.gz bar-directory/

This is fine to compress something relatively small (a few megs).
However, if you have to archive hundreds of gigabytes, you probably want features such as:

  • Speeding things up
  • Displaying a progress bar
  • Displaying an ETA
This command* offers all of these:

tar cf - bar-directory/ -P | pv -s $(du -sb bar-directory/ | awk '{print $1}') | pigz > foo.tar.gz

You'll probably have to install pigz, which is a parallelized version of Gzip. You can substitute it with pbzip2 if you want a "tar.bz2" archive.

Supposing the CPU was the bottleneck and not the I/O, which is probably the case if you are working on the local filesystem, this will speed things up by a factor of "number of CPU cores". For instance, instead of 1 hour with gzip, it took only 15 minutes with pigz on my machine with 4 CPU cores.
In fact it was almost as fast as copying the directory without archiving / compression.

As a bonus you get a nice progress bar in your terminal:


*On OS X and *BSD, du uses another unit, use awk to convert the size into bytes.

Friday, February 6, 2015

Anybody can look at your Instagram

Geek version: I just think everybody should know that Instagram uses unencrypted connections. So not only it's very easy to sniff the traffic it sends or receives from a REST JSON web service, but also that the links to the pictures in the JSON messages are publicly accessible for anyone with the right link...

Non-geek version: Anybody can look at your Instagram pictures, what's in your inbox, what you send and what you receive, especially if you use a company / school / shared WiFi network. Use a mobile connection or avoid using Instagram altogether.
Instagram automatically downloads data without your knowledge, so even if the app is not opened, people can look at your data.

Also, never use a password on more than one website (use a password manager instead) and use a code on your phone.

Saturday, January 31, 2015

Facebook reverse phone search is much more dangerous than you think

Facebook's got this dangerous feature allowing anybody to search for people using only their phone number.

Many companies are now taking profit of this to associate a number with a name and other information from the profile. This is incredibly bad for your privacy.

Facebook's bad decisions

Facebook made two particularly bad decisions.
First they require a phone number for many of the site's functions.
Secondly, on your profile you can choose who can see your number. You can even select "Me only". But there is a different setting, which is enabled by default, that allows people to search for your profile using only your phone number.

There's a saying in software design that default settings should be good for most people. I don't think this is the case here. The problem is made even worse by the fact that Facebook is used by teenagers (and older people too) who are not aware of the consequences a lack of privacy can have on their lives, and their Facebook profile contains everything there is to know about them and that can be used against them.

On a larger scale...

Maybe you are thinking "Well, so someone knows a number and can find who owns it, what's so bad about it?" It would require a lot of time for people to look for the numbers*.
Then you are not aware of what can be done with computers:

  • Write a program to perform a brute-force search by trying every possible number there is out there, and build a database. Then sell this database.
  • You think Facebook would find someone doing such kind of search? If they do, then attackers would use different network paths for each connection like it can be done with I2P.
  • I2P would be particularly slow though. Also you would need a Facebook profile to do the search. Then, botnets would be used. People operating such networks have 100,000s of "zombie" computers working for them (where supposedly there would be a cookie on the computer allowing them to perform the search), and these would use people's Facebook accounts to do the search. The attack could be done in minutes.
* Did you know there are people in India and elsewhere in the world currently solving CAPTCHAs by hand? Although there are advanced techniques to solve many kinds of CAPTCHAs, by the time and price an engineer can write such a program, people with low pay from poor countries would have solved millions of these stupid images.

Friday, January 30, 2015

Microsoft and Snapchat vulnerabilities got publicly exposed.

Here's a post to thank people who don't fear to publish information about security threats when the software companies don't care after they have been warned.

Microsoft for instance failed to correct serious issues after 90 days it has been reported to them by Google on its Project Zero blog. I mean, come on! Microsoft is a giant company. 720 hours is plenty to correct the bug in time.

Snapchat isn't a good player either.
For those who don't know, Snapchat is a social media site who mostly focuses on the exchange of so-called "snaps" : photos or short videos that delete themselves upon viewing.

The recipient can only see the video or photo once and while holding a finger on the screen. The "snaps" get deleted afterwards.
Well, not quite. The truth is, the snaps are marked for deletion by the operating system. The files get renamed with a ".nomedia" extension and will be really removed from the file system later, because, as you might have witnessed with the "Gallery" app, this operation is slow.

Many applications, which are illegal by the Snapchat terms and conditions as well as Google Play's, and hence were removed from the store, did something very simple: copy the marked files elsewhere and rename them. Voilà, users could view the snaps as long as they wanted and re-share them.
To prevent this, Snapchat used some encryption.

If you know a thing or two about encryption, you might be aware that the term encryption refers to an infinite number of techniques trying to "hide" data in some way or another. For instance, reversing the letters from a text is considered encryption.
As this very interesting and very well written article from GibsonSec will tell you, Snapchat uses AES/CBC with a single synchronous key. The decryption function in Python is only 8 "instructions" long, including two requests on a web service.

Snapchat founder said he doesn't care about security. He wants his users to have fun with the app. That's obviously something a product design major would say, not an engineer.
Because the truth is that if they want their service to exist in the future and make money, they should consider this issue very seriously. If people can cheat and anybody can save the snaps by downloading an app publicly available on the Play store, their whole business idea goes at the bottom of the sea.

Even if Google removes the apps from the store, Android users, hopefully, are free to download and install what they want on their device. So simply removing apps exploiting the encryption weakness is useless.

Snapchat played its cards very badly, as users got their credentials stolen because the company still considers security a minor issue...
There are plenty of websites featuring stolen photos and videos from Snapchat. What did the company say? It's because users installed third-party apps. Boo hoo. If the government did the same. Who would you blame? The citizens, the hackers? Nope. You'd blame the government. So you should blame Snapchat.

Snapchat doesn't give a **** about your privacy.

Wednesday, January 28, 2015

ownCloud Client asking for password when starting a KDE session

UPDATE: On another machine I have, the fix described below is not enough. You have to make sure that the KWallet daemon is enabled for the current user. Enable with: System Settings => Account details => KWallet => Enable

The ownCloud client is integrated with KDE's password manager (KWallet).
When it starts, it checks if the kwallet is unlocked and looks for the password.

Yeah, that's how it should be and that is a very software design. Unfortunately it has never worked and the ownCloud client asks for the password every single time!

In the latest version the problem is actually explained in the dialog ("No keychain available"):



In my case, which seems to be the default, the problem is that kwalletd is not running when the ownCloud client starts. It's a timing issue.

My solution :


1. Open KDE's "Autostart" control module (a quick search from the start menu will get you there)
2. Change the ownCloud entry to point to a script of your own.
3. Click OK.
4. Write the following shell script:



5. Make it executable.
6. Reboot (or kill kwalletd and try your script. Note that logging out doesn't kill the daemon.)

Of course if your KWallet is protected by a password, then you will be asked to provide it.

VBoxManage made simple

As you might know, VirtualBox machines can be managed from the command-line. However, I find the syntax of the VBoxManage command-line utility cumbersome and hard to remember.

Copy this in .bash_aliases:



Now you can do the following:

vm-start starts a VM in headless mode
vm-savestate suspends a VM to disk
vm-powerbutton simulates a press on the power button
vm-poweroff simulates unplugging the power cord (some OS don't shutdown completely)
vm-running lists the running VMs

Autocompletion is enabled and dangerous actions will ask for confirmation.

(You will need to re-login or "source" .bash_aliases whenever you add / remove a VM.)
(The script doesn't support VM names with spaces.)

Sunday, January 25, 2015

Backing up your data with Amazon S3 Glacier and rsnapshot. A complete guide, Part 4.

Aletsch Glacier in Switzerland
Photo from John Fowler

Amazon Glacier

Here we go! In Part I I wrote quite a lot about Glacier, now is the time to get our hands dirty.


Create an AWS account and a vault

Before starting you need an Amazon Web Services account. I realize there is no point in showing you screenshots or explaining you the following process in details because it's well documented by Amazon and it will be probably be outdated by the time you read this. Using the web management console:

  1. Create an AWS account. It will be referred by Amazon as the "root account".
  2. Create a group with permissions for Amazon Glacier.
  3. Create a user (IMA) and add it to the group you just created. Mark down the access key and the secret key. We will need them in a minute.
  4. Switch to the datacenter you want your data to be stored in.
  5. Create a new vault in Glacier. Name it appropriately, for instance "yourHostname_backup".
  6. Configure the appropriate limit for the vault in the "Settings" panel from the Glacier page.

Glacier-Cmd

Amazon doesn't develop any command-line or graphical client. All they offer are wrappers in all languages for their REST API. The Java and .NET APIs offer high-level features that the others do not. But still, everybody needs to upload and download archives, so some people developed interfaces. One of them is Glacier-Cmd.

As a regular user:

git clone https://github.com/uskudnik/amazon-glacier-cmd-interface.git
cd amazon-glacier-cmd-interface
sudo python setup.py install

At the time of writing there was a pending patch to support "eu-central-1", Amazon's latest datacenter located in Frankfurt am Main, Germany, Central Europe.

With the user that will send the archives to Amazon Glacier:

Edit ~/.glacier-cmd:

[aws]
access_key=YOUR_ACCESS_KEY
secret_key=YOUR_SECRET_KEY

[glacier]
region=eu-central-1
#logfile=~/glacier-cmd.log
#loglevel=INFO
#output=print

Change the keys and region accordingly. The rest is optional. Attention! You must choose the region in the AWS web console, not here!
The logging commands don't seem to work.

Verify this is working:

glacier-cmd lsvault

By the way, you can create vaults with glacier-cmd.
The command-line is badly documented. Look at this page instead.

To upload an archive:
glacier-cmd -c ~/.glacier-cmd  upload --description "A description of the archive" your_vault_name archiveName.tar

Do I need to tell you to run this with Task Spooler or Screen?

I am not sure Glacier Cmd completely supports resuming. But in case you get a timeout, try this:
  1. glacier-cmd listmultiparts your_vault_name
  2. Copy the upload ID
  3. Retry the upload with "--resume --uploadid the_copied_upload_id".
The CLI says it resumes something even though the documentation says it doesn't support the feature, so I'm a bit lost. Maybe because the doc is 2 years old...
See also this solution in case of timeouts. (in short: while true: do glacier-cmd ... --resume --partid 1234 --partsize multiple_of_two; sleep 600; done)

Alternative: Glacier CLI. It also offers an interesting mechanism to work with git-annex.
Alternative on Windows and Mac: I love CyberDuck. It might well be the most user-friendly GUI for many storage providers including S3 Glacier.


Things to know

  • Upload is free
  • Download is not
  • Asking for the "inventory" takes times.
  • Asking for a download link takes at least 4 hours.
  • Download links are available for 24 hours. (Does it mean we have 24 hours to download the entire archive?)
  • It takes one day before deleted archives are not listed anymore in the "inventory".
  • Started but failed uploads are kept by Amazon, you must either resume them (see above) or remove them (glacier-cmd abortmultipart)

Automating

The next step is to automate the upload and the deletion of the archives created from rsnapshot. 
Remember you can only delete for free 90 days after the archive has been uploaded.

Last words: Testing

Voilà. We are done. One last advice: test your backups. To do this, create a small directory, for instance with a single "Hello World" text file, and modify rsnapshot so it runs a script that will create an archive, encrypt it and send it to S3 Glacier. Then download the archive with Glacier-cmd or another software (try one of the GUIs like SAGU), decrypt it, extract it and see if you can retrieve the "Hello, World" text.

Backing up your data with Amazon S3 Glacier and rsnapshot. A complete guide, Part 3.


(Part I is here)

Remember when I told you it was a bad idea to backup everything automatically with rsnapshot if the files never change (e.g. photos from 5 years ago) ? Still, you want to make one backup of these files.
That's what we'll do now. We'll create archives manually and encrypt them.

Creating archives from the rsnapshot folders and encrypting them is left as an exercise. It should be easy if you read the guide.

Creating an archive

The most simple way to create an archive of a folder is this:
tar cf somename.tar folderPath/

Then you could run "watch du somename.tar -h" to see how it progresses.
Hopefully, there is a more complicated way!
tar cf - folderPath/ -P | pv -s $(du -sb folderPath/ | awk '{print $1}') | cat > somename.tar

This will display a nice progress bar. I suggest the use of screen for long-running jobs. You can also try Task Spooler, a barely known, badly documented yet very useful tool!
Note that the two commands above create an uncompressed archive. That's what you want if you are creating a backup of your music library, images, videos, ZIP files, ...

If you want compression:
tar zcf somename.tar.gz folderPath/
for a GZIP compressed file. It's slow because it uses one thread. There are multithreaded implementations of GZIP and BZIP2 (namely pigz and pbzip2) that will linearly accelerate the compression depending on the number of CPU cores.

With the progress bar:
tar cf - folderPath/ -P | pv -s $(du -sb folderPath/ | awk '{print $1}') | gzip > somename.tar.gz

Note: Instead of TAR you might want to take a look at DAR. It offers many useful features.
Note 2: I've found that with default options filesizes go like this: uncompressed > gzip > bzip2 > xz (~ 7zip).
Note 3: Here and in general: avoid ZIP, RAR and DMG files. Everything TAR based is 100 % open-source while these are not, or might get you in trouble. Also tar+gzip+bzip2 are available on all UN*X after the first boot.

Encryption

OpenSSL is able to do some compression but it's not meant for large files, so we can't use it. We are left with S/MIME and PGP. Here we will use GnuPG / GPG, an alternative to the original PGP (Privacy Guard / Pretty Good Privacy) proprietary software.

First we'll need to create a private and public key. I won't explain how PGP works, nor RSA or ElGamal... There are plenty of GUIs for all operating systems to create the keys, but you can create them with the command-line, as explained in this online guide:
gpg --gen-key

Make several copies your private and public keys. Use a strong password. Whenever possible use 400 permissions for the private key. You can give your public key to anyone ; in fact most people put them on a key server so other people can find the public keys to send encrypted e-mails.
PGP is great to encrypt files or e-mails so only designated recipients can read them. In this case, the entity encrypting the file and the recipient are the same person. Let's encrypt our archive:
gpg --encrypt --recipient 'Your Name or e-mail here' archivename.tar
or even shorter:
gpg -e -r NameOrEmail archivename.tar

You'll end up with a new file named "archivename.tar.gpg", encrypted. You can now delete the unencrypted version. 
Exercise: combine the archive creation, compression and encryption using pipes. Yes you can.

I would advise to check this page from NASA explaining how to use the AES256 cipher algorithm and how to use the compression flags of GPG.

Backing up your data with Amazon S3 Glacier and rsnapshot. A complete guide, Part 2.


(Part I is here)

Let's get our hands dirty!
It's time to make automated backups with rsnapshot.

Remember rsnapshot lets you access full backups while minimizing space and letting you access older versions of the files.

Install rsnapshot

Step 1 is to install rsnapshot on your system.

Configure rnapshot

rsnapshot can be configured to store files over the network and do pretty complicated stuff. It is in fact just a layer written in Perl on top of rsync and other common Linux commands.
The configuration file /etc/rsnapshot.conf will tell you plenty on how to configure the program. I just want you to pay attention to these points that are not that clear in tutorials and hard to find in the documentation:
  • Use TABS, not spaces. If like me your default in ViM is to replace tabs by spaces, you can temporarily disable this behavior for the current session (or file?) by typing ":set noexpandtab". It has to look stupid when you "cat" the file.
  • Folder paths must end with a slash. Always.
  • Look at the rsync man page for the exclusion patterns you can use.
  • The retain lines should be read like below. Do not try to interpret it otherwise, it would be wrong.

    retain hourly  4


    Keep only the four most recent versions of the job named "hourly". Only a few people know this but "hourly" doesn't mean anything for rsnapshot. You could replace it with "darkvader" if you wanted to.
    Here are incorrect ways to read the "retain" lines:
    "4" is not the number of times per hour the backup must be done.
    "hourly 0.5" doesn't mean the job will be executed every two days.
  • The retain lines must be declared from the most to the least frequent. So: hourly, daily, weekly, monthly, yearly.
  • Again, the job name (e.g. "daily") doesn't mean anything. You can remove any of them. For instance you could have it configured to keep the last 4 "hourly" jobs and the last 2 "monthly" jobs without mentioning "daily" and "weekly".
  • I repeat for the third time: the job name has no meaning. So if you put "daily" before "hourly", then the folders named "daily" will actually contain the "hourly" backups.

Rsnapshot will create the output folder if it doesn't exist. On Debian, the default path is /var/cache/rsnapshot. The folder will be owned by root and forbid anyone else to access it.

First run

The very first time, invoke rsnapshot manually as root from the command line (preferrably with screen) in verbose mode and see what happens:

rsnapshot -v hourly
where "hourly" is the name of the first retain job in the configuration. The very first run will take much longer than all the other afterward because it has to make all the copies. The next runs are faster because only the modified files get copied.

Schedule rsnapshot to run every hour / day / week / month ...


If all went well, you can now create a few cron tasks to run rsnapshot automatically. Type "crontab -e" as root and enter something like this (I will explain it below):

# m h  dom mon dow   command
  0 1,7,13,19 * * * /usr/bin/rsnapshot hourly
  0 2  *   *   *    /usr/bin/rsnapshot daily
  0 6  *    *   1   /usr/bin/rsnapshot weekly
  0 11 1    *   *   /usr/bin/rsnapshot monthly

Quit the crontab editor.

hourly: Instead of "*/6" to make an hourly backup every 6 hours, I didn't want the first one to run between midnight and 1 am because I know there are other cron jobs scheduled at that time. 
If you are keeping the last 4 "hourly" backups you probably want to make one every 6 hours. Does that make sense to you?

daily: There is one big risk with these cronjobs. It is that the hourly cronjob is not finished when you schedule the daily cronjob. In that case, the daily cronjob will be cancelled. I am pretty sure you can configure rsnapshot to run two jobs in parallel but I would advise against that. The best bet is to keep enough time for the "hourly" job to complete.

weekly: Same remark. Funny story, the value of "dow" can be [0 - 7]. Both "0" and "7" designate Sunday for portability reasons. Here "1" is for Monday. (You should probably run the weekly job in the week-end in a corporate environment.) In my case the job runs every Monday at 6 am.

monthly: Same remark regarding the hour (not too close from the other jobs). In my case the monthly job runs every 1st day of the month at 11 am. 

Trick question: How can you schedule a backup to run every 3 days instead of one and keep all of the backups from the past month? You must keep the daily and weekly backups.

In /etc/rsnapshot.conf:
retain everyotherday 10 
where "everyotherday" could be "gogglydoe", and 10 is 30 days divided by 3 days.
The line must go between "daily" and "weekly".

In the crontab: 
# m h  dom mon dow   command
  0 0  */3 *   *    /usr/bin/rsnapshot everyotherday

Enjoy the power of full backups

You know what's nice with full backups (or kind of, as rsnapshot uses hard links to avoid duplication) ?
You can browse the backup folders in /var/cache/rsnapshot just like the "live" folders!

Continue to Part III

Backing up your data with Amazon S3 Glacier and rsnapshot. A complete guide, Part 1.


In this first part I'll tell you when to consider Amazon Glacier or not, compare full backups to incremental backups, and explain why you shouldn't "put all files in the same basket".

When to consider Glacier, and when not

Glacier is a great storage solution offered by Amazon for about $0.012 per GB, supposing :

  • You want something cheap but reliable ;
  • You understand that by "Glacier" Amazon means that your files are frozen, it takes a while to get to the glacier and heat up your data so you can retrieve it ;-)
  • You almost never need to access the data from the server (doing so will cost you something, and you will have to wait about 4 hours before getting a download link) ;
  • You already have some primary backup storage (a second disk will do) where you can restore data immediately if needed ;
  • You understand that Glacier is only meant to protect your data in case of fire or other major events, not simply to restore a file deleted by mistake on the "live" storage ;
  • You don't plan to delete your files less than 90 days after uploading them (otherwise it will cost something) ;
  • You are OK with the principle of storing and retrieving archives instead of single files.

With these considerations in mind, if the delay (~4 hours) to retrieve your data is unacceptable you are looking at the wrong product, try regular Amazon S3 storage. It costs 3 times as much but it's no slower than downloading this web page.
In fact there are plenty of use cases where Amazon Glacier is not the right solution except if you are willing to accept its limitations.

Full backups and incremental backups explained

If you copy a folder with all the links (cp -a src dest), you are doing a full backup. If the source folder is 100 GB and you want to keep the backup for the last 7 days, you will need 700 GB of storage, and it will take 20 to 25 minutes to copy. If you have 1 TB, we are talking about 3 to 4 hours !

The nice thing about full backups is that you can browse the backup just like you would with the "live" copy because it's a plain old regular folder! There is no need to extract archives or to use the backup solution's command-line client.

But as you can see, full backups use a lot of storage and are not particularly quick. The alternative is incremental backups. Instead of making a whole copy of the source folder, you only do it the first time. The next time only the differences get saved. So if you add one character to a text file and that's all you did, the second backup is only 1 byte (I am simplifying but you get the idea). The technical term to describe this would be a "delta".
A good command-line program to make incremental backups is rdiff-backup. 
One big flaw of this system is that you can't access the files directly because the complete content of the file is splitted across backups. You will need to rebuild it from all the small pieces.
What incremental backup kind of people usually do is to create a full backup every other week or so to mitigate the problem.

My personal preference is rsnapshot. It's probably the best of both worlds. It gives you full backup-like folders while saving only the files that changed. So yes, if you change one byte in the file, a complete copy is made. That's the price to pay.

The little magic trick that rsnapshot uses is hard links. You see, when you list the content of a folder or you type rm somefile you are only dealing with a symbolic name to a record (also named "inode") on the file system (the i-node contains all sorts of metadata but not the name). It means two things: not only nothing is erased from the disk when you ask to "remove a file", but that you can have two filenames pointing at the same content on the disk. This principle is known as a "hard link".
A "symbolic link" on the other hand is the UN*X equivalent of shortcuts on Microsoft Windows. The "shortcut filename" points to an link-type i-node which points to the real i-node. If the real i-node marked as removed, the shortcut gets broken.
This means that rsnapshot never stores the same file with the same content more than once and explains why the very first time you run rsnapshot it takes much longer than say the exact same command run one hour later. That's why it is advised to run the first backup manually instead of letting cron do it, so you can verify it works like it should and because it takes a long time.

The dilemma

There is one problem with rsnapshot. If you make an archive of the last folder which is supposed to be only a few megabytes bigger than the folder from an hour ago, you end up with the full 100 GB backup. You can send it on Glacier and it will be great because when the time comes, you'll get a full copy requiring almost no more work than extracting it.
The bad news: you will need to pay to store the same file again and again.

Incremental backups are much less practical to store on Glacier. First you have to keep a log of some sort to know when you stored the first version of the file and where are all deltas you need to build the version of the file you are interested in. This is very complex and cannot be done by hand.

Not all files are born equal

I have 200 GB of data to backup. But here's the thing: you are probably like me, 90 % of it is made of files that never change and take a lot of space. These can be photos and videos. They never change and "incremental backups" are useless on that kind of files.
You must be very picky when choosing the folders you want to automatically backup.
This way you don't make useless copies of files you know will never change and you reduce your costs.

Stuff you are working on gets royal privileges

I've got two requirements regarding files related to projects I am currently working on: there must be at least two copies accessible immediately and they must be synchronized as often as possible.
This can be achieved with versioning systems such as Git if you are working on code, or with Dropbox, Copy, Box, OwnCloud, ... for everything else.
If anything happens to my laptop, I can open a browser on another computer and access my files in less than a minute.
You think that's excessive? Imagine you are in a rush and you have only a few (dozen) minutes to print a paper, a Master thesis, the e-ticket for your flight in 3 hours, the PowerPoint presentation that begins in 10 minutes...

There's a rule of thumb in the storage world:
The more often the data needs to be accessed, the fastest the retrieval, and the higher the cost.

You should still save these files in the "slow backup system" because you shouldn't trust Dropbox and alike to have multiple copies of your files in several locations and they usually delete the old versions after a few months.

Continue to Part II

Saturday, January 24, 2015

Linux firewall and DLNA server


MediaTomb is a DLNA server on Linux that is great to stream movies and music from a NAS or any network storage to a TV or any compatible device.

The server magically appears on the TV, and from there you can browse the disk for media. Its one big flaw: there is no authentication. Anybody on the network can not only see that you have a DLNA server running, but also watch all your content.

There are many tutorials out there to explain you how to setup MediaTomb (which is simple thanks to its Web interface and one change in the XML configuration to enable some authentication on the web page).

What you don't find is how to prevent people from seeing the DLNA server and watching the content.
This can be done easily, supposing the IP address of the client (such as a TV) never changes.

Simply add the appropriate rules in Netfilter to allow the one client to access the server, and block traffic for everyone else:

In my configuration, the default policy for the INPUT chain is DROP. Rule number 3 allows anybody from the network to access services on the server (which is not that secure, but well...). You can look at the line numbers by typing iptables -vnL --line-number. (no s at the end)

To only let 172.16.0.123 to access MediaTomb:

iptables -I INPUT 3 -i eth1 -p udp --dport 1900 -d 172.16.0.123 -j ACCEPT
iptables -I INPUT 4 -i eth1 -p tcp --dport 1900 -d 172.16.0.123 -j ACCEPT
iptables -I INPUT 5 -i eth1 -p tcp --dport 1900 -j DROP
iptables -I INPUT 6 -i eth1 -p udp --dport 1900 -j DROP

You can do the same for the web configuration interface but I didn't bother because the username / password that can be set. I let this exercise for you. (The IP address will be your computer's).

Note : There is probably a way to specify a port both for TCP and UDP in the same rule, but I couldn't find it.
Also, eth1 is my LAN network interface. For this interface everything but DLNA is accepted. With the WAN interface, accepted traffic is an exception, so there was no need to write new rules.

DHCP subnet based on vendor in MAC address


As a network administrator you are probably doing some network segmentation, where you have internal servers in one subnet, IP phones in another, and so on.
You should probably use VLANs if you don't want these devices to "see" each other. But in other cases you only need to put them in a separate subnet and/or dynamically assign them particular IP addresses.

As you might know, half of the MAC address is dedicated to the hardware vendor. If you are running a DHCP server such as the ISC DHCP Server on *nix, you can have devices from one vendor use a particular subnet / IP range.

Edit /etc/dhcp/dhcpd.conf:

Watch a folder and send files by e-mail


My printer is able to convert faxes to PDF documents and save these documents on a network folder.
From there I could do many things. But I need to watch for new documents in a folder.

But first let's setup the Windows / Samba network folder.

Shared folder with Samba



The comment is shown in the file explorer. browseable tells if the folder is listed, and writeable is useful to write and (more interestingly in this case) delete files.
valid users can contain users or groups (prefixed with @) that have the permission to read and (if applicable) write to the folder.
create mask is required because the folder is meant to be shared to a group instead of belonging to a single user. Use 0750 if you only want the creator of the file to be able to remove the file.

I created a sambausers group on my machine and put the appropriate users in it.. Remember you need a system account for each user, and that you need to configure each account through smbpasswd. By the way you can configure Samba with a database or LDAP if you like.

You need to make a new directory and let the sambausers group own it and have read / write / execute (chmod g+rwx) on it. The execute permission on the folder is needed so you can create files in the folder.

Restart Samba. Check you can access the folder from your file manager with the appropriate credentials.

Watch directories on your file system

One great way to do this is the incron daemon. It lets you setup cron-like tasks to execute a shell script or any executable program whenever a change is detected in the folder.

Install incron on your system.

Login with a user with permissions to read files from the shared folder on your Linux box. You can sudo -i -u theUser.

Open man incrontab in one terminal window.
In another window, fire up incrontab -e to edit the tasks associated with the user.
I have mine configured with /srv/shr/IncomingFax IN_CREATE /home/myuser/somescript.sh
You can watch for other events, just read the manual you just opened!

Now everytime a file is created in the IncomingFax folder the script will be executed.

E-mail the file just added

Here is an example of the shell script that I use. It might not be the smartest way to do what it does (particularly because the information of the file added is lost in the process because of how incron works)


The script holds its "state" by using a ".lastfile" because it might happen that the script is executed several times with the same document. I don't know why it does that, I think it's an issue with the printer. You might not need to do that.
Also I happen to have a log file to log what happens with the script. You might not want that either. What you might like though is to verify the extension of the file. Note that the filesystem is case-sensitive so ".PDF" files won't be matched.

Make sure you have mutt installed. It's a bit complicated to send attachments. Sendmail is not enough. Note you can attach several files at once. The argument to the echo command is the message body, and what comes after "-s" is the message subject.
I purposedly let this example in French as a reminder that it is safer to avoid any other encoding than pure ASCII. There has to be a way to cope with UTF-8 but I didn't have time to investigate that issue.

This script has (at least) one limitation. I should actually rewrite it to make it more robust. I am assuming it takes less than 5 seconds for the printer to transfer the file. After 5 seconds I send the file by e-mail. There would be ways to know if the transfer is finished:
  • Use a program to read the file. That program probably knows if it's valid.
  • Wait until the file size has been constant for some time. Then we can assume the transfer is finished.
  • If there are a lot of transfers, we can assume the before-last file was completely transferred when we detect a new file.


There you go. We can now look at a folder, filter files by extension, and send the new file by e-mail.

Saturday, January 17, 2015

Secure e-mail and encrypted files


In the last article I explained that you should encrypt your traffic. The article was focused on web browsing.

With emails it's a another story.

Web traffic is like a phone conversation that can be tapped. Email is like regular mail. You don't even have a direct connection with the recipient.

Like regular mail, emails are "routed". You send them from your computer to your email provider, then your e-mail provider sends it to the mail server responsible for the recipient domain, and from there it may possibly be sent again to a mailbox server (e.g. IMAP server).

While it's always a good idea to send your e-mails using a secure SMTP connection, your email provider might not do so with the recipient email server.

Furthermore, while web traffic is volatile, emails are stored on hard drives and are more likely to contain sensitive information. It's very easy for say anyone with access to these servers (such as your boss, the email providers and the government) to look at your emails.

The solution to this problem can actually be used for almost anything and is not new technology at all.
This technology is Asymmetric cryptography. (It's asymmetric because the encryption and decryption keys are not the same. The decryption key can be sent to anyone and are usually published on opened websites. In fact they are not really encryption or decryption key, we'd better call them private and public keys.)

It can take two forms: PGP (with implementations such as GnuPG), the preferred solution,  or digital signatures / certificates (known as S/MIME).

I will list the advantages of using asymmetric cryptography, and a few drawbacks.

Pros


  • Only the sender and the recipient can look at the data. For people in the middle, it's just gibberish.
  • You encrypt the data for one recipient only. Even if someone else has the decryption key, it would be useless because the e-mail was encrypted with their public key so only them can decrypt it with their private key and your public key.
  • The recipient can verify the identity of the sender and know for sure only someone with the proper private key and the password to that key could have sent the email.
  • Same thing, but it's possible to verify while not encrypting if the data is not sensitive except you want to make sure you created it. It's useful for software so you don't download a counterfeit copy with security holes.
  • Can be used to encrypt anything, any sequence of bytes, including files or your whole home directory.

Cons

  • That you already knew: if you use a weak password for the private key or use it on more than one application / website, then you achieve no security.
  • The recipient needs to use PGP or S/MIME so that the message gets encrypted and they can decrypt / verify it. So your recipient needs to be as tech-savvy as much as you are.
  • Even though there are some solutions for mobile devices and webmails, the solutions are never as good as say the Enigmail extension for Thunderbird.
  • It's very easy to lost your private key but it's risky to make copies. See the problem?