Monthly Archives: April, 2016

A manual backup routine using AWS

This post is also slightly off topic – not a data announcement, workshop, video, etc.  But it does contain one specific instance of something that everyone should be thinking about – data backup.  Everyone knows the rule of three – keep at least three backups of your precious files and make sure at least one of them is offsite in case of disaster.

I needed to develop a new routine for my home computer backup after deciding to seize control of my system back from SpiderOak.  I had been using that for a while, but then upgraded to SpiderOak One, and my incremental backups seemed to take forever, with the SpiderOak process constantly using lots of CPU and seemingly not accomplishing much.  [This is all on Linux as usual].  I realized that I understood very little of what the client was actually doing, and since the client was unresponsive, could no longer rely on it to actually be able to backup and retrieve my files.  I decided to go completely manual so that I would know exactly what my backup status was and what was happening.

Part 0 of my personal rule of three is that all of my family’s home machines get an rsync run periodically (i.e., whenever I remember) to back up their contents to the main home server.

Part 1 is a local backup to an internal hard drive on the server.  I leave this drive unmounted most of the time, then mount it and rsync the main drive to it.  The total file size is about 600 GB right now, partly because I do not really de-dupe anything or get rid of old stuff.  Also, I don’t have a high volume of video files to worry about at this point.

Part 2 is a similar rsync backup to a portable hard drive [encrypted].  I have two drives that I swap and carry back and forth to work every couple of weeks or so.  I have decided that I don’t really like frequent automated backup, because I’d be more worried about spreading a problem like accidental deletion of files, or a virus, before the problem is discovered.  I can live with the loss of a couple of weeks of my “machine learning” if disaster truly strikes.

But what about Part 3?  I wanted to go really offsite, and not pay a great deal for the privilege.  I have grown more comfortable with AWS as I learn more about it, and so after some trial and error, devised this scheme…

On the server, I tar and zip my files, then encrypt them, taking checksums along the way

tar -cvf mystuff.tar /home/mystuff

bzip mystuff.tar

sha256sum > mystuffsha

gpg -c –sign mystuff.tar.bz2

sha256sum mystuff.tar.bz2.gpg > mystuffgpgpsha

This takes some time to run, and generates some big files, but it is doable.  I actually do this in three parts because I have three main directories on my system.

Then we need to get it up to the cloud.  Here is where file transfer really slows down.  I guess it is around 6 days total wait time for all of my files to transfer, although I do it in pieces.  The files need to be small enough that a breakdown in the process will not lose too much work, but large enough so that you don’t have thousands of files to keep track of.  I do this to split the data into 2GB chunks:

split -b 2147483648 mystuff.tar.bz2.gpg

Now we have to upload it.  I want to get the data into AWS Glacier since it is cheap, and this is a backup just for emergencies.  Now Glacier does have a direct command line interface, but it requires the use of long IDs and is just fussy in terms of accepting slow uploads over a home cable modem.  Fortunately, getting data into S3 is easier and more reliable.  And, S3 allows you to set a file policy that will allow you to automatically transfer your data from S3 to Glacier after a set amount of time.  So the extra cost you incur for say, letting your data sit in S3 for a day, is really pretty small.  I guess you could do this with regular expressions, but I just have a long shell file with each of my S3 commands on a separate line.  This requires you to install the Amazon CLI on your system.

aws s3 cp xaa s3://your_unique_bucket_name_here

aws s3 cp xab s3://your_unique_bucket_name_here

I just run that with a simple shell command that dumps any messages to a file.

sh -xv > special_output

And, voila…days later your files will be in the cloud.  You can set a hosting zone that will put the files on the other side of the planet from you if you think that will be more reliable.

To bring the files back down, you must request through the AWS interface for the files to be brought back from Glacier to S3, then download from S3, then use “cat” to fuse them together, and in general reverse all the other steps to decrypt, untar, checksum and such.  It worked for me on small scale tests, but I guess I should try it on my entire archive at least once to make sure this really works.

At least with this method, I know exactly what is in the cloud, how and when it got there, and how to get it back.  And it looks like it will only run me about $6 a month.

Data Visualization and R

Well, it has been a long time in coming, but I have finally finished converting my Data Visualization workshop series to a screencast video version.  See this YouTube playlist for the complete series, and the materials at Github.  This is the long version of the in-person 3 hour workshop.  The video series goes into even more detail, starting from a history of major developments in visualization, to various implementations of specific graphs, interactive visualizations, web viz, big data, and more.

I also have some ideas for some more up-to-date add-ins that I will probably record as lagniappe videos over the next few weeks.  Those didn’t quite fit into the existing sequence of videos.

The energy to complete these videos came from several musical sources, of which I would credit Harmogu and Linton Kwesi Johnson as leading lights.

Installing Debian/XFCE (Linux) on Dell XPS 13 9350

Well, this is off topic for the theme of the blog, but I felt the urge to record an expurgated version of my recent installation of Linux on a new Dell XPS 13 (9350), for the potential edification of the populace.  I will try to make it brief, by my standards 🙂  [but I guess I failed..]  This is not in any way an objective post, but I am just blowing off steam and letting my opinions fly.  Please skip over if you are looking for actual educational material…

As tweeted earlier, a Pi day discount tempted me into the purchase of a 13″ Dell XPS (model 9350).  I had been following Dell primarily due to their Linux support on via the Developer Series, and had been tempted on many occasions in the past.

I should also mention that dating back to the fin-de-siècle, I have been a Linux user.  Although I have occasionally strayed away, for most of the time, Linux computers have formed the core of my computing infrastructure.  In Linux, I can do what I want to do, rather than simply obey the instructions of other OSes.  In recent years, I converted from the Fedora sect to become a Debian adherent, and I have been very satisfied with that choice.

Still, the computer I bought was NOT a Developer’s Edition, but a new Windows 10 machine.  In the past, I have usually put Linux onto either very standard or slightly older hardware, and didn’t have FEAR that it would not work.  My working laptop recently has been a leftover 2010 Macbook Pro running Debian only, and it has no real issues, but it is running hot and noisy.   Since this XPS laptop was brand new with the latest technology, I have to confess that for a moment or two I even considered leaving Windows on the machine and using it in dual boot mode.  But two minutes in Windows 10 erased any of those doubts.  Seriously, why would anyone voluntarily remain in that depressing environment if they had the possibility of escape?

So, I committed to putting Debian on the machine as its sole operating system, and began Googling to get ready.  I learned that the Dell-rebranded Broadcom wireless card was not being supported, except in bleeding edge kernels, and was not very good anyway.  And also that Intel wireless cards worked easily with Linux.  Thanks to Dell, because they put a wonderful service manual online and don’t mind users operating on their own hardware (unlike the fruit-themed gadget company).  I ordered an Intel Wireless card.  Due to a bit of carelessness, I picked a 9260 instead of the 9265 model.  The 9265 is supported natively in the kernel, whereas the 9260 requires a download of Intel proprietary drivers [more on this later].  But, in spite of my nerves, popping open and disassembling my brand new laptop was a piece of cake, and it went back together just as good as new.  I am liking Dell from the hardware perspective.

Then, I prepared a USB boot stick to install Debian 8 (Jessie).  So, I had to fidget about a bit with the UEFI/BIOS settings to get the Dell to want to boot from the stick, but eventually made it happen.  Then I went through a couple of abortive installation attempts because of the aforementioned wireless drivers, which needed to be loaded from a second USB stick.  I am sorry I can’t really document it completely here, but only more fiddling until I found the right combination that would recognize one USB as the boot media and one USB as the supplemental driver files allowed me to proceed.  During that phase, I began to worry that I had gone too far by buying a slim fancy device without an ethernet port, but I survived.  I would still lean towards getting a computer with a real ethernet port in the future though, just for safety.  Turns out the Linux drivers for the USB-C to Ethernet are reported to be fussy too.

On to the next complication… Did I mention that I not only like Linux and Debian, but prefer XFCE as my desktop?  Because I am old school and don’t care for eye candy at all.  It was the horrible broken experience of GNOME 3 that drove me away from Fedora.  I just want a desktop that stays out of my way and does the work (is that some antiquated colonialist mindset? perhaps, but I think it is still OK to exploit a computer, right?).  Anyway, XFCE has been my go-to for the last 4-5 years.  I respect LinuxMint/Cinammon too for their attempt to correct the awful GNOME decisions that were forced on unsuspecting users.  But XFCE has done the job for me.  So, I was willing to work overtime to get XFCE as my desktop.

Now, I have done Debian/XFCE installs on a number of desktops, and my Macbook too.  But somehow, the Debian 8 XFCE install (at the time of writing) had one major issue.  I finished the installation, but could only login to XFCE desktop as root, not as a user.  There was some kind of weird permissions issue, or some problem with the install scripts.  I am experienced in Linux, but not expert, so extensive Googling on this topic failed to resolve the issue.  What did work was to do a standard Debian install with Cinammon as the default desktop.  And only after that was working did I install XFCE.  That worked like a charm.  Hopefully someone who has more knowledge of what could cause this would fix it for the young generations.

I also had to try a few different configurations before getting my preferred configuration of encrypted hard drive and swap space, while leaving a bit of open /boot directory.  I wish there was a better-documented path for this too.  Somehow encryption is still considered to be a slightly exotic option, when it shouldn’t be.

Ok, so now I am excited because I have a working Debian/XFCE install on my new laptop.  My hand-installed wireless is working, and everything is looking up.  BUT, I have NO SOUND, and it appears that the lack of sound is also causing any standard (e.g., YouTube) videos to play too fast.  I take a deep breath.  More googling reveals that this is an issue with the Dell 9350 model’s audio, and that future kernels will handle it.  But my Debian 8 kernel does not handle it, and I cannot use my expensive laptop to watch my favorite YouTube videos!!!!  I use all of my experience in “taming my dog of desire” to reconcile myself to the situation.  I can use my laptop as a wonderful distraction-free zone to code and write wonderful things.  What do I need sound for?  After all, Plato and Muhammad both condemned music.  Yes, what do I need sound for?  Ok, I will live without sound on my laptop 😦

But, after getting everything else configured to my liking, I was ready to keep experimenting.  Is that not the whole point of Linux?  To experiment, to control your own working environment?   Not to blindly obey when a popup window says, “You must update now”, or “You must click ‘accept’ to continue”, or “Operation not permitted”.  Right, this is Linux, so let’s go!

In practice, what that meant was that I attempted a full upgrade from Debian 8 (jessie) to Debian 9 (stretch), even though 9 is not yet stable.  What was my motivation?  Well, to confess, at least 90% of the motivation was to get that audio working.  Because if this is the kind of world where we can’t listen to music on our laptops, is that the kind of world that we want to live in?  We want our music, and we want control of our computers too!

Now, the Debian instructions are very clear and very good, and after editing my apt sources, I was able to update and upgrade, and within a very, very short period of time, had my entire OS running a very current set of applications with crystal clear audio and video.  I now have pretty close to my full suite of applications (R, RStudio, Mathematica, Claws-mail, LaTeX, LyX, Gummi, all of the old favorites…).  Now I am content!  I then had to go and customize my desktop settings and browser to a very dark theme and plaster my laptop with some stickers to make it seem more like my own.  Too much, probably, but it is a small thing that makes me happy 🙂  My family says I’m crazy, but I get that all the time anyway…

20160413_174415 20160413_174124 20160413_174104 20160413_174046 20160413_174014