Sunday, June 30, 2013

AWS Cost Saving Tip 12: Add Spot Instances with Amazon EMR [feedly]

Now this looks very promising. background processing is something that I don't want to spend a penny more than is needed and certain parts of the infrastructure fit this use case quite nicely. 

Keeping cloud computing costs low while delivering the best performance is a key part to my role and any article or tip around this area is always welcome. 

This feed is becoming very useful to me indeed!
 
 
Shared via feedly // published on Cloud, Big Data and Mobile // visit site
AWS Cost Saving Tip 12: Add Spot Instances with Amazon EMR
In continuation to my post on "How elastic thinking can save costs on Amazon EMR cluster ?" i have explored in this post how we can exploit Amazon EMR by introducing Spot EC2 into the cluster and achieve more cost savings.

Most of us know that Amazon Spot EC2 instances are usually good choice for Time-flexible and interruption-tolerant tasks. These instances gets traded frequently on a Spot market price and you can fix your Bid Price using AWS API's or AWS Console. Once free Spot EC2 instances are available for your Bid Price, AWS will allot them for use in your account. Spot instances are usually available way cheaper than On-Demand EC2 instances most of the times. Example: On-Demand m1.xlarge per hour price is 0.48 USD and on spot market you can find them sometimes @ 0.052 per hour. This is ~9 times cheaper than the on-demand price; imagine if you can bid competitively and get hold of spot EC2 even around 0.24 USD most of the times, you are saving 50% from the on-demand price straight away. In Big data use cases usually you might need lots of EC2 nodes for processing, adopting such techniques can vastly make difference in your infra cost and operations in long term. I am sharing my experience on this subject as tips and techniques you can adopt to save costs while using EMR clusters in Amazon for big data problems. 
Note : While dealing with spot you can be sure that you will never pay more than your maximum bid price per hour.

To know more about real implementation of these tips, read the following case study. Lock, Stock and X Smoking EC2's. 

Tip 1: Make right choice (Spot vs On-Demand) for the cluster components
Data Critical workloads: For workloads which cannot afford to lose data you can have the Master + Core on Amazon On-Demand EC2 and your task nodes on Spot EC2. This is the most common pattern while combining Spot and On-Demand on Amazon EMR cluster. Since task nodes are operating on spot prices depending upon your bidding strategy you can save ~50% costs from running your task nodes using On-Demand EC2. You can further save(if you are lucky) by reserving your Core and Master Nodes , but you will be tied to an AZ. According to me this is not a good or common technique, because some AZ's can be very noisy with high spot prices.
Cost Driven workloads: When solving big data problems, sometimes you might have to face scenarios where cost is very important than time. Example: You are processing archives of old logs as low priority jobs, where cost of processing is very important and usually with abundant time left. Such cases you can have all the Master+Core+Task run on Spot EC2 to get further savings from the data critical workloads approach. Since all the nodes are operating on spot prices depending upon your bidding strategy you can save ~60% or more costs from running your nodes using On-Demand EC2. The below mentioned table published by AWS gives an indication of the Amazon EMR + Spot combinations that are widely used:
Tip 2: There is free lunch sometimes
Spot Instances can be interrupted by AWS when the spot price reaches your bidding price. What interruption means is that, AWS can pull out the Spot EC2's assigned to your account when the price matches/exceeds. If your Spot Task Nodes are interrupted you will not be charged for any partial hour of usage by AWS i.e. if you have started the instance @ 10:05 am and if your instances are interrupted by spot price fluctuations @ 10:45 am you will not be charged for the partial hour of usage. If your processing exercise is totally time insensitive, you can keep your bidding price at closer level to spot price which are easily interrupt-able by AWS and exploit this partial hours concept. Theoretically you can get most of the processing done through your task nodes for free* exploiting this strategy.

Tip 3: Use the AZ wisely when it comes to spot
Different AZ's inside an Amazon EC2 region has different spot prices for the same Instance type. Observe this pattern for a while, build some intelligence around the price data collected and rebuild your cluster in the AZ with lowest price. Since the Master+Core+Task need to run on the same AZ for better latency, it is advisable to architect your EMR clusters in such a way they can be switched(i.e.recreate) to different AZ's according to spot prices. If you can build this flexibility in your architecture you can save costs by leveraging the Inter AZ price fluctuations. Refer the below images for Spot Price variations in 2 AZ's inside the same Region for same time period. "Make your choice wisely time to time"


Tip 4: Keep your Job logic small and store intermediate outputs in S3
Breakdown your complex processing logic into small jobs and design your jobs and tasks in EMR cluster in such a way that they run for very small period of time (example few minutes). Store all the intermediate job outputs in Amazon S3. This approach is helpful in EMR world and gives you following benefits:

  • When your Core+ Task nodes are interrupted frequently, you can still continue from the intermediate points. Data accessed from S3.
  • You now have the flexibility to recreate the EMR clusters in multiple AZ depending upon the Spot price fluctuations
  • You can decide the number of nodes needed for your EMR cluster(even every hour) depending upon the data volume, density and velocity

All the above 3 points when implemented contribute to elasticity in your architecture and there by helps you save costs in Amazon cloud. The above recommendation is not suitable for all Jobs, it has to be carefully mapped with right use cases by the architects.

To know more about real implementation of the above tips, read the following case study. Lock, Stock and X Smoking EC2's. 

Other Tips

Cost Saving Tip 1: Amazon SQS Long Polling and Batch requests
Cost Saving Tip 2: How right search technology choice saves cost in AWS ?
Cost Saving Tip 3: Using Amazon CloudFront Price Class to minimize costs
Cost Saving Tip 4 : Right Sizing Amazon ElastiCache Cluster
Cost Saving Tip 5: How Amazon Auto Scaling can save costs ?
Cost Saving Tip 6: Amazon Auto Scaling Termination policy and savings
Cost Saving Tip 7: Use Amazon S3 Object Expiration
Cost Saving Tip 8: Use Amazon S3 Reduced Redundancy Storage  (new)
Cost Saving Tip 9: Have efficient EBS Snapshots Retention strategy in place (new)
Cost Saving Top 10: Make right choice between PIOPS vs Std EBS volumes and save costs (new)
Cost Saving Top 11: How elastic thinking saves cost in Amazon EMR Clusters ? (new)

Go! - A Google Reader Replacement [feedly]


Now this is interesting not so much because its an open source google reader replacement but that I think it's written in Go a language that a friend and ex colleague introduced me to a while ago.

When I was introduced to Go it was way faster in my friends benchmarks than both Ruby on Rails and Microsoft MVC. 

Could be worth a look at just in terms of a reading and learning excercise.
 
Shared via feedly // published on you've been HAACKED // visit site
A Google Reader Replacement

Google is shuttering Google Reader in a little over a day (on July 1st, 2013) as I write this. If you use Google Reader to read my blog, this means you might miss out on my posts and I KNOW YOU DON'T WANT THIS!

Then again, maybe this is finally your chance to make a break, get some fresh air, stop reading blogs and start creating! I won't hold it against you.

But for the rest of you, it's a good time to find a replacement. Or at the very least follow me on Twitter since I do tweet when I blog.

There's a lot of Google Reader replacements out there, but only two that I like so far.

Feedly

feedly

Feedly is gorgeous. There are apps for many platforms, but the browser works pretty well. Also, you can use Google to log into it and import your Google Reader feeds. I hope Google allows exporting to Feedly and other aggregators after July 1st even as they close down the Google Reader site.

The problem I have with Feedly is that it doesn't work like Google Reader. It wouldn't be so bad if it had a better flow for reading items, but I find its interface to be quirky and in some cases, unintuitive. For example, it seems I have to mark items as read by clicking "mark above articles as read" rather than having it do it automatically like Reader does after you scroll past it.

This leads me to…

Go Read

go-read

Go Read is a late entry into the list, but there are three important things I really like about it:

  1. It is intended to be a clean and simple clone of Google Reader.
  2. It supports Google Reader's keyboard shortcuts.
  3. It is open source and up on GitHub!

For some more details, check out the announcement blog post by the author, Matt Jibson, a developer at Stack Exchange:

I would like to announce the release of Go Read. It as a Google Reader clone, and designed to be close to its simplicity and cleanliness. I wanted to build something as close to Google Reader as made sense for one person to build in a few months.

It's basically Google Reader, but without all the cruft and where you can send pull requests to improve things!

In fact, there's already a few pull requests with some nice user interface polish that should hopefully make it into the site soon.

Despite some false starts, I have it up and running on my machine. I sent a few pull requests to update the README to help other clueless folks like me get it set up for hacking on.

So check it out, import your Google Reader feeds, and never miss out on another Haacked.com post EVER!


Saturday, June 29, 2013

Mr. Reader is a Power User's RSS App, Now with Feedly Support and More [feedly]

Up until the google reader shutdown I was happily using reeder on Mac OSx, iPhone and iPad but for now only the iPhone version supports feedbin which along with feedly have my entire opml RSS feed list for syncing so I've been experimenting with the feedly client for iOS and reeder for iPhone.

So now this app has turned up on the scene and has been getting some pretty good reviews.

For now I will stick with the ones I have but this is on radar for a potential replacement if the reeder app doesn't not come to Mac OSx or iPad soon.

RSS isn't dead and google have opened up plenty of opportunity for developers to innovate and replace this much missed service.  
 
 
Shared via feedly // published on Lifehacker // visit site
Mr. Reader is a Power User's RSS App, Now with Feedly Support and More

Mr. Reader is a Power User's RSS App, Now with Feedly Support and More

iPad: The looming shutdown of Google Reader is a great opportunity to look at new RSS apps to go along with your new syncing service of choice. If you like to keep up with news on your iPad, nothing matches the powerful options available in Mr. Reader.

Read more...

    



Marc Edwards' app design workflow [feedly]

A post that's kicking the tyres of the email in to blogger functionality along with testing out one of two google reader replacements. This being feedly and the other feedbin. 

And this looks like a promising read as well.
 
 
Shared via feedly // published on iMore - The #1 iPhone, iPad, and iPod touch blog // visit site
Marc Edwards' app design workflow

Marc Edwards' app design workflow

Don't know how I missed this. Marc Edwards, my co-host on Iterate is not only one of the best designers on the planet, but one of the most generous, and on top of all the articles and scripts he's already shared, he's now gone and posted his entire app design workflow on Bjango.com:

Here it is — my complete iOS, Android and Mac app design workflow, starting from the first time you open Photoshop, to the app release and beyond. Now seemed like a good time to document how I've been working, because my workflow is about to drastically change again, with the release of Skala.

Perhaps with iOS 7 as well? I'm really looking forward to seeing how Marc updates the Bjango apps, and if -- and how -- his workflow evolves. In the meantime, if you're interested in app design, check out how one of the best in the business goes about practicing his craft.

More: Bjango.com:

    



How to download photos from Dropbox directly to your iPhone or iPad camera roll [feedly]

This is something I wanted to do the other day and was too busy to work it out for myself so this has arrived at a good time. 
Shared via feedly // published on iMore - The #1 iPhone, iPad, and iPod touch blog // visit site
How to download photos from Dropbox directly to your iPhone or iPad camera roll

How to download photos from Dropbox directly to your iPhone or iPad camera roll

If you use to store photos and save space on your iPhone or iPad, there may come a time when you want to share those photos to a social network or with a friend. In order to do so, you'll most likely have to save them to your camera roll first.

As it happens, Dropbox gives you an easy way to do this. Here's how:

  1. Launch the Dropbox app from the Home screen of your iPhone or iPad.
  2. Find the photo that you'd like to download to your camera roll in your Dropbox app.
  3. Click on the Download button in the lower right hand corner.
  4. Now tap on the option for Save to Photo Library.
  5. The photo will export directly to your iPhone or iPad camera roll.

Once the export is done you can hop right into your Photos app and upload it to whatever service you'd like.

    



From twit to tweet: How Twitterrific helped Twitter get its verb - and bird - on [feedly]


Posting via feedly  
Shared via feedly // published on iMore - The #1 iPhone, iPad, and iPod touch blog // visit site
From twit to tweet: How Twitterrific helped Twitter get its verb - and bird - on

Last week the word "tweet" was added to the Oxford English Dictionary. Craig Hockenberry, a principle of the Iconfactory, co-creator of Twitterrific, and iMore hall of famer, gave some background as to its origins on his blog, Furbo.org:

It still feels strange to hear a word I helped create be mentioned over and over again in the media. It's a great word to go along with a great service, and in the end, I'm just happy we're not calling each other twits!

More than just the word "tweet", the Iconfactory and Twitterrific are responsible for the bird and a remarkable amount of Twitter's common branding and popular identity. The whole story is charming and enlightening, and a rare glimpse back at the very incommon beginnings of something that now seems so commonplace.

More: Furbo.org

    



Saturday, June 15, 2013

Mac OSX Mountain Lion Time Machine Backup to NAS slow over Ethernet

Now this has happened to me enough times that it needs to be blogged about so that when I suffer with it again I can come back to these hacks to try and resolve it again.

So on Friday evening I started off my Time Machine back up and today Saturday evening it had only backed up 1.5GB of the 13GB I wanted it to.

24 hours should be plenty to back this amount of data up!

My setup is as follows:
  • MackBook Pro connected via ethernet (WIFI connection good but ethernet better for backups)
  • Western Digital MyBookLive NAS connected via ethernet.
  • All up to date in terms of Mountain Lion updates and MyBookLive firmware
macbook specs

So in order to fix this I tried the following, one of these worked. No idea which one although I suspect the reboot into safe mode then restart is the magic here.
  • Open a command prompt and type without quotes "sudo tmdiagnose" and wait for it to complete.
  • In a web browser open up http://mybooklive.local and login to the western digital drive.
  • Turn off the twonky media service if you don't use it.
  • Turn off remote access if you do not need it.
  • Turn off the energy saver checkbox so the hard disk does not go to sleep.
  • Reboot the western digital drive just in case. if in doubt turn things off then back on again.
  • Ensure your mac is not indexing the local disk by checking spotlight. if it is wait for it to finish.
  • And now I think the magic. shutdown your mac via the apple logo then after a minute....
  • Turn on the mac holding down the shift key so that it boots into safe mode.
  • Stare at the screen for a while then reboot your mac.
  • Start time machine backup again. 
And then to my relief my time machine backup over ethernet process zoomed along and I saw this screen within about an hour which is much more acceptable.

time-machine-cleaning-up

Now all I need to work out is to back up my Western Digital NAS to Amazon's Glacier storage. I think I'll leave that for another weekend...

Monday, June 03, 2013

Aaronontheweb | What Do You Need to Become an Elite Developer?

What Do You Need to Become an Elite Developer?, What character traits does an elite developer have?

Aaron from MarkedUp makes some good points here.

Aaronontheweb | What Do You Need to Become an Elite Developer?:

'via Blog this'