Smart Home, IoT, Data and Analytics – Reading List

Smart homes and the Internet of Things
A smart home is one in which the various electric and electronic appliances are wired up to a central control system so they can either be switched on and off at certain times. Most homes already have a certain amount of “smartness” because many appliances already contain built-in sensors or electronic controllers. Virtually all modern washing machines have programmers that make them follow a distinct series of washes, rinses, and spins depending on how you set their various dials and knobs when you first switch on. If you have a natural-gas-powered central heating system, most likely you also have a thermostat on the wall that switches it on and off according to the room temperature, or an electronic programmer that activates it at certain times of day whether or not you’re in the house.
https://www.explainthatstuff.com/smart-home-automation.html

8 sensors to help you create a smart home
List of sensors that one can deploy in their home to help ensure that they are safe from the storm.
https://www.ibm.com/blogs/internet-of-things/sensors-smart-home/

Six Evolving Business Models for the Smart Home
Smart companies today understand that traditional models of one-time sell through of products, or standalone products that cannot be monitored intelligently, provide little or no opportunity to create a relationship with consumers and establish brand loyalty. They also realize that in today’s market, their products can serve as revenue generators with the data they gather based upon a homeowner’s energy usage, security connectivity and/or home entertainment preferences.
https://www.iotforall.com/business-models-for-smart-home/

The anatomy of an IoT solution: Oil, data and the humble washing machine
A lot of people think about data as the new gold, but a better analogy is data is the new oil. When oil comes out of the ground it is raw, it has intrinsic value but until that oil is refined into petrol and diesel, its true value is not gained. Data from sensors is very similar to oil. The data that comes from the sensor is raw, to gain insight from it, the data needs to be refined. Refining the data is at the heart of a successful Internet of Things project which leads to business growth and transformation.
https://www.ibm.com/blogs/internet-of-things/washing-iot-solution/

The best smart home systems: Top ecosystems explained
Apple, Google and Amazon are the major players in the smart home arena now, with their smart speakers, ecosystems and voice assistants on hand to not only make controlling your connected tech easier, but to make home automation a doddle.
https://www.the-ambient.com/guides/smart-home-ecosystems-152

How data analytics is adding value in the smart home?
Analytics can be expected to foster whole-home integration of various Internet of Things devices by increasing awareness across multiple facets of the home, from thermostats to door locks to refrigerators to solar panels. Having insight across the entire home can enable machine learning and artificial intelligence technologies to eventually create smarter, more intuitive systems that not only make consumers’ lives more convenient, but also play a role in smart cities and the larger digitalized grid.
https://www.smartcitiesdive.com/news/how-data-analytics-is-adding-value-in-the-smart-home/446406/

Navigating Smart Home Data Security Concerns
Smart homes are no longer science fiction. But many consumers are slow to adopt. They’re worried about smart home data security breaches—and rightly so.
https://www.iotforall.com/smart-home-data-security/

User Perceptions of Smart Home IoT Privacy
Smart home Internet of Things (IoT) devices are rapidly increasing in popularity, with more households including Internet-connected devices that continuously monitor user activities. Report that analyzes the smart home owners reasons for purchasing IoT devices, their perceptions of smart home privacy risks, and actions taken to protect their privacy from those external to the home who create, manage, track, or regulate IoT devices and/or their data.
https://arxiv.org/pdf/1802.08182.pdf

Advertisements

GPU Powered * : Accelerated computing for the future

GPUs are making a big buzz in the market. I am seeing the word GPU Powered as a norm. You can also see that with NVIDIA’s share price rising at a faster pace. Couple of months back, I had to look into the GPU world since there were multiple requests around doing data science using GPUs and multiple people has started asking me about it.

We all know that it is used in mobile devices for things like games. My curiosity started rising and some of the questions that came into my mind was how does a Graphic Processing unit relate to the application/database/analytics world? How is different from CPUs? Why everyone is suddenly talking about it?

Following are some of links that has helped me understand this world. Sharing my experience so far. You may want to start watching these videos and posts
What is a GPU and how does it work?
https://www.youtube.com/watch?v=0_TN845dxUU&t=1s>
What is GPU Accelerated computing?
http://www.nvidia.com/object/what-is-gpu-computing.html
CPUs and GPUs – There’s enough room for everyone
http://sqream.com/cpu-and-gpus-theres-room-for-everyone/

First things first. You will realize that there are two types of GPUs. Server Grade and Consumer Grade GPUs

Server-grade GPUs are designed so that they can be installed next to each other with no space separation, and fill all the available PCI slots available on the server, thus optimizing space usage and maximizing the amount of compute power per rack space unit. You can fit four Tesla K80 boards in a 1U server; that’s 8 GPUs total (K80 boards have 2 GPUs each), and that’s an impressive amount of compute throughput. The same applies to Tesla Pascal P100 models, with the due differences (one GPU per board). If you are building a supercomputer or a GPU-based server farm and buying hundreds or thousands of GPUs, these details matter a lot.

Consumer-grade GPUs typically have active cooling with a fan that ingests airflow orthogonally to the longitudinal axis of the board. That requires space clearance to accommodate the cooling airflow. That leads to less dense configurations than in servers. Typical consumers do not care because they have computer cases with more vertical development, less need for density, and most users only have one GPU card per host.

Via : Daniele Paolo Scarpazza, https://www.quora.com/What-is-the-different-between-gaming-GPU-vs-professional-graphics-programming-GPU

My interests are around Server grade GPUs.Where can I find a Server grade GPU to explore?
Use AWS – AWS has G2, P2 and F instances supporting. P2 generally fits in my ecosystem very well. Refer to this presentation Deep Dive on Amazon EC2 Instances (Slide 42 onwards). Google and Microsoft also provides similar instances. NVIDIA has a GPU Cloud, which is in Private Beta I believe.

How do I start Programming in GPUs?
To program in a GPU, NVIDIA has created a platform called CUDA 10 years back. One can also use OpenCL to program. But these are low level languages like “C”.
May not be for everyone. Are there higher level abstractions available? Yes. Let us start looking from Databases side.

Welcome to the world of GPU Powered Databases. MapD and Kinetica are very promising in this space and there are other databases like BlazingDB. Some of the benchmarks these guys are quoting are at the scale of 30X to 100X difference with a typical MPP database in the market.
How do you setup one of these databases? AMIs are available in AWS Marketplace. Use the Opensource one to play with (will work out cheaper). Try MapDs New York City taxi rides demo in your environment.  When I saw the demo first time, I was speechless for some time.

Watch these videos by Todd Mostak, the founder of MapD talking. Very insightful.
The Rise of the GPU: How GPUs Will Change the Way You Look at Big Data
https://www.youtube.com/watch?v=mwpd13urFog&t=25s
The Promise of GPU Analytics
https://www.youtube.com/watch?v=qxWSSz8x6NI
On a side note, it is also important why column stores work better in an in-memory world. I understand now that this can be used for powering databases (In-memory powered by CPUs).

What else can we use it for? How about Visualization? If the Visualization doesn’t support these kind of model, then you still will not get the interactivity what you are looking for.
I liked the way MAPD has done their visualization. Taking the power of GPUs at the consumer end and rendering it. Look for OPENGL, VEGA and D3.Js. You will be able to see how to use GPUs for Visualization.
http://developer.download.nvidia.com/presentations/2008/NVISION/NVISION08_MGPU.pdf
You may want to take a look at these JS libraries
https://stardustjs.github.io/
http://gpu.rocks/

What else can we do with GPUs? The real advantage of using GPUs will be when we leverage its power for Parallel Processing. Which means, this would be extremely useful for any complex Mathematical calculations etc. or scenarios like deep learning.
Today all the deep learning frameworks (Tensorflow or Deeplearning4J or H20) support GPU natively. All you need to do is install it on GPUs and establish the device mapping. Tensorflow would then take care of it automatically. Setting this up, is straight forward. This enables support not only for CPUs or GPUs but also for TPUs (Google latest one) or anything that comes in future.

To understand more watch this video: Effective TensorFlow for Non-Experts
https://www.youtube.com/watch?v=5DknTFbcGVM

BTW, I run Keras with Tensorflow as a backend and use Python (PyCharm) in my machine -:)

Conclusion
I believe that this is a great technology and that this is going to stay . In my opinion it is not CPU Vs GPU. It is GPGPU and it will be the way forward. MapD has raised 25M and Kinetica has raised 50M recently. This definitely shows the potential in this space. I also believe this model would bring back the Scale-up model from our current Scale-out model. Power consumption/Energy conservation is one definite plus, with reduced maintenance efforts. Though the cost of the server (P2 Instance vs T2 Instance) is high right-now, it is not apple to apple comparison. Also, the prices will come down over a period of time.

One thing which I liked the most in the GPU Powered database world is that, since you can run these queries and get sub-second performance, this will help you move out of all the Pre-computations, aggregations we used to in our analytics world. And this also doesn’t have that 32 concurrent user restriction a typical MPP database may have. You will see lot more traction in this area with all major companies moving in this direction and acquisitions.

As always, it’s great to play in the bleeding edge area. Get to learn new things on a regular basis.

Happy Learning!

Data Platform : Exploratory Data Analysis

Today, everyone talks about storing data in the raw format so that it can be analyzed and generate insights at a later point of time. Fantastic Idea.  Data Lakes just delivers that promise. However, the complexity of data is increasing day by day. And there are these new data sources that are getting added on a regular basis.

lakeswamp
Not every day you end up dealing with data sets which you are familiar with. Considering the kind of new type of data that gets added, most likely that one would end up dealing with data sets out of their comfort zone.
Data science teams spend most of their time with exploring and understanding data.

  • If you must deliver some quick insights on a set of data will you go through them manually to figure out or can we do something within the data lake that can be used?
  • What would be an easy way for the Data Science team to understand the data set quickly, understand patterns, relationships so that we could generate some hypothesis?

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to

  1. Maximize insight into a data set;
  2. Uncover underlying structure;
  3. Extract important variables;
  4. Detect outliers and anomalies;
  5. Test underlying assumptions;
  6. Develop parsimonious models; and
  7. Determine optimal factor settings.

Via : http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm
Exploratory Data Analysis : http://www.statgraphics.com/exploratory-data-analysis

If you google about exploratory data analysis, you will get tons and tons of material about doing EDA using R or Python.

Considering the shortened timelines, if you expect your data science teams to develop code to understand, you may not be able to deliver value at the speed in which your business is expecting results.

There are couple of tools which may help you understand your data faster. Google Data Profiling and you will get tons and tons of results on this topic. My favorite tools right now in this topic are

  1. Trifacta Wrangler
  2. Exploratory.io

Both are easy to use with a simple user interface. You can use the free version to get started. If you have an automated data pipeline using SPARK, you can also generate the profile statistics about the incoming data and store it as part of your Catalog.

I really like this presentation on this topic.. Data Profiling and Pipeline Processing with Spark.

Once you do this with Spark, you may want to update the data profile information and store it as part of your catalog. If you index your catalog with Elastic Search, you may be able to provide an API for your Data Science teams to search for the files with certain quality attributes etc.

The above tools will help you get a quick understanding of your data. But, what If you want pointers for analysis to get started about your data? Only a profiler will not help in this case. You may want to explore this product from IBM (yeah… you heard it right… it’s from IBM and I am using it daily). Check it out here… IBM Watson Analytics

Watson Analytics – is a SMART discovery service and it is super smart. It is available for $80 User/Month. For the value, you get out of it, $80 per month is really nothing.
You can use it for data exploration and predictive analytics and it is effortless. A free one month subscription is available for you to play with.predictive

I have looked around various products and I couldn’t find anything which is closer to what Watson offers. If i have to mention about a drawback, it doesn’t provide connectivity to S3. You may have to connect to Postgresql or Redshift to extract data.watsonconfig

If you can integrate it in your platform and use it effectively, you will be able to add value to your customers in literally no time.

Happy Learning!

Getting Started with Serverless Architecture

technology-1587673_960_720

Serverless Architecture is relatively very new. I’ve been exploring Serverless architecture for the new platform architecture off late. Though it is very interesting obviously there is a reasonable learning curve and I don’t see lot of best practices out there yet.

Everything looks green on the other side.. We will learn as we move forward..

Since, we use AWS as our cloud provider, most of the examples you will see are related to AWS Lambda.

Specific Reasons for exploring Serverless Architecture 

  1. No operating systems to choose, secure, patch, or manage.
  2. No servers to right size, monitor, or scale out.
  3. No risk to your cost by over-provisioning.
  4. No risk to your performance by under-provisioning.

https://d0.awsstatic.com/whitepapers/AWS_Serverless_Multi-Tier_Architectures.pdf

One thing i learnt in the last few years about developing distributed applications is that, it is not about learning new things… it is always about unlearning what you have done in the past.

If you are specific about Vendor lock-in then this may not be a choice at all for you…

Following is my reading list on Serverless Architecture.

What is Serverless?
https://auth0.com/blog/what-is-serverless/

Serverless Architectures
http://martinfowler.com/articles/serverless.html

What is Serverless Computing and Why is it Important?
https://www.iron.io/what-is-serverless-computing/

Serverless Architecture in short
https://specify.io/concepts/serverless-architecture

Is “Serverless” architecture just a finely-grained rebranding of PaaS?
http://www.ben-morris.com/is-serverless-architecture-just-a-finely-grained-rebranding-of-paas/

IAAS, PAAS, Serverless.
https://read.acloud.guru/iaas-paas-serverless-the-next-big-deal-in-cloud-computing-34b8198c98a2#.m9us1c5fe

Serverless Delivery: Architecture
https://stelligent.com/2016/03/17/serverless-delivery-architecture-part-1/

Principles of Serverless Architectures
There are five principles of serverless architecture that describe how an ideal serverless system should be built. Use these principles to help guide your decisions when you create serverless architecture.
1. Use a compute service to execute code on demand (no servers)
2. Write single-purpose stateless functions
3. Design push-based, event-driven pipelines
4. Create thicker, more powerful front ends
5. Embrace third-party services
https://dzone.com/articles/serverless-architectures-on-aws

Serverless Architectures – Building a Serverless system to solve a problem
https://serverless.zone/serverless-architectures-9e23af71097a#.j9z60nxw1

Serverless architecture: Driving toward autonomous operations
https://www.slalom.com/thinking/serverless-architecture

Serverless Developers
https://serverless-developers.com/

The essential guide to Serverless technologies and architectures
http://techbeacon.com/essential-guide-serverless-technologies-architectures

Using AWS Lambda and API Gateway to create a serverless schedule
https://www.import.io/post/using-amazon-lambda-and-api-gateway/

Five Reasons to Consider Amazon API Gateway for Your Next Microservices Project
http://thenewstack.io/five-reasons-to-consider-amazon-api-gateway-for-your-next-microservices-project/

AWS Lambda and the Evolution of the Cloud
https://blog.fugue.co/2016-01-31-aws-lambda-and-the-evolution-of-the-cloud.html

SquirrelBin: A Serverless Microservice Using AWS Lambda
https://aws.amazon.com/blogs/compute/the-squirrelbin-architecture-a-serverless-microservice-using-aws-lambda/

A Crash Course in Amazon Serverless Architecture
http://cloudacademy.com/blog/amazon-serverless-api-gateway-lambda-cloudfront-s3/
­
AWS Lambda and Endless Serverless Possibilities
https://abhishek-tiwari.com/post/aws-lambda-and-endless-serverless-possibilities

Awesome Serverless – A Curated List
https://github.com/JustServerless/awesome-serverless

Happy Learning!

Data Infrastructure, Data Pipeline and Analytics – Reading List – Sep 27, 2016

Splunk vs ELK: The Log Management Tools Decision Making Guide
Much like promises made by politicians during an election campaign, production environments produce massive files filled with endless lines of text in the form of log files. Unlike election periods, they’re doing it all year around, with multiple GBs of unstructured plain text data generated each day.
http://blog.takipi.com/splunk-vs-elk-the-log-management-tools-decision-making-guide/

Building a Modern Bank Backend
https://monzo.com/blog/2016/09/19/building-a-modern-bank-backend/

An awesome list of Micro Services Architecture related principles and technologies.
https://github.com/mfornos/awesome-microservices#api-gateways–edge-services

Stream-based Architecture
Part of the Stream Architecture Book. An excellent overview on the topic.
https://www.mapr.com/ebooks/streaming-architecture/chapter-02-stream-based-architecture.html

The Hardest Part About Micro services: Your Data
Of the reasons we attempt a micro services architecture, chief among them is allowing your teams to be able to work on different parts of the system at different speeds with minimal impact across teams. So we want teams to be autonomous, capable of making decisions about how to best implement and operate their services, and free to make changes as quickly as the business may desire. If we have our teams organized to do this, then the reflection in our systems architecture will begin to evolve into something that looks like micro services.
http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/

New Ways to Discover and Use Alexa Skills
Alexa, Amazon’s cloud-based voice service, powers voice experiences on millions of devices, including Amazon Echo and Echo Dot, Amazon Tap, Amazon Fire TV devices, and devices like Triby that use the Alexa Voice Service. One year ago, Amazon opened up Alexa to developers, enabling you to build Alexa skills with the Alexa Skills Kit and integrate Alexa into your own products with the Alexa Voice Service.
http://www.allthingsdistributed.com/2016/06/new-ways-to-discover-and-use-alexa-skills.html

Happy Learning!