Functional programming, Clojure, Hiring Developers and some learnings…

I have been helping a good friend of mine to hire Clojure developers for the last couple of months. While doing so, I also started learning a thing or two about FP and Clojure.
Thought of sharing my experience in getting started with FP and Clojure.
One of the questions that comes frequently while I speak to my friends is that Why functional programming? Though many people have tried answering this in various shapes and forms, I couldn’t find an answer which has 10 bullets explaining why functional programming and why it is different?

The first place you may want to start with is Uncle Bob’s NDC 2014 session on  Functional Programming; What? Why? When?
https://www.youtube.com/watch?v=7Zlp9rKHGD4
Couple of posts/videos that I thought worth reading
https://www.javaworld.com/article/3314640/java-101-functional-programming-for-java-developers-part-1.html
https://medium.com/@ertu.ctn/why-clojure-seriously-why-9f5e6f24dc29
https://www.intertech.com/Blog/why-functional-programming-is-on-the-rise-again/
https://clojure.org/about/rationale
https://www.quora.com/What-is-so-great-about-functional-programming-What-are-the-main-points-of-it-and-why-are-they-useful

How do I get started?
If you are like me who is lazy to setup anything in your local machine, use https://repl.it/ .My son and myself, we both are using repl to learn programming these days 🙂

Disclaimer: I have read only a few chapters in all these books. Just too many things to cover these days 😦

So far, it’s not very easy for me. It’s definitely new and it’s a lot of unlearning. I am taking my time to get used to this style of programming. May be I should be able to write few lines of my code on my own without any help soon.

I saw someone talking about this here. https://medium.com/@cscalfani/so-you-want-to-be-a-functional-programmer-part-1-1f15e387e536

If there are so many good things about Functional Programming and the salary is literally 2 or 3 times of a Java developer why do we not see enough Functional programming developers in India/elsewhere?
Let me use the analogy of agile and scrum in this context. Scrum is not new and we all at least pretend to know what agile is. However if you want to communicate to someone, you still prepare an Milestone chart and conveniently call this milestone chart as Release Plans.
Why does this happen? Is this a capability issue. IMO, it’s more of a mindset issue. Your Management/Leadership/VP of XYZ/Sales team wants it in a Milestone format, because that’s what they understand.
I believe you could relate the same here as well. If you try to do everything stateless, your tech lead/architect may start saying that either you don’t know programming or your program uses lot of memory than the other developer. Sorry state… But unless the senior members understand FP, I believe the adoption will be very slow and this might be the universal syndrome. For the senior members to move, its lot of unlearning. Like Waterfall is in our blood, Imperative programming style is also in our blood

Couple of years back, i was involved in developing a data platform and we used Scala as our language for data processing. It took us a while to get there, we made significant progress over the time. The moment, i moved out of the product, they moved to python stating resource unavailability. At the end its a mindset 🙂

In this process, I have met few guys who have been developing FP style applications for some time and are training others. Happy to see a small community!

Some useful references
https://github.com/ClojureBridge/curriculum
https://www.seas.upenn.edu/~cis194/fall16/lectures/01-intro.html
https://devcenter.heroku.com/articles/clojure-web-application

Happy Learning!

Advertisements

Getting Started with Serverless Architecture

technology-1587673_960_720

Serverless Architecture is relatively very new. I’ve been exploring Serverless architecture for the new platform architecture off late. Though it is very interesting obviously there is a reasonable learning curve and I don’t see lot of best practices out there yet.

Everything looks green on the other side.. We will learn as we move forward..

Since, we use AWS as our cloud provider, most of the examples you will see are related to AWS Lambda.

Specific Reasons for exploring Serverless Architecture 

  1. No operating systems to choose, secure, patch, or manage.
  2. No servers to right size, monitor, or scale out.
  3. No risk to your cost by over-provisioning.
  4. No risk to your performance by under-provisioning.

https://d0.awsstatic.com/whitepapers/AWS_Serverless_Multi-Tier_Architectures.pdf

One thing i learnt in the last few years about developing distributed applications is that, it is not about learning new things… it is always about unlearning what you have done in the past.

If you are specific about Vendor lock-in then this may not be a choice at all for you…

Following is my reading list on Serverless Architecture.

What is Serverless?
https://auth0.com/blog/what-is-serverless/

Serverless Architectures
http://martinfowler.com/articles/serverless.html

What is Serverless Computing and Why is it Important?
https://www.iron.io/what-is-serverless-computing/

Serverless Architecture in short
https://specify.io/concepts/serverless-architecture

Is “Serverless” architecture just a finely-grained rebranding of PaaS?
http://www.ben-morris.com/is-serverless-architecture-just-a-finely-grained-rebranding-of-paas/

IAAS, PAAS, Serverless.
https://read.acloud.guru/iaas-paas-serverless-the-next-big-deal-in-cloud-computing-34b8198c98a2#.m9us1c5fe

Serverless Delivery: Architecture
https://stelligent.com/2016/03/17/serverless-delivery-architecture-part-1/

Principles of Serverless Architectures
There are five principles of serverless architecture that describe how an ideal serverless system should be built. Use these principles to help guide your decisions when you create serverless architecture.
1. Use a compute service to execute code on demand (no servers)
2. Write single-purpose stateless functions
3. Design push-based, event-driven pipelines
4. Create thicker, more powerful front ends
5. Embrace third-party services
https://dzone.com/articles/serverless-architectures-on-aws

Serverless Architectures – Building a Serverless system to solve a problem
https://serverless.zone/serverless-architectures-9e23af71097a#.j9z60nxw1

Serverless architecture: Driving toward autonomous operations
https://www.slalom.com/thinking/serverless-architecture

Serverless Developers
https://serverless-developers.com/

The essential guide to Serverless technologies and architectures
http://techbeacon.com/essential-guide-serverless-technologies-architectures

Using AWS Lambda and API Gateway to create a serverless schedule
https://www.import.io/post/using-amazon-lambda-and-api-gateway/

Five Reasons to Consider Amazon API Gateway for Your Next Microservices Project
http://thenewstack.io/five-reasons-to-consider-amazon-api-gateway-for-your-next-microservices-project/

AWS Lambda and the Evolution of the Cloud
https://blog.fugue.co/2016-01-31-aws-lambda-and-the-evolution-of-the-cloud.html

SquirrelBin: A Serverless Microservice Using AWS Lambda
https://aws.amazon.com/blogs/compute/the-squirrelbin-architecture-a-serverless-microservice-using-aws-lambda/

A Crash Course in Amazon Serverless Architecture
http://cloudacademy.com/blog/amazon-serverless-api-gateway-lambda-cloudfront-s3/
­
AWS Lambda and Endless Serverless Possibilities
https://abhishek-tiwari.com/post/aws-lambda-and-endless-serverless-possibilities

Awesome Serverless – A Curated List
https://github.com/JustServerless/awesome-serverless

Happy Learning!

Developing a Robust Data Platform : Key Considerations

key-considerations

Developing a robust data platform requires definitely more than HDFS, Hive, Sqoop and Pig. Today there is a real need for bringing data and compute as close as possible. More and more requirements are forcing us to deal with high-throughput/low-latency scenarios. Thanks to in-memory solutions, things definitely seems possible right now.

One of the lesson I have learnt in the last few years is that it is hard to resist developing your own technology infrastructure while developing a platform infrastructure. It is always important to remind ourselves that we are here to build solutions and not technology infrastructure.

Some of the key questions that needs to be considered while embarking on such journey is that

  1. How do we handle the ever growing volume of data (Data Repository)?
  2. How do we deal with the growing variety of data (Polyglot Persistence)?
  3. How do we ingest large volumes of data as we start growing (Ingestion Pipelines/Write Efficient)?
  4. How do we scale in-terms of faster data retrieval so that the Analytics engine can provide something meaningful at a decent pace?
  5. How do we deal with the need for Interactive Analytics with a large dataset?
  6. How do we keep our cost per terabyte low while taking care of our platform growth?
  7. How do we move data securely between on premise infrastructure to cloud infrastructure?
  8. How do we handle data governance, data lineage, data quality?
  9. What kind of monitoring infrastructure that would be required to support distributed processing?
  10. How do we model metadata so that we can address domain specific problems?
  11. How do we test this infrastructure? What kind of automation is required?
  12. How do we create a service delivery platform for build and deployment?

One of the challenges I am seeing right now is that the urge to use multiple technologies to solve similar problems.  Though this gives my developers the edge to do things differently/efficiently, from a platform perspective this would increase the total cost of operations.

  1. How do we support our customers in production?
  2. How can we make the life our operations teams better?
  3. How do we take care of reliability, durability, scalability, extensibility and Maintainability of this platform?

Will talk about the data repository and possible choices in the next post.

Happy Learning!

Scaling data operations with in-memory OLTP

Data has become the center of our universe in modern digital world. Applications are designed to store and collect more and more data. Companies are looking to integrate and analyse the data to generate insights and take actions.

Data is a precious thing and will last longer than the systems themselves ~ Tim Berners-Lee

Can an existing relational database scale with high ingestion rates, improved read performance?Database

In-Memory OLTP seems to be the direction forward. This is considering your existing technology investments. Of course if the company is open to change technology there would be more options.

Found couple of very good articles posts related to SQL Server in-memory OLTP. Looks like SQL Server 2016 has fixes to most of the issues with in-memory OLTP.

I just think it is an amazing technology and if we can use it in the right way, will definitely yield great results for your customers.

Introducing SQL Server In-Memory OLTP
https://msdn.microsoft.com/en-in/library/dn133186.aspx
https://www.simple-talk.com/sql/learn-sql-server/introducing-sql-server-in-memory-oltp/
http://blog.sqlauthority.com/2014/08/08/sql-server-introduction-to-sql-server-2014-in-memory-oltp/

The Use Cases for SQL Server 2014 In-Memory OLTP
http://sqlturbo.com/the-use-cases-for-sql-server-2014-in-memory-oltp/

SQL Server In-Memory OLTP Internals Overview
https://msdn.microsoft.com/en-us/library/dn720242.aspx

The Promise – and the Pitfalls – of In-Memory OLTP
https://www.simple-talk.com/sql/performance/the-promise—and-the-pitfalls—of-in-memory-oltp/
https://msdn.microsoft.com/en-us/library/dn246937.aspx

SQL Server 2016 : In-Memory OLTP Enhancements
http://sqlperformance.com/2015/11/sql-server-2016/in-memory-oltp-enhancements

Speeding up Business Analytics Using In-Memory Technology
https://blogs.technet.microsoft.com/dataplatforminsider/2015/12/08/speeding-up-business-analytics-using-in-memory-technology/

Dynamic Data Masking in SQL Server 2016
http://www.codeproject.com/Articles/1084808/Dynamic-Data-Masking-in-SQL-Server
https://blogs.technet.microsoft.com/dataplatforminsider/2016/01/25/use-dynamic-data-masking-to-obfuscate-your-sensitive-data/

Happy Learning!

“Data is long-term, Applications are temporary.”

Think data first. Data is long-term, applications are temporary. I recently happened to read this in one of the blog post. I couldn’t agree more. Data remains one of the most strategic projects for most of the companies.

Every fifth person you talk to, every other start up you come across and job postings has something or other to mention about data, analytics etc. But, when I speak to the guys whoever I come across in my ecosystem, lot of guys think it is only doing cool stuff in R.

Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.

If someone is an application developer for the last 10 years, can he/she suddenly become an expert in statistics and become an expert in Algorithms? Suddenly you start calling yourself a Data Scientist? May be… Nothing is impossible. But if that’s what is your passion you wouldn’t be an application developer for the last 10 years. Right?

Is there anything else one can learn and contribute in the data world? Thought of sharing couple of valuable links which can give you a very good idea on the various aspects and where one can fit in.

#1 Will Balkanization of Data Science led to one Empire or many Republics? Via http://www.kdnuggets.com/2015/11/balkanization-data-science.html
#2 Becoming a Data Scientist via http://nirvacana.com/thoughts/becoming-a-data-scientist/
#3 Difference between Data Engineering and Data Science via http://www.galvanize.com/blog/difference-between-data-engineering-and-data-science/
#4 The world of data science: Who does what in the data world? Via http://cloudtweaks.com/2015/11/booming-world-data-science/matrix-1013612_640

Data is one of the hottest stack right now and it is growing at a crazy speed. It would be extremely difficult for any single individual to cope up with this change unless one’s basics are right.

Once you have the basics right, it is about Meta learning and evolving from there.

Working with various large scale data related projects for the last 15 months, following is my high level list of items one need to know to have a reasonable understanding of data (Big/Small). This list is no specific order. 😦

General A Basic overview of what is Descriptive, Diagnostic, Prescriptive, Predictive and Cognitive Analytics? Understanding of the concepts and difference
Data Warehouses
  • OLAP VS OLTP
  • Dimensional Modelling (Star Schemas, Snowflake Schemas)
  • Difference between Multi-Dimensional, Relational, Hybrid
  • In-Memory OLAP
No SQL Databases
  • CAP Theorem
  • If you are from application development, this is where the most important change would be. So far, you would have dealt primarily with Key-Value stores and Document Stores. For Analytics purpose (Write Efficient), it is important to start understanding column databases (E.g.: Cassandra) and Graph (E.g.:Neo4J). This is again a big shift from what you would have done as an application developer. Spend some time on it.
  • In-Memory databases in general.
  • Apart from Cassandra and Neo4J, get an understanding of what MemSQL offers. Yes, it is MemSQL and not MySQL J seems very impressive.
Outside EDWs
  • MPPs/PDWs – Difference between traditional EDWs and MPPs?
  • DWH on cloud AWS Redshift, Azure SQL Data Warehouse
Data Mining
  • What does it mean?
  • Data Mining Algorithms
Hadoop
  • Hadoop and Various Hadoop Components
  • When to use Hadoop?
  • Parallelization and Map Reduce Fundamentals
Outside Hadoop
  • Difference between Hadoop, Spark and Storm (I personally prefer SPARK. RDDs give me the same comfort what I had with ADO.NET)
  • When to use Hadoop/Spark/Storm over MPP?
ETL
  • Data Munging/Wrangling
  • Scrubbing
  • Transforming
  • Reading and Loading Data
  • Exception Handling
  • Jobs/Tasks
Real time Analytics Working with Stream: Real time Analytics is something everyone talks about. But without understanding what it means by Stream processing you will never be able to figure out this.
From an application background

  • Reactive Architecture (Responsive, Resilient, Elastic and Message driven)
  • Understand the difference between an Event and a Transaction.
  • Event Processing(CQRS, Actor Model[Akka], Complex Event Processing)

If you don’t understand the above, then it would be difficult to move forward. Spend time on these before moving forward to other items
Messaging/Data bus

  • Kafka

Processing Streams

  • Spark/Storm

Lambda Architecture

Machine Learning Machine Learning

  • Difference between Data Mining and Machine Learning
  • ML Algorithms

Couple of very good posts to read in this
Machine Learning for Programmers: Leap from developer to machine learning practitioner via http://machinelearningmastery.com/machine-learning-for-programmers/
What Every Manager Should Know About Machine Learning via https://hbr.org/2015/07/what-every-manager-should-know-about-machine-learning
Most of what we are doing can be achieved at some level using Excel Analytics Data Pack. In fact, I would say Excel is the most powerful tool out there.

Recommendation Engines
  • Collaborative Filtering
  • Content-based Filtering
  • Hybrid

Once you are clear with the concepts start implementing using Apache Mahout

Communication Protocols
  • JSON, AVRO, Protocol Buffer, and Thrift: If you are from application development – you would have used JSON extensively. It is time to understand the other ones as well. I keep arguing this with my friend Sendhil (IMO, AVRO seems to be the way to go – where things are evolving and need for self-documentation – Cowboys Friendly).
Time Series
  • Modelling
  • Databases (OpenTSDB)
  • Forecasting
  • Trend Analysis
Modern day HOLAP Engines
  • Apache Kylin (My favourite at this point)
Data Visualization Self-Service is the Mantra here. Read this article: Data Scientists Should be Good Storytellers

Most of the people in an organization cannot understand the outcome of analytics, however they do need the proof of analysis and data. Data storytellers incorporate data and analytics in a compelling way as their stories involve real people and organizations” via https://dzone.com/articles/data-scientists-should-be-good-storytellers

  • How to represent data (Graphs/Charts)?
  • Excel Power Pivot/ Power BI (Polybase)
  • Lumira
  • D3.js
Deep Learning Though it may or may not be important at this point, try to understand what is deep learning. Read this : Deep Learning in a Nutshell: Core Concepts via http://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-core-concepts/
Data Lake One of my favorite topic and something I learnt after burning my hands is with data lake

  • Understand what Data Lakes mean? Why do you need one? How to build a data lake on your own?
  • Extract Load and Transform (ELT)
  • ELT vs ETL

Read this: https://azure.microsoft.com/en-in/solutions/data-lake/

Language Though there is a bunch of things to do with Python, R, Java etc. My choice is Scala (I love the way the language allows you to express. Wish someone can afford me as a developer again J)

If you have a good grasp on above, then it is time for you to figure our when to use what (Creating Solutions).

 “If all you have is a hammer, everything looks like a nail”

Read this:  The Ethics of Wielding an Analytical Hammer via http://sloanreview.mit.edu/article/the-ethics-of-wielding-an-analytical-hammer/

Data is having an impact on business models and profitability. It’s hard to find a non-trivial application that doesn’t use data in a significant manner ~ Ben Lorica, O’Reilly Media

Ok, this looks like a large list. Where do I start?

  1. Focus on the basics. Get a good overview of the ecosystem
  2. Decide your area of specialization.
  3. Focus on your specialization and build skills.
  4. Iterate and change course as required.
  • If you are more than 10 years of experience, understand the business situation and figure out when to use what. May be pick 1 or 2 items and start implementing in your environment.
  • If you are less than 10 years of experience, pick up a scenario and try to implement this and see if it makes any business sense.

What I have not covered in the list? I haven’t gone into the details of

  1. Hadoop Ecosystem and components (Pig/Hive etc.)
  2. Algorithms
    1. Nearest Neighbour
    2. K-Means Clustering
    3. Linear Regression
    4. Decision Trees etc.
  3. R in detail
  4. Infrastructure
    1. Env Setup
    2. Zookeeper, Yarn, Mesos
    3. Replication
  5. Vertical Industry Solutions
  6. Operational Systems (like Splunk)
  7. Data Governance

I keep hearing/seeing people who have never seen more than 1 GB of data saying that they do Big Data Analytics. Don’t learn or do something for the sake of doing it.

There is no short cut to a place worth going.

My favorite books on this topic.

If you want to know more about what I am learning, you can follow me in Twitter

Happy Learning!

Microservices : Reading List

Modern day businesses requires agility to survive and to be a leader. If you translate this business requirement into technology requirement, this means X Deploys a day (Time to market).

The big bloated, complex applications that we have built over a period of time is not allowing us to meet this X Deploys a day without compromising quality. If there is a way to decompose the big bloated monolith application blocks into smaller chunks it will help the business to extend, manage and deploy and eventually the X Deploys a day could become a reality.

How do we get there? Is there a way to achieve this? Microservices (lots of small applications) is one of the ways that could help in achieving this.

Microservices means developing a single, small, meaningful functional feature as single service, each service has its own process and communicate with lightweight mechanism, deployed in single or multiple servers.
Source

Additional Reading List
The Twelve-Factor App
http://12factor.net/

Microservices Reading List
http://www.mattstine.com/microservices

Understanding Microservices
http://kpbird.com/2014/11/Monolithic-vs-MicroService-Architecture/
http://shakayumi.tumblr.com/post/95688359079/whats-the-big-idea-with-microservices
http://kpbird.com/2014/06/Microservice-Architecture-A-Quick-Guide/
http://www.infoq.com/articles/microservices-intro
http://www.slideshare.net/mstine/microservices-cf-summit
http://java.dzone.com/articles/microservice-architecture
http://tech.gilt.com/post/35711763311/how-gilt-com-give-came-to-be

Microservices Architecture and Scalability
http://www.pst.ifi.lmu.de/Lehre/wise-14-15/mse/microservice-architectures.pdf
http://technologyconversations.com/2015/01/26/microservices-development-with-scala-spray-mongodb-docker-and-ansible/

Microservices Patterns
http://blog.arungupta.me/microservice-design-patterns/
http://microservices.io/patterns/index.html

Simon Brown’s Video : Software Architecture & Balance with Agility
https://vimeo.com/user22258446/review/79382531/91467930a4

Books
Building Microservices
Software Architecture for Developers

Frameworks
http://gilliam.github.io/concepts.html
http://projects.spring.io/spring-boot/
http://fabric8.io/
http://azure.microsoft.com/en-us/campaigns/service-fabric/

Technology Ecosystem for the Modern Day Business Application Developer

Technology is changing at a rapid pace. Everyday you see something new to be learnt, which did not exist few months back. If you are like me, who has come from an application development background, what does this change means to you?

For sure, this is not for gyan. Tried depicting this in a form, which i could use as a reference.  I purposefully, hace not included Desktop applications in this. If you are working in some of them, you may have include it for yourself. Obviously, this may change when we revisit this in couple of months.

Similarly, things like Programming Languages (Java, C#, Ruby), OOPS Concepts, TDD, SOLID Principles are foundations.

Technology Ecosystem

Is this Perfect? Not Necessarily. This is my version and you may have a different way of visualizing this. If you create one, please do share it with me 🙂

Did I cover all aspects? Not really. Take Analytics as an example. If you take Descriptive Analytics, you start looking at traditional Business Intelligence, Data Warehousing, Data Visualization etc. Each one is a separate block diagram on its own. Hence, i have stopped it at a very high level for this.

Can i be a master of all this? May not be possible. But if we have to call ourselves as techies, then we at least need to know what these are, where we can use them and may be pick and choose couple of items that could be of our interest and master it.

Happy Learning!!!