Most of the time when a warehouse transforms into a datalike, question arises on how to normalize/denormalize the data. As an intro you can read an article, and follow the guidelines for deploying an IBM Industry Model to Hadoop.
To get an insight into a new highload big data platform you can read a blogpost on the keen website that highlights the architecture of several giants from a 10000 ft view.
Again serverless architectures are coming, the number of companies adopting serverless architectures are increasing. A good question related to skills for serverless development is covered in this high scalability blog post – whether serverless is the new VISUAL BASIC.
Yet another educational cassandra article that explains the dynamics that happen within a single node. A good place to start with your journey or refresh your knowledge.
Building robust systems is not a problem anymore, even fulfilling existing compliances and regulatory rules. It’s especially important in the era of big data analytics and data crunching. Check an example of serverless architecture working with healthcare and medical data.
Apache Beam version 2.0 has been released.
ML&DS
NLU/NLP is still an area which lacks deep learning, there are movements towards employing different architectures, though the results are not yet impressive. However there is a new library one can start using for initial data processing – SpaCy.
This area of data science is hot and affected by the hype around Machine Learning and Data Science. It brings new tools that are designed and implemented to ease the life of modern data scientists, while they perform data crunching. We encourage you to read a bit on data version control and how it improves a typical workflow.
A bit of fun with deep colorization – a ready to use model and Python notebook to colour grayscale images. Results are impressive and might give a new life for your grandpa’s photos.
Check also the articles about security and Machine Learning & security related topics.
Performance
As usual Brendan Gregg publishes deep insights into the performance of your system, this time he highlights a topic related to the CPU utilization and how one can misinterpret it.
And as a conclusion, check a real world case study from the telecom industry on improving spark streaming job performance.