I write about Technology, Classical Music, Personal Development, Finance, and the Workplace. 1x Engineer on weekdays. https://linktr.ee/kovid
Image for post
Image for post
Photo by Alexandre Debiève on Unsplash.

Inspired by Tim Denning, I am pinning this to my profile and will keep updating this on a weekly basis

This Week’s Most Viewed Story (9th January 2021)

My All-Time Most Viewed Stories

  1. Meet Google Meena
  2. 4 Advanced SQL Features You Haven’t Used Enough
  3. State of the Art Infrastructure as Code
  4. How To Avoid Writing Sloppy SQL
  5. Git Best Practices For SQL

You can find more of my work here on Linktree or you can get in touch with me directly either on LinkedIn or via email. If none of that, you can always buy someone (me) a coffee!


Image for post
Image for post
Photo by Jason Briscoe on Unsplash

Getting Started, DATA ENGINEERING

A short hands-on tutorial on how to use SQL extensions built for time-series data in QuestDB

In this tutorial, you are going to learn about QuestDB SQL extensions which prove to be very useful with time-series data. Using some sample data sets, you will learn how designated timestamps work, and how to use extended SQL syntax to write queries on time-series data.

Introduction

Traditionally, SQL has been used for relational databases and data warehouses. In recent years there has been an exponential increase in the amount of data that connected systems produce, which has brought about a need for new ways to store and analyze such information. …


Image for post
Image for post
Photo by NASA on Unsplash

DATA ENGINEERING

A short introduction on how to store, process, analyze and visualize Geospatial data using AWS

With databases, warehouses, analysis, and visualization tools extending their capability to handle geospatial data, we are in a position to capture and use GIS data more than ever. While individual products supported GIS, cloud providers didn’t offer complete GIS solutions early on. With time, that has changed a lot. Cloud providers like AWS have extended support wherever it was required. For a software company like AWS, we need to understand that the choice of products they build is directly impacted by the requirements of usually their high-paying clients.

Now that a lot of companies see the need for geospatial data and that those companies are able to acquire and store it at a reasonably low cost (as location data is omnipresent and storage has gotten cheaper), we have seen wider adoption of geospatial databases, analysis, and visualization tools. Although AWS doesn’t have a full-fledged solution for Geospatial data, it does provide some great features in different services. …


Image for post
Image for post
Photo by Luke Chesser on Unsplash

DATA ENGINEERING

A brief introduction to the time-series databases — InfluxDB, TimescaleDB, and QuestDB

A plethora of new databases has evolved from relational databases based on specific business requirements and use-cases. From in-memory key-value stores to graph databases, from geospatial databases to time-series databases. All of these different types of databases serve a specific use where the general solution of using a relational database isn’t very efficient.

Although there are a lot of different types of databases, here we’re going to look at time-series databases — the databases required to handle time-series data.

Data that consists of successive measurements of something over a time interval is time series data.

With the modernization of financial trading and with the advent of IoT, the need for time-series databases is evident. Stock and cryptocurrency prices change every second. To measure this changing data and to perform analysis on that data, we need an efficient way of storing and retrieving data. …


Image for post
Image for post
Photo by Kelly Sikkema on Unsplash

AD INFINITUM

A list of my writings on Medium for various publications this year

As Medium doesn’t have an authentic way to create a Table of Contents for private blogs or non-Medium owned publications yet, one has to make do with a list of the kind I’m going to make here. The beginning of the end of 2020 is near. Finally, a year that has been hard on the world is coming to an end. Although 2021 is just a number and it's coming doesn’t really mean that any of our problems from 2020 are going to go away from a change in dates, calendar years do make us put things in perspective.

This year I wrote about the things that surround me — technology and music. I also wrote a bit about personal finance but realized that I needed to educate myself more on that front. I have been reading a lot of books about personal finance lately. I have also been following some people like Aswath Damodaran, JL Collins, and Mr. Money Moustache. Reading and writing about personal finance is fun. And very important too. …


Image for post
Image for post
Photo by Chris Liverani on Unsplash

DATA ENGINEERING

How to ingest data into QuestDB using Sockets

Background

In this tutorial, you will learn how to ingest data into QuestDB using a Python script and QuestDB’s InfluxDB line protocol ingestion feature. You will also learn how QuestDB supports schemaless data and modifies table structure on-the-fly to support operational flexibility.

For anyone who is new to QuestDB, it is a high performance open source time-series database. It has a wide range of use cases related to IoT, logging & application monitoring, financial trading and more. It uses a relational model with column-oriented storage, supports SQL and works best with append-only workloads.

Introduction to InfluxDB line protocol

With time-series data, the rate of ingestion can be very high and the effort that goes into parsing SQL insert statements really adds up. This is also true for any other format which requires a similar parsing effort, such as JSON. An alternative to this is to send data over sockets using a line protocol. The inspiration to use line protocol in databases comes the networking world, where line protocols are commonly used. …


Image for post
Image for post
Photo by Rajeshwar Bachu on Unsplash

AD INFINITUM

dbt’s main (probably only) competitor is now owned by Google

The two companies — (Fishtown Analytics) dbt and Dataform are in the business of solving data transformation problems at scale. Both of the products aim at standardizing data models that can be consumed across teams.

Although dbt is the more matured product in the two because of the community support and wide adoption, Dataform claimed a win yesterday when it was announced that Dataform will be joining Google Cloud.

Companies have realized that Data transformation is one of the main unsolved pain points of the data engineering domain. Many companies, even traditional ETL companies, are trying to get into solving the Transform in ETL using better ways. …


Image for post
Image for post
Photo by Gustas Brazaitis on Unsplash

DATABASES

A short guide for dealing with database indexes

Database indexes are often designed badly. The power that database indexes have is realized only if they are designed and used efficiently. Otherwise, an index is a sheer wastage of disk space and database performance. But you don’t want to waste disk space, so let’s quickly go through some of the things that you need to do for designing and using indexes properly.

Database indexes should be created for one reason only — to serve an existing or a future load of queries on the database, that is, the Indexes have to be designed based on current or expected usage. …


Image for post
Image for post
Photograph by John Dempsie, c. 1978

AD INFINITUM

Earlier this year, I wrote about what Christopher Hitchens can teach us about writing. Summarizing his ideas in a few points here

  1. Is writing something you can’t not do
  2. Find a voice
  3. Write more like the way you talk
  4. Don’t depend on booze
  5. Write to please yourself
  6. It matters not what you think, but how you think

Sam Harris, while speaking at an event just after Christopher’s passing away said,

The man had more wit and style and substance than a few civilizations I could name

I couldn’t agree more.


Image for post
Image for post
Photo by Ilya Pavlov on Unsplash

DATA ENGINEERING

Every team that takes Data seriously should get a DataOps person ASAP

It starts with a small team where a couple of data engineers (and database engineers) start catering to the requests of data analysts and scientists. Initially, the requests are simple. They don’t take much time to fulfil. As the team size grows, the data engineers take more and more additional work. A time comes when the data engineering team becomes a blocker for the business. How?

Data engineers start by catering to requests that are ad-hoc in nature. They deviate from their main responsibilities of developing new pipelines, adding new sources of data, fixing issues with old data, modelling the data in lakes and warehouses and so on. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store