My All-Time Most Viewed Stories
You can find more of my work here on Linktree or you can get in touch with me directly either on LinkedIn or via email. If none of that, you can always buy someone (me) a coffee!
In this tutorial, you are going to learn about QuestDB SQL extensions which prove to be very useful with time-series data. Using some sample data sets, you will learn how designated timestamps work, and how to use extended SQL syntax to write queries on time-series data.
Traditionally, SQL has been used for relational databases and data warehouses. In recent years there has been an exponential increase in the amount of data that connected systems produce, which has brought about a need for new ways to store and analyze such information. …
With databases, warehouses, analysis, and visualization tools extending their capability to handle geospatial data, we are in a position to capture and use GIS data more than ever. While individual products supported GIS, cloud providers didn’t offer complete GIS solutions early on. With time, that has changed a lot. Cloud providers like AWS have extended support wherever it was required. For a software company like AWS, we need to understand that the choice of products they build is directly impacted by the requirements of usually their high-paying clients.
Now that a lot of companies see the need for geospatial data and that those companies are able to acquire and store it at a reasonably low cost (as location data is omnipresent and storage has gotten cheaper), we have seen wider adoption of geospatial databases, analysis, and visualization tools. Although AWS doesn’t have a full-fledged solution for Geospatial data, it does provide some great features in different services. …
A plethora of new databases has evolved from relational databases based on specific business requirements and use-cases. From in-memory key-value stores to graph databases, from geospatial databases to time-series databases. All of these different types of databases serve a specific use where the general solution of using a relational database isn’t very efficient.
Although there are a lot of different types of databases, here we’re going to look at time-series databases — the databases required to handle time-series data.
Data that consists of successive measurements of something over a time interval is time series data.
With the modernization of financial trading and with the advent of IoT, the need for time-series databases is evident. Stock and cryptocurrency prices change every second. To measure this changing data and to perform analysis on that data, we need an efficient way of storing and retrieving data. …
As Medium doesn’t have an authentic way to create a Table of Contents for private blogs or non-Medium owned publications yet, one has to make do with a list of the kind I’m going to make here. The beginning of the end of 2020 is near. Finally, a year that has been hard on the world is coming to an end. Although 2021 is just a number and it's coming doesn’t really mean that any of our problems from 2020 are going to go away from a change in dates, calendar years do make us put things in perspective.
This year I wrote about the things that surround me — technology and music. I also wrote a bit about personal finance but realized that I needed to educate myself more on that front. I have been reading a lot of books about personal finance lately. I have also been following some people like Aswath Damodaran, JL Collins, and Mr. Money Moustache. Reading and writing about personal finance is fun. And very important too. …
In this tutorial, you will learn how to ingest data into QuestDB using a Python script and QuestDB’s InfluxDB line protocol ingestion feature. You will also learn how QuestDB supports schemaless data and modifies table structure on-the-fly to support operational flexibility.
For anyone who is new to QuestDB, it is a high performance open source time-series database. It has a wide range of use cases related to IoT, logging & application monitoring, financial trading and more. It uses a relational model with column-oriented storage, supports SQL and works best with append-only workloads.
With time-series data, the rate of ingestion can be very high and the effort that goes into parsing SQL insert statements really adds up. This is also true for any other format which requires a similar parsing effort, such as JSON. An alternative to this is to send data over sockets using a line protocol. The inspiration to use line protocol in databases comes the networking world, where line protocols are commonly used. …
The two companies — (Fishtown Analytics) dbt and Dataform are in the business of solving data transformation problems at scale. Both of the products aim at standardizing data models that can be consumed across teams.
Although dbt is the more matured product in the two because of the community support and wide adoption, Dataform claimed a win yesterday when it was announced that Dataform will be joining Google Cloud.
Companies have realized that Data transformation is one of the main unsolved pain points of the data engineering domain. Many companies, even traditional ETL companies, are trying to get into solving the Transform in ETL using better ways. …
Database indexes are often designed badly. The power that database indexes have is realized only if they are designed and used efficiently. Otherwise, an index is a sheer wastage of disk space and database performance. But you don’t want to waste disk space, so let’s quickly go through some of the things that you need to do for designing and using indexes properly.
Database indexes should be created for one reason only — to serve an existing or a future load of queries on the database, that is, the Indexes have to be designed based on current or expected usage. …
Earlier this year, I wrote about what Christopher Hitchens can teach us about writing. Summarizing his ideas in a few points here
Sam Harris, while speaking at an event just after Christopher’s passing away said,
The man had more wit and style and substance than a few civilizations I could name
I couldn’t agree more.
It starts with a small team where a couple of data engineers (and database engineers) start catering to the requests of data analysts and scientists. Initially, the requests are simple. They don’t take much time to fulfil. As the team size grows, the data engineers take more and more additional work. A time comes when the data engineering team becomes a blocker for the business. How?
Data engineers start by catering to requests that are ad-hoc in nature. They deviate from their main responsibilities of developing new pipelines, adding new sources of data, fixing issues with old data, modelling the data in lakes and warehouses and so on. …