Comparing the performance of pypolars and pandas

Image for post
Image for post
Photo by shiyang xu on Unsplash

pandas was initially released in 2008 written in Python, Cython, and C. Today, we’re comparing the performance of this well-known library with pypolars, a rising DataFrame library written in Rust. We compare the two while sorting and concatenating a 25Mil-record data and also when joining two CSVs.

Downloading Reddit Usernames data

Let’s first download a CSV file that contains ~26 million reddit usernames from Kaggle:

And let’s form another CSV file that we will use, you can create it with your favorite text editor or through the command line:

$ cat >> fake_users.csv


Now, let’s compare the sorting algorithm of the…

Image for post
Image for post
Photo by Theme Inn on Unsplash

Want to write a technical book or document your project?

Here I’ll walk you through how you can do that with python and sphinx. You can build content of tables that can contain sections to the chapters of your book. I will show you a case study to know how to insert images, hyperlinks, mathematical formulas, syntax highlighting for your favorite programming languages, and more.

Assuming you have some basic knowledge of python, let’s dive in and see what sphinx can do for us.

What is Sphinx?

Sphinx is a documentation generator library which can be useful to generate documentation for your project and can also be used for creating a content (e.g. …

Trends in your hand

Image for post
Image for post
Photo by Morning Brew on Unsplash

Social Analytics companies have been massively using twitter to get insights about whatever data they are interested in for brands, celebrities, etc. and also trending topics. In this tutorial, you’ll know how to get countries that have trends on Twitter and also get insights about what topics are trending the most and be able to retrieve the URL of that tweet and its volume as well.

Here I’ll walk you through how you can do that with Python and Tweepy. You can do many other things with Tweepy other than trending topics but in this tutorial, I will focus on getting trends. …

How to develop the habit of reading and the secret in the 3rd component

Image for post
Image for post
Photo by Henrikke Due on Unsplash

Whenever you hear baby steps, it might seem trivial to you but it’s not. Baby steps (aka tiny habits) are super powerful and it’s a way to help people accomplish great things in the long run.

When you know how to create tiny habits, you can change your life forever.

~ BJ Fogg, PhD

Stanford University

BJ Fogg is the Director of the Stanford Persuasive Technology Lab and a social scientist who is currently a research scientist at Stanford University. In 2009, he published the Fogg Behavior Model (FBM), a model for analyzing and designing human behavior.

BJ explains the behavior model at a TEDx Talk, you can check it out. …

A tutorial about cleaning a JSON file using command-line program jq

Image for post
Image for post
Photo by DISRUPTIVO on Unsplash

jq is a lightweight command-line JSON processor written in C. It follows the Unix philosophy that it’s focused on one thing and it can do it very well. In this tutorial, we see how jq can be used to clean JSONs and retrieve some information or get rid of undesired ones.

There are some data that are more suitable to be in a JSON format than a CSV or any other format. Most modern APIs and NoSQL databases support JSONs and also useful if your data is hierarchical that can be considered trees that can go to any depth, essentially any dimension, unlike CSV which is just 2D and can only form tabular data, not a hierarchical one. …

A tutorial about cleaning a large CSV file using command-line programs csvkit and xsv when sorting and concatenating discussing their performance

Image for post
Image for post
Photo by Yan Ots on Unsplash

In the last blog post, we proved that xsv is ~1882x faster than csvkit when joining two CSVs and saw how performant xsv is and when we can use xsv or csvkit when we are at the terminal.

Today, we’re talking about part 2 of cleaning CSV data at the command line investigating a large CSV file (from Kaggle) that contains ~26 million users and their number of comments posted from 2005 to 2017. We also talked about cleaning text files in general using the command line if you want to check out.


You just need to install csvkit and xsv with the following…

An in-depth tutorial about cleaning COVID-19 CSV file using command-line programs: csvkit and xsv comparing the performance of each

Image for post
Image for post
Photo by CDC on Unsplash

Have you ever dealt with a big-scary CSV file that has many columns that you don’t want and many records that slow down the process for you to filter and get the desired information?

This tutorial is about using two command-line programs that can solve these problems; csvkit and xsv. We will compare the two at the end and see how performant each and when we can use one and not the other in terms of speed especially if we’re processing a large CSV file. …

Tips to help you get accepted in the software industry even if you don’t have a CS degree

Image for post
Image for post
Photo by Ben White on Unsplash

Last September I gave a talk at the DSC event (Developer Student Club) which is a program powered by Google Developers, designed to help students learn web and mobile development skills & design thinking skills. They had an event on personal branding and CV writing and technical interviews. I talked about:

  • How to Build a Resume
  • How to Use Linkedin Professionally
  • How I Landed a Full-Time Job Without a CS Degree

Here is the presentation if you want to have a look: career launch

In this blog post, I’m going to emphasize the last point and will try to deliver all the messages I gave at that talk and also elaborate more to help more people who are looking for jobs in the software industry to get their dream jobs efficiently and I hope it’ll be easier for them to know these tips beforehand because I’ve been into this situation before and I consider myself lucky and more importantly was doing some productive stuff that helped me get that opportunity and I’m looking forward to helping as many as possible whether you are a recent grad or someone who is not satisfied with their current position and want a change. …

With Python and Javascript solutions: Solving a medium leetcode problem

Understanding the Problem

Minimum Numbers of Function Calls to Make Target Array is a medium problem at leetcode, I hope you have read the problem carefully and have tried solving it. This problem requires you to get the least number of operations that you can do in order to get to the desired array. The question now, how we can do such operations to count its numbers to reach the target array in the least possible way.

Initial thoughts

Let us take this example, having [2, 2] as the target array and we want to know the minimum number of operations to reach that array. So we start off with [0, 0] adding 1 to each element (2 operations now) reaching [1, 1] and then multiply the whole array by 2 (1 operation) reaching the target array we want [2, 2] so we have 3 operations to reach the desired array for this example as shown…

A basic tutorial about cleaning data using command-line tools: tr, grep, sort, uniq, sort, awk, sed, and csvlook

Image for post
Image for post
Photo by JESHOOTS.COM on Unsplash

Cleaning data is like cleaning the walls in your house, you clear any scribble, remove the dust, and filter out what is unnecessary that makes your walls ugly and get rid of it. The same thing happens when cleaning your data, it’s filtering what we want and removing what we don’t want to make the raw data useful and not raw anymore. …


Ezz El Din Abdullah

Data Engineer | data science, software engineering, self-help, and more at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store