Henry Eleonu's blog

Importance of Writing Only A Single Task in a Function

Henry Eleonu — Sun, 07 May 2023 17:36:51 GMT

I will be talking about the importance of writing only a single task in a function based on my recent experience in a project I was involved with. I was engaged in making some updates to simulation software for transport and logistics, and I noticed that the transport and logistics department, which was supposed to use the software, was not using it because they couldnt update it to meet their current needs and developers engaged previously had a hard time at maintaining the project. It was immediately apparent that the software was difficult to maintain because many tasks were lumped into each function, making the functions monolithic. It was surprising to learn that the company that developed the project is a well-known software consulting company, and as such, I did not expect them to deliver such poorly written code.

Apart from software projects meeting the functional and non-functional requirements in terms of deliverables, organisations should also have minimum standards in terms of code quality. This is very important because poor code quality will ultimately have a negative impact on the maintainability of a software project. The solution to their problems was writing only a single task in a function.

Some of the benefits of doing this are:

It reduces the code complexity of functions.
Reduce code coupling
Makes code readable, easier to understand and maintain
Improves code quality
Encourages the adoption of test-driven software development

Reduction of Code Complexity of Functions

Writing only a single task in a function reduces the function code complexity by simply reducing the number of lines of code of the function. Another way the complexity is reduced is that it reduces the nesting of conditional statements or program loop statements.

Reduce Code Coupling

It decouples the tasks that could have been coupled in one function into separate functions, which improves the reuse functions.

Make Code Readable, Easier to Understand and Maintain

Reducing code complexity and coupling are the major factors for improving code readability and understandability, consequently improving code maintainability.

Improve Code Quality

Improved readability, understandability and maintainability of code are key to improved code quality.

Encourage The Adoption of Test-Driven Software Development

By writing only a single task in a function, it is easier to write tests for each task represented by a function. This makes it possible to adopt a test-driven software development approach.

Some Best Practices for Improving Code Maintainability and Quality

Henry Eleonu — Sun, 07 May 2023 12:57:48 GMT

I will discuss my experience while maintaining simulation software written in Python. I was engaged in making some updates to simulation software for transport and logistics, and I noticed that the transport and logistics department, which was supposed to use the software, was not using it because they couldnt update it to meet their current needs and developers engaged previously had a hard time at maintaining the project. It was immediately apparent that the software was difficult to maintain because many tasks were lumped into each function, making the functions monolithic. It was surprising to learn that the company that developed the project is a well-known software consulting company, and as such, I did not expect them to deliver such poorly written code.

Apart from software projects meeting the functional and non-functional requirements in terms of deliverables, organisations should also have minimum standards in terms of code quality. This is very important because poor code quality will ultimately negatively impact the maintainability of a software project.

When the project was no longer being used because it was difficult to update it to meet current needs, it was regarded as a failed project by those in charge of transport and logistics. This shows the impacts of not following best practices in software engineering in software projects. The developers of the project failed to, as much as possible, make sure that each function in the software performs only one task. The functions had many lines of code, making reading and understanding the code very difficult. The monolithic functions had higher complexity which made them difficult to maintain. Also, because of the many tasks in each function, there is high code coupling, often resulting in spaghetti code that is hard to understand. Reusing the code for the many tasks embedded within each function was also impossible. Because many tasks were lumped into each function, it was difficult to write tests for a function that would simultaneously test the functionality of the many tasks embedded in one function. This made it practically impossible to adopt a test-driven software development approach.

How To Encourage Best Practices in Python Programming By Complying With PEP8 Style Guide

Henry Eleonu — Mon, 17 Apr 2023 14:28:10 GMT

As part of this blog post, I have added a YouTube demo on how to enable PEP8 compliance in Visual Studio Code.

Enabling PEP8 Compatibility of Python Code in Visual Studio Code - YouTube

https://www.youtube.com/watch?v=ZkwHwQ6l4wI

Some of the best practices in Python programming are:

Comply with PEP8 conventions.
Enforce data type checking.
Autogenerate docstring
Install autopep8 for code formatting.

Complying With PEP8 Conventions

The key benefit of following best practices and conventions in Python programming is to improve the readability and consistency of code. The readability of code is very important because code is read much more than it is written. The readability of code cannot be over emphasized because it helps to improve the maintainability of code. Following the standards in PEP8 will go a long way in improving the quality of the code we write. PEP8 convention provides a style guide for writing consistent and readable Python code and I will show you how to enforce or encourage PEP8 conventions in Visual Studio Code. The first step is to install PEP8 which has the Python coding standards such as variable naming style, module docstring, function docstring, and inconsistent indentation.

The first step is to Install PEP8 by running the following command:

$ pip install pep8

The next step: To enforce the PEP8 standard, we will install Pylint in Visual Studio Code. Pylint is the tool that checks whether our code complies with the PEP8 standard and returns errors where we fail to comply. Install Pylint The easiest way to install Pylint is to go to the extension tab in Visual Studio Code, search for Pylint and then install it.
Another way to install Pylint is to run the following code on the terminal:

$ pip install pylint

Then Pylint needs to be enabled on VSCode by following these steps:

Press "Ctrl + Shift + P" to get Command Palette
Type "Lint"

Select "Python : Enable/Disable Linting", and click on "Enable"
Repeat Steps 1 & 2, now select "Python : Select Linter", Select pylint from options

Note: apart from highlighting stylistic problems, Pylint highlights syntactical errors in your code.

Enable Type Checking

You might also enable type checking in VScode by going to settings, typing type checking and changing type checking mode to either basic or strict. This will highlight function parameters without data type specified.

Autogenerate docstring

For a large project, writing docstring could be cumbersome, so I use autoDocstring to automatically generate docstring for modules and functions. To install autoDocstring, go to an extension on VsCode, type autoDocstring and then install it. To generate a docstring, Cursor must be at the line where you want your docstring to start, right click and then click Generate Docstring. You must write your function first before you can generate a docstring for it.

Install autopep8 Formatter Extension for VScode

autopep8 automatically formats Python code to conform to the PEP8 style guide. Install autopep8 formatter from extensions in Vscode.

Deploy Jupyter Notebook and Spark on AWS Elastic Kubernetes Service (EKS)

Henry Eleonu — Tue, 20 Dec 2022 17:28:20 GMT

In this article, I am going to show the steps to follow to enable you to run Apache Spark on a cluster managed by Kubernetes. But before this, you have to first create the EKS cluster. I have another article on how to create an EKS cluster in AWS. Spark is a framework for big data processing which enables in-memory processing of a large amount of data by partitioning the data and distributing the partitions to the nodes that make up the cluster to be processed. The Dockerfile and YAML files used for the deployment of Spark on EKS can be found in my GitHub repository:

https://github.com/henryeleonu/spark-kubernetes/tree/jupyter-spark-kube

I also have a Youtube demo on the deployment of Spark on EKS.

https://www.youtube.com/watch?v=XGvdlSmNMvc

Docker Image

You will build a Spark docker image from a Dockerfile. The docker image is required to run the Spark docker containers in the Kubernetes cluster. Docker is a container runtime environment that is frequently used with Kubernetes. Spark ships with a Dockerfile that can be used to build the spark image. This Spark official Dockerfile can be customized to meet an individual applications needs. The first step is to download Apache Spark from https://spark.apache.org/downloads.html. Choose a package type: Pre-built for Apache Hadoop 3.3 and later. Then download the tar archive file to your local directory and extract it.

The next step is to build the Spark image. The Spark download has a bin/docker-image-tool.sh script that you can be used to build the Spark image found in the kubernetes/dockerfiles/ directory. We will be building an additional Pyspark image with a docker file in this directory /kubernetes/dockerfiles/spark/bindings/python/Dockerfile

Example

# To build additional PySpark docker image$ ./bin/docker-image-tool.sh -r  -t my-tag -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build

Replace with the name of your docker hub repository and my-tag with the tag of the repository.

# To build additional PySpark docker image$ ./bin/docker-image-tool.sh -r heleonu/spark-py -t 1.1 -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build

After building, push the image to docker hub using the command:

docker push heleonu/spark-py:1.1

You will create a Dockerfile that is based on the Spark base image we created earlier. This image which we will be using will have the packages and libraries we need such as Jupyter Notebook etc. You will build an image from this Dockerfile and push it to docker hub.

docker build -t heleonu/spark-py-kube:1.2 .

docker push heleonu/spark-py-kube:1.2

https://gist.github.com/henryeleonu/64b247a5e6471203cdd7a83f9841775d

Create a Service Account and Permissions

A service account provides an identity for processes that run in a Pod, and maps to a ServiceAccount object. A ClusterRole contains rules that represent a set of permissions. A ClusterRoleBinding grants the permissions defined in a ClusterRole to a user, groups, or service accounts. The YAML configuration below has the Service Account, ClusterRole and ClusterRoleBinding we used.

kubectl apply -f service-account.yaml

https://gist.github.com/henryeleonu/46e8a2505e82ab03d7aef98d50088baa

Create Secret

Kubernetes Secrets can be used to provide credentials for a Spark application to access secured services. The secret YAML file has the login credential spark required to communicate with the PostgreSQL database. The login credentials are encoded with base64 encoding. I have another article on how to deploy PostgreSQL on EKS.

https://gist.github.com/henryeleonu/268516c3e68be13d77936bb459e0f6ff

Run the command to deploy the secret:

kubectl apply -f postgres-login-secret.yaml

Creating Spark Pod

You will create a spark pod with the YAML configuration below.

https://gist.github.com/henryeleonu/157c513a560d384cf05339549a9afb09

From the YAML file, the service account we created earlier is bound to the spark pod. We use the spark image, heleonu/spark-py-kube:1.2, which we created earlier. This command,

command: ["jupyter", "notebook", "--ip", "0.0.0.0", "--allow-root"]

starts the Jupyter Notebook when the Spark container starts running. By default, Jupyter Notebook runs on port 8888. The secret is attached to the spark pod using these lines.

envFrom:          - secretRef:              name: mysecret

Run the command to deploy the spark pod:

kubectl apply -f spark-pod.yaml

Create Headless Service

This headless service enables the spark executors to communicate with the spark driver.

https://gist.github.com/henryeleonu/e3c10027c1306aabf9f201c69b3e6e91

Run the command to deploy the headless service:

kubectl apply -f spark-headless-service.yaml

Start Jupyter Notebook

We may want to start Jupyter Notebook by Kubectl exec into the spark pod using the command

kubectl exec -it spark-pod bash

And then start Jupyter Notebook in the spark pod by running the command:

jupyter notebook --ip 0.0.0.0 --allow-root

When Jupyter Notebook starts in the spark pod it will display the URL on Jupyter Notebook on the terminal. This URL will have the port on which Jupyter Notebook is running in the pod. You need to port forward this port to enable you to run Jupyter Notebook on the browser on your local machine. You will open another terminal to do the port forwarding by running the command:

kubectl port-forward pod/spark-pod 8889:8889

This command assumes that Jupyter Notebook is running on port 8889 in the spark pod and this port is forwarded to port 8889 on your local machine. You can then copy the URL of Jupyter Notebook and paste it into the browser on your local machine to run it from your local machine. This way, you can submit Spark jobs to AWS EKS from your local browser. To better understand how this works, watch this YouTube demo.

https://www.youtube.com/watch?v=XGvdlSmNMvc

This is an example code to use on your notebook to set the spark configurations to enable the submission of Spark jobs on EKS.

https://gist.github.com/henryeleonu/2d2116f732c525159240c974001d7a01

How To Deploy PostgreSQL on AWS Elastic Kubernetes Service (EKS)

Henry Eleonu — Mon, 19 Dec 2022 11:26:02 GMT

In this post, I am going to explain the step I followed to deploy PostgreSQL running in docker containers to AWS EKS. The YAML files I used for this deployment can be found in my GitHub repository:

https://github.com/henryeleonu/spark-kubernetes/tree/jupyter-spark-kube

In this YouTube video, I demoed the deployment of Apache Spark and PostgreSQL on EKS:

https://www.youtube.com/watch?v=XGvdlSmNMvc

Before your deploy PostgreSQL, you must deploy the EKS cluster on AWS. Click this link to see how we deploy EKS cluster.

Create a Storage Volume

As Docker containers are ephemeral. All the data which is generated by or in the container will be lost after the termination of the container instance. To save the data, we will be using Persistent volumes and persistent volume claim resources within Kubernetes to store the data on persistent storages. We use the manifest below to specify persistent volumes and persistent volume claim resources. Note that since we are deploying this database container to AWS EKS, the instance storage in the EC2 will be used. This means on termination of the EC2, your data will be lost. For a production database, the best approach is to use storage external to the EC2 instance on which the database pod is running such as Elastic Block Store (EBS) volume or Elastic File System (EFS).

https://gist.github.com/henryeleonu/235c91bfd299767a369650b58103b44f

To create a persistent volume and persistent volume you run the command on the terminal:

kubectl create -f postgres-storage.yaml

kubectl apply -f postgres-storage.yaml

To check the Persistent Volume, run the command:

kubectl get pv

To check the Persistent Volume Claim, run the command:

kubectl get pvc

Create Secret

A Secret is an object that contains a small amount of sensitive data such as a password, a token, or a key. By using secret, you don't need to include confidential data in your application code. Secrets are created independently of the Pods that use them, therefore, there is less risk of confidential data being exposed during the workflow of creating and viewing and editing Pods. Kubernetes, and applications that run in your cluster, can also take additional precautions with Secrets, such as avoiding writing secret data to nonvolatile storage. I used base64 encoding to encode the values for POSTGRES_DB, POSTGRES_USER AND POSTGRES_PASSWORD in the secret manifest. Below is the secret manifest we used.

https://gist.github.com/henryeleonu/268516c3e68be13d77936bb459e0f6ff

To create the secret, you run the command on your terminal:

kubectl apply -f postgres-login-secret.yaml

Creating PostgreSQL Deployment

In the deployment manifest below, the pod is specified under template, which has the image of postgres:14-alpine. It has the information from the secret associated to the pod and also mounts the volume created from the persistent volumes and claims. Set the replica of the deployment to 1. If you need to run multiple pods of the database in the Kubernetes cluster, then do not use a deployment for it because it will not be able to synchronize the data in the multiple databases which will compromise the integrity of data. If you need multiple databases, like one master and one or more read replicas, then use a StatefulSet. A manifest of the kind: StatefulSet will ensure the synchronization of read replicas with the master pod.

https://gist.github.com/henryeleonu/87c278384453eaebcdbd60e285d707a6

To create Postgres deployment, run the command on your terminal:

kubectl create -f postgres-deployment.yaml

To get the list of Kubernetes deployments run the command:

kubectl get deployments

Create PostgreSQL Service

To access the deployment or container, we need to expose PostgreSQL service. Kubernetes provides different types of services like ClusterIP, NodePort and LoadBalancer. With ClusterIP we can access the PostgreSQL service within a Kubernetes cluster. NodePort gives the ability to expose a service endpoint on the Kubernetes node, which is the EC2 instance in the case of AWS. For accessing PostgreSQL externally, you need to use a Load Balancer service type which exposes the service externally. Apart from service, you can also use Ingress to expose a service externally.

In our case, we don't need to expose the database externally and it is a good practice never to do this because the database should not be access directly but through a front end. Therefore the manifest below which we used specifies a service of type ClusterIP.

https://gist.github.com/henryeleonu/3f5b100c327d5abf3cc970024803622c

To create Postgres Service, run the command in the terminal:

kubectl create -f postgres-service.yaml

To verify the Kubernetes service, run the command:

kubectl get svc

Connecting to PostgreSQL via kubectl Command

To get into the PostgreSQL pod, run the kubectl exec command below. Be sure to change the pod name, the username and the database name.

kubectl exec -it --psql -h localhost -U --password -p

kubectl exec -it postgres-574d8d5f-2488v --psql -h localhost -U postgres --password -p 5432 postgres

Delete PostgreSQL Deployments

For the deletion of PostgreSQL resources, we need to use the below commands.

kubectl delete -f postgres-deployment.yaml

kubectl delete -f postgres-login-secret.yaml

kubectl delete -f postgres-service.yaml

kubectl delete -f postgres-storage.yaml

How to Create an AWS Elastic Kubernetes Service (EKS) Cluster

Henry Eleonu — Sun, 18 Dec 2022 22:30:54 GMT

We will be explaining steps to follow to create to create an AWS EKS cluster.

Set Up The IDE or Command Line Interface

The first step to start from in creating an EKS cluster on AWS is to set up the interfaces and Integrated Development Environments (IDE) to enable communication with AWS APIs. You can set up the AWS Command Line Interface (AWS CLI) on your local machine or set up AWS Cloud9 (an IDE) on AWS. AWS CLI is preinstalled on Cloud9, unlike your local machine which requires the setting up of AWS CLI. If you want to know how to set up Cloud9, I have another blog post on how to do this click here. You may follow the steps click here to download and install AWS CLI and for these steps to configure AWS CLI click here.

Install or Update kubeclt

kubectl is a command line tool that you use to communicate with the Kubernetes API server. kubectl is available in many package managers and installation via a package manager is often easier than a manual download and install process. Steps for the installation of kubectl can be found in this link click here.

Install or Update eksctl

eksctl is a simple CLI tool for creating and managing clusters on EKS. It is written in Go, uses CloudFormation, and was created by Weaveworks. eksctl provides the fastest and easiest way to create a new cluster with nodes for Amazon EKS. As a prerequisite, kubectl must be installed before the installation of eksctl. This link click here shows short steps to follow to install eksctl.

Create an AWS EKS Role

To enable your Kubernetes clusters managed by Amazon EKS to make calls to other AWS services and manage the resources on AWS, you must create an IAM role with the following policies: AmazonEKSClusterPolicy Click here to find the steps to create this role.

Creating a VPC for your Amazon EKS cluster

You may decide to create a VPC beforehand or create it during cluster creation. Follow these steps click here to create a VPC beforehand.

Create an EKS cluster

To create the Kubernetes cluster, we will first write a manifest in a YAML file with the file name eksctl-cluster.yaml. This is the manifest I used to create the Kubernetes cluster on AWS. The YAML files used to create the EKS cluster can be found in my GitHub repository:

https://github.com/henryeleonu/spark-kubernetes/tree/jupyter-spark-kube

https://gist.github.com/henryeleonu/e395c72d413a2b7c8001f1d2cb839b32

I ran the following command on my terminal to create the cluster.

eksctl create cluster -f eksctl-cluster.yaml

After the creation of the cluster, I ran the following commands:

To get all contexts:

kubectl config get-contexts

To get the current context:

kubectl config current-context

To set the context to EKS cluster on AWS:

kubectl config use-context henry@spark-nodes.eu-west-2.eksctl.io

Creating an IAM OIDC provider for your cluster

A Kubernetes service account provides an identity for processes that run in a pod. If a pod needs access to AWS services, a service account is mapped to an AWS Identity and Access Management identity to grant that access. Your cluster has an OpenID Connect (OIDC) issuer URL associated with it. To use AWS Identity and Access Management (IAM) roles for service accounts, an IAM OIDC provider must exist for your cluster. Follow these steps click here to create an IAM OIDC provider for your cluster.

Deploy The Cluster Autoscaler

Autoscaling is a function that enables automatic horizontal scaling of your resources, that is, scaling resources up or down to meet changing demands. This is a crucial Kubernetes function that would otherwise be difficult to achieve if performed manually.

Amazon EKS supports two autoscaling products. The Kubernetes Cluster Autoscaler and the Karpenter open-source autoscaling project. The cluster autoscaler uses AWS scaling groups, while Karpenter works directly with the Amazon EC2 fleet. We will be using cluster autoscaler.

The Cluster Autoscaler requires the following tags on your Auto Scaling groups so that they can be auto-discovered. If you used eksctl to create your node groups, these tags are automatically applied.

Key Value

k8s.io/cluster-autoscaler/my-cluster owned

k8s.io/cluster-autoscaler/enabled true

Create an IAM policy that grants the permissions that the Cluster Autoscaler requires to use an IAM role. Follow these steps here click here to create the role and policy.

To deploy the Cluster Autoscaler:

Follow the steps here click here.

Download the Cluster Autoscaler YAML file by running the following command:

curl -o cluster-autoscaler-autodiscover.yaml https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

Modify the YAML file and replace with your cluster name. Also, consider replacing the cpu and memory values as determined by your environment

Run the command on the terminal to deploy the Cluster Autoscaler:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

How to Set Up AWS Cloud9 Environment

Henry Eleonu — Sun, 18 Dec 2022 13:11:54 GMT

Cloud9 is a web-based IDE that runs on an AWS EC2 instance. This means we do not need to install any IDE on our local machine to be able to develop on AWS.

Create Cloud9 Environment

Navigate to the AWS console to create the environment. Click on Create environment. Fill in the detail which includes selecting the instance type.

Choose the default VPC under VPC settings

Then click on Create. After the creation of cloud9, click on the name of the created cloud9 environment

Click on Open in Cloud9

The Cloud9 IDE opens in another tab on your browser

Create An IAM Role for EC2

We will then create an IAM role the EC2 instance which Cloud9 runs on will assume. This role will allow the EC2 instance to make API calls to other AWS services. Navigate to Identity and Access Management (IAM), under Acess Management, click Roles and then click Create role. Under the Trust entity type, select AWS services because the EC2 instance is an AWS service that will assume the role that is being created. Under the use case, select EC2.

Click next. To attach a policy to the role, search for and select AdministratorAccess from permissions policies and click next.

Click next, name the role and create the role.

Navigate to EC2 console, and click on instances to see the EC2 instance created on which cloud9 runs on.

Select the EC2 instance, click the Actions dropdown menu, select Security and then select Modify IAM role. From the menu, select the role we created earlier and that click Update IAM role.

Remove The Temporary IAM Credentials for AWS Cloud9

Go to AWS Cloud9 IDE, and choose Settings in the gear icon on the top right corner. Under Preferences, choose AWS settings and then choose Credentials. Turn off AWS managed temporary credentials and close the tab.

Code Generation Using ChatGPT: Optimizing Language Models for Dialogue

Henry Eleonu — Fri, 16 Dec 2022 12:29:30 GMT

OpenAI has released a trained model called ChatGPT that can interact conversationally. I am particularly interested in how it generates code. In this post, I will be writing about my interactions with ChatGPT and some of my observations.

My Interactions with ChatGPT

I asked this question:

How can I scrape a table from a website in python?

I got this response from ChatGPT:

To scrape a table from a website in Python, you will need to use a library called Beautiful Soup. Beautiful Soup is a popular Python library for extracting data from HTML and XML files. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree.

Here is an example of how you can use Beautiful Soup to scrape a table from a website:

https://gist.github.com/henryeleonu/c398d95b3eabb64b7f28d891a4d44952

This code will make a request to the specified URL, parse the HTML content of the page, find the table element in the HTML, and then loop through the rows and cells of the table, extracting the text from each cell and printing it to the console.

You can also use other libraries such as Selenium or Pandas to scrape tables from websites. Selenium is a tool for controlling a web browser through code, which can be useful if the table you want to scrape is generated dynamically by JavaScript. Pandas has a function called read_html() which can be used to automatically parse tables from an HTML page and return them as a list of DataFrame objects.

I hope this helps! Let me know if you have any questions.

I asked the same question a second time and I got a different response, this time it gave the steps without writing any code.

This is the response I got:

There are a few ways to scrape a table from a website in Python:

Using the BeautifulSoup library:

First, install the BeautifulSoup library using pip install beautifulsoup4
Then, use the requests library to send a GET request to the website and retrieve the HTML content
Parse the HTML content using BeautifulSoup and find the table element
Extract the rows and columns from the table and store the data in a list or a dictionary

Using the Selenium library:

First, install the Selenium library using pip install selenium
Then, use Selenium to open the website and retrieve the HTML content
Parse the HTML content and find the table element
Extract the rows and columns from the table and store the data in a list or a dictionary

Using the pandas library:

First, install the pandas library using pip install pandas
Then, use the pandas.read_html() function to extract the tables from the website
The function will return a list of DataFrame objects, which can be accessed and manipulated as needed.

I then asked a this more specific question

How can I write a python code to scrape a table from the website, https://en.wikipedia.org/wiki/List_of_airlines_of_the_United_Kingdom?

This is the response from ChatGPT:

Here is a sample code that demonstrates how you can scrape the table from the website using Python's Beautiful Soup library:

https://gist.github.com/henryeleonu/7d3f86733328e27cc6a56123d97459b4

This code will print out the contents of the first three columns of the table, which contain the name, IATA code, and ICAO code of the airlines. You can modify the code to extract other information from the table, such as the type of airline or the country of origin.

My Observations

From my interactions, I can say that ChatGPT is impressive, especially from the way it makes the interaction feel natural and how detailed the responses it gives. But in terms of code generation, I will say that the code is more of a template that needs to be customized to meet your needs. The code generated is not guaranteed to run without errors, as the above code all have errors on running them. Generally speaking, the code generation aspect might be beneficial at the initial stage of a project especially to a beginner programmer to know the step to take to solve a problem.

How to Install Multiple Versions of Python Using Virtualenv

Henry Eleonu — Thu, 15 Dec 2022 09:11:12 GMT

Click this line to watch: how to install multiple versions of python using virtualenv - YouTube

There are situations when we need to have multiple versions of python, for instance, when we need to install dependencies that are not compatible with the python version we are running. I have run into a situation where a dependency I need to install is only compatible with previous versions of python. To solve this problem, I had to run the previous version of python in a virtual environment. I use virtualenv utility, which enabled me to run multiple versions of python. I will be explaining some of the steps I followed to get things running.

Install virtualenv
pip install virtualenv
Download the desired version of python
I already had my main python installed in this path: C:\Python\Python311\
I install the previous version of python here: C:\Python\Python310\
Create a project directory
I created a project directory here: C:\Python_Workspace\my_project\
Create a virtual environment in your project directory
Open the terminal and change directory to project directory
cd C:\Python_Workspace\my_project\
Create your virtual environment
python -m virtualenv -p C:\Python\Python310\python.exe my-virtual-env
Activate virtual environment
.\my-virtual-env\Scripts\activate
To deactivate the virtual environment, run
deactivate
We can now go ahead to install all the dependencies we need.
To create a requirements.txt file of all dependencies in the virtual environment, run:
pip freeze > requirements.txt

Pros and Cons of Stock APIs

Henry Eleonu — Thu, 15 Dec 2022 08:46:58 GMT

Real-time and historic stock or other financial datasets are very essential for developing financial applications. Developers and Data Engineers typically want to extract data from Application Programming Interfaces (API) from within their code. There are many stock APIs out there, therefore could be a bit difficult to decide on which API to use.

We will be looking at some of the popular ones out there and comparing them based on cost, how well-supported the API is, ease of use, limitations on the size of the dataset, and support for real-time data.

In terms of cost, some are free, others give free days of trial, while some are not free and dont give free trial.

We can measure how well-supported an API is by its last update, for example in the case of a python library of an API, the last date of update in PyPi. We can also measure this by the frequency of updates of their GitHub repository.

Yahoo Finance

Yahoo Finance publishes free stock data from the major stock market around the world.

Pros:

It is free

We can get a huge amount of data from it

It has two well-supported Python libraries, pandas-datareader library as well as the yfinance library. The libraries simplify the extraction and use of the data by reducing the amount of code. The latest release for yfinance was on Nov 16, 2022

Cons:

The API is not an official Yahoo Finance API

Only basic dataset can be retrieved

Because it is unofficial, the rate of API calls could be limited

Alpha Vantage

Alpha Vantage provides enterprise-grade real-time and historical financial market data through a set of powerful and developer-friendly data APIs and spreadsheets. Delivered through REST stock APIs, Excel, and Google Sheets.

Pros:

It has a python library, but it is not popular

It is suitable for commercial applications

Supports real-time and historical financial market data

Cons:

It is not free.

It has an unofficial python library that is not well supported of which the latest release is July 4, 2021

Bloomberg API

Pros:

it has an official python library

suitable for commercial applications

Supports real-time and historical financial market data

Cons:

Not free. Start at $2000 per month

Its python library is not well supported. The latest version was released Jan 1, 2019

Stock News

Focused on stock news and summary reports

Pros:

Good at stock news

Cons:

No custom python library. You make us of request to get data in json format.
Gives a 14-day free trial plan starting at $19.99 per month.
Requires details like card number and name on the card to make payment.

IEX Cloud

Pros:

Has a free trial

Supports real-time and historical financial market data

Cons:

No custom python library. You make us of request to get data in json format.
Not free. Starting at $49

Morning star

Pros:

Has a python library

Cons:

Not free.
Python library is not well supported. The latest release was Jun 16, 2020