Showing posts with label Pandas. Show all posts
Showing posts with label Pandas. Show all posts

Sunday, 18 May 2025

Displaying weather data using Seaborn with Python

Using the library Seaborn, built on top of MatPlotLib, displaying weather data is convenient. Using Anaconda and Jupyter Notebook, working on the data is user friendly. First off, let's look at a data set containing some Norwegian weather data. The following data set, contains weather data from Norway from 2020-2021 for 55 meterological stations. (13,61 MB in size), available on DbCL v1.0 license (meaning, free of use , 'as-is' warranty) on the Kaggle.com website.

https://www.kaggle.com/datasets/annbengardt/noway-meteorological-data/data

First off, the following imports are done in the Jupyter Notebook. This is a free IDE part of the Anaconda distribution that is prepared for data visualization, that runs in a browser.

NorwayMeteoDemo1.pynb


import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

Next off, using Pandas, Python's Data Analysis Library, the data is prepared from the mentioned dataset. The format is in CSV format. Also, columns are added dynamically to the dataset. A moving 14-days average of the daily maxmimum air temperature is added using the rolling method and setting window to 14. Also, a date column is added. Our dataset contains three int64 values day, month and year. We combine these to create a date column.

NorwayMeteoDemo1.pynb


df = pd.read_csv("datasets/weather/NorwayMeteoDataCompleted.csv")
df['moving_max_air_temp_avg'] = df['max(air_temperature P1D)'].rolling(window = 14, min_periods = 1).mean()
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

Note the use of min_periods set to 1 for the moving average. Or else, you will get NaN in the start of the data of your created moving average column and the way Python works, it will cause NaN for all the next periods too ! Next, choosing what data to display. The following data will be shown in the demo.
  • Station id : SN69100 (This is Værnes - Trondheim Airport weather station by the way,
  • Year 2020
The station ids can be looked up here on this user's Github GIST: https://gist.github.com/ofaltins/c1f0158f1766c8bd695b2c8d545c052c The following code set up the filtered data and saves it into a variable

NorwayMeteoDemo1.pynb


filtered_df_2020= df[ 
    (df['sourceId'] == 'SN69100')  &
    (df['year'] == 2020)
]

Next up, the data is then displayed. The demo will show two lineplots, first the maximum air temperature as a lineplot using Seaborn. In addition, another line plot with the moving 14 days average of the daily maximum air temperature. Also, a bar plot is added below these two line plots. Note that the bar plot used here is bar and not barplot. Barplot is available in Seaborn, while Bar is available in the MatPlotLib.

NorwayMeteoDemo1.pynb



sns.set_style('whitegrid') # set the style to 'whitegrid' 

# Create a 2x1 subplot

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10), sharex=True)

sns.lineplot(data = filtered_df_2020, x = filtered_df_2020['date'], y = filtered_df_2020['max(air_temperature P1D)'], label='Daily max air temperature (C)', linewidth = 1.5, color = 'pink', ax = ax1)

sns.lineplot(data = filtered_df_2020, x = filtered_df_2020['date'], y = filtered_df_2020['moving_max_air_temp_avg'], label='14-day moving average Daily max air temperature (C)', color = 'red', linewidth = 1.8, ax = ax1)

ax2.bar(filtered_df_2020['date'], filtered_df_2020['sum(precipitation_amount P1D)'], data = filtered_df_2020, color = 'blue', label = 'Daily sum precipitation (mm)')

ax1.set_title('Værnes - Weather data - 2020')

ax1.set_xlabel('Date of year')
ax1.set_ylabel('Daily max air temperature (C)')

ax2.set_xlabel('Date of year')
ax2.set_ylabel('Daily sum precipitation (mm)')



The code above shows the resulting figure consisting of a 2x1 subplots layout, the upper plot shows the 2020 daily maximum air temperature combined with a moving 14-days average as a smoothing or trending function to show the general temperature shifts every second week of the year in average. The lower plot shows a bar plot, using MatPlotLib, since Seaborn's bar plot does not handle dates as x-axis (in our dataset, datetime64 is used). Seaborn and MatPlotLib offers a ton of plotting functionality. Maybe it also could be of interest for .NET Developers to use it more often? My previous article showed how it is possible to render in the backend images using MatPlotLib and then display in a Blazor serverside app, combining both
Python and .NET. That demo used Python.Net library for the Python interop with .NET. The screenshot shows the Jupyter Notebook IDE, part of Anaconda distribution of Python tailored for data analysis and data science. It displays the plots described from the Python script above.