Code
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np; np.random.seed(22)
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
import math
Marks and Channels
Cody Appa
February 14, 2023
The following figure shows chromosomal counts of untreated and induced samples of Euo. By using these counts we can determine that the overall replication rate of the two treatments is the same.
I used points to show the individual timepoints and then lines to connect them and show the correlation between the two. I then used differing colors to indicate the different datasets. Length of the errorbar is to show diversion from the mean.
with plt.style.context('seaborn-white'):
fig, ax = plt.subplots(ncols=1)
sns.pointplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, errwidth=2, capsize=.1, linestyles='-', palette=('orange', 'b'), dodge=True, ax=ax)
with plt.style.context('classic'):
ax.legend(loc='upper left', fontsize=10)
ax.set(ylabel='Cell Count')
ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2))
fig.set_size_inches(4, 4)
I made a heat map of the datasets which really doesn’t show the correlation or anything. You can see that as we get towards 48hpi they seem to follow the same ish pattern but really you can’t get any correlations from this.
Expressive/Effective: the data presented is not shown in a way easily perceivable. Just by looking at the data you can’t really interpret anything unless I were to describe in detail what is occuring. There is too much overlap and many channels thrown at you; what does the color intensity mean? why is there a gap in the middle? what does orange vs blue mean?
Separability: the first dataset was easily separated by color and connecting lines. The second dataset, however, has a lot of overlap which makes it hard to be able to distinguish individual datapoints or any correlation as you look at it. you can just kinda see the colors vaguely mash together in the same areas.
---
title: "Assignment 4"
subtitle: "Marks and Channels"
author: "Cody Appa"
format: html
jupyter: python3
code-fold: true
code-tools: true
categories: [Assignment, DataViz]
date: 02/14/2023
image: IMG_7123 Small.jpeg
description: "a hopefully non atrocious attempt to delve into the importance of marks and channels"
---
```{python}
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np; np.random.seed(22)
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
import math
```
```{python}
datadir = '/Users/codyappa/Documents/GitHub/Data_Science_Portfolio/posts/MarksChannels/Sheet 3-Table 1'
```
```{python}
df = pd.read_csv('Sheet 3-Table 1.csv')
```
## Figure 1
The following figure shows chromosomal counts of untreated and induced samples of Euo. By using these counts we can determine that the overall replication rate of the two treatments is the same.
## Marks/Channels
I used points to show the individual timepoints and then lines to connect them and show the correlation between the two. I then used differing colors to indicate the different datasets. Length of the errorbar is to show diversion from the mean.
```{python}
with plt.style.context('seaborn-white'):
fig, ax = plt.subplots(ncols=1)
sns.pointplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, errwidth=2, capsize=.1, linestyles='-', palette=('orange', 'b'), dodge=True, ax=ax)
with plt.style.context('classic'):
ax.legend(loc='upper left', fontsize=10)
ax.set(ylabel='Cell Count')
ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2))
fig.set_size_inches(4, 4)
```
```{python}
plt.show()
```
## Figure 2 Marks/Channels
I made a heat map of the datasets which really doesn't show the correlation or anything. You can see that as we get towards 48hpi they seem to follow the same ish pattern but really you can't get any correlations from this.
- Expressive/Effective: the data presented is not shown in a way easily perceivable. Just by looking at the data you can't really interpret anything unless I were to describe in detail what is occuring. There is too much overlap and many channels thrown at you; what does the color intensity mean? why is there a gap in the middle? what does orange vs blue mean?
- Separability: the first dataset was easily separated by color and connecting lines. The second dataset, however, has a lot of overlap which makes it hard to be able to distinguish individual datapoints or any correlation as you look at it. you can just kinda see the colors vaguely mash together in the same areas.
## Marks/Channels Utilized
However poorly they were implemented, there are many somewhat clashing marks and channels attempted to be used here in this figure.
- Marks: I would make the argument that each individual box is a point, they do not differ in area and there are no lines connecting the data.
- Channels: the main channels in use here are positioning, color, and saturation. While these generally are very good channels to use for visualization, the way they mesh in this figure does not allow for easy understanding. While the spatial positioning allows one to see a general trend upwards and saturation allows us to see points of higher density, it isn't clear as to what it is conveying. Opposed to figure 1 which gives easy to see points with error bars, showing us how the two differing treatments follow the same trend figure 2 makes it much harder to glean that relationship.
```{python}
with plt.style.context('seaborn-white'):
fig, ax = plt.subplots(ncols=1)
sns.histplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, ax=ax, linewidth=.9)
ax.set(ylabel='Cell Count')
ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2))
```
```{python}
plt.show()
```