Chlamydia Cody’s Portfolio

Code

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np; np.random.seed(22)
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
import math

Code

datadir = '/Users/codyappa/Documents/GitHub/Data_Science_Portfolio/posts/MarksChannels/Sheet 3-Table 1'

Code

df = pd.read_csv('Sheet 3-Table 1.csv')

Figure 1

The following figure shows chromosomal counts of untreated and induced samples of Euo. By using these counts we can determine that the overall replication rate of the two treatments is the same.

Marks/Channels

I used points to show the individual timepoints and then lines to connect them and show the correlation between the two. I then used differing colors to indicate the different datasets. Length of the errorbar is to show diversion from the mean.

Code

with plt.style.context('seaborn-white'):
    fig, ax = plt.subplots(ncols=1)
    sns.pointplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, errwidth=2, capsize=.1, linestyles='-', palette=('orange', 'b'), dodge=True, ax=ax)
    with plt.style.context('classic'):
        ax.legend(loc='upper left', fontsize=10)
        ax.set(ylabel='Cell Count')
ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2))
fig.set_size_inches(4, 4)

Code

plt.show()

Figure 2 Marks/Channels

I made a heat map of the datasets which really doesn’t show the correlation or anything. You can see that as we get towards 48hpi they seem to follow the same ish pattern but really you can’t get any correlations from this.

Expressive/Effective: the data presented is not shown in a way easily perceivable. Just by looking at the data you can’t really interpret anything unless I were to describe in detail what is occuring. There is too much overlap and many channels thrown at you; what does the color intensity mean? why is there a gap in the middle? what does orange vs blue mean?
Separability: the first dataset was easily separated by color and connecting lines. The second dataset, however, has a lot of overlap which makes it hard to be able to distinguish individual datapoints or any correlation as you look at it. you can just kinda see the colors vaguely mash together in the same areas.

Code

with plt.style.context('seaborn-white'):
    fig, ax = plt.subplots(ncols=1)
    sns.histplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, ax=ax, linewidth=.9)
    ax.set(ylabel='Cell Count')
    ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2))

Code

plt.show()

--- title: "Assignment 4" subtitle: "Marks and Channels" author: "Cody Appa" format: html jupyter: python3 code-fold: true code-tools: true categories: [Assignment, DataViz] date: 02/14/2023 image: IMG_7123 Small.jpeg description: "a hopefully non atrocious attempt to delve into the importance of marks and channels" --- ```{python} import matplotlib.pyplot as plt import matplotlib.ticker as ticker import numpy as np; np.random.seed(22) import seaborn as sns; sns.set(color_codes=True) import pandas as pd import math ``` ```{python} datadir = '/Users/codyappa/Documents/GitHub/Data_Science_Portfolio/posts/MarksChannels/Sheet 3-Table 1' ``` ```{python} df = pd.read_csv('Sheet 3-Table 1.csv') ``` ## Figure 1 The following figure shows chromosomal counts of untreated and induced samples of Euo. By using these counts we can determine that the overall replication rate of the two treatments is the same. ## Marks/Channels I used points to show the individual timepoints and then lines to connect them and show the correlation between the two. I then used differing colors to indicate the different datasets. Length of the errorbar is to show diversion from the mean. ```{python} with plt.style.context('seaborn-white'): fig, ax = plt.subplots(ncols=1) sns.pointplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, errwidth=2, capsize=.1, linestyles='-', palette=('orange', 'b'), dodge=True, ax=ax) with plt.style.context('classic'): ax.legend(loc='upper left', fontsize=10) ax.set(ylabel='Cell Count') ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2)) fig.set_size_inches(4, 4) ``` ```{python} plt.show() ``` ## Figure 2 Marks/Channels I made a heat map of the datasets which really doesn't show the correlation or anything. You can see that as we get towards 48hpi they seem to follow the same ish pattern but really you can't get any correlations from this. - Expressive/Effective: the data presented is not shown in a way easily perceivable. Just by looking at the data you can't really interpret anything unless I were to describe in detail what is occuring. There is too much overlap and many channels thrown at you; what does the color intensity mean? why is there a gap in the middle? what does orange vs blue mean? - Separability: the first dataset was easily separated by color and connecting lines. The second dataset, however, has a lot of overlap which makes it hard to be able to distinguish individual datapoints or any correlation as you look at it. you can just kinda see the colors vaguely mash together in the same areas. ## Marks/Channels Utilized However poorly they were implemented, there are many somewhat clashing marks and channels attempted to be used here in this figure. - Marks: I would make the argument that each individual box is a point, they do not differ in area and there are no lines connecting the data. - Channels: the main channels in use here are positioning, color, and saturation. While these generally are very good channels to use for visualization, the way they mesh in this figure does not allow for easy understanding. While the spatial positioning allows one to see a general trend upwards and saturation allows us to see points of higher density, it isn't clear as to what it is conveying. Opposed to figure 1 which gives easy to see points with error bars, showing us how the two differing treatments follow the same trend figure 2 makes it much harder to glean that relationship. ```{python} with plt.style.context('seaborn-white'): fig, ax = plt.subplots(ncols=1) sns.histplot(x=df['hpi'], y=df['pyrG copies/ul'], hue=df['Treatment'], data=df, ax=ax, linewidth=.9) ax.set(ylabel='Cell Count') ax.ticklabel_format(axis="y", style="sci", scilimits=(0,2)) ``` ```{python} plt.show() ```