Source Documentation¶

`pyPheWAS` – Root package¶

`pyPheWAS.pyPhewasv2` – pyPhewas functions file¶

pyPheWAS.pyPhewasv2.calculate_odds_ratio(genotypes, phen_vector1, phen_vector2, reg_type, covariates, lr=0, response='', phen_vector3='')[source]¶

Runs the regression for a specific phenotype vector relative to the genotype data and covariates.

Parameters:	genotypes (pandas DataFrame) – a DataFrame containing the genotype information phen_vector (numpy array) – a array containing the phenotype vecto covariates (string) – a string containing all desired covariates

Note

The covariates must be a string that is delimited by ‘+’, not a list. If you are using a list of covariates and would like to convert it to the pyPhewas format, use the following:

l = ['genotype', 'age'] # a list of your covariates
covariates = '+'.join(l) # pyPhewas format

The covariates that are listed here must be headers to your genotype CSV file.

pyPheWAS.pyPhewasv2.generate_feature_matrix(genotypes, phenotypes, reg_type, phewas_cov='')[source]¶

Generates the feature matrix that will be used to run the regressions.

Parameters:	genotypes – phenotypes –
Returns:
Return type:

pyPheWAS.pyPhewasv2.generate_icdfeature_matrix(genotypes, phenotypes, reg_type, phewas_cov='')[source]¶

Generates the feature matrix that will be used to run the regressions.

Parameters:	genotypes – phenotypes –
Returns:
Return type:

pyPheWAS.pyPhewasv2.get_bhy_thresh(p_values, power)[source]¶

Calculate the false discovery rate threshold.

Parameters:	p_values (numpy array) – a list of p-values obtained by executing the regression power (float) – the thershold power being used (usually 0.05)
Returns:	the false discovery rate
Return type:	float

pyPheWAS.pyPhewasv2.get_bon_thresh(normalized, power)[source]¶

Calculate the bonferroni correction threshold.

Divide the power by the sum of all finite values (all non-nan values).

Parameters:	normalized (numpy array) – an array of all normalized p-values. Normalized p-values are -log10(p) where p is the p-value. power (float) – the threshold power being used (usually 0.05)
Returns:	The bonferroni correction
Return type:	float

pyPheWAS.pyPhewasv2.get_codes()[source]¶

Gets the PheWAS codes from a local csv file and load it into a pandas DataFrame.

Returns:	All of the codes from the resource file.
Return type:	pandas DataFrame

pyPheWAS.pyPhewasv2.get_fdr_thresh(p_values, power)[source]¶

Calculate the false discovery rate threshold.

Parameters:	p_values (numpy array) – a list of p-values obtained by executing the regression power (float) – the thershold power being used (usually 0.05)
Returns:	the false discovery rate
Return type:	float

pyPheWAS.pyPhewasv2.get_group_file(path, filename)[source]¶

Read all of the genotype data from the given file and load it into a pandas DataFrame.

Parameters:	path (string) – The path to the file that contains the phenotype data filename (string) – The name of the file that contains the phenotype data.
Returns:	The data from the genotype file.
Return type:	pandas DataFrame

pyPheWAS.pyPhewasv2.get_icd_info(i_index)[source]¶

Returns all of the info of the phewas code at the given index.

Parameters:	p_index (int) – The index of the desired phewas code
Returns:	A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code.
Return type:	list of strings

pyPheWAS.pyPhewasv2.get_imbalances(regressions)[source]¶

Generates a numpy array of the imbalances.

For a value x where x is the beta of a regression:

x < 0	-1	The regression had a negative beta value
x = nan	0	The regression had a nan beta value (and a nan p-value)
x > 0	+1	The regression had a positive beta value

These values are then used to get the correct colors using the imbalance_colors.

Parameters:	regressions (pandas DataFrame) – DataFrame containing a variety of different output values from the regression performed. The only one used for this function are the ‘beta’ values.
Returns:	A list that is the length of the number of regressions performed. Each element in the list is either a -1, 0, or +1. These are used as explained above.
Return type:	numpy array

pyPheWAS.pyPhewasv2.get_input(path, filename, reg_type)[source]¶

Read all of the phenotype data from the given file and load it into a pandas DataFrame.

Parameters:	path (string) – The path to the file that contains the phenotype data filename (string) – The name of the file that contains the phenotype data.
Returns:	The data from the phenotype file.
Return type:	pandas DataFrame

pyPheWAS.pyPhewasv2.get_phewas_info(p_index)[source]¶

Returns all of the info of the phewas code at the given index.

Parameters:	p_index (int) – The index of the desired phewas code
Returns:	A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code.
Return type:	list of strings

pyPheWAS.pyPhewasv2.get_x_label_positions(categories, lines=True)[source]¶

This method is used get the position of the x-labels and the lines between the columns

Parameters:	categories – list of the categories lines (bool) – a boolean which determines the locations returned (either the center of each category or the end)
Returns:	A list of positions
Return type:	list of ints

pyPheWAS.pyPhewasv2.phewas(path, filename, groupfile, covariates, response='', phewas_cov='', reg_type=0, thresh_type=0, control_age=0, save='', saveb='', output='', show_imbalance=False)[source]¶

The main phewas method. Takes a path, filename, groupfile, and a variety of different options.

Parameters:

path (st) – the path to the file that contains the phenotype data
filename (str) – the name of the phenotype file.
groupfile (str) – the name of the genotype file.
covariates (str) – a list of covariates.
reg_type (int) – the type of regression to be used
thresh_type (int) – the type of threshold to be used
save (str) – the desired filename to save the phewas plot
output (str) – the desired filename to save the regression output
show_imbalance (bool) – determines whether or not to show the imbalance

pyPheWAS.pyPhewasv2.plot_data_points(x, y, thresh0, thresh1, thresh2, thresh_type, save='', path='', imbalances=array([], dtype=float64))[source]¶

Plots the data with a variety of different options.

This function is the primary plotting function for pyPhewas.

Parameters:	x (numpy array) – an array of indices y (numpy array) – an array of p-values thresh (float) – the threshold power save (str) – the output file to save to (if empty, display the plot) imbalances (numpy array) – a list of imbalances

pyPheWAS.pyPhewasv2.plot_odds_ratio(y, p, thresh0, thresh1, thresh2, thresh_type, save='', path='', imbalances=array([], dtype=float64))[source]¶

Plots the data with a variety of different options.

This function is the primary plotting function for pyPhewas.

Parameters:	x (numpy array) – an array of indices y (numpy array) – an array of p-values thresh (float) – the threshold power save (str) – the output file to save to (if empty, display the plot) imbalances (numpy array) – a list of imbalances

pyPheWAS.pyPhewasv2.run_icd_phewas(fm, genotypes, covariates, reg_type, response='', phewas_cov='')[source]¶

For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.

Parameters:	fm – The phewas feature matrix. genotypes – A pandas DataFrame of the genotype file. covariates – The covariates that the function is to be run on.
Returns:	A tuple containing indices, p-values, and all the regression data.

pyPheWAS.pyPhewasv2.run_phewas(fm, genotypes, covariates, reg_type, response='', phewas_cov='')[source]¶

For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.

Parameters:	fm – The phewas feature matrix. genotypes – A pandas DataFrame of the genotype file. covariates – The covariates that the function is to be run on.
Returns:	A tuple containing indices, p-values, and all the regression data.

`pyPheWAS.pyPhewasCore` – pyPhewas Research Tools file¶

pyPheWAS.pyPhewasCore.calculate_odds_ratio(genotypes, phen_vector1, phen_vector2, reg_type, covariates, response='', phen_vector3='')[source]¶

Runs the regression for a specific phenotype vector relative to the genotype data and covariates.

Parameters:	genotypes (pandas DataFrame) – a DataFrame containing the genotype information phen_vector (numpy array) – a array containing the phenotype vector covariates (string) – a string containing all desired covariates

Note

The covariates must be a string that is delimited by ‘+’, not a list. If you are using a list of covariates and would like to convert it to the pyPhewas format, use the following:

l = ['genotype', 'age'] # a list of your covariates
covariates = '+'.join(l) # pyPhewas format

The covariates that are listed here must be headers to your genotype CSV file.

pyPheWAS.pyPhewasCore.generate_feature_matrix(genotypes, phenotypes, reg_type)[source]¶

Generates the feature matrix that will be used to run the regressions.

Parameters:	genotypes – phenotypes –
Returns:
Return type:

pyPheWAS.pyPhewasCore.get_bon_thresh(normalized, power)[source]¶

Calculate the bonferroni correction threshold.

Divide the power by the sum of all finite values (all non-nan values).

Parameters:	normalized (numpy array) – an array of all normalized p-values. Normalized p-values are -log10(p) where p is the p-value. power (float) – the threshold power being used (usually 0.05)
Returns:	The bonferroni correction
Return type:	float

pyPheWAS.pyPhewasCore.get_codes()[source]¶

Gets the PheWAS codes from a local csv file and load it into a pandas DataFrame.

Returns:	All of the codes from the resource file.
Return type:	pandas DataFrame

pyPheWAS.pyPhewasCore.get_fdr_thresh(p_values, power)[source]¶

Calculate the false discovery rate threshold.

Parameters:	p_values (numpy array) – a list of p-values obtained by executing the regression power (float) – the thershold power being used (usually 0.05)
Returns:	the false discovery rate
Return type:	float

pyPheWAS.pyPhewasCore.get_group_file(path, filename)[source]¶

Read all of the genotype data from the given file and load it into a pandas DataFrame.

Parameters:	path (string) – The path to the file that contains the phenotype data filename (string) – The name of the file that contains the phenotype data.
Returns:	The data from the genotype file.
Return type:	pandas DataFrame

pyPheWAS.pyPhewasCore.get_imbalances(regressions)[source]¶

Generates a numpy array of the imbalances.

For a value x where x is the beta of a regression:

x < 0	-1	The regression had a negative beta value
x = nan	0	The regression had a nan beta value (and a nan p-value)
x > 0	+1	The regression had a positive beta value

These values are then used to get the correct colors using the imbalance_colors.

Parameters:	regressions (pandas DataFrame) – DataFrame containing a variety of different output values from the regression performed. The only one used for this function are the ‘beta’ values.
Returns:	A list that is the length of the number of regressions performed. Each element in the list is either a -1, 0, or +1. These are used as explained above.
Return type:	numpy array

pyPheWAS.pyPhewasCore.get_input(path, filename, reg_type)[source]¶

Read all of the phenotype data from the given file and load it into a pandas DataFrame.

Parameters:	path (string) – The path to the file that contains the phenotype data filename (string) – The name of the file that contains the phenotype data.
Returns:	The data from the phenotype file.
Return type:	pandas DataFrame

pyPheWAS.pyPhewasCore.get_phewas_info(p_index)[source]¶

Returns all of the info of the phewas code at the given index.

Parameters:	p_index (int) – The index of the desired phewas code
Returns:	A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code.
Return type:	list of strings

pyPheWAS.pyPhewasCore.get_x_label_positions(categories, lines=True)[source]¶

This method is used get the position of the x-labels and the lines between the columns

Parameters:	categories – list of the categories lines (bool) – a boolean which determines the locations returned (either the center of each category or the end)
Returns:	A list of positions
Return type:	list of ints

pyPheWAS.pyPhewasCore.plot_data_points(y, thresh, save='', imbalances=array([], dtype=float64))[source]¶

Plots the data with a variety of different options.

This function is the primary plotting function for pyPhewas.

Parameters:	x (numpy array) – an array of indices y (numpy array) – an array of p-values thresh (float) – the threshold power save (str) – the output file to save to (if empty, display the plot) imbalances (numpy array) – a list of imbalances

pyPheWAS.pyPhewasCore.run_phewas(fm, genotypes, covariates, reg_type)[source]¶

For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.

Parameters:	fm – The phewas feature matrix. genotypes – A pandas DataFrame of the genotype file. covariates – The covariates that the function is to be run on.
Returns:	A tuple containing indices, p-values, and all the regression data.

Source Documentation¶

`pyPheWAS` – Root package¶

`pyPheWAS.pyPhewasv2` – pyPhewas functions file¶

`pyPheWAS.pyPhewasCore` – pyPhewas Research Tools file¶

pyPheWAS

Navigation

Related Topics

Source Documentation¶

pyPheWAS – Root package¶

pyPheWAS.pyPhewasv2 – pyPhewas functions file¶

pyPheWAS.pyPhewasCore – pyPhewas Research Tools file¶

`pyPheWAS` – Root package¶

`pyPheWAS.pyPhewasv2` – pyPhewas functions file¶

`pyPheWAS.pyPhewasCore` – pyPhewas Research Tools file¶