Source Documentation

pyPheWAS – Root package

pyPheWAS.pyPhewasv2 – pyPhewas functions file

pyPheWAS.pyPhewasv2.calculate_odds_ratio(genotypes, phen_vector1, phen_vector2, reg_type, covariates, lr=0, response='', phen_vector3='')[source]

Runs the regression for a specific phenotype vector relative to the genotype data and covariates.

Parameters:
  • genotypes (pandas DataFrame) – a DataFrame containing the genotype information
  • phen_vector (numpy array) – a array containing the phenotype vecto
  • covariates (string) – a string containing all desired covariates

Note

The covariates must be a string that is delimited by ‘+’, not a list. If you are using a list of covariates and would like to convert it to the pyPhewas format, use the following:

l = ['genotype', 'age'] # a list of your covariates
covariates = '+'.join(l) # pyPhewas format

The covariates that are listed here must be headers to your genotype CSV file.

pyPheWAS.pyPhewasv2.generate_feature_matrix(genotypes, phenotypes, reg_type, phewas_cov='')[source]

Generates the feature matrix that will be used to run the regressions.

Parameters:
  • genotypes
  • phenotypes
Returns:

Return type:

pyPheWAS.pyPhewasv2.generate_icdfeature_matrix(genotypes, phenotypes, reg_type, phewas_cov='')[source]

Generates the feature matrix that will be used to run the regressions.

Parameters:
  • genotypes
  • phenotypes
Returns:

Return type:

pyPheWAS.pyPhewasv2.get_bhy_thresh(p_values, power)[source]

Calculate the false discovery rate threshold.

Parameters:
  • p_values (numpy array) – a list of p-values obtained by executing the regression
  • power (float) – the thershold power being used (usually 0.05)
Returns:

the false discovery rate

Return type:

float

pyPheWAS.pyPhewasv2.get_bon_thresh(normalized, power)[source]

Calculate the bonferroni correction threshold.

Divide the power by the sum of all finite values (all non-nan values).

Parameters:
  • normalized (numpy array) – an array of all normalized p-values. Normalized p-values are -log10(p) where p is the p-value.
  • power (float) – the threshold power being used (usually 0.05)
Returns:

The bonferroni correction

Return type:

float

pyPheWAS.pyPhewasv2.get_codes()[source]

Gets the PheWAS codes from a local csv file and load it into a pandas DataFrame.

Returns:All of the codes from the resource file.
Return type:pandas DataFrame
pyPheWAS.pyPhewasv2.get_fdr_thresh(p_values, power)[source]

Calculate the false discovery rate threshold.

Parameters:
  • p_values (numpy array) – a list of p-values obtained by executing the regression
  • power (float) – the thershold power being used (usually 0.05)
Returns:

the false discovery rate

Return type:

float

pyPheWAS.pyPhewasv2.get_group_file(path, filename)[source]

Read all of the genotype data from the given file and load it into a pandas DataFrame.

Parameters:
  • path (string) – The path to the file that contains the phenotype data
  • filename (string) – The name of the file that contains the phenotype data.
Returns:

The data from the genotype file.

Return type:

pandas DataFrame

pyPheWAS.pyPhewasv2.get_icd_info(i_index)[source]

Returns all of the info of the phewas code at the given index.

Parameters:p_index (int) – The index of the desired phewas code
Returns:A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code.
Return type:list of strings
pyPheWAS.pyPhewasv2.get_imbalances(regressions)[source]

Generates a numpy array of the imbalances.

For a value x where x is the beta of a regression:

x < 0 -1 The regression had a negative beta value
x = nan 0 The regression had a nan beta value (and a nan p-value)
x > 0 +1 The regression had a positive beta value

These values are then used to get the correct colors using the imbalance_colors.

Parameters:regressions (pandas DataFrame) – DataFrame containing a variety of different output values from the regression performed. The only one used for this function are the ‘beta’ values.
Returns:A list that is the length of the number of regressions performed. Each element in the list is either a -1, 0, or +1. These are used as explained above.
Return type:numpy array
pyPheWAS.pyPhewasv2.get_input(path, filename, reg_type)[source]

Read all of the phenotype data from the given file and load it into a pandas DataFrame.

Parameters:
  • path (string) – The path to the file that contains the phenotype data
  • filename (string) – The name of the file that contains the phenotype data.
Returns:

The data from the phenotype file.

Return type:

pandas DataFrame

pyPheWAS.pyPhewasv2.get_phewas_info(p_index)[source]

Returns all of the info of the phewas code at the given index.

Parameters:p_index (int) – The index of the desired phewas code
Returns:A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code.
Return type:list of strings
pyPheWAS.pyPhewasv2.get_x_label_positions(categories, lines=True)[source]

This method is used get the position of the x-labels and the lines between the columns

Parameters:
  • categories – list of the categories
  • lines (bool) – a boolean which determines the locations returned (either the center of each category or the end)
Returns:

A list of positions

Return type:

list of ints

pyPheWAS.pyPhewasv2.phewas(path, filename, groupfile, covariates, response='', phewas_cov='', reg_type=0, thresh_type=0, control_age=0, save='', saveb='', output='', show_imbalance=False)[source]

The main phewas method. Takes a path, filename, groupfile, and a variety of different options.

Parameters:
  • path (st) – the path to the file that contains the phenotype data
  • filename (str) – the name of the phenotype file.
  • groupfile (str) – the name of the genotype file.
  • covariates (str) – a list of covariates.
  • reg_type (int) – the type of regression to be used
  • thresh_type (int) – the type of threshold to be used
  • save (str) – the desired filename to save the phewas plot
  • output (str) – the desired filename to save the regression output
  • show_imbalance (bool) – determines whether or not to show the imbalance
pyPheWAS.pyPhewasv2.plot_data_points(x, y, thresh0, thresh1, thresh2, thresh_type, save='', path='', imbalances=array([], dtype=float64))[source]

Plots the data with a variety of different options.

This function is the primary plotting function for pyPhewas.

Parameters:
  • x (numpy array) – an array of indices
  • y (numpy array) – an array of p-values
  • thresh (float) – the threshold power
  • save (str) – the output file to save to (if empty, display the plot)
  • imbalances (numpy array) – a list of imbalances
pyPheWAS.pyPhewasv2.plot_odds_ratio(y, p, thresh0, thresh1, thresh2, thresh_type, save='', path='', imbalances=array([], dtype=float64))[source]

Plots the data with a variety of different options.

This function is the primary plotting function for pyPhewas.

Parameters:
  • x (numpy array) – an array of indices
  • y (numpy array) – an array of p-values
  • thresh (float) – the threshold power
  • save (str) – the output file to save to (if empty, display the plot)
  • imbalances (numpy array) – a list of imbalances
pyPheWAS.pyPhewasv2.run_icd_phewas(fm, genotypes, covariates, reg_type, response='', phewas_cov='')[source]

For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.

Parameters:
  • fm – The phewas feature matrix.
  • genotypes – A pandas DataFrame of the genotype file.
  • covariates – The covariates that the function is to be run on.
Returns:

A tuple containing indices, p-values, and all the regression data.

pyPheWAS.pyPhewasv2.run_phewas(fm, genotypes, covariates, reg_type, response='', phewas_cov='')[source]

For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.

Parameters:
  • fm – The phewas feature matrix.
  • genotypes – A pandas DataFrame of the genotype file.
  • covariates – The covariates that the function is to be run on.
Returns:

A tuple containing indices, p-values, and all the regression data.

pyPheWAS.pyPhewasCore – pyPhewas Research Tools file

pyPheWAS.pyPhewasCore.calculate_odds_ratio(genotypes, phen_vector1, phen_vector2, reg_type, covariates, response='', phen_vector3='')[source]

Runs the regression for a specific phenotype vector relative to the genotype data and covariates.

Parameters:
  • genotypes (pandas DataFrame) – a DataFrame containing the genotype information
  • phen_vector (numpy array) – a array containing the phenotype vector
  • covariates (string) – a string containing all desired covariates

Note

The covariates must be a string that is delimited by ‘+’, not a list. If you are using a list of covariates and would like to convert it to the pyPhewas format, use the following:

l = ['genotype', 'age'] # a list of your covariates
covariates = '+'.join(l) # pyPhewas format

The covariates that are listed here must be headers to your genotype CSV file.

pyPheWAS.pyPhewasCore.generate_feature_matrix(genotypes, phenotypes, reg_type)[source]

Generates the feature matrix that will be used to run the regressions.

Parameters:
  • genotypes
  • phenotypes
Returns:

Return type:

pyPheWAS.pyPhewasCore.get_bon_thresh(normalized, power)[source]

Calculate the bonferroni correction threshold.

Divide the power by the sum of all finite values (all non-nan values).

Parameters:
  • normalized (numpy array) – an array of all normalized p-values. Normalized p-values are -log10(p) where p is the p-value.
  • power (float) – the threshold power being used (usually 0.05)
Returns:

The bonferroni correction

Return type:

float

pyPheWAS.pyPhewasCore.get_codes()[source]

Gets the PheWAS codes from a local csv file and load it into a pandas DataFrame.

Returns:All of the codes from the resource file.
Return type:pandas DataFrame
pyPheWAS.pyPhewasCore.get_fdr_thresh(p_values, power)[source]

Calculate the false discovery rate threshold.

Parameters:
  • p_values (numpy array) – a list of p-values obtained by executing the regression
  • power (float) – the thershold power being used (usually 0.05)
Returns:

the false discovery rate

Return type:

float

pyPheWAS.pyPhewasCore.get_group_file(path, filename)[source]

Read all of the genotype data from the given file and load it into a pandas DataFrame.

Parameters:
  • path (string) – The path to the file that contains the phenotype data
  • filename (string) – The name of the file that contains the phenotype data.
Returns:

The data from the genotype file.

Return type:

pandas DataFrame

pyPheWAS.pyPhewasCore.get_imbalances(regressions)[source]

Generates a numpy array of the imbalances.

For a value x where x is the beta of a regression:

x < 0 -1 The regression had a negative beta value
x = nan 0 The regression had a nan beta value (and a nan p-value)
x > 0 +1 The regression had a positive beta value

These values are then used to get the correct colors using the imbalance_colors.

Parameters:regressions (pandas DataFrame) – DataFrame containing a variety of different output values from the regression performed. The only one used for this function are the ‘beta’ values.
Returns:A list that is the length of the number of regressions performed. Each element in the list is either a -1, 0, or +1. These are used as explained above.
Return type:numpy array
pyPheWAS.pyPhewasCore.get_input(path, filename, reg_type)[source]

Read all of the phenotype data from the given file and load it into a pandas DataFrame.

Parameters:
  • path (string) – The path to the file that contains the phenotype data
  • filename (string) – The name of the file that contains the phenotype data.
Returns:

The data from the phenotype file.

Return type:

pandas DataFrame

pyPheWAS.pyPhewasCore.get_phewas_info(p_index)[source]

Returns all of the info of the phewas code at the given index.

Parameters:p_index (int) – The index of the desired phewas code
Returns:A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code.
Return type:list of strings
pyPheWAS.pyPhewasCore.get_x_label_positions(categories, lines=True)[source]

This method is used get the position of the x-labels and the lines between the columns

Parameters:
  • categories – list of the categories
  • lines (bool) – a boolean which determines the locations returned (either the center of each category or the end)
Returns:

A list of positions

Return type:

list of ints

pyPheWAS.pyPhewasCore.plot_data_points(y, thresh, save='', imbalances=array([], dtype=float64))[source]

Plots the data with a variety of different options.

This function is the primary plotting function for pyPhewas.

Parameters:
  • x (numpy array) – an array of indices
  • y (numpy array) – an array of p-values
  • thresh (float) – the threshold power
  • save (str) – the output file to save to (if empty, display the plot)
  • imbalances (numpy array) – a list of imbalances
pyPheWAS.pyPhewasCore.run_phewas(fm, genotypes, covariates, reg_type)[source]

For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.

Parameters:
  • fm – The phewas feature matrix.
  • genotypes – A pandas DataFrame of the genotype file.
  • covariates – The covariates that the function is to be run on.
Returns:

A tuple containing indices, p-values, and all the regression data.