Source Documentation¶
pyPheWAS.pyPhewasv2
– pyPhewas functions file¶
-
pyPheWAS.pyPhewasv2.
calculate_odds_ratio
(genotypes, phen_vector1, phen_vector2, reg_type, covariates, lr=0, response='', phen_vector3='')[source]¶ Runs the regression for a specific phenotype vector relative to the genotype data and covariates.
Parameters: - genotypes (pandas DataFrame) – a DataFrame containing the genotype information
- phen_vector (numpy array) – a array containing the phenotype vecto
- covariates (string) – a string containing all desired covariates
Note
The covariates must be a string that is delimited by ‘+’, not a list. If you are using a list of covariates and would like to convert it to the pyPhewas format, use the following:
l = ['genotype', 'age'] # a list of your covariates covariates = '+'.join(l) # pyPhewas format
The covariates that are listed here must be headers to your genotype CSV file.
-
pyPheWAS.pyPhewasv2.
generate_feature_matrix
(genotypes, phenotypes, reg_type, phewas_cov='')[source]¶ Generates the feature matrix that will be used to run the regressions.
Parameters: - genotypes –
- phenotypes –
Returns: Return type:
-
pyPheWAS.pyPhewasv2.
generate_icdfeature_matrix
(genotypes, phenotypes, reg_type, phewas_cov='')[source]¶ Generates the feature matrix that will be used to run the regressions.
Parameters: - genotypes –
- phenotypes –
Returns: Return type:
-
pyPheWAS.pyPhewasv2.
get_bhy_thresh
(p_values, power)[source]¶ Calculate the false discovery rate threshold.
Parameters: - p_values (numpy array) – a list of p-values obtained by executing the regression
- power (float) – the thershold power being used (usually 0.05)
Returns: the false discovery rate
Return type: float
-
pyPheWAS.pyPhewasv2.
get_bon_thresh
(normalized, power)[source]¶ Calculate the bonferroni correction threshold.
Divide the power by the sum of all finite values (all non-nan values).
Parameters: - normalized (numpy array) – an array of all normalized p-values. Normalized p-values are -log10(p) where p is the p-value.
- power (float) – the threshold power being used (usually 0.05)
Returns: The bonferroni correction
Return type: float
-
pyPheWAS.pyPhewasv2.
get_codes
()[source]¶ Gets the PheWAS codes from a local csv file and load it into a pandas DataFrame.
Returns: All of the codes from the resource file. Return type: pandas DataFrame
-
pyPheWAS.pyPhewasv2.
get_fdr_thresh
(p_values, power)[source]¶ Calculate the false discovery rate threshold.
Parameters: - p_values (numpy array) – a list of p-values obtained by executing the regression
- power (float) – the thershold power being used (usually 0.05)
Returns: the false discovery rate
Return type: float
-
pyPheWAS.pyPhewasv2.
get_group_file
(path, filename)[source]¶ Read all of the genotype data from the given file and load it into a pandas DataFrame.
Parameters: - path (string) – The path to the file that contains the phenotype data
- filename (string) – The name of the file that contains the phenotype data.
Returns: The data from the genotype file.
Return type: pandas DataFrame
-
pyPheWAS.pyPhewasv2.
get_icd_info
(i_index)[source]¶ Returns all of the info of the phewas code at the given index.
Parameters: p_index (int) – The index of the desired phewas code Returns: A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code. Return type: list of strings
-
pyPheWAS.pyPhewasv2.
get_imbalances
(regressions)[source]¶ Generates a numpy array of the imbalances.
For a value x where x is the beta of a regression:
x < 0 -1 The regression had a negative beta value x = nan 0 The regression had a nan beta value (and a nan p-value) x > 0 +1 The regression had a positive beta value These values are then used to get the correct colors using the imbalance_colors.
Parameters: regressions (pandas DataFrame) – DataFrame containing a variety of different output values from the regression performed. The only one used for this function are the ‘beta’ values. Returns: A list that is the length of the number of regressions performed. Each element in the list is either a -1, 0, or +1. These are used as explained above. Return type: numpy array
-
pyPheWAS.pyPhewasv2.
get_input
(path, filename, reg_type)[source]¶ Read all of the phenotype data from the given file and load it into a pandas DataFrame.
Parameters: - path (string) – The path to the file that contains the phenotype data
- filename (string) – The name of the file that contains the phenotype data.
Returns: The data from the phenotype file.
Return type: pandas DataFrame
-
pyPheWAS.pyPhewasv2.
get_phewas_info
(p_index)[source]¶ Returns all of the info of the phewas code at the given index.
Parameters: p_index (int) – The index of the desired phewas code Returns: A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code. Return type: list of strings
-
pyPheWAS.pyPhewasv2.
get_x_label_positions
(categories, lines=True)[source]¶ This method is used get the position of the x-labels and the lines between the columns
Parameters: - categories – list of the categories
- lines (bool) – a boolean which determines the locations returned (either the center of each category or the end)
Returns: A list of positions
Return type: list of ints
-
pyPheWAS.pyPhewasv2.
phewas
(path, filename, groupfile, covariates, response='', phewas_cov='', reg_type=0, thresh_type=0, control_age=0, save='', saveb='', output='', show_imbalance=False)[source]¶ The main phewas method. Takes a path, filename, groupfile, and a variety of different options.
Parameters: - path (st) – the path to the file that contains the phenotype data
- filename (str) – the name of the phenotype file.
- groupfile (str) – the name of the genotype file.
- covariates (str) – a list of covariates.
- reg_type (int) – the type of regression to be used
- thresh_type (int) – the type of threshold to be used
- save (str) – the desired filename to save the phewas plot
- output (str) – the desired filename to save the regression output
- show_imbalance (bool) – determines whether or not to show the imbalance
-
pyPheWAS.pyPhewasv2.
plot_data_points
(x, y, thresh0, thresh1, thresh2, thresh_type, save='', path='', imbalances=array([], dtype=float64))[source]¶ Plots the data with a variety of different options.
This function is the primary plotting function for pyPhewas.
Parameters: - x (numpy array) – an array of indices
- y (numpy array) – an array of p-values
- thresh (float) – the threshold power
- save (str) – the output file to save to (if empty, display the plot)
- imbalances (numpy array) – a list of imbalances
-
pyPheWAS.pyPhewasv2.
plot_odds_ratio
(y, p, thresh0, thresh1, thresh2, thresh_type, save='', path='', imbalances=array([], dtype=float64))[source]¶ Plots the data with a variety of different options.
This function is the primary plotting function for pyPhewas.
Parameters: - x (numpy array) – an array of indices
- y (numpy array) – an array of p-values
- thresh (float) – the threshold power
- save (str) – the output file to save to (if empty, display the plot)
- imbalances (numpy array) – a list of imbalances
-
pyPheWAS.pyPhewasv2.
run_icd_phewas
(fm, genotypes, covariates, reg_type, response='', phewas_cov='')[source]¶ For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.
Parameters: - fm – The phewas feature matrix.
- genotypes – A pandas DataFrame of the genotype file.
- covariates – The covariates that the function is to be run on.
Returns: A tuple containing indices, p-values, and all the regression data.
-
pyPheWAS.pyPhewasv2.
run_phewas
(fm, genotypes, covariates, reg_type, response='', phewas_cov='')[source]¶ For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.
Parameters: - fm – The phewas feature matrix.
- genotypes – A pandas DataFrame of the genotype file.
- covariates – The covariates that the function is to be run on.
Returns: A tuple containing indices, p-values, and all the regression data.
pyPheWAS.pyPhewasCore
– pyPhewas Research Tools file¶
-
pyPheWAS.pyPhewasCore.
calculate_odds_ratio
(genotypes, phen_vector1, phen_vector2, reg_type, covariates, response='', phen_vector3='')[source]¶ Runs the regression for a specific phenotype vector relative to the genotype data and covariates.
Parameters: - genotypes (pandas DataFrame) – a DataFrame containing the genotype information
- phen_vector (numpy array) – a array containing the phenotype vector
- covariates (string) – a string containing all desired covariates
Note
The covariates must be a string that is delimited by ‘+’, not a list. If you are using a list of covariates and would like to convert it to the pyPhewas format, use the following:
l = ['genotype', 'age'] # a list of your covariates covariates = '+'.join(l) # pyPhewas format
The covariates that are listed here must be headers to your genotype CSV file.
-
pyPheWAS.pyPhewasCore.
generate_feature_matrix
(genotypes, phenotypes, reg_type)[source]¶ Generates the feature matrix that will be used to run the regressions.
Parameters: - genotypes –
- phenotypes –
Returns: Return type:
-
pyPheWAS.pyPhewasCore.
get_bon_thresh
(normalized, power)[source]¶ Calculate the bonferroni correction threshold.
Divide the power by the sum of all finite values (all non-nan values).
Parameters: - normalized (numpy array) – an array of all normalized p-values. Normalized p-values are -log10(p) where p is the p-value.
- power (float) – the threshold power being used (usually 0.05)
Returns: The bonferroni correction
Return type: float
-
pyPheWAS.pyPhewasCore.
get_codes
()[source]¶ Gets the PheWAS codes from a local csv file and load it into a pandas DataFrame.
Returns: All of the codes from the resource file. Return type: pandas DataFrame
-
pyPheWAS.pyPhewasCore.
get_fdr_thresh
(p_values, power)[source]¶ Calculate the false discovery rate threshold.
Parameters: - p_values (numpy array) – a list of p-values obtained by executing the regression
- power (float) – the thershold power being used (usually 0.05)
Returns: the false discovery rate
Return type: float
-
pyPheWAS.pyPhewasCore.
get_group_file
(path, filename)[source]¶ Read all of the genotype data from the given file and load it into a pandas DataFrame.
Parameters: - path (string) – The path to the file that contains the phenotype data
- filename (string) – The name of the file that contains the phenotype data.
Returns: The data from the genotype file.
Return type: pandas DataFrame
-
pyPheWAS.pyPhewasCore.
get_imbalances
(regressions)[source]¶ Generates a numpy array of the imbalances.
For a value x where x is the beta of a regression:
x < 0 -1 The regression had a negative beta value x = nan 0 The regression had a nan beta value (and a nan p-value) x > 0 +1 The regression had a positive beta value These values are then used to get the correct colors using the imbalance_colors.
Parameters: regressions (pandas DataFrame) – DataFrame containing a variety of different output values from the regression performed. The only one used for this function are the ‘beta’ values. Returns: A list that is the length of the number of regressions performed. Each element in the list is either a -1, 0, or +1. These are used as explained above. Return type: numpy array
-
pyPheWAS.pyPhewasCore.
get_input
(path, filename, reg_type)[source]¶ Read all of the phenotype data from the given file and load it into a pandas DataFrame.
Parameters: - path (string) – The path to the file that contains the phenotype data
- filename (string) – The name of the file that contains the phenotype data.
Returns: The data from the phenotype file.
Return type: pandas DataFrame
-
pyPheWAS.pyPhewasCore.
get_phewas_info
(p_index)[source]¶ Returns all of the info of the phewas code at the given index.
Parameters: p_index (int) – The index of the desired phewas code Returns: A list including the code, the name, and the rollup of the phewas code. The rollup is a list of all of the ICD-9 codes that are grouped into this phewas code. Return type: list of strings
-
pyPheWAS.pyPhewasCore.
get_x_label_positions
(categories, lines=True)[source]¶ This method is used get the position of the x-labels and the lines between the columns
Parameters: - categories – list of the categories
- lines (bool) – a boolean which determines the locations returned (either the center of each category or the end)
Returns: A list of positions
Return type: list of ints
-
pyPheWAS.pyPhewasCore.
plot_data_points
(y, thresh, save='', imbalances=array([], dtype=float64))[source]¶ Plots the data with a variety of different options.
This function is the primary plotting function for pyPhewas.
Parameters: - x (numpy array) – an array of indices
- y (numpy array) – an array of p-values
- thresh (float) – the threshold power
- save (str) – the output file to save to (if empty, display the plot)
- imbalances (numpy array) – a list of imbalances
-
pyPheWAS.pyPhewasCore.
run_phewas
(fm, genotypes, covariates, reg_type)[source]¶ For each phewas code in the feature matrix, run the specified type of regression and save all of the resulting p-values.
Parameters: - fm – The phewas feature matrix.
- genotypes – A pandas DataFrame of the genotype file.
- covariates – The covariates that the function is to be run on.
Returns: A tuple containing indices, p-values, and all the regression data.