I am part of the Class of 2020! I am from northern Virginia and majoring in CAMS (computational applied math & statistics) and minoring in Computer Science. Hopefully working in Data Science in the future!
Extracting Column from csv Files (.py)
This sample script was originally made to extract gene names from files that contain the gene names, counts, and various statistics. The extracted column is put into a text file and removes the first item which would typically be the word ‘gene’. This file is useful for looking comparing genes later on or any header column.
This script can take an argument -c for the column number to be extracted, this index starts at 0 (0 is the first column and so on). The only default argument needed is the file path that is a csv file, it will take the first column on default.
Usage: extract_column.py
#!/usr/bin/env python
"""
Gets all of the content that is in the first column of a csv
Skips the header to make content more iterable
Usage: Extract_header_column.py <Directory of files>
Output: Text files of the gene names in the same directory
Example output file name:
<Sib_18_30_symmetric_difference_results.csv_names
"""
import sys, csv, os, argparse
def main(fName,column,dirName = None):
#with open(dirName + fName, 'r') as reader:
with open(fName, 'r') as reader:
= [row for row in csv.reader(reader)]
data
#with open(dirName + fName + '_names.txt', 'w') as writer:
with open(fName + '_names.txt', 'w') as writer:
= 0
count for row in data:
if (count != 0):
+ '\n')
writer.write(row[column] += 1
count print("Wrote", count, "rows")
"""
#this function is used if you want to run on a directory
def runOnDir(dirName):
filenames = next(os.walk(dirName))[2]
for fName in filenames:
print("Running on file:", fName)
main(fName, dirName)
"""
if __name__ == "__main__":
""" for running on a directory
dirName = sys.argv[1]
if (dirName[-1] != '/'):
dirName += '/'
runOnDir(dirName)
"""
#this is used if you want to prompt the user for a specific column
#inputNum = input("Which column number do you want to output?")
= argparse.ArgumentParser(description="Extracts the first (on default), or a given column from a csv file. This will output a new text file with _name at the end.")
p 'file', nargs='?', help="The file path")
p.add_argument("-c", "--column", type =int, default=0, help="The column number (starting at 0) that you want to be extracted.")
p.add_argument(= p.parse_args()
args #fName = sys.argv[1] #without argsparse
file,args.column) main(args.