so have bunch of tab delimited data files like:
subject phase condition trial trial type target loc targetid distid digit1 digit2 accuracy-t rt-p rt-t 2 1 9 1 cong bottom s h f t s h f t 7 2 1 742.69104 681.4379692 2 1 9 2 cong top p s t e p s t e 2 3 1 699.4130611 454.8609257 2 1 9 3 incong top s u g r y o u t h 6 5 1 979.2759418 31.06093407 2 1 9 4 incong top c h e e k g r o n 4 8 1 1025.339842 31.55088425 2 1 9 5 incong bottom s t l k l e v e 7 9 1 555.9248924 479.6338081 2 1 9 6 incong top b r n f e l d 4 5 2 976.7041206 31.50486946 2 1 9 7 incong bottom c r o w n p l t e 5 7 1 0 32.24992752 2 1 9 8 cong top s t n d s t n d 7 6 1 1092.888117 31.59618378 2 1 9 9 cong bottom r o u t e r o u t e 4 8 1 883.2840919 31.32796288 2 1 9 10 cong top f l o t f l o t 5 6 1 768.682003
what want strip file lines value '2' or '3' under 'accuracy-t' heading (sorry they're mis-alligned- it's 10th value).
so basic idea python script iterates function on multiple files (seen here 'studyfile') , spit out new tab delimited text file items removed (seen here 'goodstudyfile'). came this:
groupvar=['1','2'] subjectvar=['1','2'] condvar=['1','2','3','4','5','6','7','8','9','10','11','12'] group in groupvar: subject in subjectvar: condition in condvar: studyfile_name = '*/pruning/study 126/group_'+str(group)+'_subject_'+str(subject)+'_condition_'+str(condition)+'_phase_1.txt' studyfile = open(studyfile_name,'r') goodstudyfile_name = '*/pruning/study 126/phase 1 no errors/group_'+str(group)+'_subject_'+str(subject)+'_condition_'+str(condition)+'_phase_1_fixed.txt' goodstudyfile = open(goodstudyfile_name,'w') study_lines = studyfile.readlines() studyfile.close() first_block = study_lines[4].split('\t')[1].strip() nr_errors_removed = 0 r_errors_removed = 0 spoils_removed = 0 low_cutoff_spoils = 0 study_line in study_lines: if len(study_line.split('\t')) > 2: if study_line.split('\t')[10] == '2': if study_line.split('\t')[4] == 'incong': study_lines.remove(study_line) nr_errors_removed+=1 elif study_line.split('\t')[4] == 'cong': study_lines.remove(study_line) r_errors_removed+=1 elif study_line.split('\t')[10] == '3': study_lines.remove(study_line) spoils_removed+=1 else: study_line in study_lines[1:]: if int(float(study_line.split('\t')[12][:8])) < 100.00: study_lines.remove(study_line) low_cutoff_spoils+=1 print 'group:' + str(group) + ' subject:' + str(subject) + ' condition:' + str(condition) print 'nr errors:'+ str(nr_errors_removed) print 'r errors:'+ str(r_errors_removed) print 'spoils:'+ str(spoils_removed) print 'low cutoff spoils:'+ str(low_cutoff_spoils) goodstudyfile.write('{}\t{}\t{}\t{}\t{}\t{}\t{}\t{}\n'.format(nr_errors_removed, 'nr errors removed', r_errors_removed, 'r errors removed',spoils_removed, 'spoils removed',low_cutoff_spoils, 'low cutoff spoils')) goodstudyfile.write('{}\n'.format(first_block)) line in study_lines: goodstudyfile.write(line) goodstudyfile.close()
so iterates fine across of files (48 files based on possible permutations of group, subject, , condvar combinations), reason regularly misses lines should deleted. in supposedly 'fixed' files, i'll still have bunch of lines ought have been removed.
nothing seems fix or change outcome- missed lines consistent (ie. miss line 7 of group2_subject1_condition_6 despite line 7 being tagged '2'). tell me i'm going wrong?
and here's example of 1 of lines it's missing:
subject phase condition trial trial type target loc targetid distid digit1 digit2 accuracy-t rt-p rt-t 1 1 6 25 incong top v l u e g u d e 9 7 2 304.780960083 866.713047028
which should have been pruned python script since has value of '2' under accuracy-t
Comments
Post a Comment