# ACADIA Article Metadata Parser and Sandpoints Archive generator 

Written and documented by [Constantinos Miltiadis](studioany.com/).  
This work was conducted in the context of the inaugural [ACADIA Cultural History Fellowship](http://acadia.org/news/QWMTMG), with the purpose of creating an open archive for all publications of [ACADIA](http://acadia.org/) (the Association for Computer Aided Design in Architecture, est. 1981).

Current preview website: https://pages.sandpoints.org/sandpoints/acadiaarchive-46619c43/

---

This Python program parses the articles catalogue of the ACADIA Conference (CSV), to: 
- identify individual authors, papers, issues, 
- create entries Markdown for all of the above, 
- link articles with issues and contributors. 

For an outline of the archive-making process see the page 'Making the archive' at the archive website.  


## Code outline

- **Part 1**: Setup; load CSV catalogue of articles; create articles dataframe. 
- **Part 2**: Parse `source` column and create Issues dataframe:
	- identify individual publication sources/proceedings names (column `clean issue`)
	- get page range for each paper (column `clean pages`)
	2. create a new dataframe for individual issues ([issues dataframe (CSV)][]).
- **Part 3**: Parse and clean up keywords: 
	- Add publication year to keywords. 
	- For cases that feature 'category' in the keywords, keep that as a keyword with the prefix `category: `)
	- if no keywords add the keyword `archive-note-no-tags`
	- if no abstract add the keyword `archive-note-no-abstract`
	- add clean keyword array to column `clean keywords`
- **Part 4**: Clean up paper titles 
	- remove double spaces, special characters, etc., and add to column `clean title`
	- generate short title to use as filename (column `filename`), and check filename uniqueness. 
- **Part 5**: Clean up and identify individual authors
	1. Manually scan similar names to identify author aliases and create a dictionary of names and aliases,; match author aliases and reduce author list (see [aliases](#aliases)),
	2. Create a new dataframe ([authors dataframe (CSV)][]) listing individual authors, aliases (if applicable), and indexes of papers . 
- **Part 6**: Create Markdown files for all entries: 
	- Contributor entries, including: 
		- a list of author name aliases (if applicable), 
		- list of articles, 
		- list of co-authors.
	- Article entries, including: 
		- abstract, 
		- keywords (including publication year), 
		- contributors (by frontmatter metadata association)
	- Issue entries, including
		- list of articles (by frontmatter metadata association),
		- publication year as keyword,
		- a table with key information such as proceedings ISBN, conference location and date, 
		- link to library entry of proceedings PDF.


Libraries
1. [Pandas](https://pandas.pydata.org/) - Python library for working with spreadsheets  
<!-- 2. [mdutils](https://pypi.org/project/mdutils/) - Python library for managing Markdown files  -->
2. [unidecode](https://pypi.org/project/Unidecode/) 
3. unicodedata
4. re, os, collections, json, datetime, math


Tasks: 
- [x] cleanup author names, and get rid of special characters carried over from Cumincad 
- [x] find author aliases and compress archive list 
- [x] cleanup paper titles 
- [x] get short paper title - for filename
- [x] separate paper keywords (both `,` and `;` separators)
- [x] export authors files 
- [x] export papers (incl. abstract, keywords, year, authors)
- [x] export issues 

TODO: 
- [ ] cleanup 


# Part 1: Setup; load CSV catalogue of articles; create articles dataframe

- User: add your signature to `FILE_AUTHOR`

In [104]:
# path to article metadata file (downloaded as CSV from GDrive) 
# using an edited version, which required multiple manual corrections 
metadata_csv= "acadia-meta-edit-sc.csv" 
##########################################
# content folder paths 
contentFolder = "../../content/"
authorsFolder=contentFolder+"contributor/"
editorsFolder=contentFolder+"editor/"
articleFolder=contentFolder+"article/"
issueFolder=contentFolder+"issue/"
##########################################
# Vars for new columns 
ENTRY_TAGS= 'keywords'
ENTRY_YEAR= 'year'
ENTRY_FILENAME= 'filename'
CLEAN_TITLE_LABEL='clean title'
CLEAN_KEYWORDS_LABEL='clean keywords'
CLEAN_AUTHORS_LABEL='clean authors'
# SHORT_TITLE_LABEL='filename'#replaced with entry_filename
CLEAN_ISSUE_LABEL ='clean issue'
CLEAN_PAGES= "clean pages"
##########################################
# Vars for columns for Authors dataframe
ALIASES_LABEL='aliases'
UNICODE_NAME_LABEL='unicode name'
AUTHOR_NAME_LABEL= 'author name'
PAPER_INDEXES_LABEL='paper indexes'
##########################################
# ISSUE DATAFRAME COLUMNS 
ISSUE_NAME= "issue name" # main title 
ISSUE_PROCEEDINGS_TITLE= "proceedings title"
ISSUE_EDITORS='editors'
ISSUE_ISBN= 'ISBN'
ISSUE_PUBLISHER= 'publisher'
ISSUE_LOCATION= 'location'
ISSUE_HOST= 'host'
ISSUE_DATES= 'dates'
ISSUE_WEBSITE='website'
ISSUE_LIB_ID= 'library id'
CLEAN_EDITORS_COLUMN='clean editors'
EDITOR_FILENAMES_COLUMN='editor filenames'
ISSUE_ID='issue short title' # short unique title used for sorting/reference
##########################################
# ARCHIVIST TAGS - tags added manually 
TAG_NO_ABSTRACT= 'archive-note-no-abstract'# keyword for articles without abstracts
TAG_NO_TAGS= 'archive-note-no-tags' # keyword for articles without tags
TAG_ALIASES='archive-note-aliases'# keyword for contributors s who have aliases
TAG_ISSUE='acadia proceedings' # keyword for issues which are proceedings 
#
NULL_TEXT="N/A" # how to desplay null/NaN entries 
###########################################
# ARCHIVING INFO 
PARSER_VERSION=1
FILE_AUTHOR="anybody" # archivist signature to keep track of who created files  <------------------

OVERWRITE='w'
NO_OVERWRITE='x'
WRITE_MODE= OVERWRITE
if (WRITE_MODE==OVERWRITE): 
    print(">>Write mode: Overwrite (!)")
else: 
    print(">>Write mode: No overwrite")



>>Write mode: Overwrite (!)


In [2]:
# check that files/directories are valid
import os

# metadata file 
print(">>Metadata CSV file \t"+("OK" if (os.path.isfile(metadata_csv)) else "NOT FOUND"))
    
# folders 
print(">>Content folder \t"+ ("OK" if (os.path.isdir(contentFolder)==True) else "NOT FOUND"))
print(">>Contributors folder \t"+ ("OK" if (os.path.isdir(authorsFolder)==True) else "NOT FOUND"))
print(">>Editors folder \t"+ ("OK" if (os.path.isdir(editorsFolder)==True) else "NOT FOUND"))
print(">>Article folder \t"+ ("OK" if (os.path.isdir(articleFolder)==True) else "NOT FOUND"))
print(">>Issue folder  \t"+ ("OK" if (os.path.isdir(issueFolder)==True) else "NOT FOUND"))

# regular expression (regex) library 
import re
# library for unicode decoding 
from unidecode import unidecode

#Util function to create valid filename from string (used for authors, papers, issues)
import unicodedata
def stringToFilename(text): 
    result = unicodedata.normalize('NFKC', text)
#     result = unicodedata.normalize('NFKD', result).encode('ascii', 'ignore').decode('ascii')
    result = re.sub(r'[^\w\s-]', '', result.lower())
    result= re.sub(r'[-\s]+', '-', result).strip('-_')
    return result

# util function to check for duplicates in list (useful for filenames)
import collections
def checkListDuplicates(myList): 
    print("Duplicate check:")
    print(  [item for item, count in collections.Counter(myList).items() if count > 1])
    
# string with printable chars 
import string
PRINTABLE = set(string.printable) # set of printable chars 

>>Metadata CSV file 	OK
>>Content folder 	OK
>>Contributors folder 	OK
>>Editors folder 	OK
>>Article folder 	OK
>>Issue folder  	OK


In [3]:
import pandas as pd

# Create dataframe from CSV (make sure the CSV separator is comma ',' if it's semicolon) add argument ` sep=';' )
df=pd.read_csv(metadata_csv) 

# get metrics 
dfElements=df.size # num of elements
dfRows= len(df.index) # num of rows

print(">>Loaded CSV with "+ str(dfRows)+" rows and "+ str(dfElements)+ " elements.")

# show DF
df.head()

>>Loaded CSV with 1488 rows and 10416 elements.


Unnamed: 0,authors,year,title,source,summary,keywords,NOTES
0,"Lenart, Mihaly",1985,The Design of Buildings which Have Complex Mec...,ACADIA Workshop '85 [ACADIA Conference Proceed...,This paper presents a project under developmen...,,
1,"Kalay, Y.E., Harfmann, A.C. and Swerdloff, L.M.",1985,ALEX: A Knowledge-Based Architectural Design S...,ACADIA Workshop '85 [ACADIA Conference Proceed...,A methodology for the development of a knowled...,,
2,"Quadrel, Richard W. and Chassin, David P.",1985,Energy Graphics: A Progress Report on the Deve...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Energy Graphics is a technique for determining...,,
3,"Wolchko, Matthew J.",1985,Strategies Toward Architectural Knowledge Engi...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Conventional CAD-drafting systems become more ...,,
4,"Hall, Theodore W.",1985,Design-Aided Computing: Adapting Old Spaces to...,ACADIA Workshop '85 [ACADIA Conference Proceed...,The introduction of computer-aided design to a...,,


# Revised Part 2: Load Issues metadata from CSV

In [4]:
issues_csv='acadia-issues.csv'
ISSUE_SOURCE_TITLE_COLUMN= 'source title' # column in csv for issue title as mentioned in 'source' colum of paper
ISSUE_SHORT_TITLE_LABEL= 'short title'

print(">>Issue CSV ("+issues_csv+") \t"+ ("OK" if (os.path.isfile(issues_csv)==True) else "NOT FOUND"))
print(">>Loading "+ issues_csv+ " as dataframe" )
issuedf= pd.read_csv(issues_csv)
nIssues= len(issuedf.index)
print(">>Issues found: "+ str(nIssues))

# get issue title as mentioned in 
issueSourceTitles=list(issuedf[ISSUE_SOURCE_TITLE_COLUMN])
issueTitleList= issuedf['issue name']# get title per issue
issueShortTitleList= issuedf[ISSUE_SHORT_TITLE_LABEL]# get short title per issue, to save at paper entry for reference

# print(issueShortTitleList)

>>Issue CSV (acadia-issues.csv) 	OK
>>Loading acadia-issues.csv as dataframe
>>Issues found: 39


In [5]:
# GET ISSUE LIBRARY ID from copy of library catalog JSON 
# LIB IDs have been copied in the CSV

# Print lib key per 'acadia proceedings' tagged item - copy data to CSV 
# Can match by date
# import json
# # get library catalog and match titles 
# bibCatalogJson = 'library-catalog-copy.json'
# bibData=""
# if os.path.isfile(bibCatalogJson)==False: 
#     print("*ERROR: JSON file not found ("+ authorAliasesJsonFile+")")
# else: 
#     jsonFile= open(bibCatalogJson)
#     bibData= json.load(jsonFile)

# print lib key, pub date, and title  
# # new dictionary of {title,libId}
# libraryTitleIdDict= {}
# for bibKey, values in bibData.items():
#     # append to dict 
#     if 'acadia proceedings' in values['tags']:
#         print(values['pubdate'])
#         print(values ['title'])
#         print(bibKey)
#         print('-----')

In [6]:
# Parse 'source' which contains bibliographic information: publication title, page range, location, date, etc.
# Match pattern with issueSourceTitles from issue metadata csv

# lists per paper
paperSourceList= df['source'] # source per paper 
paperPagesList=[] # empty array to add page range, per paper
pagesFoundCounter=0
paperIssueTitleList=[None]*dfRows# empty array for issue title, per paper 
paperIssueShortTitleList=[None]*dfRows# empty array for issue title, per paper 

issueDictionary={} 
issueBreakAt= ['[', ' - ' ] # patterns to break 'source' description to get title 

issueMatchCounter=0
issuePaperIndexList=[ [] for _ in range(nIssues) ]#list of paper indexes per issue

for i in range  (len(paperSourceList)): 
    source= paperSourceList[i]
    #####################################
    # find pages for each article
    pages=""
    # get paper page range 
    pages=re.search('\d{1,3}\-?\s?\d{1,3}?\s?\.?$', source)
    if (pages!=None): 
        pages= pages.group().strip()
        pages= pages.replace('- ', '-')
        pagesFoundCounter+=1
    else: 
        print("Page not found: "+ issueInfo)
    paperPagesList.append(pages)
    #####################################
    # get issue title (remove text tail) 
    for breakChar in issueBreakAt: 
        if source.find(breakChar)>0: 
            issue = (source.split(breakChar)[0])
            break
    # cleanup issue title
    issue= issue.replace(' // ', ' ') # remove " // " from recent titles
    issue= (''.join(filter(lambda x: x in PRINTABLE, issue))).replace('  ', ' ') # remove non-printable
    issue=issue.strip()
    
    if issue in issueSourceTitles: 
        # add paper index to list of paper indexes of issue
        issueMatchCounter+=1
        # get issue index 
        issueIndex= issueSourceTitles.index(issue)
        # add paper index to issue 
        issuePaperIndexList[issueIndex].append(i)
        # save reference at paper entry 
        paperIssueTitleList[i]=issueTitleList[issueIndex]
        # save short name as id 
        paperIssueShortTitleList[i]=issueShortTitleList[issueIndex]
    else: 
        print ("*Error: issue information not matched for "+ source)

print (">>Match source publication for "+ str(issueMatchCounter)+"/"+str(dfRows)+" papers.")
issuedf[PAPER_INDEXES_LABEL]= issuePaperIndexList
# print(">>Issue info added to ")
df[CLEAN_ISSUE_LABEL]= paperIssueTitleList # save issue title per paper 
df[ISSUE_ID]= paperIssueShortTitleList # save issue short title per paper (as id )
print(">>Found Pages for " + str(pagesFoundCounter)+"/"+ str(dfRows)+ " papers. Pages added to column: "+ CLEAN_PAGES)
#add pages to papers dataframe
df[CLEAN_PAGES]= paperPagesList

issuedf.head()
# df.head()

>>Match source publication for 1488/1488 papers.
>>Found Pages for 1488/1488 papers. Pages added to column: clean pages


Unnamed: 0,year,short title,issue name,subtitle,editors-straight-names,editors,proceedings title,ISBN,publisher,location,...,website,Notes,Original PDF,Compressed & OCR PDF Size,OCRd,in library,paper indexes,filename,source title,library id
0,1985,proceedings 1985,ACADIA Workshop '85,,Patricia G. Macintosh,"Macintosh, Patricia G.",ACADIA Conference Proceedings,,ACADIA,"Tempe, Arizona (USA)",...,,,19.3,6.8,True,True,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]",acadia-workshop-85,ACADIA Workshop '85,b6cadd10-10cc-4bbb-a171-10a66e02a002
1,1986,proceedings 1986,ACADIA Workshop '86 Proceedings,"Architectural Education, Research, and Practic...",James A. Turner,"Turner, James A.",,,ACADIA,"Houston, Texas (USA)",...,,,24.6,9.06,True,True,"[14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 2...",acadia-workshop-86-proceedings,ACADIA Workshop '86 Proceedings,100de8ee-d54d-4261-abee-a4fa462d560f
2,1987,proceedings 1987,Integrating Computers into the Architectural C...,,Barbara-Jo Novitski,"Novitski, Barbara-Jo",ACADIA Conference Proceedings,,ACADIA,"Raleigh, North Carolina (USA)",...,,,26.8,,False,True,"[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 4...",integrating-computers-into-the-architectural-c...,Integrating Computers into the Architectural C...,7ae3b72d-af50-4408-a03c-d5f854871f70
3,1988,proceedings 1988,Computing in Design Education,,Pamela J. Bancroft,"Bancroft, Pamela J.",ACADIA Conference Proceedings,,ACADIA,Ann Arbor Michigan (USA),...,,,41.4,14.9,False,True,"[46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 5...",computing-in-design-education,Computing in Design Education,7c6fd3f6-d3cc-4fa8-b735-0d2976bc66da
4,1989,proceedings 1989,New Ideas and Directions for the 1990's,,Chris I. Yessios,"Yessios, Chris I.",ACADIA Conference Proceedings,,,"Gainsville, Florida (USA)",...,,Partly OCR'd OK,6.2,,True,True,"[69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 8...",new-ideas-and-directions-for-the-1990s,New Ideas and Directions for the 1990's,e8a1fefe-82d2-49bd-ab54-4cef7567281a


In [116]:
# # util function to put surname first 
# def surnameFirst(name): 
#     if name.find(',')>0: 
#         print("*Error in name reversal; already has comma: "+ name )
#         return
#     isJr= name.lower().find(' jr.')
#     hasDel= name.lower().find(' del ')
#     hasVan= name.lower().find(' van ')
#     if hasDel>0: 
#         reverse= name[hasDel: ].strip()+', '+ name[:hasDel].strip()
#         return reverse
#     elif hasVan>0: 
#         reverse = name[hasVan:].strip()+", "+ name[:hasVan].strip()
#         return reverse
#     else:     
#         bits= name.split(' ')
#         if len(bits)==2: 
#             return bits[1]+", "+ bits[0]
#         elif len(bits)==3: 
#             if bits[0].find('.')>0: # ex. J. Peter Jordan- > Jordan, J. Peter
#                 return bits[2]+", "+ bits[0]+ " "+ bits[1]
#             else: 
#                 return bits[2]+", "+ bits[0]+ " "+ bits[1]
#         else: 
#             print("UNCAUGHT: "+ name)
#             return (name)

# function to put name first from Surname, Name
def nameFirst(name): 
    bits = name.split(', ')
    if len(bits)!=2: 
        print("*ERROR No or Multiple commas in \'Surname, Name\' : "+ name)
    else: 
        return bits[1]+" "+ bits[0]

# get short title, then get filename
def titleToFilename(title): 
    #shorten
    title= title.split(':')[0]
    title=title.split('-')[0].strip()
    return stringToFilename(unidecode(title))
    

issuedf.head()
editorsList= issuedf[ISSUE_EDITORS]
EDITORS_DELIMITER=';'

editorsDict={} # dictionary of (editor name, issue index)
editorsFoundCounter=0

# create new lists for clean editors names, and their filenames (to link) 
issueEditorList=  [None]*nIssues
issueEditorFilenameList= [None]*nIssues

for index, row in issuedf.iterrows(): 
    issueEditors=row[ISSUE_EDITORS]

    if type(issueEditors)==str: 
        editorsFoundCounter+=1
    else: 
        print("*EDITOR entry not string; continue: "+ str(index))
        continue
        
    # get individual editors 
    individualEditors= issueEditors.split(EDITORS_DELIMITER)
    
    for editor in individualEditors:
        # strip spaces
        editor=editor.strip()
        if editor=="": 
            print("*Error, empty editor in : "+ str(index)+ ' (continue): '+ issueEditors)
            continue 
        else: 
            # add editor and issue index in editors dictionary 
            if editor in editorsDict: 
                editorsDict[editor]['issue index'].append(index)
            else:
                # get filename
                editorFilename= stringToFilename(nameFirst(unidecode(editor)))
                editorsDict[editor]= {'issue index': [index], 'filename': editorFilename}
                
        # save individual editor names per issue, and their corresponding filenames
        if issueEditorList[index]==None: 
            issueEditorList[index]= [editor]
            issueEditorFilenameList[index]= [editorsDict[editor]['filename']]
        else: 
            issueEditorList[index].append(editor)
            issueEditorFilenameList[index].append (editorsDict[editor]['filename'])

# get filename per issue
issueFilenameList=[]
for index, row in issuedf.iterrows(): 
    # get issue filename 
    issueTitle= row[ISSUE_NAME]
    issueFilename= titleToFilename(issueTitle)
#     print(issueFilename)
    
    if issueFilename in issueFilenameList: 
        issueFilenameList[issueFilenameList.index(issueFilename)]= issueFilename+"-a"
        issueFilenameList.append(issueFilename+'-b')
        print("filename conflict resolved:"+ issueFilename)
    else: 
        issueFilenameList.append(issueFilename)

print(">>Issue Editors found for "+ str(editorsFoundCounter)+"/"+ str(nIssues))
print(">>Editors added to column: "+ CLEAN_EDITORS_COLUMN )
print(">>Editor filenames added to column: "+ EDITOR_FILENAMES_COLUMN)
print(">>Issue filename added to column: "+ ENTRY_FILENAME)
issuedf[CLEAN_EDITORS_COLUMN]=issueEditorList
issuedf[EDITOR_FILENAMES_COLUMN]= issueEditorFilenameList
issuedf[ENTRY_FILENAME]= issueFilenameList




# Export dataframe to CSV
exportIssuesDataframe=False
if exportIssuesDataframe: 
    exportName= "export-acadia-issues-list.csv"
    issuedf.to_csv(exportName)
    print(">>Exported Issues dataframe as: "+ exportName)

issuedf.head()

filename conflict resolved:acadia-2020
>>Issue Editors found for 39/39
>>Editors added to column: clean editors
>>Editor filenames added to column: editor filenames
>>Issue filename added to column: filename


Unnamed: 0,year,short title,issue name,subtitle,editors-straight-names,editors,proceedings title,ISBN,publisher,location,...,Original PDF,Compressed & OCR PDF Size,OCRd,in library,paper indexes,filename,source title,library id,clean editors,editor filenames
0,1985,proceedings 1985,ACADIA Workshop '85,,Patricia G. Macintosh,"Macintosh, Patricia G.",ACADIA Conference Proceedings,,ACADIA,"Tempe, Arizona (USA)",...,19.3,6.8,True,True,"[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]",acadia-workshop-85,ACADIA Workshop '85,b6cadd10-10cc-4bbb-a171-10a66e02a002,"[Macintosh, Patricia G.]",[patricia-g-macintosh]
1,1986,proceedings 1986,ACADIA Workshop '86 Proceedings,"Architectural Education, Research, and Practic...",James A. Turner,"Turner, James A.",,,ACADIA,"Houston, Texas (USA)",...,24.6,9.06,True,True,"[14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 2...",acadia-workshop-86-proceedings,ACADIA Workshop '86 Proceedings,100de8ee-d54d-4261-abee-a4fa462d560f,"[Turner, James A.]",[james-a-turner]
2,1987,proceedings 1987,Integrating Computers into the Architectural C...,,Barbara-Jo Novitski,"Novitski, Barbara-Jo",ACADIA Conference Proceedings,,ACADIA,"Raleigh, North Carolina (USA)",...,26.8,,False,True,"[30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 4...",integrating-computers-into-the-architectural-c...,Integrating Computers into the Architectural C...,7ae3b72d-af50-4408-a03c-d5f854871f70,"[Novitski, Barbara-Jo]",[barbara-jo-novitski]
3,1988,proceedings 1988,Computing in Design Education,,Pamela J. Bancroft,"Bancroft, Pamela J.",ACADIA Conference Proceedings,,ACADIA,Ann Arbor Michigan (USA),...,41.4,14.9,False,True,"[46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 5...",computing-in-design-education,Computing in Design Education,7c6fd3f6-d3cc-4fa8-b735-0d2976bc66da,"[Bancroft, Pamela J.]",[pamela-j-bancroft]
4,1989,proceedings 1989,New Ideas and Directions for the 1990's,,Chris I. Yessios,"Yessios, Chris I.",ACADIA Conference Proceedings,,,"Gainsville, Florida (USA)",...,6.2,,True,True,"[69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 8...",new-ideas-and-directions-for-the-1990s,New Ideas and Directions for the 1990's,e8a1fefe-82d2-49bd-ab54-4cef7567281a,"[Yessios, Chris I.]",[chris-i-yessios]


# DEPRICATED 
Part 2: Create Issues dataframe: parse 'source' column, cleanup individual Issues, and get pages for each paper 
- get issue 
- get page range per paper
- get issue location/date
- get issue ISBN 
- get conference title 
- get proceedings title 
- create dataframe of issues

In [8]:
# # get source column 
# issueList = df['source']
# issueDictionary={} # dictionary for unique issues 
# issueBreakAt= ['[', ' - ' ] # patterns to break 'source' description to get title 

# paperIssue=[] # list for issue of paper
# paperPages=[] # list of pages of paper 

# #counters 
# counter=0
# pagesFoundCounter=0
# # loop through 'source' column for all papers 
# for i in range (len(issueList)): 
#     issueInfo= issueList[i]
    
#     issue=""
#     pages=""
    
#     # get issue title (remove tail) 
#     for breakChar in issueBreakAt: 
#         if issueInfo.find(breakChar)>0: 
#             issue = (issueInfo.split(breakChar)[0])
#             break
        
#         if issueInfo.endswith('.'): 
#             issueInfo= issueInfo[:len(issueInfo)-1]

#     # get paper page range 
#     pages=re.search('\d{1,3}\-?\s?\d{1,3}?\s?\.?$', issueInfo)
#     if (pages!=None): 
#         pages= pages.group().strip()
#         pages= pages.replace('- ', '-')
#         pagesFoundCounter+=1
#     else: 
#         print("Page not found: "+ issueInfo)

#     # cleanup issue title
#     issue= issue.replace(' // ', ' ') # remove " // " from recent titles
#     issue= (''.join(filter(lambda x: x in PRINTABLE, issue))).replace('  ', ' ') # remove non-printable
#     issue=issue.strip()
#     #add pages to list 
#     paperPages.append(pages)
#     #add issue to list 
#     paperIssue.append(issue)
#     # add to dictionary 
#     if issue not in issueDictionary:
#         issueDictionary[issue]=[i]
#     else: 
#         issueDictionary[issue].append(i)
        
#     counter+=1
    
# print(">>Found Pages for :" + str(pagesFoundCounter)+"/"+ str(counter)+ " papers")
# df[CLEAN_ISSUE_LABEL]= paperIssue
# df[CLEAN_PAGES]= paperPages
# print(">>Cleaned up paper issue, and pages; added to columns \'"+CLEAN_ISSUE_LABEL+" \' and \'"+CLEAN_PAGES+"\'"  )

In [9]:
# ################################################
# # Go through issues and find: 
# # title, proceedings title, location/dateISBN, year, and generate filename
# issueTitleList=[]
# issueLocationList =[]
# issueISBNList=[]
# issueYearList=[]
# issueFilenameList=[]

# for key in issueDictionary.keys(): 
    
#     #get first entry for year and full description 
#     firstEntry= df.iloc[issueDictionary[key][0]]
#     issueYear= firstEntry[ENTRY_YEAR] 
#     issueDescription = firstEntry['source']
    
#     issueISBN=""
#     location=""
#     #######################################################################
#     # get proceedings title and ISBN (OK)
#     title= re.search('\[.*\]', issueDescription)
#     if title!=None: 
#         title=title.group()
#         title=title.replace(']','').replace('[','') # remove brackets 
        
#         if title.find("ISBN")>0: 
#             bits= title.split('ISBN')
#             title= bits[0].replace('/','').strip()
#             issueISBN= bits[1].strip()
#         else: 
#             bits= title.split('/')
#             if len(bits)>1: 
#                 title= bits[0]
#                 issueISBN= bits[1].strip()
#     # if null values add 'N/A'
#     if title==None: 
#         title= NULL_TEXT
#     if issueISBN==None or issueISBN=="": 
#         issueISBN=NULL_TEXT
#     #######################################################################
#     # Get location 
#     location = re.search('\].*\d{4}', issueDescription)
#     if location!=None:
#         location=location.group() 
#         location=location.split(' ', 1)[1]
#         location= location.split('] ')[-1]# this is to get the Banff conf, that has a different format 
#         location=location.strip()
#         location=location.replace('] ', '')
#         location=location.replace('].', '')
# #         location=location.replace(' - ', ', ')
#         location=location.replace(' / ', ', ')
#         location=location.replace('(', '')
#         location=location.replace(')', '')
#     else: # get other case 
#         bits = issueDescription.split(' - ', 1)
#         if len(bits)>1: 
#             location= (bits[1].split(', pp')[0])

#     # get rid of accent in Quebec 
#     if location!= None:
#         location=unidecode(str(location))
#         #remove false location (1992)
#         if len(location)<10: 
#             location=None
#     # overwrite None with controlled text
#     if location==None: 
#         location= NULL_TEXT

#     #######################################################################
    
#     # GET FILENAME
#     issueFilename= key.split(':')[0]
#     issueFilename= issueFilename.split(' - ')[0]
#     issueFilename= stringToFilename(issueFilename)
    
#     #add to lists
#     issueTitleList.append(title)
#     issueLocationList.append(location)
#     issueISBNList.append(issueISBN)
#     issueYearList.append(issueYear)
#     if issueFilename in issueFilenameList: 
#         issueFilenameList[issueFilenameList.index(issueFilename)]= issueFilename+"-a"
#         issueFilenameList.append(issueFilename+'-b')
# #         print("filename conflict resolved:"+ issueFilename)
#     else: 
#         issueFilenameList.append(issueFilename)
                
#     # LOG  change flag to true to print
#     if False:
#         print("KEY:"+ key)
#         print("PTITLE: "+ str(title))
#         print("ISBN: "+ str(issueISBN))
#         print("Location: "+ str(location))
#         print("YEAR: "+ str(issueYear))
# #         print("Filename: "+issueFilename )
# #         print('\t'+ issueDescription)
#         print("----------")
        
# print(">>Parsed issue information.")

In [10]:
# #create issue dataframe
# #get data

# nIssues= len(issueDictionary)
# print("nIssues: "+ str(nIssues))
# dummyColumn =[None]* nIssues # empty column 

# issueData= {
#     ISSUE_NAME:issueDictionary.keys(), 
#     ISSUE_PROCEEDINGS_TITLE: issueTitleList, 
#     ENTRY_YEAR: issueYearList,
#     #included papers 
#     PAPER_INDEXES_LABEL: issueDictionary.values(),
#     ENTRY_FILENAME: issueFilenameList,
#     # publication info
#     ISSUE_ISBN: issueISBNList, 
#     ISSUE_PUBLISHER: dummyColumn, 
#     # conference info  info
#     ISSUE_LOCATION: issueLocationList, # atm both location/host and dates 
#     ISSUE_HOST: dummyColumn, 
#     ISSUE_DATES: dummyColumn, 
#     # following are empty for later use
#     ISSUE_EDITORS: dummyColumn,
#     ENTRY_TAGS: dummyColumn, # tags 
#     ISSUE_LIB_ID: dummyColumn, # library item  
    
#     }
# #create dataframe 
# issuedf= pd.DataFrame(issueData)
# print(">>Created Issues dataframe")


# # Export dataframe to CSV
# exportIssuesDataframe=True
# if exportIssuesDataframe: 
#     exportName= "export-acadia-issues-list.csv"
#     issuedf.to_csv(exportName)
#     print(">>Exported Issues dataframe as: "+ exportName)

# issuedf.tail()

## 2.1 Add quarterlies from lib catalog to issues dataframe

In [11]:
# QUATERLIES_LIB_TAG= 'acadia quarterlies'
# bibCatalogJson = 'python-parser/library-catalog-copy.json'
# bibData=""
# if os.path.isfile(bibCatalogJson)==False: print("*ERROR: JSON file not found ("+ authorAliasesJsonFile+")")
# else: 
#     jsonFile= open(bibCatalogJson)
#     bibData= json.load(jsonFile)
# #     print(jsonData)


# def createIssueRow(title, proceedingsTitle,isbn,location, pubYear, abstract, tags, libId): 
#     return {
#         ISSUE_NAME: title,
#         ISSUE_PROCEEDINGS_TITLE: NULL_TEXT if proceedingsTitle==None else proceedingsTitle,  
#         ISSUE_ISBN: NULL_TEXT if isbn==None else isbn, 
#         ISSUE_LOCATION: NULL_TEXT if location==None else location, 
#         ENTRY_YEAR: pubYear, 
#         PAPER_INDEXES_LABEL: [], 
#         ENTRY_FILENAME:"NULL FOR NOW", 
# #         ISSUE_LIB_ID: libId, 
# #         ENTRY_TAGS: tags 
#     }

# for key in bibData: 
#     # if is quarterly 
#     if QUATERLIES_LIB_TAG in bibData[key]['tags']: 
#         # get data 
#         title=(bibData[key]['title'])
# #         printTitle
#         pubYear= bibData[key]['pubdate'].split('-')[0]
#         libId= bibData[key]['library_uuid']
#         tags= bibData[key]['tags']
#         abstract=bibData[key]['abstract']
#         # create row to add to dataframe 
#         newRow= createIssueRow(title, None, None,None, pubYear, abstract, tags, libId)
#         print("SKIP NEW ROW: "+ str(newRow))
# #         issuedf.loc[len(issuedf)]=newRow 

print(">>SKIP (adding Quarterlies to issues df)")

>>SKIP (adding Quarterlies to issues df)


# Part 3: Clean up keywords 
- keywords are separated either by commas (',') or by semicolons (';')
- some cases have 'category:' AND 'keywords:'. Category was kept as a keyword with the 'category prefix' 
- add publication year to keywords 

In [12]:
keywordList= df['keywords']
paperYearList= df[ENTRY_YEAR]

keywordReplace={
    '/': '-', 
    ' & ': '-', 
    '&': '-', 
#     ',':'', # remove comma 
    '.': '', # remove period 
    '  ': ' ', # remove double space
    '--': '-'# last double dash to single 
}

cleanKeywordList=[]
cCounter=0
for i in range(len( keywordList)): 
    keywordBulk = keywordList[i] # get keyword cell 
    paperYear= str(paperYearList[i])# paper year
    
    if type (keywordBulk) ==str : # is not NaN
        
        keywordBulk = keywordBulk.lower() # make lower case 
        
        # check if it has 'category:'
        category=""
        if keywordBulk.find("category")>=0: 
#             print(str(cCounter)+'-'+ keywordBulk)
            cCounter+=1
            if keywordBulk.startswith('category:'):  # If 'category:' is at start
#                 print("STARTS WITH: "+ keywordBulk)
                kw = keywordBulk.split('; keywords:')
                if len(kw)>1: 
                    keywordBulk= kw[1] # set keyword bulk 
                    category= kw[0][10:]
#                     print("KEYWORDS: " + keywordBulk)
#                     print("CAT:"+ category)
                else: 
                    print("ERROR in getting category")
            else:  # if 'category:' is at the end  
                keywordBulk = re.sub('\;\(category\)', ' category:',keywordBulk)
#                 print("IN MIDDLE: "+ keywordBulk)
                
                kw = keywordBulk.split('category: ')
                keywordBulk= kw[0]
                category= kw[1]
#                 print("KW:"+ kw[0])
#                 print("CAT:"+ kw[1])
#           # replace characters 
            for pattern,replace in keywordReplace.items(): 
                category=category.replace(pattern,replace)
            
        
        # replace characters 
        newBulk=keywordBulk
        for pattern,replace in keywordReplace.items(): 
            newBulk= newBulk.replace(pattern, replace)
        if newBulk!= keywordBulk: 
            keywordBulk=newBulk
        
        # split at commas 
        paperKeywords = keywordBulk.split(',')
         # if no commas, split at semicolons 
        if len(paperKeywords)==1: 
            paperKeywords= keywordBulk.split(';')
            
        #strip
        paperKeywords= [unidecode(key.strip()) for key in paperKeywords] # unidecode (eg 'facade')
        #append category keywords
        if category!="": 
            paperKeywords.append("category: "+ category.strip())
    
        #append year of publication 
        if paperYear not in paperKeywords: paperKeywords.append(paperYear)
        cleanKeywordList.append(paperKeywords)
    else: # if no tags append paper year, and NOTE for no tags 
        cleanKeywordList.append([paperYear, TAG_NO_TAGS]) # add paper year 

    # Get abstract, if no abstract, add note 
    abstract = df.iloc[i]['summary']
    if type(abstract)!=str: 
        cleanKeywordList[i].append(TAG_NO_ABSTRACT)
#         print("ADDED NO ABSTRACT NOTE for "+ str(i))
#         print(cleanKeywordList[i])

    
# add clean keywords to dataframe
df[CLEAN_KEYWORDS_LABEL]= cleanKeywordList
print(">>Retrieved and split keywords; added to dataframe column \'"+CLEAN_KEYWORDS_LABEL+"\'" )

#preview dataframe 
# df.tail()

>>Retrieved and split keywords; added to dataframe column 'clean keywords'


# Part 4: Clean up paper titles 

In [13]:
paperTitles= df['title']


print("Paper#:"+ str(len(paperTitles)))

# shortTitleReplace={
#     ' the ': ' ', 
#     ' and, ': ' ',
#     ' and ': ' ',
#     ' of ': ' ',
#     ' an ' : ' ',
#     ' on ': ' ', 
#     ' as ': ' ', 
#     ' to ' : ' ',
#     ' in ': ' ',
#     ' a ': ' ' , 
#     ' into ': ' ', 
#     ' for ' : ' ',
#     ' & ': ' ',
#     '©': '',
#     ' - ': '-',
#     '[': '', 
#     ']': '', 
#     '\'': '', 
#     '\"': '',
#     '+' :' ', 
#     '|': ' ', 
#     '.': '', # remove periods
#     ', ': ' ', # replace with space
#     '  ': ' ' # cleanup
# }

# Remove if title starts with 
titleStartRemove= ['The ', 'A ', 'An ']

breakTitleAt =['?',' - ',' – ', ':','–','–', ';', ',', '/', '. ', ] #'|',
#################################
cleanTitles=[]
cleanShortTitles=[]
paperTitleDict={}
#################################

def cleanPaperTitle(paperTitle): 
    cleanTitle= paperTitle
    # replace double quotes with single ones 
    cleanTitle= re.sub('"', '\'', cleanTitle)
    #remove double spaces
    cleanTitle= re.sub('\s+', ' ',cleanTitle )
    #remove period at end
    if cleanTitle.endswith('.'): cleanTitle= cleanTitle [:len(cleanTitle)-1]
    
    # if all caps, capitalize as title 
    if cleanTitle.isupper(): 
        cleanTitle= (cleanTitle.title())
    #strip
    cleanTitle= cleanTitle.strip()
    return cleanTitle

# Get Clean title: 
for i in range (len(paperTitles)): 
    paperTitle= paperTitles[i]
    cleanTitle= cleanPaperTitle(paperTitle)
    cleanTitles.append(cleanTitle)# append to list 
    
df[CLEAN_TITLE_LABEL]= cleanTitles # add clean title to df 

df.head()

Paper#:1488


Unnamed: 0,authors,year,title,source,summary,keywords,NOTES,clean issue,issue short title,clean pages,clean keywords,clean title
0,"Lenart, Mihaly",1985,The Design of Buildings which Have Complex Mec...,ACADIA Workshop '85 [ACADIA Conference Proceed...,This paper presents a project under developmen...,,,ACADIA Workshop '85,proceedings 1985,52-68,"[1985, archive-note-no-tags]",The Design of Buildings which Have Complex Mec...
1,"Kalay, Y.E., Harfmann, A.C. and Swerdloff, L.M.",1985,ALEX: A Knowledge-Based Architectural Design S...,ACADIA Workshop '85 [ACADIA Conference Proceed...,A methodology for the development of a knowled...,,,ACADIA Workshop '85,proceedings 1985,96-108,"[1985, archive-note-no-tags]",ALEX: A Knowledge-Based Architectural Design S...
2,"Quadrel, Richard W. and Chassin, David P.",1985,Energy Graphics: A Progress Report on the Deve...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Energy Graphics is a technique for determining...,,,ACADIA Workshop '85,proceedings 1985,129-141,"[1985, archive-note-no-tags]",Energy Graphics: A Progress Report on the Deve...
3,"Wolchko, Matthew J.",1985,Strategies Toward Architectural Knowledge Engi...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Conventional CAD-drafting systems become more ...,,,ACADIA Workshop '85,proceedings 1985,69-82,"[1985, archive-note-no-tags]",Strategies Toward Architectural Knowledge Engi...
4,"Hall, Theodore W.",1985,Design-Aided Computing: Adapting Old Spaces to...,ACADIA Workshop '85 [ACADIA Conference Proceed...,The introduction of computer-aided design to a...,,,ACADIA Workshop '85,proceedings 1985,25-34,"[1985, archive-note-no-tags]",Design-Aided Computing: Adapting Old Spaces to...


# Part 4.1: Get short paper titles for filenames

In [14]:
# tests=[
#     'i+a: Explorations in Emerging Architectural / Typologies and Design Processes', 
#     'Pneuma-Technics // Methods for Soft Adaptive Environments', 
#     'Context-Aware Multi-Agent Systems: Negotiating Intensive Fields',
#     'FORM{less}', 
#     'Foll(i)cle', 
#     'Patty & Jan',
#     '[BENT]',
#     'POLYBRICK 2.0'
# ]

breakTitleAt=[': ',':', '?', '–','–',]
removeChars = [',','{', '}', '[', ']', ')', '(', '\/','+','&','.', "\"", "\'", "|",'%']
removePatterns=[', and ', ' and ', ' for ', ' in ', ' at ', ' of ', ' the ', ' a ', ' an ', ' as ']
removeStart= ['the ', 'a ','an ', 'in ']

shortTitleDict={}
shortTitleList=[]

for i in range (len(cleanTitles)): #   title in tests: 
    
    fullTitle=cleanTitles[i]
    shortTitle= unidecode(fullTitle.lower())
    
    # remove chars 
    for char in removeChars: 
        shortTitle= shortTitle.replace(char, ' ')
        
    # remove start
    for startSign in removeStart: 
        if shortTitle.startswith(startSign): 
            shortTitle=shortTitle[len(startSign):]
            
    for pattern in removePatterns: 
        if shortTitle.find(pattern)>=0: 
            shortTitle=shortTitle.replace(pattern, ' ')
    
    # break title 
    for breakSign in breakTitleAt: 
        if shortTitle.find(breakSign)>0: 
            potentialST= shortTitle.split(breakSign)[0]
            if len(potentialST)<13: # If it results in a small string then just replace and stay in the loop
                shortTitle= shortTitle.replace(breakSign, ' ') 
#                 print("SKIP ST: "+ breakSign +"||"+ shortTitle+ " >"+ fullTitle)
            else: #apply  
                shortTitle= potentialST
                continue
    
    # format to valid filename
    shortTitle= stringToFilename(shortTitle)
    
    # Limit short title to 6 words 
    maxNumBits= 6
    shortTitleBits=shortTitle.split('-')
    if len(  shortTitleBits)>maxNumBits: 
        newTitle= ""
        for i in range (maxNumBits): 
            newTitle+= shortTitleBits[i]+ ("-" if (i<maxNumBits-1) else "")
        shortTitle=newTitle
        
    # add to list 
    if (shortTitle not in shortTitleList): 
        shortTitleList.append(shortTitle)
    else: 
        #prepend paper year
        paperYear= int(df.iloc[i]['year'])
        newTitle= str(paperYear)+ "-"+shortTitle
        
        otherIndex= shortTitleList.index(shortTitle)
        otherYear= int(df.iloc[otherIndex]['year'])
        otherNewTitle=str(otherYear)+"-"+ shortTitleList[otherIndex]
          
        if (otherYear-paperYear)==0: # equals operation was not working 
            newTitle=newTitle+'-a'
            otherNewTitle=otherNewTitle+'-b'
            print("Filename conflict Settled:"+ newTitle + "||"+ otherNewTitle)
            
        #append 
        shortTitleList.append(newTitle) 
        shortTitleList[otherIndex]= otherNewTitle

# check for duplicates again
checkListDuplicates(shortTitleList)

df[ENTRY_FILENAME]= shortTitleList
df.head()

Filename conflict Settled:2003-digital-curricula-a||2003-digital-curricula-b
Filename conflict Settled:2004-digital-tectonics-a||2004-digital-tectonics-b
Filename conflict Settled:2014-context-aware-multi-agent-systems-a||2014-context-aware-multi-agent-systems-b
Filename conflict Settled:2014-centennial-chromagraph-a||2014-centennial-chromagraph-b
Duplicate check:
[]


Unnamed: 0,authors,year,title,source,summary,keywords,NOTES,clean issue,issue short title,clean pages,clean keywords,clean title,filename
0,"Lenart, Mihaly",1985,The Design of Buildings which Have Complex Mec...,ACADIA Workshop '85 [ACADIA Conference Proceed...,This paper presents a project under developmen...,,,ACADIA Workshop '85,proceedings 1985,52-68,"[1985, archive-note-no-tags]",The Design of Buildings which Have Complex Mec...,design-buildings-which-have-complex-mechanical
1,"Kalay, Y.E., Harfmann, A.C. and Swerdloff, L.M.",1985,ALEX: A Knowledge-Based Architectural Design S...,ACADIA Workshop '85 [ACADIA Conference Proceed...,A methodology for the development of a knowled...,,,ACADIA Workshop '85,proceedings 1985,96-108,"[1985, archive-note-no-tags]",ALEX: A Knowledge-Based Architectural Design S...,alex-knowledge-based-architectural-design-system
2,"Quadrel, Richard W. and Chassin, David P.",1985,Energy Graphics: A Progress Report on the Deve...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Energy Graphics is a technique for determining...,,,ACADIA Workshop '85,proceedings 1985,129-141,"[1985, archive-note-no-tags]",Energy Graphics: A Progress Report on the Deve...,energy-graphics
3,"Wolchko, Matthew J.",1985,Strategies Toward Architectural Knowledge Engi...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Conventional CAD-drafting systems become more ...,,,ACADIA Workshop '85,proceedings 1985,69-82,"[1985, archive-note-no-tags]",Strategies Toward Architectural Knowledge Engi...,strategies-toward-architectural-knowledge-engi...
4,"Hall, Theodore W.",1985,Design-Aided Computing: Adapting Old Spaces to...,ACADIA Workshop '85 [ACADIA Conference Proceed...,The introduction of computer-aided design to a...,,,ACADIA Workshop '85,proceedings 1985,25-34,"[1985, archive-note-no-tags]",Design-Aided Computing: Adapting Old Spaces to...,design-aided-computing


# Part 5: Clean up and identify individual authors

1. Get individual authors from paper catalogue
2. Create a dictionary of {author-name,[paperIndexes]}
3. Find author duplicates and aliases, and reduce list

## Author listing convention problems 

The metadata CSV goes by different conventions, and has many inconsitencies. 
Some corrections were done manually in a copy of the CSV file (included)

Up to about row 596, authors are mentioned as: 
- Surname, Name (single authors)
- Surname1, Name1, Surname2, Name2 and Surname3, Name3 (multiple authors
From about row 596 onwards, author names are separated with semicolons as in: 
- Name Surname; Name Surname 

However, there are multiple inconsistencies. 
- Some authors are listed as Surname, Name (et al.) 

## Special characters 

The paper spreadsheet contains many illegible characters (originally non-latin characters that were converted to something else)
- some special character sets were replaced in the spreadsheet
- others were replaced dynamicaly (see dictionary `replaceDict` below)
- others (e.g. `?` characters) were replaced manually 


In [15]:
# Dictionary for replacing illegible special characters 
replaceDict = {
    '√â': 'É',
    '√¥': 'ô', 
    '√†': 'à', 
    '√Ø': 'ï', 
    '√§': 'ä',
    '√®': 'è', 
    '√∂': 'ö', 
    '√≠': 'í', 
    '√°': 'á', 
    '√ß': 'ç', 
    '√©': 'é', 
    '√º': 'ü', 
    '‚Äô':'’', 
    '√∏': 'ø', 
    '√ò': 'Ø', 
    '√≥': 'ó', 
    '√±': 'ñ', 
    '√±': 'ń', # ?? dup 
    '≈°': 'š', 
#     '√ß': 'ç', 
    '√ñ': 'Ö', 
    '√∫': 'ú', 
    '√£' : 'ã'  # Leit?o,Ant√≥nio
}

def fixAscii(authorName):
    for wrong,right in replaceDict.items(): 
        authorName= authorName.replace(wrong, right)
    return authorName

In [16]:
#cleanup authors 
# get authors column 
authorsList = df['authors']
# author separator 
authSeparator=';'
# Surname-name separator 
surnameSeparator=", "
################################################################


def checkAuthorName(authorName, fixNonAscii, verbose=False):
    
    #strip 
    authorName=authorName.strip()
    
    # fix special characters 
    asciiFixed=False
    if fixNonAscii: 
        if authorName.isascii()==False: 
            authorName= fixAscii(authorName)
            asciiFixed=True
            if (verbose): print("nonASCII:: "+ authorName)
            

    #split on commas 
    parts= authorName.split(',')
    
    if len(parts)==2: # "Doe, Jane" >  proceed
        result= authorName.replace(' ,', ',')# remove space before comma
    
    elif len(parts)==1: # No comma
        bits= authorName.split(' ') # Split at spaces 
        
        if len(bits)==2:  # if 2 bits, return [1], [0] // Jane Doe -> Doe, Jane
            result= bits[1]+surnameSeparator+bits[0]
        elif len(bits)==3: # if 3 # Jane X Doe
            if bits[1].lower()=='de': # Jane De Doe
                result= bits[1]+ " "+ bits[2]+surnameSeparator+ bits[0] # De Doe, Jane
            elif len(bits[1])<=2: # Jane M. Doe
                result= bits[2]+ surnameSeparator+ bits[0]+ " "+ bits [1] # Doe, M. Jane
            else: 
                result= bits[2]+ surnameSeparator+ bits[0]+ " "+ bits [1] # Doe, M. Jane
#                 print(">REARNG3: "+ result)
        else:
            print(">>>>>>>>>>>>>>>>UNCAUGHT: "+ authorName)
            result=authorName 
                
    else: # many commas (->correct spreadsheet)
        print("MANY COMMAS!: "+ authorName)
        result= authorName

    result=result.replace('  ', ' ')
    result=result.strip()
    
    #"Add period if name ends in Uppercase letter after space (' X') 
    endsWithInitial= re.search(' [A-Z]$', result)
    if (endsWithInitial!=None): 
        result+='.'
        if (verbose): print("Add period at the end: " + result)
    
    #LOG 
    if verbose and (asciiFixed or result!=authorName): 
        print("NameEdit: "+ result+" || org: "+authorName)
        
    return result 
    # END OF FUNCTION 
    
################################################################

cleanAuthorList=[0]*dfRows
verbose=False 
fixNonAscii=True 
nameChangeVerbose=False
################################################################


# cycle through contribution (co)authors, and try to get individual author names 
for i in range (dfRows): 
    fullAuthors = authorsList[i]# get row 
    authors = fullAuthors 
    
    contributionAuthors=[] # List to add all individual coauthors 
    
    if verbose: print(str(i)+ ":"+ authors)
    

    # LAST AUTHOR (' AND ')
    # get last author after ' and ' or ', and '
    lastAuthor=""
    hasLast=False
    if (authors.find(', and ')>0): 
        hasLast=True
        parts = authors.split(', and ')
        lastAuthor =parts [1]
        authors=parts[0].strip()
    elif (authors.find(' and ')>0): 
        hasLast=True
        parts = authors.split(' and ')
        lastAuthor =parts [1]
        authors=parts[0].strip()
        
        
    #SPLIT at SEMICOLON 
    comps= authors.split(';')
    nComps= len(comps)
    
    if nComps==1: # one block no semicolons 
        parts= authors.split(',')# split at commas 
        nParts = len(parts)
        if (nParts==1): # no comma, only 2 cases (ShoP Architects / Cook+Fox Architects)
            #print ("***Authors without comma:" + authors)
            noSpaceAuth= authors.strip()
            #ADD
            contributionAuthors.append(fixedAuthor)
            if (verbose): print("\t NS+ "+ coauth)  # log coauthor
        elif (nParts==2): # one author (Surname, Name)
            singleAuthor= authors
            #FIX AUTH
            fixedAuthor = checkAuthorName(singleAuthor, fixNonAscii, nameChangeVerbose)
            #ADD
            contributionAuthors.append(fixedAuthor)
            if (verbose): print("\t S+ "+ fixedAuthor)  # log single author
        elif (nParts%2==0): #multiple authors should have an odd amount of commas > even amt of bits between commas 
            for c in range(0, nParts, 2): 
                coauth= parts[c].strip()+surnameSeparator+parts[c+1].strip()
                #FIX AUTH
                fixedAuthor = checkAuthorName(coauth, fixNonAscii, nameChangeVerbose)
                contributionAuthors.append(fixedAuthor)
                if (verbose): print("\t C+ "+ fixedAuthor)  # log coauthor
        else: 
            print ("*****Error: uneven amount of commas (skipping): "+ authors)
                
    else: # authors separated by semicolons 
        for a in range (nComps): 
            #FIX AUTH
            fixedAuthor = checkAuthorName(comps[a], fixNonAscii, nameChangeVerbose)
            #ADD
            contributionAuthors.append(fixedAuthor)
            if (verbose): print("\t C+ "+ coauthor)  # log coauthor semicolons
    
    # add last author found from (' and ')
    if (hasLast): 
        #FIX AUTH
        fixedAuthor = checkAuthorName(lastAuthor, fixNonAscii, nameChangeVerbose)
        #ADD
        contributionAuthors.append(fixedAuthor)
        if (verbose): print("\t L+ "+ fixedAuthor) # log last author

    # add authors to list
    cleanAuthorList[i]=(contributionAuthors)# add to global array
    
#add authors list in DF
# cleanAuthorList
df[CLEAN_AUTHORS_LABEL]= cleanAuthorList
print(">>Cleaned up authors; added to column \'"+CLEAN_AUTHORS_LABEL +'\'')
df.head()

>>Cleaned up authors; added to column 'clean authors'


Unnamed: 0,authors,year,title,source,summary,keywords,NOTES,clean issue,issue short title,clean pages,clean keywords,clean title,filename,clean authors
0,"Lenart, Mihaly",1985,The Design of Buildings which Have Complex Mec...,ACADIA Workshop '85 [ACADIA Conference Proceed...,This paper presents a project under developmen...,,,ACADIA Workshop '85,proceedings 1985,52-68,"[1985, archive-note-no-tags]",The Design of Buildings which Have Complex Mec...,design-buildings-which-have-complex-mechanical,"[Lenart, Mihaly]"
1,"Kalay, Y.E., Harfmann, A.C. and Swerdloff, L.M.",1985,ALEX: A Knowledge-Based Architectural Design S...,ACADIA Workshop '85 [ACADIA Conference Proceed...,A methodology for the development of a knowled...,,,ACADIA Workshop '85,proceedings 1985,96-108,"[1985, archive-note-no-tags]",ALEX: A Knowledge-Based Architectural Design S...,alex-knowledge-based-architectural-design-system,"[Kalay, Y.E., Harfmann, A.C., Swerdloff, L.M.]"
2,"Quadrel, Richard W. and Chassin, David P.",1985,Energy Graphics: A Progress Report on the Deve...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Energy Graphics is a technique for determining...,,,ACADIA Workshop '85,proceedings 1985,129-141,"[1985, archive-note-no-tags]",Energy Graphics: A Progress Report on the Deve...,energy-graphics,"[Quadrel, Richard W., Chassin, David P.]"
3,"Wolchko, Matthew J.",1985,Strategies Toward Architectural Knowledge Engi...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Conventional CAD-drafting systems become more ...,,,ACADIA Workshop '85,proceedings 1985,69-82,"[1985, archive-note-no-tags]",Strategies Toward Architectural Knowledge Engi...,strategies-toward-architectural-knowledge-engi...,"[Wolchko, Matthew J.]"
4,"Hall, Theodore W.",1985,Design-Aided Computing: Adapting Old Spaces to...,ACADIA Workshop '85 [ACADIA Conference Proceed...,The introduction of computer-aided design to a...,,,ACADIA Workshop '85,proceedings 1985,25-34,"[1985, archive-note-no-tags]",Design-Aided Computing: Adapting Old Spaces to...,design-aided-computing,"[Hall, Theodore W.]"


In [17]:
# Create a dictionary of unique authors (authorname, [paper indexes])
authorsDictionary={}

for paperIndex in range (len (cleanAuthorList)):
    paperAuthors= cleanAuthorList[paperIndex]
    
    for pAuthor in paperAuthors: 
        if pAuthor in authorsDictionary: 
            authorsDictionary[pAuthor].append(paperIndex)
        else: 
            authorsDictionary[pAuthor]=[paperIndex]

print("Created author dictionary: 'authorsDictionary'")
print("Sorted dictionary")
authorsDictionary= dict(sorted(authorsDictionary.items()))
# authorsDictionary
nAuthosInitial = len(authorsDictionary)
print("Authors found: "+ str(nAuthosInitial))

Created author dictionary: 'authorsDictionary'
Sorted dictionary
Authors found: 2228


## Part 5.2: Find author aliases and reduce author list

- Create a dictionary with author names and their aliases . 
- Follow the convention `Surname, Name` 
- Remove middle names or initials when possible
- Find author aliases and compact dictionary 
- Export aliases into JSON (for review and saving)

Examples:  

```
'Ahrens, Chandler' : ['Ahrens, Chandler', 'Ahrens, C.'], 
'Akleman, Ergun': ['Akleman, Ergun' , 'Akleman, E.'], 
'Kalisperis, Loukas' : ['Kalisperis, Loukas N.' , 'Kalisperis, L.'],
 ```

In [18]:
# (manually) Check for similar names under different aliases  
def printSimilarNames(dictionary): 
    previousSurname=""
    previousName=""
    for name in dictionary: 
        bits = name.split(',')
        surname=bits[0] 
        if previousSurname==surname : 
            print(':[\''+ previousName+"\', \'"+name+"\'],")
        elif previousSurname== unidecode(surname): 
            print("UNICODE: \t"+ ':[\''+ previousName+"\', \'"+name+"\']," )
        previousSurname=surname
        previousName= name
        
# printSimilarNames(authorsDictionary)

In [114]:
# LIST AUTHOR ALIASES (Manually) 
# preferred name as key, aliases as values 
authorAliases ={
    # A
    'Ahrens, Chandler' : ['Ahrens, Chandler', 'Ahrens, C.'], 
    'Akleman, Ergun': ['Akleman, Ergun' , 'Akleman, E.'], 
    'Angulo, Antonieta': ['Angulo, Antonieta' , 'Angulo, A.H.'],
    #B
    'Barrow, Larry': ['Barrow, Larry R.' , 'Barrow, Larry'],
    'Battaglia, Christopher': ['Battaglia, Christopher A.' , 'Battaglia, Christopher'],
    'Beaman, Michael': ['Beaman, Michael Leighton' , 'Beaman, Michael'],
    'Beetz, Jakob':['Beetz, Jakob' , 'Beetz, J.'],
    'Bell, Bradley': ['Bell, Bradley' , 'Bell, Brad'],
    'Brell-Cokcan, Sigrid' : ['Brell-Cokcan, Sigrid', 'Cokcan, Sigrid Brell'], 
    'Beorkrem, Christopher': ['Beorkrem, Christopher' , 'Beorkrem, Chris'],
    'Bermúdez, Julio':  ['Bermúdez, Julio','Bermudez, Julio' , 'Bermudez, J.'],
    'Biloria, Nimish': ['Biloria, Nimish' , 'Biloria, N.','Biloria, Nimish M.'],
    'Burry, Jane': ['Burry, Jane' , 'Burry, J.'],
    'Burry, Mark' :['Burry, M.C.' , 'Burry, M.','Burry, Mark' ],
    # C
    'Campbell, Dace': ['Campbell, Dace A.' , 'Campbell, Dace'],
    'Cantrell, Bradley': ['Cantrell, Bradley E.' , 'Cantrell, Bradley'],
    'Capeluto, Isaac Guedi' : ['Capeluto, Isaac Guedi' , 'Capeluto, I.G.'],
    'Carrara, Gianfranco': ['Carrara, Gianfranco' , 'Carrara, G.'],
    'Ceccato, Cristiano':['Ceccato, Cristiano' , 'Ceccato, C.'],
    'Chase, Scott':['Chase, Scott C.' , 'Chase, Scott'],
    'Chastain, Thomas' : ['Chastain, Thomas' , 'Chastain, Th.'],
    'Cheng, Nancy': ['Cheng, Nancy Yen-Wen' , 'Cheng, N.', 'Cheng, Nancy Yen-wen'], 
    'Choi, Jin Won': ['Choi, Jin-Won' , 'Choi, Jin Won', 'Choi, Jinwon' ],
    'Chronis, Angelos': ['Chronis, Angelos' , 'Chronis, Angelo'],
    'Clayton, Mark' : ['Clayton, Mark' , 'Clayton, M.J.' , 'Clayton, M.'],
    'Corte, Daniel' : ['Corte, Daniel' , 'Corte, Dan'],
    'Crolla, Kristof': ['Crolla, Kristof' , 'Crolla, Kristo'],
    # D
    'd’Estrée Sterk, Tristan':['d\'Estree Sterk, Tristan','d\'Estrée Sterk, Tristan'],
    'Danahy, John': ['Danahy, John' , 'Danahy, J.'],
    'Datta, Sambit': ['Datta, Sambit' , 'Datta, S.'],
    'Davidson, James': ['Davidson, J.N.' , 'Davidson, J.', 'Davidson, James N.'], 
    'del Campo, Matias': ['del Campo, Matias', 'Del Campo, Matias'],
    'Do, Ellen Yi-Luen': ['Do, E. Y.L.' , 'Do, E.', 'Do, Ellen Yi-Luen' , 'Do, E. Y.L.'],
    'Donath, Dirk' : ['Donath, Dirk' , 'Donath, D.'],
    'Dorta, Tomás': ['Dorta, Tomás' , 'Dorta, T.'],
    'Doyle, Shelby Elizabeth': ['Doyle, Shelby'],
    # E
    'Eastman, Charles': ['Eastman, Charles M.' , 'Eastman, Charles'],
    'Engeli, Maia': ['Engeli, Maia' , 'Engeli, M.'],
    'Erhan, Halil': ['Erhan, Halil I.' , 'Erhan, Halil'], 
    'Estévez, Alberto' :['Estevez, Alberto T.', 'Estévez, Alberto T.'],
    # F
    'Fargas, Josep': ['Fargas, Josep' , 'Fargas, J.'],
    'Flemming, Ulrich': ['Flemming, Ulrich' , 'Flemming, U.'],
    'Flöry, Simon':['Flory, Simon', 'Flöry, Simon'],
    'Foresti, Stefano' : ['Foresti, Stefano' , 'Foresti, S.'],
    'Foged, Isak Worre':['Foged, Isak Worre','Worre Foged, Isak'],
    # G
    'García del Castillo, Jose Luis': ['García del Castillo, Jose Luis', 'Garcia del Castillo y López, Jose Luis','García del Castillo y López, Jose Luis'],
    'Gerber, David': ['Gerber, David Jason' , 'Gerber, David', 'Gerber, Dr. David Jason'],
    'Gero, John':['Gero, John S.' , 'Gero, John'],
    'Gerzso, Michael':['Gerzso, Michael' , 'Gerzso, J. Michael'],
    'Goldman, Glenn': ['Goldman, Glen' , 'Goldman, G.', 'Goldman, Glenn'],
    'Gorbet, Robert': ['Gorbet, Robert' , 'Gorbet, Rob'],
    'Gramazio, Fabio' : ['Gramazio, Fabio' , 'Gramazio, F.'],
    'Greenberg, Evan': ['Greenberg, Evan L.' , 'Greenberg, Evan'], 
    'Gross, Mark': [ 'Gross, Mark', 'Gross, Mark D.'],
    'Gupta, Shawn': ['Gupta, Shawn' , 'Gupta, Sachin Sean'],
    'Gutierrez, Maria Paz':['Gutierrez, Maria Paz', 'Gutierrez, Maria-Paz'],
    'Gün, Onur Yüce':['Gün, Onur Yüce', 'Yüce Gün, Onur'], 
    # H
    'Hall, Theodore' : ['Hall, Theodore W.' , 'Hall, T.W.'],
    'Harfmann, Anton' : ['Harfmann, Anton' , 'Harfmann, A.C.', 'Harfmann, Anton C.'],
    'Hegre, Erik': ['Hegre, Erik D.' , 'Hegre, Erik'],
    'Hemsath, Timothy': ['Hemsath, Timothy L.' , 'Hemsath, Timothy'],
    'Hill, Pamela':['Hill, Pamela J.' , 'Hill, Pamela'],
    'Hunt, Erin': ['Hunt, Erin Linsey' , 'Hunt, Erin'],
    # I 
    'Imbern, Matías': ['Imbern, Matías' , 'Imbern, Matias'],
    'Johnson, Brian': ['Johnson, Brian R.' , 'Johnson, Brian' ],
    # J
    'Jabi, Wassim': ['Jabi, Wassim' , 'Jabi, W.'],
    'Johnson, Robert' :['Johnson, Robert E.' , 'Johnson, R.E.'], 
    'Joyce, Sam':['Joyce, Sam Conrad' , 'Joyce, Sam'],
    # K                  
    'Kalay, Yehuda':  ['Kalay, Yehuda' , 'Kalay, Y.E.', 'Kalay, Yehuda E.' ], 
    'Kalisperis, Loukas' : ['Kalisperis, Loukas N.' , 'Kalisperis, L.'], 
    'Kellett, Ronald': ['Kellett, Ronald' , 'Kellett, R.'], 
    'Kensek, Karen' : ['Kensek, Karen' , 'Kensek, K.', 'Kensek, Karen M.' ], 
    'Klinger, Kevin': ['Klinger, Kevin R.' , 'Klinger, Kevin'], 
    'Krawczyk, Robert': ['Krawczyk, Robert' , 'Krawczyk, R.', 'Krawczyk, Robert J.'],
    'Körner, Axel': ['Körner, Axel', 'Korner, Axel'],  
    'Kohler, Matthias': ['Kohler, M.', 'Kohler, Matthias'], 
    'Kudless, Andrew': ['Kudless, A.', 'Kudless, Andrew'],
    'Kumar, Shilpi':['Kumar, S.', 'Kumar, Shilpi'],
    'Kvan, Thomas' :['Kvan, T.', 'Kvan, Thomas'],
    # L
    'Leitão, António':['Leitão, Antonio', 'Leitão, António'],
    'Liapi, Katherine':['Liapi, Katherine', 'Liapi, Katherine A.'],
    'Liu, Jingyang':['Liu, Jingyang', 'Liu, Jingyang (Leo)'],
    'Luhan, Gregory' :['Luhan, G.A.', 'Luhan, Greg A. (et al.)', 'Luhan, Gregory A.'], #  ET AL!
    'Lömker, Thorsten':['Lömker, Thorsten M.', 'Lömker, Thorsten Michael'],
    'López, Déborah': ['López, Déborah', 'Lopez, Deborah', 'López Lobato, Déborah'], 
    # M
    'Martens, Bob' :['Martens, B.', 'Martens, Bob'],
    'Mathew, Anijo':['Mathew, Anijo', 'Mathew, Anijo Punnen'],
    'Maze, John':['Maze, J.', 'Maze, John'],
    'McCall, Raymond':['McCall, R.J.', 'McCall, Ray', 'McCall, Raymond'],
    'Meibodi, Mania' :['Meibodi, Mania Aghaei', 'Aghaei Meibodi, Mania'],
    'Meyboom, AnnaLisa':['Meyboom, AnnaLisa', 'Meyboom, Annalisa'],
    'Mitchell, William' :['Mitchell, W.', 'Mitchell, William J.'],
    'More, Gregory':['More, G.', 'More, Gregory'],
    'Mueller, Caitlin':['Mueller, Caitlin', 'Mueller, Caitlin T.'],
    'Mueller, Völker':['Mueller, Volker', 'Mueller, Völker'],
    # N
    'Muramoto, Katsuhiko':['Muramoto, K.', 'Muramoto, Katsuhiko'],
    'Nagakura, Takehiko':['Nagakura, T.', 'Nagakura, Takehiko'],
    'Narahara, Taro':['Narahara, T.', 'Narahara, Taro'],
    'Neiman, Bennett':['Neiman, Bennett', 'Neiman, Bennett R.'],
    'Neuman, Eran':['Neuman, E.', 'Neuman, Eran'],
    'Noble, Douglas':['Noble, D.', 'Noble, Douglas', 'Noble, Douglas E.'],
    'Norman, Frederick Stacy':['Norman, F. Stacy', 'Norman, Frederick','Norman, Frederick Stacy'],
    'Novak, Marcos':['Novak, Marcos', 'Novak, Marcos J.'],
    'Novembri, Gabriele':['Novembri, G.', 'Novembri, Gabriele'],
    # O
    'O’Malley, Mary': ['O\'Malley, Mary'],#  'O\'Malley, Mary'], ##
    'Oosterhuis, Kas':['Oosterhuis, K.', 'Oosterhuis, Kas'],
    'Özel, Güvenç': ['Özel, Güvenç', 'Ozel, Guvenc'],
    # P
    'Papazian, Pegor':['Papazian, P.', 'Papazian, Pegor'],
    'Payne, Andrew':['Payne, Andrew', 'Payne, Andrew O.'],
    'Pena, Alexander': ['Pena de Leon, Alex', 'Pena de Leon, Alex'], 
    'Peri, Christopher':['Peri, Ch.', 'Peri, Christopher'],
    'Petersen, Kirstin':['Petersen, Dr. Kirstin', 'Petersen, Kirstin'],
    'Pedersen, Ole Egholm': ['Pedersen, Ole Egholm','Egholm-Pedersen, Ole'], 
    'Petzold, Frank':['Petzold, F.', 'Petzold, Frank'],
    'Pigram, David' :['Pigram, Dave', 'Pigram, David'],
    # R
    'Radford, Antony' :['Radford, A.D.', 'Radford, Anthony D.', 'Radford, Antony D.'],
    'Reeves, David' :['Reeves, Dave', 'Reeves, David'],
    'Rocker, Ingeborg':['Rocker, Ingeborg', 'Rocker, Ingeborg M.'],
    # S
    'Sabin, Jenny':['Sabin, Jenny', 'Sabin, Jenny E.'],
    'Sanchez del Valle, Carmina': ['Sanchez-Del-Valle, Carmina','Sanchez del Valle, Carmina'],
    'Salim, Flora':['Salim, Flora', 'Salim, Flora Dilys'],
    'Sass, Lawrence':['Sass, Larry', 'Sass, Lawrence'],
    'Schmitt, Gerhard':['Schmitt, G.', 'Schmitt, Gerhard','Schmitt, Gerhard N.'],
    'Selkowitz, Steven':['Selkowitz, S.', 'Selkowitz, Steven'],
    'Senagala, Mahesh':['Senagala, M.', 'Senagala, Mahesh'],
    'Shelden, Dennis':['Shelden, D.', 'Shelden, Dennis'],
    'Søndergaard, Asbjørn':['Sondergaard, Asbjorn'],## name in paper 
    'Sprecher, Aaron':['Sprecher, A.', 'Sprecher, Aaron'],
    'Streich, Bernd':['Streich, B.', 'Streich, Bernd'],
#     :['Swackhamer, Marc', 'Swackhamer, Marc (et al.)'],
    # T 
    'Talbott, Kyle':['Talbott, K.', 'Talbott, Kyle'],
    'Taron, Joshua':['Taron, Joshua', 'Taron, Joshua M.'],
    'Terzidis, Kostas':['Terzidis, Constantinos A.', 'Terzidis, Costas', 'Terzidis, K.', 'Terzidis, Kostas'],
    'Thomsen, Mette Ramsgaard':  ['Thomsen, Mette Ramsgaard', 'Ramsgaard Thomsen, Mette'],
    'Tracy, Kenneth':['Tracy, Ken', 'Tracy, Kenneth'],
    'Thün, Geoffrey':['Thun, Geoffrey', 'Thün, Geoffrey'],
    'Tunçer, Bige':['Tunçer, B.', 'Tunçer, Bige'],
    'Turk, Ziga':['Turk, Z.', 'Turk, Ziga'],
    # V
    'Van Wyk, Skip': ['Van Wyk, C.G. Skip', 'Van Wyk, C.S.G.'],
#     'Van Wyk, C.S.G.':['Van Wyk, C.G. Skip', 'Van Wyk, C.S.G.'],
    'Von Buelow, Peter': ['Von Buelow, Peter', 'von Buelow, Peter'],
    # W
    'Williamson, Shane':['Williamson, R. Shane', 'Williamson, Shane'],
    'Wiscombe, Tom':['Wiscombe, Thomas', 'Wiscombe, Tom'],
    'Wit, Andrew':['Wit, Andrew', 'Wit, Andrew John'],
    'Wojtowicz, Jerzy':['Wojtowicz, J.', 'Wojtowicz, Jerzy'],
    'Woodbury, Robert':['Woodbury, R.','Woodbury, R.F.', 'Woodbury, Robert', 'Woodbury, Robert F.'],
    # Y 
    'Yan, Wei':['Yan, Dr. Wei', 'Yan, Wei'],
    'Yessios, Chris':['Yessios, Chris', 'Yessios, Chris I.'],
    # Z
    'Zdepski, Stephen':['Zdepski, M. Stephen', 'Zdepski, Stephen']
}
# :['Verbeke, Johan', 'Verbeke,Johan'],# CORRECTED in CSV
# ['Krieg, Oliver David', 'Krieg,Oliver David'],#corrected in CSV 

######################################################
# Export and import dictionary 
import json
authorAliasesJsonFile= 'author-aliases.json'

exportAuthorAliases=False
if exportAuthorAliases:
    if (os.path.isfile(authorAliasesJsonFile)): print(">>Note: File exists; will overwrite. ")
    with open(authorAliasesJsonFile, "w") as exportFile:
        json.dump(authorAliases, exportFile)
        print(">>Exported author aliases as: "+ authorAliasesJsonFile)
        
importAuthorAliases= False
if importAuthorAliases:    
    if os.path.isfile(authorAliasesJsonFile)==False: print("*ERROR: JSON file not found ("+ authorAliasesJsonFile+")")
    else: 
        jsonFile= open(authorAliasesJsonFile)
        jsonData= json.load(jsonFile)
        print(jsonData)    

In [95]:
# Make new dictionary of reduced author list, and all all paper indexes from aliases
compactAuthorsDictionary={}

replacedKeys=[]
for preferredName in authorAliases: 
    aliases= authorAliases[preferredName]
    allPapers=[]
    for alias in aliases: 
        replacedKeys.append(alias) # add alias to list 
        papers= authorsDictionary[alias]
        allPapers.extend(papers) # add all paper indexes
#         print(alias+"-"+ str(papers))
#     print(preferredName+ "-"+ str(aliases)+ " All papers:"+ str(allPapers))
    compactAuthorsDictionary[preferredName]= allPapers
    
#add missing keys (authors without aliases)
for author in authorsDictionary: 
    if author not in replacedKeys: 
        #add author to 
        compactAuthorsDictionary[author]= authorsDictionary[author]
        
compactAuthorsDictionary= dict(sorted(compactAuthorsDictionary.items()))
        
nAuthors= len(compactAuthorsDictionary)
print(">>Compact author list length: " + str(nAuthors))
print(">>Author list reduced by: " + str(nAuthosInitial-nAuthors))

#check for illegible characters 
for author in compactAuthorsDictionary: 
    if author.find('?')>0: 
        print(author)

# compactAuthorsDictionary

>>Compact author list length: 2057
>>Author list reduced by: 171


In [86]:
correctedCleanAuthorList=[]
counter=0
notMatched=0

# cycle through the paper-author list 
for i in range (len(cleanAuthorList)): 
    paperAuthors= cleanAuthorList[i]# paper co-authors 
    
    cleanPaperAuthors=[]
    
    #cycle through coauthors 
    for author in paperAuthors: 
        if author in compactAuthorsDictionary: 
            cleanPaperAuthors.append(author)
        else: 
#             print(str(i)+ " looking for: "+ author)
            found=False
            for k,v in authorAliases.items(): 
                if author in v: 
#                     print (str(i)+ " found in v:"+ author)
                    cleanPaperAuthors.append(k)
                    found=True
                    break
            if found==False: 
                print("*ERROR: AUTHOR NOT MATCHES WITH ALIAS: "+ author)
    correctedCleanAuthorList.append(cleanPaperAuthors)


# overwrite authors (reduce aliases)
df[CLEAN_AUTHORS_LABEL]= correctedCleanAuthorList
print(">>Rewrote author list (column \'" +CLEAN_AUTHORS_LABEL+  "\' ) to account for author aliases")
df.head()


>>Rewrote author list (column 'clean authors' ) to account for author aliases


Unnamed: 0,authors,year,title,source,summary,keywords,NOTES,clean issue,issue short title,clean pages,clean keywords,clean title,filename,clean authors
0,"Lenart, Mihaly",1985,The Design of Buildings which Have Complex Mec...,ACADIA Workshop '85 [ACADIA Conference Proceed...,This paper presents a project under developmen...,,,ACADIA Workshop '85,proceedings 1985,52-68,"[1985, archive-note-no-tags]",The Design of Buildings which Have Complex Mec...,design-buildings-which-have-complex-mechanical,"[Lenart, Mihaly]"
1,"Kalay, Y.E., Harfmann, A.C. and Swerdloff, L.M.",1985,ALEX: A Knowledge-Based Architectural Design S...,ACADIA Workshop '85 [ACADIA Conference Proceed...,A methodology for the development of a knowled...,,,ACADIA Workshop '85,proceedings 1985,96-108,"[1985, archive-note-no-tags]",ALEX: A Knowledge-Based Architectural Design S...,alex-knowledge-based-architectural-design-system,"[Kalay, Yehuda, Harfmann, Anton, Swerdloff, L.M.]"
2,"Quadrel, Richard W. and Chassin, David P.",1985,Energy Graphics: A Progress Report on the Deve...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Energy Graphics is a technique for determining...,,,ACADIA Workshop '85,proceedings 1985,129-141,"[1985, archive-note-no-tags]",Energy Graphics: A Progress Report on the Deve...,energy-graphics,"[Quadrel, Richard W., Chassin, David P.]"
3,"Wolchko, Matthew J.",1985,Strategies Toward Architectural Knowledge Engi...,ACADIA Workshop '85 [ACADIA Conference Proceed...,Conventional CAD-drafting systems become more ...,,,ACADIA Workshop '85,proceedings 1985,69-82,"[1985, archive-note-no-tags]",Strategies Toward Architectural Knowledge Engi...,strategies-toward-architectural-knowledge-engi...,"[Wolchko, Matthew J.]"
4,"Hall, Theodore W.",1985,Design-Aided Computing: Adapting Old Spaces to...,ACADIA Workshop '85 [ACADIA Conference Proceed...,The introduction of computer-aided design to a...,,,ACADIA Workshop '85,proceedings 1985,25-34,"[1985, archive-note-no-tags]",Design-Aided Computing: Adapting Old Spaces to...,design-aided-computing,"[Hall, Theodore]"


In [87]:
# Export main (articles) dataframe to CSV
exportPapersDataframe=True
if exportPapersDataframe: 
    exportName= "export-acadia-articles-list.csv"
    df.to_csv(exportName)
    print(">>Exported Articles dataframe as: "+ exportName)

>>Exported Articles dataframe as: export-acadia-articles-list.csv


# Part 5.3: Create new dataframe with author names, paper indexes, and aliases 

In [88]:
# From author name get valid filename: from 'Surname, Name' > 'name-surname'
def getAuthorEntryFilename(name): 

    # Reverse surname,name to name surname
    tempName=""
    if (name.find(',')>0):  # if there are commas
        bits = name.split (', ') #split at comma 
        if (len(bits)<2): print ("ONE WORD with comma?" + name)
        else: # rearrange name 
            tempName= bits[1]+" "+bits[0]
    else: 
#         print("No comma: "+ name)
        tempName=name
    # get filename 
    tempName= unidecode(stringToFilename(tempName))    
    
    return tempName

In [89]:
#Create new dataframe with author name, paper indexes and aliases 
authorNames= compactAuthorsDictionary.keys()
paperIndexes= compactAuthorsDictionary.values()
aliases = [ authorAliases[author] if author in authorAliases else None for author in authorNames  ]
authorNameUnicode= [unidecode(author) for author in authorNames]
authorEntryFilename= [getAuthorEntryFilename(author) for author in list(compactAuthorsDictionary.keys())]

authorNamesList= list(authorNames) # to use for index finding
# duplicates   
checkListDuplicates(authorEntryFilename)

# collect data
authorData= {
    AUTHOR_NAME_LABEL:authorNames, 
    PAPER_INDEXES_LABEL: paperIndexes, 
    ALIASES_LABEL: aliases,
    UNICODE_NAME_LABEL: authorNameUnicode, 
    ENTRY_FILENAME: authorEntryFilename
    }
# create dataframe
authordf= pd.DataFrame(authorData)

# SAVING DATAFRAME 
exportAuthorsDataframe=True
if exportAuthorsDataframe: 
    exportName= "export-acadia-author-list.csv"
    authordf.to_csv(exportName)
    print("Exported authors dataframe as: "+ exportName)
    
authordf.head()

Duplicate check:
[]
Exported authors dataframe as: export-acadia-author-list.csv


Unnamed: 0,author name,paper indexes,aliases,unicode name,filename
0,"Aagaard, Anders Kruse",[1379],,"Aagaard, Anders Kruse",anders-kruse-aagaard
1,"Aalbers, C.",[518],,"Aalbers, C.",c-aalbers
2,"Abbasy-Asbagh, Ghazal",[942],,"Abbasy-Asbagh, Ghazal",ghazal-abbasy-asbagh
3,"Abdel-Rahman, Amira",[1225],,"Abdel-Rahman, Amira",amira-abdel-rahman
4,"Abdelmawla, S.",[305],,"Abdelmawla, S.",s-abdelmawla


In [109]:
#########################################################
# Util functions for authors 
def isAuthorInList (authorName): 
    return authorName in authorNamesList

def getAuthorIndex(authorName): 
    return authorNamesList.index(authorName)

def hasAliases(authorName): 
    return authorName in authorAliases

def getAuthorAliases(authorName): 
    return authorAliases[authorName]

def isNameAlias(authorName): 
    for name, aliases in authorAliases.items(): 
        if authorName in aliases: 
            return name
    return False
#tName='Van Wyk, Skip'# 
# tName= 'Doyle, Shelby Elizabeth'
# print(getAuthorIndex(tName))
# print(isNameAlias(tName))
# print(" Is author: "+str( isAuthorInList(tName)))
# print(hasAliases (tName))
# print(getAuthorAliases(tName))

def getAuthorIndexByName(editor): 
    if isAuthorInList(editor): 
        return getAuthorIndex(editor)
    else: 
        alias= isNameAlias(editor)# returns false or preferred name
        if alias != False: 
            return getAuthorIndex(alias)
        else: return -1 # not an author 
        

# 6. Create Markdown Files 

In [66]:
from datetime import datetime
import math # for nan 

#YAML-MD formatting constants 
YAMLBAR="---"
NL= '\n'
VB='|'
QUO="\""# quote 
BUL="- "# bullet
MD='.md'
DATEFORMAT="%Y-%m-%d" #T%H,%M" # +%S,00"  #hugo date:  "2023-07-20T16,44,32+03,00"

#MD frontmatter items
TITLE="title: "
DATE= "date: "
ABSTRACT="abstract: "
KEYWORDS="keywords: "
# for article entries 
CONTRIBUTORS="contributors: "
#For issue entries
EDITORS="editors: "
HAS_ARTICLES = "has_articles: " #
# to add signature of file creator 
AUTHOR= "author: " 

###################################
### Util functions for Markdown 
###################################

def isNan(val): 
    return  math.isnan(val)

# if nan return 'n/a'
def nanToString(val): 
    if val==None or (type(val)!=str and math.isnan(val)):
        return 'N/A'
    else: return val

def encloseInComment(text): 
    return NL +'<!--'+ NL + text + NL + '-->' + NL 

def encloseInParentheses(text): 
    return "("+str(text)+")"

def encloseInBrackets(text): 
    return "["+str(text)+"]"

def mdBold(text): 
    return '**'+text+'**'

# list or string to YAML
def toYAMLList(items):
    result=""
    if type(items)==list: 
        for item in items: 
            result+= QUO+ str(item)+ QUO +", "
        result= result[:len(result)-2] # remove comma and space from last entry
        return encloseInBrackets(result) # add brackets and return
    if type(items)==str: 
        result =QUO+ items+ QUO
        return encloseInBrackets(result)# add brackets and return 
    else: 
        print("*ERROR, uncaught input type (toYAMLList)")

###########################################################################################
##### function to generate Markdown YAML header + content 
##### (content: text to add after header; contributors: for paper entries; has: for parenting (issues to journal))
def createHeader (title, content=None, abstract= None ,keywords=None,contributors=None, has=None, editors=None): 
    
    # COLLECT FRONT MATTER INFO 
    result = YAMLBAR+NL  # OPEN YAML 
    result+= TITLE+ QUO+title +QUO+NL    # TITLE 
    ## ADD ARCHIVIST 
    result+= AUTHOR+ QUO +FILE_AUTHOR+  QUO + NL  ############## TO KEEP TRACK WHO CREATED THESE FILES
    ###########################################################################################
    # contributors (for article entries)
    if contributors!= None: 
        contributorHeader=""
#         contributorHeader+= CONTRIBUTORS+"["
        for i in range(len(contributors)): 
            contributorHeader+= QUO+ contributors[i]+QUO
            if i<len(contributors)-1: contributorHeader+=", " # comma separation
#         contributorHeader+="]"+ NL 
        contributorHeader= CONTRIBUTORS+ encloseInBrackets(contributorHeader)+ NL 
        result+= contributorHeader
        
    # ABSTRACT 
    if abstract != None: 
        result+= ABSTRACT+QUO+ abstract +QUO+NL
        
    # KEYWORDS 
    if (keywords!=None): 
        result+= KEYWORDS + toYAMLList(keywords)+ NL 
    # EDITORS 
    if (editors!=None): 
        result+=EDITORS + toYAMLList(editors)+ NL 
        
    # parenting flag for issues 
    if (has): 
        result+= has+NL 
        
    #DATE (CURRENT)
    timeStamp = datetime.now().strftime(DATEFORMAT)
    result+=DATE+ QUO+ str(timeStamp) +QUO+ NL
    
    result+=YAMLBAR+NL # CLOSE YAML 
    ###########################################################################################
    #add content 
    if content!=None: 
        result+=content 
    ###########################################################################################
    
    # add comment signature 
    stamp= "This file was created via the Python parser version "+ str(PARSER_VERSION)+ NL 
    stamp+= "On "+ str(datetime.now()) + NL
    stamp+= "by "+ FILE_AUTHOR if FILE_AUTHOR!=None and FILE_AUTHOR!="" else "unknown creator."
    
    result+=NL +NL + encloseInComment(stamp)+ NL  
        
    return result

###########################################################################
### Util functions to for internal links 
###########################################################################
# apostrophes or single quotes in link caption should be preceded by a backslash 
def fixLinkCaption(caption): 
    caption= caption.replace('\'', '\\\'')
    caption= caption.replace('&', '\&')
    return caption
# util function for internal and external links 
def getInternalLink(caption, destination): 
    caption= fixLinkCaption(caption)
    return "!["+caption+"]("+destination+")"
def getExternalLink(caption, destination): 
    return "["+caption+"]("+destination+")"
# Util function to link to some article 
def getLinkToArticleByIndex(index, caption=None):
    if caption==None: 
        caption= df.iloc[index][CLEAN_TITLE_LABEL]
    caption=fixLinkCaption(caption)
    articleFilename= df.iloc[index][ENTRY_FILENAME]
    return "!["+caption+"](article:"+articleFilename+")"
# Util function to get link to author 
def getLinkToContributorByIndex(index):
    caption= authordf.iloc[index][UNICODE_NAME_LABEL]
    caption=fixLinkCaption(caption)
    articleFilename= authordf.iloc[index][ENTRY_FILENAME]
    return "!["+unidecode(caption)+"](contributor:"+articleFilename+")"
# util function to create link to library item by id: 
# form ![Caption](bib:bibId)
def getIssueLibraryLink(bibId, caption):
    return getInternalLink(caption, "bib:"+str(bibId))
# without caption, will be generated by Sandpoints 
def getIssueLibraryLink(bibId):
    return "![](bib:"+str(bibId)+")"
# remove double spaces tabs etc 
def util_removeDoubleSpacesTabsLines(text): 
    #remove double spaces, tabs, new lines etc. 
    return text.replace('/\s\s+/g', ' ');
def util_doubleToSingleQuotes(text): 
    return re.sub('"', '\'', text)

def util_cleanAbstract(abstract): 
    abstract= util_removeDoubleSpacesTabsLines(abstract) #double to single quotes
    abstract= util_doubleToSingleQuotes(abstract)
    abstract= re.sub('[^A-Za-z0-9 \,\.\']+', '', abstract)
    return abstract

print(">>Loaded Markdown related functions.")

>>Loaded Markdown related functions.


## Part 6.1: Create MD file per author 

In [69]:

def writeAuthorEntries(export=True, breakCount=None): 
    counter=0
    
    for index, row in authordf.iterrows():

        filename= row[ENTRY_FILENAME]+MD
        entryTitle = row[UNICODE_NAME_LABEL]
        authorName = row[AUTHOR_NAME_LABEL]
        aliases= row[ALIASES_LABEL]
       
        # add text 
        text=""
        #################################################
        hasAliases=False # flag for adding alias keyword 
        # Aliases table 
        if (aliases)!=None:
            hasAliases= True # raise flag 
            #create table 
            aliasTable="|Aliases|"+ NL 
            aliasTable+="|-|"+NL 
            for alias in aliases: 
                aliasTable+=("|"+alias+"|"+ NL)
                
            text+=aliasTable+ NL 
#             print(aliasTable)
        #################################################
        # Paper List
        # get every paper, paper year, and generate link to that paper 
        paperList="List of contributions:"+NL
        # Create dictionary with title as key and year as value
        # sort dictionary by value (date); to cater for papers published
        # the same year by the same author 
        paperIndexes= row[PAPER_INDEXES_LABEL]
        articleDict={}
        # use just paper title (not link) for this list
        # since there are already backlinks to papers 
        for paperIndex in paperIndexes:         
            paperTitle= unidecode(df.iloc[paperIndex][CLEAN_TITLE_LABEL])
#             paperFilename= df.iloc[paperIndex][ENTRY_FILENAME]
#             paperLink = getLinkToArticleByIndex(paperIndex)
            paperYear = df.iloc[paperIndex]['year']   
             # assemble string
            ##SORT by year 
            articleDict[paperTitle]= paperYear
        #sort dictionary by value (year of publication)
        sortedPapers= {k: v for k, v in sorted(articleDict.items(), key=lambda item: item[1], reverse=True)}
#         if len(sortedPapers)>2: print(sortedPapers)
        for title, year in sortedPapers.items(): 
            paperList+= BUL+"("+ str(year)+") "+ title+ NL
        text+= paperList+ NL 
        #################################################
        # get coauthors 
        coauthors=[]
        for paperIndex in paperIndexes: 
            paperAuthors= df.iloc[paperIndex][CLEAN_AUTHORS_LABEL]
            for author in paperAuthors: 
                if author!= authorName and author not in coauthors: 
                    coauthors.append(author)
#         print(authorName)
        coauthorsBulletList="List of co-authors: "+NL 
        for coauthor in coauthors: 
            coauthorIndex= authorNamesList.index(coauthor)
            coauthorsBulletList+= BUL+ getLinkToContributorByIndex(coauthorIndex)+ NL 
    
        if len(coauthors)>0: 
            text+= coauthorsBulletList
        #################################################
        
        # Create string 
        mdContent = createHeader(entryTitle, text, None , TAG_ALIASES if hasAliases else None)
    
        # Export 
        if (export): 
            md=open(authorsFolder+filename, WRITE_MODE)#create file
            md.write(mdContent)# write stuff
            md.close()# close file 
            
        counter+=1
        if breakCount!= None and counter>=breakCount: 
            print("BREAKING "+ str(counter)) 
            break
    
    #LOG  
    if (export): 
        print("Wrote "+ str(counter)+ " Author entries.") 
    else: 
        print("Export False; Processed " + str(counter)+ " Author entries.")

# True/False to export, 
# 2nd argument is integer as break counter for testing
writeAuthorEntries(False)

Wrote 2057 Author entries.


# Part 6.2: Create MD Files per Article

Include 
- [x] abstract (`summary`)
- [x] keywords 
- [x] Short Title / filename
- [x] co-authors
- [x] issue


In [70]:
   
def writePaperEntries(export=True, breakCount=None): 
    counter=0    
    
#     issueTitleList= issuedf[ISSUE_NAME]
    print("TODO: Add library link")
    
    # iterate papers dataframe 
    for index, row in df.iterrows():
        #title 
        title= unidecode(row[CLEAN_TITLE_LABEL])
        ################################################        
        #issue 
        issue= row[CLEAN_ISSUE_LABEL] 
        # Get Library link if applicable 
        #year 
        year= row['year']
        # FILENAME 
        shortTitle= row[ENTRY_FILENAME]
        filename=shortTitle +MD
        ################################################
        # get abstract 
        abstract= row['summary']
        hasAbstract=True
        # check if abstract is nan (empty)
        if type(abstract)!= str and math.isnan(abstract): 
            hasAbstract=False
        else: #cleanup abstract 
            abstract= util_cleanAbstract(unidecode(abstract))
        ################################################
        # KEYWORDS 
        keywords=[]
        keywords= row[CLEAN_KEYWORDS_LABEL]
        ################################################
        # AUTHORS  
        authors= row[CLEAN_AUTHORS_LABEL]
        authorsText=""
        contributors=[]
        for author in authors: 
            # get author index 
            authorIndex= authorNamesList.index(author)
            # get author filename
            authorFilename= authordf.iloc[authorIndex][ENTRY_FILENAME]
            contributors.append( authorFilename+MD) # populate contributors 
            # get link to author 
            authorLink= getLinkToContributorByIndex(authorIndex)
#             print(authorLink)
            authorsText+= author+ "; " #  authorLink
        authorsText= authorsText[:len(authorsText)-2]# cut the last semicolon
        #add period if it doesnt end with period
        if authorsText.endswith('.')==False: authorsText+="."

#         print(contributors)
        ################################################        
        #pages 
        pages= row[CLEAN_PAGES]
        ################################################        
        text=""
        ################################################        
        # INFO TABLE       
        table= '| | |'+NL 
        table+= '|-|-|'+NL
        table+= '|'+ mdBold('Year') + '|' + str(year)+ '|'+ NL 
        table+= '|'+ mdBold('Authors') + '|' + unidecode(authorsText) + '|'+ NL 
        table+= '|'+ mdBold('Issue') + '|' + (issue)+ '|'+ NL 
        table+= '|'+ mdBold('Pages') + '|' + (pages)+ '|'+ NL 
        table+= '|'+ mdBold('Entry filename') + '|' +(shortTitle)+ '|'+ NL 
        text+=table+NL 
#         print(table)
        
        ################################################        
        #append publication year on title 
        title+= " "+ encloseInParentheses(year)
        # get file content 
#         title, content=None, abstract= None ,keywords=None,contributors=None, has=None
        mdContent = createHeader(title,  text , abstract if hasAbstract else None, keywords, contributors)
#         print(mdContent)
        ################################################        
        # Export 
        if (export): 
#             print("Saving: "+ str(index)+" - "+ filename)
            try: 
                with open(articleFolder+filename, WRITE_MODE) as md: #create file
                    md.write(mdContent)# write stuff
                    md.close()# close file 
            except FileNotFoundError:
                print('no such file:', filename)
    
        # stop counter
        counter= counter+1
        if breakCount!= None and counter>=breakCount:  
            print("BREAKING "+ str(counter))
            break 
    print("Processed " + str(counter) + " article entries.")

# 1st argument True/False to export or just process
# 2nd optional argument is integer, break point coutner, for testing
writePaperEntries(False)

TODO: Add library link
Processed 1488 article entries.


# Part 6.3 Export Editors 

In [112]:
# # compare editors with authors 
# # if editor is also author return author index 


def writeEditorEntries(export=False, breakCount=None): 
    counter=0
    for editor in editorsDict.keys(): 

        # get filename
        filename= editorsDict[editor]['filename']+MD
        
        ########## Is editor also author 
        # match names to find if editor is also author 
        authorIndex= getAuthorIndexByName(editor)
        
        text= ""
        # If editor is also author, add link 
        if authorIndex!=-1: 
            text= "See also editor's contributor entry: "+ getLinkToContributorByIndex(authorIndex)
        #################
        # create MD content 
        mdContent = createHeader(unidecode(editor), text)
        
        # Export 
        if (export): 
        #             print("Saving: "+ str(index)+" - "+ filename)
            try: 
                with open(editorsFolder+filename, WRITE_MODE) as md: #create file
                    md.write(mdContent)# write stuff
                    md.close()# close file 

            except FileNotFoundError:
                print('no such file:', filename)

        counter+=1
        if breakCount!=None and counter>=breakCount: 
            print("BREAKING: "+ str(counter))
            break 

    if export: 
        print(">>Exported " + str( counter)+ " editors.")
    else: 
        print(">>Processed " + str( counter)+ " editors (no export).")

writeEditorEntries(True)

>>Exported 116 editors.


# Part 6.3: Export Issue entries 

In [67]:

def writeIssueEntries (export=False, breakCount=None): 
    issueFilenameHas=""
    counter=0
    for index, row in issuedf.iterrows(): 
        
        name=row[ISSUE_NAME]
        proceedingsTitle= row[ISSUE_PROCEEDINGS_TITLE]
        isbn= row[ISSUE_ISBN]
        publisher= row[ISSUE_PUBLISHER]
        
        year= row[ENTRY_YEAR]
        dates= row[ISSUE_DATES]
        host= row [ISSUE_HOST]
        location=row[ISSUE_LOCATION]
        website= row[ISSUE_WEBSITE]
    
        filename= row[ENTRY_FILENAME]
        
        paperIndexes=row[PAPER_INDEXES_LABEL]
        editorFilenames= row[EDITOR_FILENAMES_COLUMN]
        libraryRef= row[ISSUE_LIB_ID]
            
        ####################################################################
        # issue articles & author count 
        hasArticles=""
        authorCount=0
        for i in range(len( paperIndexes)):
            paperIndex= paperIndexes[i]
            # get list of authors 
            authorList= df.iloc[paperIndex][CLEAN_AUTHORS_LABEL]
            #if array add length, if integer add 1 
            if type (authorList)==list:  
                authorCount += len(authorList)# how many authors 
            else: 
                authorCount+=1 
            
            hasArticles+= QUO+ df.iloc[paperIndex][ENTRY_FILENAME]+MD + QUO
            if i<len(paperIndexes)-1: 
                hasArticles+=", "
        hasArticles= 'has_articles: ['+hasArticles+']' # MAKE REF 
        
        ####################################################################

        text=""
#         abstract=" SOME ABSTRACT " #str(proceedingsTitle)+ "  "+ str(location)
        table= '| | |'+ NL 
        table+='|-|-|'+ NL
        table+= mdBold("Name")+'|'+name +'|'+NL 
        table+= mdBold("Proceedings name")+'|'+nanToString(proceedingsTitle) +'|'+NL 
        table+= mdBold("Article count")+'|'+str(len(paperIndexes)) +'|'+NL 
        table+= mdBold("Contributor count")+'|'+str(authorCount) +'|'+NL 
        table+= mdBold("Year")+'|'+str(year) +'|'+NL 
        # conf info 
        table+= mdBold("Location")+'|'+nanToString(location) +'|'+NL 
        table+= mdBold("Host")+'|'+nanToString(host) +'|'+NL 
        table+= mdBold("Dates")+'|'+nanToString(dates) +'|'+NL 
        #bib info 
        table+= mdBold("ISBN")+'|'+nanToString(isbn) +'|'+NL 
        table+= mdBold("Publisher")+'|'+nanToString(publisher) +'|'+NL
        # library link, if available
        libLink= getIssueLibraryLink(libraryRef) if type(libraryRef)==str else NULL_TEXT
        table+= mdBold("Library link")+'|' +libLink +'|'+NL 
        table+= mdBold("Entry filename")+'|' +filename +'|'+NL 
    
        text+= table
        
        keywords=[year,TAG_ISSUE]
        # add year in parentheses in title 
        name+= " "+ encloseInParentheses(year)
         # get file content 
        mdContent = createHeader(name,  text , None, keywords,None, hasArticles, editorFilenames)

        # Export 
        if (export): 
            ################## SAVE File 
            try: 
                with open(issueFolder+filename+MD, WRITE_MODE) as md: #create file
                    md.write(mdContent)# write stuff
                    md.close()# close file 

            except FileNotFoundError:
                print('no such file:', filename)
            
            issueFilenameHas+= QUO+ filename+MD+QUO +","
            ##################  
            
        counter+=1
        
        if breakCount!=None and counter>=breakCount: 
            print("BREAKING: "+ str(counter))
            break 
            
    if export: 
        print(">>Exported " + str( counter)+ " issues.")
        print(">> HAS STRING: copy to header of parent at--> has_issues: [<here>])")
        print(issueFilenameHas)
    else: 
        print(">>Processed " + str( counter)+ " issues (no export).")

    
# 1st argument is to export or just process entries 
# 2nd optional argument, is counter break point (integer) for testing 
writeIssueEntries(True)

>>Exported 39 issues.
>> HAS STRING: copy to header of parent at--> has_issues: [<here>])
"acadia-workshop-85","acadia-workshop-86-proceedings","integrating-computers-into-the-architectural-curriculum","computing-in-design-education","new-ideas-and-directions-for-the-1990s","from-research-to-practice","reality-and-virtual-reality","computer-supported-design-in-architecture","education-and-practice","reconnecting","computing-in-design","design-computation","design-and-representation","digital-design-studios","media-and-design-process","eternity-infinity-and-virtuality-in-architecture","reinventing-the-discourse","thresholds","connecting-crossroads-of-digital-discourse","fabrication","smart-architecture","synthetic-landscapes","expanding-bodies","silicon-skin","acadia-09","acadia-10","acadia-11","acadia-12","acadia-13","acadia-14","acadia-2105","acadia-2016","acadia-2017","acadia-2018","acadia-19","acadia-2020-a","acadia-2020-b","realignments","hybrids-haecceities",
