I'm currently working on a project where I need to work with and query large datasets. My computer can't handle processing really big files so I've been struggling a bit. I finally found a workflow that's getting the job done that I wanted to share incase anyone else is experiencing similar issues.
What I'm trying to do is isolate specific property records associated to certain people or businesses. I have the list of people/businesses who own the land broken out by state so that helps ease the file size issue a bit. But some larger states were still giving me some trouble. Here's what I've been doing to accomplish what I need without access to a server.
1) I had the property attribute data for New Hampshire downloaded to my computer, divided into separate .txt files for each county. I wanted to query the whole state so first I needed to combine the .txt files. To do this I downloaded EmEditor
. I added all NH county files to it and then used the combine tool to create a state file. Then I defined the delimiter as pipe and identified which row I wanted as the header.
2) Once I had the property attributes all combined into one state file for NH, I opened SQL Server Management Studio
. Using the script provided with my download of the property attributes data I created a table for NH that I could load the data into.
3) To import the data I used the following script (with the exception of part of the path name I had to blur out) :
4) Then I was ready to query the data and find the property records I needed:
So far I'm having success with this method. Has anyone else ever had to work with large file sizes in an environment that can't typically handle them? If so, I'm interested to hear what other ways I can work on this.
Data Product Marketing Manager
New York, NY