Working with large files (SQL Server & EmEditor)

View Only

Back to discussions

Expand all | Collapse all

Working with large files (SQL Server & EmEditor)

1. Working with large files (SQL Server & EmEditor)

0 Like
Briana Brown
Posted 05-03-2019 16:10

Reply Reply Privately
I'm currently working on a project where I need to work with and query large datasets. My computer can't handle processing really big files so I've been struggling a bit. I finally found a workflow that's getting the job done that I wanted to share incase anyone else is experiencing similar issues.

What I'm trying to do is isolate specific property records associated to certain people or businesses. I have the list of people/businesses who own the land broken out by state so that helps ease the file size issue a bit. But some larger states were still giving me some trouble. Here's what I've been doing to accomplish what I need without access to a server.

1) I had the property attribute data for New Hampshire downloaded to my computer, divided into separate .txt files for each county. I wanted to query the whole state so first I needed to combine the .txt files. To do this I downloaded EmEditor. I added all NH county files to it and then used the combine tool to create a state file. Then I defined the delimiter as pipe and identified which row I wanted as the header.

2) Once I had the property attributes all combined into one state file for NH, I opened SQL Server Management Studio. Using the script provided with my download of the property attributes data I created a table for NH that I could load the data into.

3) To import the data I used the following script (with the exception of part of the path name I had to blur out) :

4) Then I was ready to query the data and find the property records I needed:

So far I'm having success with this method. Has anyone else ever had to work with large file sizes in an environment that can't typically handle them? If so, I'm interested to hear what other ways I can work on this.

------------------------------
Briana Brown
Data Product Marketing Manager
New York, NY
------------------------------
2. RE: Working with large files (SQL Server & EmEditor)

0 Like
Employee

neena priyanka
Posted 05-04-2019 05:09

Reply Reply Privately
Query and Analysis using Open Source Software (PostgresDB)

Install PostgresDB based on Operating System (Windows or Mac OS)

Copy data to local drive

Execute create statement to create table in which files will be loaded (specify table name, field name, data type and length, indexing info, primary key etc.)

Create batchfile in the same folder where the files are present
Example of batch file content to load multiple *.txt files :

for %%f in (*.txt) do psql -d your_database_name -h localhost -U postgres -p 5432 -c "\COPY public.tablename FROM '%%~dpnxf' DELIMITER ''| CSV;"
pause

Run batchfile through command prompt to load data in created table

Execute Standard SQL queries to extract required information from table loaded with data

------------------------------
Dr Neena Priyanka
Pitney Bowes Software India PVT. Ltd
Noida
------------------------------

Original Message
3. RE: Working with large files (SQL Server & EmEditor)

0 Like
Employee

neena priyanka
Posted 05-04-2019 05:21

Reply Reply Privately
Amazon provides a few services to query and analyze large files if you are not familiar with big data technologies.
Query and Analysis using Amazon Athena

Set-up an amazon account/ or use Enterprise Account set-up by your respective organisation

Copy data to S3 bucket (ensure extracted file is kept such *.txt or *.csv or *.tsv if compression is other other than *.gzip or *.biz)

Connect Athena to some data stored in Amazon Simple Storage Service (S3), identify fields, run queries, and get results in seconds.

Note: Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using Standard SQL Queries. You don't have to build, manage, or tune a cluster or any other infrastructure, and you pay only for the queries that you run.
Weblink: https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/

Query and Analysis using Redshift

Set-up an amazon account/ or Enterprise Account set-up by your respective organisation

Copy data to S3 bucket (ensure extracted file is kept such *.txt or *.csv or *.tsv if compression is other than GZIP or BZIP2)

Launch an Amazon Redshift cluster.

Install a SQL client

Connect your SQL client to the cluster master database.

Load data from S3 into Redshift cluster and start analyzing data using Standard SQL Queries

Note: Amazon Redshift is an enterprise-level, petabyte scale, fully managed data warehousing service.
Weblink: https://docs.aws.amazon.com/redshift/latest/dg/t_Loading_data.html

------------------------------
Dr Neena Priyanka
Pitney Bowes Software India PVT. Ltd
Noida
------------------------------

Original Message
4. RE: Working with large files (SQL Server & EmEditor)

0 Like
Employee

Colleen Reed
Posted 05-06-2019 08:44

Reply Reply Privately
Hi Briana, Over the years I have found that the easiest way (for me) to combine multiple text files into a single file is by using a command line in a dos window. The command "type" will write out all records within a file to your screen. You can easily direct the output to write to a file that you specify, instead of to the screen.

So in your case, if each of your county files started with "NH", you could use the command "type NH*.txt > output.txt". This would essentially write all of the records in each NH*.txt file to the output.txt file. The only trick is the handling of the header rows in each file. You can either remove them all first, and reinsert the header to your output file, or alternatively find them after the fact and remove them from your output file.

------------------------------
Colleen Reed
PITNEY BOWES SOFTWARE, INC
Maitland FL
------------------------------

Original Message

Data Points

Working with large files (SQL Server & EmEditor)

Briana Brown05-03-2019 16:10

neena priyanka05-04-2019 05:09

neena priyanka05-04-2019 05:21

Colleen Reed05-06-2019 08:44

1. Working with large files (SQL Server & EmEditor)

2. RE: Working with large files (SQL Server & EmEditor)

3. RE: Working with large files (SQL Server & EmEditor)

4. RE: Working with large files (SQL Server & EmEditor)

About Precisely

Customer Support

Copyright ©2024 Precisely. All rights reserved worldwide.