How to Load CSV Data Into MySQL

Niole Nelson
Niole Net
Published in
6 min readDec 6, 2020

--

I wanted to load data into the database on startup. I had been working on an api server for a while. I eventually found myself wanting to test drive it, maybe hook it up to a web client and bash it a little bit. I wanted to see what it could do and find its weak spots.

I had chosen mysql server in order to support a chart UI which needed to show data aggregated by day, month, year, etc. Mysql seemed like the appropriate technology (it was).

Now I needed a faster and more straightforward way to seed the database with lots of data. I had imported data files into mongodb instances before, which was delightfully easy.

How hard could it be to do the same with mysql?

Honestly, it was super annoying. I fiddled around with the mysql system variables docs for way too long. The worst of it is, that in the end I followed the information that is provided on the mysql docker image page (I was using docker to do this) and got something working in less than an hour.

Nonetheless, I think there is something to say. And also, more people need to document these “annoying details” in blog form. It’s true: I spent way too long looking in the wrong places, googling my problems just to find solutions that only work in mysql workbench. Maybe this is because mysql just isn’t written about as hot technologies like React or Kubernetes. Maybe I just can’t read the docs. I don’t care. I’m writing about it.

Working Example

We’re doing a working example first, because there is nothing worse than a long gabby treatise on “who-gives-a-shit” when you are “just trying to make it work”.

Goal

Load csv data into a new mysql server on startup. The implementation should be able to handle multiple tables, multiple rows per table, and special data types like enums and timestamps.

Dependencies

docker

The Command

docker run \
--name my-db \
-p 3306:3306 # for y'all trying to hit it from localhost
-v ./init.sql:/docker-entrypoint-initdb.d/init.sql \
-v ./init_data:/docker-entrypoint-initdb.d/init_data \
-e MYSQL_ROOT_PASSWORD=pw \
-e MYSQL_USER=my-database-user \
-e MYSQL_PASSWORD=pw \
-e MYSQL_DATABASE=my-database-name \
-d mysql:8.0.22 \
--secure-file-priv=docker-entrypoint-initdb.d

Important bits

— secure-file-priv=docker-entrypoint-initdb.d

secure_file_priv is a system variable. It tells mysql server that “it’s ok” to load data from files from the specified directory.

Why docker-entrypoint-initdb.d? When your mysql docker container starts for the first time, it will execute files in this directory with extensions .sql (as well as a couple others). Next we introduce the .sql file that we created.

init.sql

This file will contain your database and table initialization queries as well as your file data loading queries. You will see a syntax example later on.

init_data/

This directory contains your .csv data files. The init.sql code will reference these files in order to load their contents into mysql tables.

Code

Forgive the non-standard capitalization choices.

init.sql

CREATE DATABASE IF NOT EXISTS environment;CREATE TABLE IF NOT EXISTS environment.Events (Id varchar(36) PRIMARY KEY NOT NULL,Type ENUM('THERMAL', 'PARTICLE') NOT NULL,Unit ENUM('F','C','PM2.5') NOT NULL,Timestamp timestamp NOT NULL);LOAD DATA INFILE '/docker-entrypoint-initdb.d/init_data/events.csv'INTO TABLE environment.EventsFIELDS TERMINATED BY ','LINES TERMINATED BY '\n' (Id,Type,Unit,Timestamp);

gotchas

If mysql complains about not being able to find or use the init.sql file, make sure that it’s set to be readable/writeable/executable by the owning user: chmod 700 init.sql.

If you get an error like “Incorrect datetime value: ‘2020–01–01 10:10:10 eventid2' for column ‘Timestamp’ at row 1”, it might seem like you formatted your timestamp incorrectly (that’s certainly what I thought), but if you followed the example csv file below, it’s more likely that your LINES TERMINATED BY clause doesn’t match up with reality. Check that then line at row 1 ends with a new line character and that there are no mistakes with the clause itself.

init_data/events.csv

eventid1,55,THERMAL,2020–01–01 10:10:10eventid2,201,PARTICLE,2020–01–02 10:10:10

Gotchas

If you see something like “Data truncated for column ‘Value’ at row 1”, you might have a space messing up parsing. Leave no spaces between the delimiting commas and the field values.

Also, don’t put quotes around string values and don’t put quotes around the timestamp value. I know that I was tempted to, due to the space between the date and the time part of the value.

The load data statement can parse these values just fine on its own. If you get parse errors, they likely have to do with rogue spaces and end-of-line values.

Why so gotcha?

Why was it such a pain to load data files into the mysql server? Mysql server has a lot of fiddly configuration.

In order to see what I mean, ssh into your running mysql container and run:

mysqld --help

Notice how it gives you no information, but then tells you to run:

mysqld --help --verbose

That’s because there’s so much configuration. I bet that once upon a time, a PM brought up how unusable the help page is. Then when the team tried to refactor it there was too much to do. So, they just shoved it all behind this verbose flag with some messaging to warn users that they are about to be visually assaulted by several pages of information.

The most annoying part of trying to make this work was the winding path that I took in order to find a working solution and then the realization that the answer was in front of me all along.

At first, I used LOAD DATA LOCAL INFILE, which requires jumping through many more hoops than my ultimate solution. Both the client and the server must set the local_infile flag. I mounted a /etc/my.cnf file in order to do the server part of that. Then I wanted to make sure that the data was loaded on startup, so I elected to use the init_file flag for that:

Then I still needed to reassure mysql server that it was indeed okay to execute the init_file, so I had to set the secure_file_priv flag as well.

In the end, according to the logs it was this flag that refused to be set. How could this be, you might ask, since I used it successfully in my working version.

Between all of the moving parts, the three different flags, the fiddly details around client and server needing to opt in to load_infile, the load file syntax itself and the .sql file, doing this all at startup, which introduces a timing complication, not to mention that I was using the mysql docker image, which might behave differently that just running mysqld on its own, the real failure might not have been with secure_file_priv at all. Heck, the particular configuration I was trying to force might not even be possible in this scenario.

In the end, I decided to simplify. Since I was most paranoid about the mysql docker image being a special snowflake, I decided to read the mysql docker page in order to see what they recommend. I quickly got the init.sql file to execute. A few messages about syntax errors later, I was loading data into a database.

So, what did we learn today?

Read the docs, but also keep things simple. In my first go around, I certainly read the docs. I tried very hard to implement the mysql docs to the letter. We want however, to be less “Rube Goldberg machine” and more “I-know-how-it-works machine”. We want something more along the lines of several components with lighter configuration, instead of plugging everything into mysql and hoping that not only are all of the values set correctly, but also it all runs in the right order, permissions are correct, etc.

--

--