You've probably come across a situation before where you needed to do some kind of mass update or sync with external data. A naive way you might approach this could be similar to below.
collect($csvContacts)->each(function (array $row) {
$contact = Contact::updateOrCreate(
['email' => $row['email']],
['name' => $row['name'], 'address' => $row['address']]
);
});
Under the hood, this will make 2 queries for each record. First, it will find the first row matching all of the key/value pairs in the first array. If a row already exists, it will update it, otherwise a new row is inserted.
With small imports this will work fine, but imagine you had 100,000 rows in your CSV. This would result in 200,000 queries, which is going to take forever.
When using upserts a single query is made containing all the rows. This query includes on duplicate key update
on MySQL and on conflict ... do update set
on Postgres which instructs the database to update records if they already exist in the database.
It's important to note that you must have either a primary
or unique
index on the column you are upserting on the database. If you forget to add the index you'll get an error.
Obviously, making a single query is much more efficient and lets the database internally compute duplicate rows (which is very quick since you have an index on the column).
It's also a good idea to chunk these queries into blocks, especially if your queries are inserting/updating lots of data. If you don't do this, you may hit query size limits on some databases.
collect($csvContacts)
->map(function (array $row) {
return Arr::only($row, ['email', 'name', 'address']);
})
->chunk(1000)
->each(function (Collection $chunk) {
Contact::upsert($chunk, 'email');
});