Skip to content

Commit 2aa4cd9

Browse files
mhodgefreekmurzeclaude
authored
Add trimValues() and formatValuesUsing() to the reader (#195)
* @ Add trimValues() and formatValuesUsing() to the reader The reader could already clean up header names (trimHeaderRow, headersToSnakeCase, formatHeadersUsing) but had no equivalent for the actual cell values. This adds two symmetric reader methods: - trimValues(?string $characters = null) - formatValuesUsing(callable $callback) // receives ($value, $key) Non-string values (e.g. dates) are left untouched by trimValues(). Formatting runs per-row inside the LazyCollection, preserving the package low memory usage. Includes tests and README docs. @ * Strengthen no-header-row trim test to assert trimming on whitespace rows Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Freek Van der Herten <freek@spatie.be> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 1ffa9cf commit 2aa4cd9

5 files changed

Lines changed: 188 additions & 2 deletions

File tree

README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,46 @@ $rows = SimpleExcelReader::create($pathToCsv)
260260
});
261261
```
262262

263+
#### Trimming and formatting values
264+
265+
Just like you can clean up header names, you can clean up the data values themselves. This is handy when importing files where cells contain stray whitespace or need to be normalized.
266+
267+
Use `trimValues()` to strip whitespace from every value.
268+
269+
```csv
270+
email,first_name
271+
john@example.com , john
272+
jane@example.com,jane
273+
```
274+
275+
```php
276+
$rows = SimpleExcelReader::create($pathToCsv)
277+
->trimValues()
278+
->getRows()
279+
->each(function(array $rowProperties) {
280+
// in the first pass $rowProperties will contain
281+
// ['email' => 'john@example.com', 'first_name' => 'john']
282+
});
283+
```
284+
285+
Like `trim`, `trimValues()` accepts an optional argument specifying which characters to trim. This argument is a *set of characters* stripped from both ends of each value (exactly like PHP's [`trim`](https://www.php.net/manual/en/function.trim.php)), **not** a suffix. For example, `trimValues('*')` removes any leading or trailing `*` from every value. Be careful with letters: `trimValues('.com')` would also turn `Tom` into `T`.
286+
287+
```php
288+
$rows = SimpleExcelReader::create($pathToCsv)
289+
->trimValues('*')
290+
->getRows();
291+
```
292+
293+
For full control, use `formatValuesUsing()` and pass a closure. The closure receives the value and its header key, so you can normalize values per column. Non-string values (such as dates read from an Excel file) are passed through untouched by `trimValues()`, but your own closure is responsible for handling them.
294+
295+
```php
296+
$rows = SimpleExcelReader::create($pathToCsv)
297+
->formatValuesUsing(function ($value, $key) {
298+
return $key === 'email' ? strtolower($value) : $value;
299+
})
300+
->getRows();
301+
```
302+
263303
#### Manually working with the reader object
264304

265305
Under the hood this package uses the [openspout/spout](https://github.com/openspout/openspout) package. You can get to the underlying reader that implements `\OpenSpout\Reader\ReaderInterface` by calling the `getReader` method.

src/SimpleExcelReader.php

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,9 @@ class SimpleExcelReader
2626
protected bool $trimHeader = false;
2727
protected bool $headersToSnakeCase = false;
2828
protected bool $parseFormulas = true;
29+
protected bool $trimValues = false;
30+
protected ?string $trimValueCharacters = null;
31+
protected mixed $formatValuesUsing = null;
2932
protected ?string $trimHeaderCharacters = null;
3033
protected mixed $formatHeadersUsing = null;
3134
protected ?array $headers = null;
@@ -160,6 +163,21 @@ public function headersToSnakeCase(): self
160163
return $this;
161164
}
162165

166+
public function trimValues(?string $characters = null): self
167+
{
168+
$this->trimValues = true;
169+
$this->trimValueCharacters = $characters;
170+
171+
return $this;
172+
}
173+
174+
public function formatValuesUsing(callable $callback): self
175+
{
176+
$this->formatValuesUsing = $callback;
177+
178+
return $this;
179+
}
180+
163181
public function keepFormulas()
164182
{
165183
$this->parseFormulas = false;
@@ -376,7 +394,7 @@ protected function getValueFromRow(Row $row): array
376394
$headers = $this->customHeaders ?: $this->headers;
377395

378396
if (! $headers) {
379-
return $values;
397+
return $this->processValues($values);
380398
}
381399

382400
$values = array_slice($values, 0, count($headers));
@@ -385,7 +403,33 @@ protected function getValueFromRow(Row $row): array
385403
$values[] = '';
386404
}
387405

388-
return array_combine($headers, $values);
406+
return $this->processValues(array_combine($headers, $values));
407+
}
408+
409+
protected function processValues(array $values): array
410+
{
411+
if ($this->trimValues) {
412+
$values = array_map([$this, 'trimValue'], $values);
413+
}
414+
415+
if ($this->formatValuesUsing) {
416+
$keys = array_keys($values);
417+
$formatted = array_map($this->formatValuesUsing, array_values($values), $keys);
418+
$values = array_combine($keys, $formatted);
419+
}
420+
421+
return $values;
422+
}
423+
424+
protected function trimValue(mixed $value): mixed
425+
{
426+
if (! is_string($value)) {
427+
return $value;
428+
}
429+
430+
return is_null($this->trimValueCharacters)
431+
? trim($value)
432+
: trim($value, $this->trimValueCharacters);
389433
}
390434

391435
protected function getSheet(): SheetInterface

tests/SimpleExcelReaderTest.php

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,103 @@ function () {
496496
]);
497497
});
498498

499+
it('can trim whitespace from values', function () {
500+
$rows = SimpleExcelReader::create(getStubPath('values-with-spaces.csv'))
501+
->trimValues()
502+
->getRows()
503+
->toArray();
504+
505+
expect($rows)->toEqual([
506+
[
507+
'email' => 'john@example.com',
508+
'first_name' => 'john',
509+
'last_name' => 'doe',
510+
],
511+
[
512+
'email' => 'mary-jane@example.com',
513+
'first_name' => 'mary jane',
514+
'last_name' => 'doe',
515+
],
516+
]);
517+
});
518+
519+
it('can trim values with custom characters', function () {
520+
$rows = SimpleExcelReader::create(getStubPath('padded-values.csv'))
521+
->trimValues('*')
522+
->getRows()
523+
->first();
524+
525+
expect($rows)->toEqual([
526+
'email' => 'john@example.com',
527+
'first_name' => 'john',
528+
]);
529+
});
530+
531+
it('can format values using a callback', function () {
532+
$rows = SimpleExcelReader::create(getStubPath('header-and-rows.csv'))
533+
->formatValuesUsing(fn ($value) => strtoupper($value))
534+
->getRows()
535+
->first();
536+
537+
expect($rows)->toEqual([
538+
'email' => 'JOHN@EXAMPLE.COM',
539+
'first_name' => 'JOHN',
540+
'last_name' => 'DOE',
541+
]);
542+
});
543+
544+
it('passes the header key to the value formatter', function () {
545+
$rows = SimpleExcelReader::create(getStubPath('header-and-rows.csv'))
546+
->formatValuesUsing(fn ($value, $key) => $key === 'email' ? strtoupper($value) : $value)
547+
->getRows()
548+
->first();
549+
550+
expect($rows)->toEqual([
551+
'email' => 'JOHN@EXAMPLE.COM',
552+
'first_name' => 'john',
553+
'last_name' => 'doe',
554+
]);
555+
});
556+
557+
it('can trim and format values together', function () {
558+
$rows = SimpleExcelReader::create(getStubPath('values-with-spaces.csv'))
559+
->trimValues()
560+
->formatValuesUsing(fn ($value) => strtoupper($value))
561+
->getRows()
562+
->first();
563+
564+
expect($rows)->toEqual([
565+
'email' => 'JOHN@EXAMPLE.COM',
566+
'first_name' => 'JOHN',
567+
'last_name' => 'DOE',
568+
]);
569+
});
570+
571+
it('can format values when there is no header row', function () {
572+
$rows = SimpleExcelReader::create(getStubPath('values-with-spaces.csv'))
573+
->noHeaderRow()
574+
->trimValues()
575+
->getRows()
576+
->toArray();
577+
578+
expect($rows)->toEqual([
579+
['email', 'first_name', 'last_name'],
580+
['john@example.com', 'john', 'doe'],
581+
['mary-jane@example.com', 'mary jane', 'doe'],
582+
]);
583+
});
584+
585+
it('does not trim non-string values', function () {
586+
$dates = SimpleExcelReader::create(getStubPath('formatted_dates.xlsx'))
587+
->trimValues()
588+
->getRows()
589+
->pluck('created_at')
590+
->toArray();
591+
592+
expect($dates[0])->toBeInstanceOf(DateTimeImmutable::class);
593+
expect($dates[1])->toBeInstanceOf(DateTimeImmutable::class);
594+
});
595+
499596
it('can retrieve rows with a different delimiter', function () {
500597
$rows = SimpleExcelReader::create(getStubPath('header-and-rows-other-delimiter.csv'))
501598
->useDelimiter(';')

tests/stubs/padded-values.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
email,first_name
2+
*john@example.com*,*john*

tests/stubs/values-with-spaces.csv

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
email,first_name,last_name
2+
john@example.com , john,doe
3+
mary-jane@example.com,mary jane , doe

0 commit comments

Comments
 (0)