Pandas : Excel 파일에서 시트 목록 조회

lottoking 2020. 8. 20. 19:07

Pandas : Excel 파일에서 시트 목록 조회

새 버전의 Pandas는 다음 인터페이스 를 사용 하여 Excel 파일을로드합니다.

read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])

하지만 사용할 수있는 시트를 모를 경우 어떻게 사용합니까?

예를 들어, 다음 시트가 Excel 파일로 작업하고 있습니다.

데이터 1, 데이터 2 ..., 데이터 N, foo, bar

그러나 나는 N선험적 인 것을 모른다 .

Pandas의 Excel 문서에서 시트 목록을 찾을 수있는 방법이 있습니까?

ExcelFile 클래스 (및 sheet_names특성) 는 계속 사용할 수 있습니다 .

xl = pd.ExcelFile('foo.xls')

xl.sheet_names  # see all sheet names

xl.parse(sheet_name)  # read a specific sheet to DataFrame

더 많은 옵션 은 구문 분석 문서를 참조하십시오 ...

두 번째 매개 변수 (sheetname)를 없음으로 명시 적으로 지정해야합니다. 이렇게 :

 df = pandas.read_excel("/yourPath/FileName.xlsx", None);

"df"는 DataFrame의 사전 인 모든 시트이며 다음을 실행하여 확인할 수 있습니다.

df.keys()

결과는 다음과 가변적입니다.

[u'201610', u'201601', u'201701', u'201702', u'201703', u'201704', u'201705', u'201706', u'201612', u'fund', u'201603', u'201602', u'201605', u'201607', u'201606', u'201608', u'201512', u'201611', u'201604']

자세한 내용은 pandas 문서를 참조하십시오 : https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

이것은 @diving에서 영감을 얻은 가장 빠른 방법입니다. 모든 xlrd, openpyxl 또는 pandas를 기반으로 한 답변은 모두 전체 파일을 먼저로드하기 때문에 느립니다.

from zipfile import ZipFile
from bs4 import BeautifulSoup  # you also need to install "lxml" for the XML parser

with ZipFile(file) as zipped_file:
    summary = zipped_file.open(r'xl/workbook.xml').read()
soup = BeautifulSoup(summary, "xml")
sheets = [sheet.get("name") for sheet in soup.find_all("sheet")]

@dhwanil_shah의 답변을 바탕으로 전체 파일을 추출 할 필요가 없습니다. 함께 zf.open그것을 직접 압축 파일에서 읽을 수 있습니다.

import xml.etree.ElementTree as ET
import zipfile

def xlsxSheets(f):
    zf = zipfile.ZipFile(f)

    f = zf.open(r'xl/workbook.xml')

    l = f.readline()
    l = f.readline()
    root = ET.fromstring(l)
    sheets=[]
    for c in root.findall('{http://schemas.openxmlformats.org/spreadsheetml/2006/main}sheets/*'):
        sheets.append(c.attrib['name'])
    return sheets

두 개의 개의 연속 된 readlines는 추악하지만 내용은 텍스트의 두 번째 줄에만 있습니다. 전체 파일을 구문 분석 할 필요가 없습니다.

이 솔루션은 read_excel버전 보다 훨씬 빠르며 전체 추출 버전보다 빠를 가능성이 있습니다.

나는 xlrd, pandas, openpyxl 및 기타 이러한 라이브러리를 시도했으며 전체 파일을 읽을 때 파일 크기가 증가함에 따라 모두 기하 급수적으로 시간이 걸리는 것 같습니다. 위에서 언급 한 '주문형'을 사용한 다른 솔루션은 저에게 적합하지 않았습니다. 처음에 시트 이름을 가져 오려면 xlsx 파일에 대해 다음 함수가 작동합니다.

def get_sheet_details(file_path):
    sheets = []
    file_name = os.path.splitext(os.path.split(file_path)[-1])[0]
    # Make a temporary directory with the file name
    directory_to_extract_to = os.path.join(settings.MEDIA_ROOT, file_name)
    os.mkdir(directory_to_extract_to)

    # Extract the xlsx file as it is just a zip file
    zip_ref = zipfile.ZipFile(file_path, 'r')
    zip_ref.extractall(directory_to_extract_to)
    zip_ref.close()

    # Open the workbook.xml which is very light and only has meta data, get sheets from it
    path_to_workbook = os.path.join(directory_to_extract_to, 'xl', 'workbook.xml')
    with open(path_to_workbook, 'r') as f:
        xml = f.read()
        dictionary = xmltodict.parse(xml)
        for sheet in dictionary['workbook']['sheets']['sheet']:
            sheet_details = {
                'id': sheet['@sheetId'],
                'name': sheet['@name']
            }
            sheets.append(sheet_details)

    # Delete the extracted files directory
    shutil.rmtree(directory_to_extract_to)
    return sheets

모든 xlsx는 기본적으로 압축 된 파일이므로 기본 xml 데이터를 추출하고 통합 문서에서 시트 이름을 직접 읽습니다. 이는 라이브러리 함수에 비해 몇 분의 1 초가 걸립니다.

벤치마킹 : (4 장의 6MB xlsx 파일)
Pandas, xlrd : 12 초
openpyxl : 24 초
제안 된 방법 : 0.4 초

내 요구 사항은 시트 이름을 읽는 것이므로 전체 시간을 읽는 불필요한 오버 헤드가 나를 괴롭 히고 대신이 경로를 선택했습니다.

참고 URL : https://stackoverflow.com/questions/17977540/pandas-looking-up-the-list-of-sheets-in-an-excel-file

'IT' 카테고리의 다른 글

기본 매개 변수 값으로 SQL 함수? (0)	2020.08.20
IDLE에서 명령 내역에 어떻게 액세스 할 수 있습니까? (0)	2020.08.20
포드 컨테이너 내에서 Kubernetes API에 액세스해야합니까? (0)	2020.08.20
순수 JavaScript는 양식없이 POST 데이터를 전송합니다. (0)	2020.08.20
three.js가있는 투명한 배경 (0)	2020.08.20

현재글Pandas : Excel 파일에서 시트 목록 조회

lottoking 내가 바로 로또왕!

내가 바로 로또왕!

무비순위, spring, 관광, spring3, 행사, Java, 공연, 자바, c++, 놀거리, javascript, 영화순위, 볼거리, 여행, 뮤지컬, jquery, 축제, C#, DVD순위, 연극,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

lottoking